Network blog

Algorithmic power and East African languages: questioning search engine Autocomplete (PART 2)

Expanding our (research) toolkit

By Peter Chonka, Stephanie Diepeveen, and Yidnekachew Haile (17 December 2020)

Part 1 of this pair of blogposts highlighted problematic Autocomplete ‘predictions’ of Google search engine for sample keywords in Somali and Amharic languages. We explained Autocomplete as a tool and highlighted some of the many unknowns of its interaction with languages other than English, with a focus on East African languages in particular. Because Google Search Autocomplete is partly based on users’ aggregated past online behaviour (what has been searched for previously), the predictions that pop up potentially reflect existing social, political or cultural dynamics. However, we also pointed out possible ‘feedback loops’ where the increased visibility of certain predictions may influence users’ search behaviour, further reinforcing particular semantic links between search terms. Here in Part 2, we explore the Google Autocomplete predictions using Swahili sample keywords  and consider some associated online tools that could be used to make sense of what these predictions actually show us, or their wider cultural and linguistic influence. We look at Google Trends (as a proxy for what data may be feeding into the Google Autocomplete algorithm) and Answerthepublic.com (a commercial tool that creates a snapshot of autocomplete suggestions in a country at that moment in time).

As with the Amharic and Somali tests, we chose a selection of search terms with contemporary, political and cultural significance with a view of identifying patterns in predictions. We used names of Kenyan politicians on their own, and with a verb or conjunction (both in English and Swahili), common politicised or potentially divisive terms (uislamu – Islam, ukristo – Christianity, ufisadi – corruption, virusi vya corona – coronavirus) and gendered nouns (msichana – girl, wasichana – girls). The full data from our test can be found here.

For the Kenyan politicians, their names in Kiswahili are often spelled in the same way that they would be in English (this is not the case for Amharic or Somali). Therefore, predictions for a search on ‘Uhuru Kenyatta’ will likely be influenced by English language searches, both within Kenya and globally. Also, unlike Amharic or Somali, English is one of Kenya’s official languages and is commonly used in online spaces (like Google Search). This makes identifying patterns in the Google Autocomplete predictions for Kiswahili keywords more challenging, because there is potentially a much wider range of linguistic influence on the prediction algorithm.

This digital linguistic context therefore prompted us to adjust our approach to the Kiswahili predictions and we focused instead on the potential significance of code switching practices online, and the value of other Search related tools to identify digital linguistic patterns visible in Autocomplete. Code switching can be construed as Bi- or multi-lingual speakers moving between multiple languages within an act of communication or a conversation. Both Google and third-party developers have created analytical tools to explore patterns of engagement on Google Search. These are designed to provide individual and commercial insights into user behaviour. Are these tools useful for studying Google search predictions in African languages? We explore this question by looking at how Kiswahili words feature in Google Trends and Answerthepublic. While neither of these tools are designed primarily for academic research, both are increasingly being explored for their potential use by scholars to interrogate patterns of engagement on search engines like Google. This post considers if and how they might have analytical value in relation to African indigenous languages. These tools present potential opportunities to unpack the ‘black boxes’ of search autocomplete algorithms to explore how users interact with Google’s human and algorithmic operations. Equally, they bring their own limited gaze, also involving algorithmic processing and representing creators’ interests. To what extent can these tools be used to gain insight into the rights and agency of users inputting African indigenous languages on Google search?

Socio-linguistic and algorithmic code-switching practices in Kiswahili

Kiswahili provides a clear example of the dynamism of language. It is a Bantu language, but prominently features Arabic and English loan words. These are derived from East Africa’s history of interaction with Indian Ocean cultural flows and European colonialism. In Kenya, multilingualism and code-switching is common, with Kenyans often speaking a vernacular language associated with an ethnic group, Kiswahili, and/or English. In our experiment, we were interested in how Kiswahili in Google Search Autocomplete reflects, influences or reconstructs code-switching practices. Firstly, the algorithm appears to operate more effectively (i.e. it gives more suggestions) when English versions of verbs or words that are recognisable in English are used, in comparison with Kiswahili. At times, Google Autocomplete also initiates code-switching itself, for example, changing the swahili ‘na’ (and) to English ‘and’ in the predictions.

Google Trends

Google Trends is a tool that provides insights into how a keyword or topic is ‘trending’ in Google search – on its own or in comparison with other words or topics. It also reveals how different words, by language, are recognised in Google Search, and provides different options for analysing them as terms (simply looking for matches to the query as written) or topics (where Google groups terms that share the same concept). It is aimed at providing granular information for wider use, and isn’t explicitly marketed at advertisers (as, for example, Adwords search terms reports are). It provides information on the relative frequency of searches over time in a country, as well as what other topics and queries are being conducted by users. This could provide insight into the information that informs what comes up in Autocomplete suggestions.

As with Autocomplete, Google gives suggestions for what someone might be looking for when using Trends. It gives both English and Kiswahili suggestions for Kiswahili words. For example, ‘ufisadi’ is met with 5 suggestions: analysing it as a search term ‘ufisadi’ (uncategorised), or a search term ‘corruption’, and then also as a topic in three English forms: as ‘corruption’ – as a political ideology, ‘anti-corruption commission’, and ‘ethics and anti-corruption commission’.

The potential for analysis is much greater in English forms of the word. Without the categorisation, the relative frequency of searches for a word are lower, and thus likely to have a less profound impact on suggestions. Also, the information about related queries is minimal for the Kiswahli version of ‘ufisadi’.

Answer the Public

Answerthepublic is a consumer insight tool that is more explicitly aimed at helping corporate customers gain insights into their target audiences’ patterns of thought. It has a limited free version, but much of its functionality sits behind a subscription-based paywall. It does not provide an option to search for Autocomplete suggestions in Kiswahili. However, inputting and comparing how it responds to different Kiswahili search terms helps to reveal both how the tool works, as well as something about how people engage with the search query. Answerthepublic is openly commercial, and mainly operates in western languages. Nonetheless, its investment in gathering and visualising Autocomplete results presents a potential opportunity to probe how autocomplete functions.

First, Answerthepublic breaks down Autocomplete suggestions for a term in combination with common verbs and question words. Here, the tool struggles with the Kiswahili words, only ‘picking’ them out when they are combined with English terms. Second, Answerthepublic breaks down Autocomplete suggestions alphabetically, showing what words come up most often when the search query is followed by a letter. Here, the tool presents what seems to be the most common word, unfiltered/analysed for meaning. A Kiswahili term is followed by Kiswahili autocomplete suggestions; many of which are similar to what came up in our own search.

Autocomplete analysis on Answerthepublic (Kenya) versus our Google search

Reflecting on the tools

What value might these tools bring to the understanding of Autocomplete predictions in relation to African indigenous languages? While our initial results are mixed, they show potential for exploring the way commercial tools can be employed for research purposes. For example, these tools help to show how Google operates more effectively and comprehensively in some languages than others. Google Trends helps to reveal how Google categorises different words, and how this relates to its ability to track and link user searches. At the same time, it indicates where and how we might need to employ caution when utilising a tool in different languages. Creators’ interests and areas of investment influence what we can see. There is often a commercial interest in developing more comprehensive tools for analysing data, and this reinforces power differentials in what can be analysed, by whom and for what purpose. Nonetheless, even where tools were not designed with African indigenous languages in mind, we find some areas where they could still help to gain insights into Autocomplete predictions in these languages. On Answerthepublic, there were some areas of its analysis where it did not filter out Kiswahili results (e.g. when collated by the first letter of a word), and the overall results had a strong overlap with our own test results.

Reflecting on the platform: Google Search Autocomplete

Overall, what insights can we gain by looking at Autocomplete (and associated tools) from the perspective of East African indigenous languages and their online users? It’s clear across our two blogposts that our experiments with Autocomplete raise many more questions than they answer. Nonetheless, each of these questions potentially opens up new and important areas of research and scrutiny.  We are interested in shedding a light on the wider impacts of algorithmic interactions with different languages on digital publics of debate and information access for future research. This involves examining whether the type of phenomenon we describe here are consciously or implicitly internalised, and whether they have the power to shape what people think and how they act. On one hand, there is no guarantee that people will choose and search for Autocomplete predictions regardless of what they initially intended to search for. On the other hand, even in the event when people don’t click on predictions, it does not mean that they have been not influenced in terms of the way they think about a particular issue or the semantic associations that have been made visible. The line between prediction and suggestion becomes important here: when and to what extent do predictions blur into suggestions, and does this blurring alter the power dynamics between human users and algorithmic design/functioning?

With this in mind, what kind of agency should we attribute to the algorithm, and how must this be balanced with the agency of users, individually and in the aggregate? The ability of the algorithm to shape people’s information environments is not clear-cut. How it moderates what is shown to an individual is determined by wider and past behaviours, which are rendered as data. Particular features of different languages matter. Amharic is more difficult to type on the Qwerty keyboard, potentially limiting the frequency of its use online. In the Kiswahili search, we see Google automatically switching to English translations in the autocomplete predictions in some cases.

In taking forward these investigations, there is a risk of over-emphasising the centrality of particular platforms (e.g. Google Search) to political and social behaviours. Our study of Google Autocomplete indicates that it is being used in these languages, but the scale and importance of this use is difficult to ascertain from the Autocomplete predictions alone. Comparing Google Search Autocomplete predictions on other platforms (on Google’s YouTube or Facebook for instance) and other search engines such as Bing might offer useful comparisons.

It is important to highlight how much we don’t know about how these tools actually work (their ‘Black Box’ nature), but to assume that Google Search has a particularly unique and perverse influence on the exercise of choice more widely may be a step too far. Here, there is a need to combine studies of how these tools appear to operate with research into users’ actual technology use – the particular platforms they engage with; their information retrieval capability and practices; and their perceptions, expectations, and preferences for functions such as Autocomplete. Ultimately, starting to interrogate algorithmic interaction with African indigenous languages forces us away from wider implicit assumptions about the global centrality of English-language digital experience. These tools were not designed with languages such as Amharic, Kiswahili or Somali in mind. However, they are engaging with them – and in some quite problematic ways, as our small-scale testing has shown. Algorithmic power is an important theme for digital culture research worldwide and we hope that our pilot study illustrates how the interaction of globally dominant tools with myriad of different cultural linguistic contexts can have unexpected, unmonitored(?) and yet potentially significant impacts.

Peter Chonka is a Lecturer in Global Digital Cultures at King’s College London. Stephanie Diepeveen is a Research Associate in the Department of Politics and International Studies at the University of Cambridge. Yidnekachew Haile is a PhD/Doctoral researcher in digital technologies and development at Royal Holloway University of London.

Related Blogs