Jade Abbott opened a page on her laptop and instructed ChatGPT to count from 1 to 10 in isiZulu, a language spoken by over 10 million individuals in her native South Africa, while she was working in a co-working space in the Rosebank town of Johannesburg. According to Abbott, a computer professor and researcher, the results were described as “mixed and hilarious.”
She then inputted a few phrases in isiZulu and asked the AI to translate them into English, but the outcomes were far from satisfactory. Abbott noted that this highlights the challenge of technology not being fully inclusive of all languages, despite efforts to incorporate them into AI models, especially when resources for training are limited.
Abbott’s experience mirrors the struggles faced by non-English-speaking Africans, particularly those from dialects with fewer speakers like some American languages, where speech models such as ChatGPT may not perform effectively. To address this gap, Abbott and Pelonomi Moiloa, a biomedical engineer, are collaborating on a new project called Lelapa AI, which aims to leverage machine learning to develop tools tailored specifically for African users.
One of Lelapa’s recent innovations is Vulavula, an AI tool designed to transcribe speech into text, identify individuals and locations in written text, and enhance online searches or document summaries. Currently, the team has successfully integrated four languages spoken in South Africa—English, Sesotho, Afrikaans, and IsiZulu—and is working on incorporating additional languages from various regions across Africa.
Vulavula can be utilized independently or in conjunction with existing AI platforms like ChatGPT and online bots, with the goal of making African languages more accessible on these platforms. Moiloa, the CEO and co-founder of Lelapa AI, emphasized the importance of developing AI solutions tailored to African needs to empower individuals in the region and unlock the vast potential benefits of AI technology. She stated, “We are striving to address critical issues and empower our communities.”
The diversity of languages worldwide, with a significant concentration in Africa where an estimated one-third of all languages are spoken, underscores the importance of linguistic inclusivity in technological advancements. Despite English being the dominant language on the internet and in AI systems, efforts are being made to broaden language support. For instance, OpenAI’s GPT-4 has included additional languages like Icelandic, and Google Translate expanded its language support to include five new languages spoken by approximately 75 million people in February 2020.
However, African AI researchers highlight the shortcomings of current language models, citing instances where African languages are misidentified or poorly represented. At a recent AI event in Rwanda, African scholar Asmelash Teka Hadgu conducted experiments similar to Abbott’s with ChatGPT, revealing significant challenges in accurately interpreting Tigrinya, his mother tongue. Companies like Lelapa AI and Lesan are actively working to develop speech recognition resources for African languages, aiming to bridge the gap in linguistic representation in AI technologies.
Despite facing obstacles such as limited funding and access to investors, these companies play a crucial role in advancing AI solutions tailored to African languages and communities. By collaborating with linguists and local communities to gather and refine data, companies like Lelapa AI are striving to create culturally relevant models that address the unique linguistic nuances of African languages.
Vukosi Marivate, a data scientist at Lelapa AI, envisions a future where AI systems in Africa are not only inclusive but also representative of the diverse linguistic landscape of the continent. Initiatives like Masakhane, co-founded by Marivate and Abbott in 2019, aim to drive NLP research in African languages, engaging thousands of volunteers, programmers, and researchers in developing Africa-centric NLP models.
Moiloa emphasizes the importance of Africans taking the lead in developing AI solutions that cater to their linguistic needs, stating that “We’re the stewards of our languages.” By prioritizing solutions like Vulavula that are created by and for African users, these initiatives aim to empower African communities and ensure that their languages are accurately represented in the digital realm.