Although 2000 of the world’s languages are African, they are hardly represented in technology. This is due to colonialism, which has impacted the promotion, preservation, and integration of African languages. As a result, their names, cultures, places, and history are not understood in the technological space. Masakhane wants to change that. They aim to strengthen and advance Natural Language Processing research in African languages, for Africans, and by Africans. This technological progress should be achieved in the spirit of human dignity, well-being, and equality. Through inclusive community building, open participatory research and multidisciplinary, Africans should be able to shape and own this progress. Now they have been awarded the Ars Electronica Award for Digital Humanity by the Austrian Federal Ministry for European and International Affairs.
What are the challenges in processing and using African languages in the technology sector, and how is Masakhane trying to overcome them?
Chris Emezue: Challenges in processing African languages include the lack of tailored technologies and infrastructure, limited datasets, and funding constraints. Masakhane addresses these by building a community of researchers and facilitating data collection. It aims to overcome linguistic diversity, data scarcity, and limited resources by providing guidance and support, fostering engagement, and enhancing corpus availability.
Are there specific projects or applications that have been advanced by Masakhane?
Chris Emezue: Challenges in African language processing include machine translation, text-to-speech, named entity recognition (NER), sentiment analysis, and part-of-speech tagging. Community projects such as MasakhaNER (https://arxiv.org/abs/2103.11811), MasakhaNEWS (https://arxiv.org/abs/2304.09972), and MasakhaPOS (https://arxiv.org/abs/2305.13989) have contributed to the development of datasets and benchmarks. Other initiatives focus on creating keyboards for easier language input and improving the quality of existing datasets. The development of speech recognition (https://arxiv.org/abs/2103.07762, https://arxiv.org/abs/2207.00688), machine translation (https://arxiv.org/abs/2204.04306, https://aclanthology.org/2022.naacl-main.223.pdf ), African language datasets, and language resources/toolkits are essential objectives. These efforts also support downstream NLP tasks like NER, POS tagging, and question answering.
What impact might Masakhane’s work have on education, economics, and cultural diversity in Africa?
Chris Emezue: Masakhane’s work has had a significant impact on education, and cultural diversity in Africa. By creating datasets and models for African languages and facilitating easy contributions from African researchers, it has lowered barriers to NLP research in Africa. The platform’s online presence, meetups, and workshops have provided opportunities for researchers to contribute to and collaborate on promoting African languages research in NLP. Economically, Masakhane expands language technology markets in Africa and encourages local content creation. Culturally, Masakhane empowers indigenous communities by supporting African languages, fosters cultural diversity and exchange, and promotes inclusion by enhancing understanding and communication in one’s own language. Additionally, it acts as an incubator for research talent, positioning individuals to shape the future and influence developments in the field.
How can people interested in promoting African languages and technologies contribute to Masakhane?
Chris Emezue: To contribute to Masakhane’s mission of promoting African languages and technologies, individuals can take several steps. Firstly, joining the online community (Slack and mailing list) is essential for Capacity and Community Building as Masakhane members. Secondly, individuals can support the work in various ways: Collaboration for resource (datasets/tools) creation and governance for existing models or project-specific funding, and language activism and Policy work by individuals working for governments and NGOs to advocate on behalf of Masakhane. The key is to work inclusively and consider African languages as integral to the process, rather than treating them as separate entities.
What are Masakhane’s plans and goals for the future?
Chris Emezue: Masakhane’s future plans and goals include fostering collaboration on research specific to Africa, mentoring and training young/early career researchers in NLP, expanding language coverage, addressing underrepresented languages and dialectal variations, enhancing language technology applications, and advocating for the significance of African languages in policy discussions.
The project can be viewed during this year’s Ars Electronica Festival. The festival highlights can be found here.
Masakhane (https://www.masakhane.io) (Multinational) is the OPEN RESEARCH, PARTICIPATORY, GRASSROOTS NLP INITIATIVE FOR AFRICANS BY AFRICANS, with the aim of putting African research in NLP on the map, by holistically tackling the problems facing NLP. Founded in 2019, Masakhane has since garnered over 2000 researchers from over 30 African countries, published state-of-the-art research (including a 2021 Wikimedia award of the year) for over 38 African languages at various venues, and has built a thriving community. Our goal is for Africans to shape and own these technological advances towards human dignity, well-being and equity, through inclusive community building, open participatory research and multidisciplinarity.