The ability to communicate and be understood in one’s own language is fundamental to digital and societal inclusion. Natural language processing techniques have enabled critical artificial intelligence applications that facilitate digital inclusion and improvements in numerous fields, including: education, finance, healthcare, agriculture, communication, and disaster response, among others.
Many advances in both fundamental and applied NLP have stemmed from openly licensed and publicly available datasets. However, such open, publicly available machine learning datasets are scarce to non-existent for many African languages, and this means the benefits of NLP are not accessible to speakers of these languages.
Sign Up Now for more grant funding opportunities
Where relevant datasets do exist, they are often based on religious, missionary, or judiciary texts, leading to outmoded language and bias. There is a need for openly accessible text, speech, and other datasets to facilitate breakthroughs based on NLP technologies for African languages.
African Natural Language Processing Grant Funding
Lacuna Fund Expressions of Interest seeks qualified organizations to develop open and accessible training and evaluation datasets for ML applications for NLP in sub-Saharan Africa. Especially datasets that would create significant impact regardless of the number of speakers of the included language, as well as the need for multi-lingual datasets.
- EOIs may include, but not limited to:
- Collecting and/or annotating new data;
- Annotating or releasing existing data;
- Augmentation of existing datasets in all areas to decrease bias
- Creating small, higher-quality benchmark data for NLP.
The datasets should enable better execution of core NLP tasks in African languages, as well as the assessment of systems performance in African languages, including:
- Speech corpora, including for applications that allow illiterate or otherwise underprivileged groups to access technology tools, information, and/or services.
- Labeled text corpora for use as training or benchmark evaluation data, including parallel corpora for machine translation or corpora to support other fundamental or downstream NLP tasks.
- Unlabeled text corpora for language models that support multiple avenues of research or application.
- Datasets related to code-switched text or speech that improve the performance of NLP tasks in such situations.
- Domain-specific creation or augmentation of text and speech datasets, such as digit datasets, place names, or specific word pairs or sentences, that enable applications with significant social impact.
- Multimodal and other innovative datasets, such as video or audio captioning or other image-text interactions.
The total Lacuna Fund pool available is approximately $900,000 USD with proposed budgets in the range of $10k – 100k for small to medium-sized projects and up to $200k for large, complex projects.
Apply Now! Deadline is December 1, 2021
More Grant Funding Opportunities
Do you want to get advice on how to apply for $4 Billion in USAID funding or get startup investments for technology entrepreneurs?
Then please sign up now to get our email updates. We are constantly publishing new funding opportunities like these:
Yes, very important. Good to you who are to do it