The workshop will be divided into two parts: a tutorial where attendees will learn about Natural Language Processing (NLP) for a low resource language, and a practice session where attendees will get to analyze different Moroccan Darija datasets and discuss their findings.

NLP is a field that is in high demand, and where research progresses actively and quickly. Whereas language technology for languages like English and French is highly developed, low-resource languages — like most African indigenous languages — have been marginalized. There are many opportunities to create new tools for languages with a few resources. In this workshop, we take the example of Moroccan Darija, Morocco’s national vernacular. We will present the latest tools, such as transformer-based models, to deal with Moroccan darija, we will also practice different NLP tasks to different Moroccan darija Datasets.

The participants will first learn basic NLP tools to analyze language. In the tutorial, we will go over NLP notions including text pre-processing and tokenization, n-gram language modeling, n-gram frequency and topic modeling. Then, we will introduce language models to perform different tasks such as text classification, dialect detection and sentiment analysis.The tutorial consists of theoretical definitions and concrete examples in Python.

* Intermediate

  • Participants will gain NLP knowledge for a low resource language
  • Participants will learn how to leverage pre-trained, modern NLP models to solve multiple tasks such as text classification, sentiment analysis, and dialect detection
  • Participants will practice by analyzing text data and will present their results and findings in the second part of the workshop.

The workshop is recommended for students who aspire to be data scientists, NLP and/or Machine Learning researchers and practitioners, and people interested in computational linguistics.

* Beginners could refer to this workshop to learn about basics of NLP for a low-resource language

December 22nd, 2021

16h GMT 11h EST

Imane Khaouja

Ihsane Gryech

Khalil Mrini

Abderrahmane Issam

