moyogo's little blog

Blogging about Open Source, fonts, language technology, maps and random stuff happening wherever I am.

2005-12-04

Building resources for lingala

There are a few things that need to be done to do any kind of language technology with a language. Even more so if this is a minority language.


Lingala has attracted my attention for a while, for many reasons, the first one being that I use to understand it while living in the Congo. Unfortunately, by the time we settled in Belgium I had forgotten most of it, and was reluctant to learn it when in Kinshasa back in the 80's. Until a few months ago I had very little interest in the details of the language, until I notice there wasn't much material available in Lingala online. I started looking for reasons, actually read the introduction of the dictionary I head, and found out the orthography is much more complex that what people actually use. In only a few weeks I made a keyboard layout to be able to type with that orthography.


Jump a few months forward, and there I am typing the 400 page Lingala dictionary experts have advised me to use. I've already build the complete list of entries of the book, but now, I'm actually getting all the definitions and the few sentences from here and there. I've also got sentences and stories from a learning book, text from translations of legal texts, religious texts and the declarations of human rights.


“All this for what?” you might ask. Well, it's really simple. I want to be able to learn the language and to write it. To make that task easier I have to complicate somethings. I'm currently working on a word list to use for a spellchecker. Since I already have the "correctly" spelled words from the Lingala dictionary, I can use that with aspell. Unfortunately that's only a tiny part. I have to inflect (conjugate) all the verbs and add those to the list. That is not an easy part. Lingala is a very tricky language when it comes to verbs. There's about 21 tenses with possible contractions, 6 persons with one that can take different forms in "classical" Lingala, and 6 modes (passive, etc.). So conjugating one verbs in all forms is kind of tricky indeed. This is only one part of it, I'll also have to adjust the spellchecker so it knows what kind of behaviour spelling mistakes have. Eventually, I'd like to see a good spell checker and a grammar checker for Lingala. If I can do this quickly I'd also like to look into some assisted, or even automated, translation, or even some speech stuff. But that's a whole different story, I would need much more data and I'm not sure where I can get that. Maybe we need to find better ways to bootstrap language and speech technology.

2 Comments:

At 8/1/06 7:10 AM , Blogger pat said...

Verbix -- Bantu. Conjugate verbs in 50+ languages

Alas, no Lingala. :-/

 
At 18/1/06 5:34 AM , Blogger Congogirl said...

Do you know of any good Lingala learning resources?

 

Post a Comment

Subscribe to Post Comments [Atom]

<< Home