moyogo's little blog

Blogging about Open Source, fonts, language technology, maps and random stuff happening wherever I am.

2005-12-17

referendum in DRC

Ali has a short series about the referendum taking place this sunday in the Congo DRC (first, second and third); very interesting read. If you don't know the context : the Congo DRC has been in a process for democratization since the early nineties, held aside by a couple of civil wars which ended up in a transition governement with the different componants. The transition started in 2003 and is still nothing close to democracy (nobody has been elected).

Radio Okapi had a great series of shows the last few weeks talking with people for and people against the proposed constitution that would lead to democratic elections in june 2006.

I personally do not like some parts of the constituion. First of all the (stupid) ‘exclusive’ citizenship. Then the possibility of the governement to give up sovereignty on some areas for the good of the ‘african union’ (wazat mean?). And most of all, one of the last article, that says the (non democratic) institutions of the transition will hold until the new ones are in place. This is utterly silly. They will take as much time as they can. Since there is no limit set, I can't imagine how many years they will take. The only mention of time limit, is 36 months after the new institutions have been set up; and that's only about the 10 provinces becoming 26, therefore allowing provinces to shares power with the central governement.

Now if politicians turn out to be of good will, *cough*, then voting yes is probably the only, ... er... best option.

2005-12-04

Building resources for lingala

There are a few things that need to be done to do any kind of language technology with a language. Even more so if this is a minority language.


Lingala has attracted my attention for a while, for many reasons, the first one being that I use to understand it while living in the Congo. Unfortunately, by the time we settled in Belgium I had forgotten most of it, and was reluctant to learn it when in Kinshasa back in the 80's. Until a few months ago I had very little interest in the details of the language, until I notice there wasn't much material available in Lingala online. I started looking for reasons, actually read the introduction of the dictionary I head, and found out the orthography is much more complex that what people actually use. In only a few weeks I made a keyboard layout to be able to type with that orthography.


Jump a few months forward, and there I am typing the 400 page Lingala dictionary experts have advised me to use. I've already build the complete list of entries of the book, but now, I'm actually getting all the definitions and the few sentences from here and there. I've also got sentences and stories from a learning book, text from translations of legal texts, religious texts and the declarations of human rights.


“All this for what?” you might ask. Well, it's really simple. I want to be able to learn the language and to write it. To make that task easier I have to complicate somethings. I'm currently working on a word list to use for a spellchecker. Since I already have the "correctly" spelled words from the Lingala dictionary, I can use that with aspell. Unfortunately that's only a tiny part. I have to inflect (conjugate) all the verbs and add those to the list. That is not an easy part. Lingala is a very tricky language when it comes to verbs. There's about 21 tenses with possible contractions, 6 persons with one that can take different forms in "classical" Lingala, and 6 modes (passive, etc.). So conjugating one verbs in all forms is kind of tricky indeed. This is only one part of it, I'll also have to adjust the spellchecker so it knows what kind of behaviour spelling mistakes have. Eventually, I'd like to see a good spell checker and a grammar checker for Lingala. If I can do this quickly I'd also like to look into some assisted, or even automated, translation, or even some speech stuff. But that's a whole different story, I would need much more data and I'm not sure where I can get that. Maybe we need to find better ways to bootstrap language and speech technology.