[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

Re: [ossig] BM word list - release under what license?



Ditesh Kumar wrote:
As part of MIMOS's assistance with translating OpenOffice
into BM, we are compiling a BM wordlist for use in the
spellchecker.

Hi Imran,

Just wondering - will the collection of the words be manual (eg typed in
by some person) or via some automated system?

As for the licensing - it depends on what sort of software do you want
to encourage in Malaysia - proprietary or open source or both? The
licensing should then reflect the sort of software you wish to see.
My opinion is to allow it to be used in both open and closed source
software but ask that any additions to the dictionary must also be
contributed back or be made available publicly (perhaps to Mimos?) In
other words - the wordlist cannot be kept secret but the source of the
software that uses it can.

there's no way to make a totally automated system, since nobody except
a human can check a word is correct without a word list (chicken and egg).
obviously, we can't simply scan or type in an existing word list (dictionary,
proprietary spelling file etc) or we are guilty of copyright theft.  such
a dictionary counts as a database and databases come under copyright law
(i believe).

so you are proposing something like GFDL, but with no restrictions on
the use of the list itself, only that modifications remain open, right?


OK here's how we will generate the word list:

take several hundred reasonable quality BM online news articles, documents
etc into open office, as a single document.  that way there will be a low
level of spelling errors in the source documents, but their will be some.
in case of query, we will record which urls we got these documents from.

add a new custom dictionary in open office.

add every BM word that is flagged as a spelling mistake to the custom
dictionary, manually referring to a paper dictionary in case of query.


That way we generate our own word list, we have not copied anybody
elses, but we are not 'manually typing them in', however we are still
manually approving them.

After that, we'll have some kind of revision phase where we publish
the word list and if anybody has any doubts about any word, we remove
it.  Our aim is to get a reasonable word list, enough so that most
normal documents will only show very few correct words as possible
spelling errors.  Our aim is not to produce a huge, definitive BM
word list.  But once we have laid the groundwork, the community can,
if it wishes, submit supplemental word lists based on their own
spellchecking experience.


Imran


--
Imran William Smith
Project Manager, Open Source Development,
MIMOS Berhad, Malaysia

Asian Open Source Centre : http://www.asiaosc.org
MIMOS Open Source        : http://opensource.mimos.my



------------------------------------------------------------
To unsubscribe: send mail to ossig-request@mncc.com.my
with "unsubscribe ossig" in the body of the message