[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]
Re: [ossig] BM word list - release under what license?
> Sudah! Then sanity-check by hand - remove numbers, lookup words,
> speellcheck, etc.
>
spellcheck? how? that's the whole point. we don't have a spellchecker
with an open word list, we are the ones creating the word list. if we
use a proprietary spellchecker to do the spellchecking, then there could
be allegations that we have built our open word list using a proprietary
word list.
also i suspect your method would create large amounts of words
that need removing. i think it's better that the wordlist we create has
some omissions, rather than misspelled words slipping into the list.
that's why we are building the world list with a 'manually include'
rather than 'manually exclude' policy.
there's no need to use 'comm, sort, uniq etc' in this case. as soon
as we accept a word as correct, all future occurrences of that word are
marked as correct in the document - openoffice is doing the 'uniq' for
us.
imran
Christopher DeMarco wrote:
Imran William Smith <iwsmith@mimos.my> writes:
take several hundred reasonable quality BM online news articles, documents
etc into open office, as a single document. that way there will be a low
level of spelling errors in the source documents, but their will be some.
in case of query, we will record which urls we got these documents from.
add a new custom dictionary in open office.
add every BM word that is flagged as a spelling mistake to the custom
dictionary, manually referring to a paper dictionary in case of query.
OUCH! Why not:
for $a in <article list> ; do
cat $a >> all_articles ; done
cat all_articles | sort | uniq > uniq_articles
comm -23 uniq_articles some_english_word_list > bm_words_only
Sudah! Then sanity-check by hand - remove numbers, lookup words,
speellcheck, etc.
--
Imran William Smith
Project Manager, Open Source Development,
MIMOS Berhad, Malaysia
Asian Open Source Centre : http://www.asiaosc.org
MIMOS Open Source : http://opensource.mimos.my
------------------------------------------------------------
To unsubscribe: send mail to ossig-request@mncc.com.my
with "unsubscribe ossig" in the body of the message