[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]
Re: [ossig] BM word list - release under what license?
Imran William Smith <iwsmith@mimos.my> writes:
> take several hundred reasonable quality BM online news articles, documents
> etc into open office, as a single document. that way there will be a low
> level of spelling errors in the source documents, but their will be some.
> in case of query, we will record which urls we got these documents from.
>
> add a new custom dictionary in open office.
>
> add every BM word that is flagged as a spelling mistake to the custom
> dictionary, manually referring to a paper dictionary in case of query.
OUCH! Why not:
for $a in <article list> ; do
cat $a >> all_articles ; done
cat all_articles | sort | uniq > uniq_articles
comm -23 uniq_articles some_english_word_list > bm_words_only
Sudah! Then sanity-check by hand - remove numbers, lookup words,
speellcheck, etc.
--
% You are in a maze of twisty passages, all alike.
Christopher DeMarco
cdemarco@fastmail.fm
+6013 389 5658
------------------------------------------------------------
To unsubscribe: send mail to ossig-request@mncc.com.my
with "unsubscribe ossig" in the body of the message