[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

Re: [ossig] BM word list - release under what license?



Imran William Smith <iwsmith@mimos.my> writes:


> take several hundred reasonable quality BM online news articles, documents
> etc into open office, as a single document.  that way there will be a low
> level of spelling errors in the source documents, but their will be some.
> in case of query, we will record which urls we got these documents from.
> 
> add a new custom dictionary in open office.
> 
> add every BM word that is flagged as a spelling mistake to the custom
> dictionary, manually referring to a paper dictionary in case of query.

OUCH!  Why not:

	for $a in <article list> ; do
	        cat $a >> all_articles ; done
	cat all_articles | sort | uniq > uniq_articles
	comm -23 uniq_articles some_english_word_list > bm_words_only

Sudah!   Then  sanity-check  by  hand  - remove  numbers,  lookup  words,
speellcheck, etc.



-- 
% You are in a maze of twisty passages, all alike.
  Christopher DeMarco
  cdemarco@fastmail.fm
  +6013 389 5658

------------------------------------------------------------
To unsubscribe: send mail to ossig-request@mncc.com.my
with "unsubscribe ossig" in the body of the message