[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

Re: [ossig] BM word list - release under what license?



> Sudah!   Then  sanity-check  by  hand  - remove  numbers,  lookup  words,
> speellcheck, etc.
>


spellcheck?  how?  that's the whole point.  we don't have a spellchecker
with an open word list, we are the ones creating the word list.  if we
use a proprietary spellchecker to do the spellchecking, then there could
be allegations that we have built our open word list using a proprietary
word list.

also i suspect your method would create large amounts of words
that need removing.  i think it's better that the wordlist we create has
some omissions, rather than misspelled words slipping into the list.

that's why we are building the world list with a 'manually include'
rather than 'manually exclude' policy.

there's no need to use 'comm, sort, uniq etc' in this case.  as soon
as we accept a word as correct, all future occurrences of that word are
marked as correct in the document - openoffice is doing the 'uniq' for
us.

imran


Christopher DeMarco wrote:
Imran William Smith <iwsmith@mimos.my> writes:



take several hundred reasonable quality BM online news articles, documents
etc into open office, as a single document.  that way there will be a low
level of spelling errors in the source documents, but their will be some.
in case of query, we will record which urls we got these documents from.

add a new custom dictionary in open office.

add every BM word that is flagged as a spelling mistake to the custom
dictionary, manually referring to a paper dictionary in case of query.

OUCH!  Why not:

	for $a in <article list> ; do
	        cat $a >> all_articles ; done
	cat all_articles | sort | uniq > uniq_articles
	comm -23 uniq_articles some_english_word_list > bm_words_only

Sudah!   Then  sanity-check  by  hand  - remove  numbers,  lookup  words,
speellcheck, etc.




--
Imran William Smith
Project Manager, Open Source Development,
MIMOS Berhad, Malaysia

Asian Open Source Centre : http://www.asiaosc.org
MIMOS Open Source        : http://opensource.mimos.my



------------------------------------------------------------
To unsubscribe: send mail to ossig-request@mncc.com.my
with "unsubscribe ossig" in the body of the message