Nutch & Lucene by Doug Cutting is about Natch open source search engine and Lucene search/indexing framework.
Eric Hatcher gave me a few links on Lucene and document parsing:
[1] http://jakarta.apache.org/poi/
[2] http://www.pdfbox.org/
[3] http://www.textmining.org
[4] http://jakarta.apache.org/lucene/docs/lucene-sandbox/
[5] http://www.etymon.com/ main site, but have to dig to get here:
http://www.etymon.com/pub/software/pj/doc/ where the actual code is
found needed by the sandbox parser.
[6] http://www.brownsite.net/docsearch.htm
[7] http://www.searchblox.com
This is Argyn's blog. I comment on topics of my interests such as software, math, finance, and music. Also, I write about local events in Northern Virginia, USA and all things related to Kazakhstan
Tuesday, February 17, 2004
Subscribe to:
Post Comments (Atom)
No comments:
Post a Comment