Thursday, March 19, 2009


Apache Tika 0.3 released

Apache Tika, a subproject of Apache Lucene, is a toolkit for detecting and extracting metadata and structured text content from various documents using existing parser libraries.

Chris Mattmann just announced that the release of version 0.3 is official.

Go grab yourself a copy from a mirror nearby. Tika is also available through the central maven repository.

There is also an article about Tika and Solr Cell at Lucid Imagination web site.

Labels: , , ,

Wednesday, March 18, 2009


Hadoop the easy edition

Cloudera has put together a nice looking configurator for Apache Hadoop. (see video)

They also offer yum repository to install RPMified version of Hadoop manageable as a standard Linux service together with local documentation and man pages.

All of this is of course available under a commercial friendly Apache License.