Harvest: A Distributed Search System

Home | Sites using Harvest | Download | Contributed Code | Todo List | Links
Contributors | User's Manual | FAQ | Installation | ChangeLog | NEWS

About

Harvest is a system to collect information and make them searchable using a web interface. Harvest can collect information on inter- and intranet using http, ftp, nntp as well as local files like data on harddisk, CDROM and file servers. Current list of supported formats in addition to HTML include TeX, DVI, PS, full text, mail, man pages, news, troff, WordPerfect, RTF, Microsoft Word/Excel, SGML, C sources and many more. Stubs for PDF support is included in Harvest and will use Xpdf or Acroread to process PDF files. Adding support for new format is easy due to Harvest's modular design.

Features

Harvest is a modular, distributed search system framework with a working set components to make it a complete search system. The default setup is to be a web search engine, but it is also much more and provides following features:

Example Usage

Copyright

The core of Harvest is licensed under GPL. The components distributed with Harvest are also under GPL or similar license.

Contents of this Site

Home | Sites using Harvest | Download | Contributed Code | Todo List | Links
Contributors | User's Manual | FAQ | Installation | ChangeLog | NEWS