Next Previous Contents

1. Harvest

1.1 What is Harvest?

Harvest is a system to collect information and make them searchable using a web interface. Harvest can collect information on inter- and intranet using http, ftp, nntp as well as local files like data on harddisk, CDROM and file servers. Current list of supported formats in addition to HTML include TeX, DVI, PS, full text, mail, man pages, news, troff, WordPerfect, RTF, Microsoft Word/Excel, SGML, C sources and many more. Stubs for PDF support is included in Harvest and will use Xpdf or Acroread to process PDF files. Adding support for new format is easy due to Harvest's modular design.

1.2 Where can I get more information about Harvest?

See Harvest homepage http://harvest.sourceforge.net/ for informations about Harvest.

1.3 Where can I download Harvest?

Harvest is available for download at Harvest download page http://prdownloads.sourceforge.net/harvest/.

1.4 Are there any information about Harvest in Russian?

Andrei Malashevich has translated the Harvest User's Manual to Russian. It is available at his Harvest User's Manual page at http://baby.chg.ru/manual_harvest/.

1.5 What is Harvest-ng?

Harvest-ng is a reimplementation of Harvest's gatherer by Simon Wilkinson. You can get more info about Harvest-ng at Harvest-ng homepage http://webharvest.sourceforge.net/ng/.

1.6 What is the copyright status of Harvest?

The core of Harvest located in src directory is under GPL. Additional components, located in components directory are under GPL or similar copyright.

1.7 Which Operating System do I need to run Harvest?

Harvest should run on any *nix like platforms including FreeBSD, Linux and Solaris.

1.8 Does Harvest run under Windows NT/2000/XP?

Michael Schlenker has ported Harvest to Windows platforms using Cygwin http://sources.redhat.com/cygwin/.

1.9 What Hardware do I need to use Harvest?

A Pentium 120MHz with 64MB RAM should achieve reasonable performance for around 350 MB of fulltext data in ca. 20.000 objects. A Pentium 650MHz with 256MB RAM should be able to handle around 1.5 GB of fulltext data in ca. 100.000 objects.

1.10 Which version of Harvest should I use?

1.11 What are "harvest-modified-by-RL-Stajsic", "harvest-MathNet", and "harvest-1.5.20-kj"?

After the original authors ceased working on Harvest, there were some periods where Harvest was unmaintained. During this time there were following forked versions of Harvest:

All these forked trees were merged into Harvest 1.6.

1.12 What are the limits of Harvest?

1.13 Do I need root access to install and run Harvest?

For initial setup, you must be able to modify the webserver configuration and to schedule cron jobs. After the initial setup, it is recommended to run Harvest as a different user for security reasons.

1.14 How do I block Harvest from my site? How do I identify Harvest?

Put a line like this to your robots.txt:

        User-agent: Harvest
        Disallow: /

1.15 What can I do to help?

There are many ways to help depending your skills and time you want to contribute to improve Harvest:


Next Previous Contents