Next
Previous Contents
Harvest FAQ
Kang-Jin Lee
lee@arco.de
2003-11-08
Harvest frequently asked questions (FAQ) with answers
1.
Harvest
1.1 What is Harvest?
1.2 Where can I get more information about Harvest?
1.3 Where can I download Harvest?
1.4 Are there any information about Harvest in Russian?
1.5 What is Harvest-ng?
1.6 What is the copyright status of Harvest?
1.7 Which Operating System do I need to run Harvest?
1.8 Does Harvest run under Windows NT/2000/XP?
1.9 What Hardware do I need to use Harvest?
1.10 Which version of Harvest should I use?
1.11 What are "harvest-modified-by-RL-Stajsic", "harvest-MathNet", and "harvest-1.5.20-kj"?
1.12 What are the limits of Harvest?
1.13 Do I need root access to install and run Harvest?
1.14 How do I block Harvest from my site? How do I identify Harvest?
1.15 What can I do to help?
2.
Building Harvest
2.1 How do I uninstall Harvest?
2.2 Where can I get bison and flex?
2.3 How can I install Harvest in "/my/directory/harvest" instead of "/usr/local/harvest"?
2.4 How can I avoid "syntax error before `regoff_t'" error message when compiling Harvest?
2.5 Where can I get more information for building Harvest on FreeBSD?
3.
Gatherer
3.1 Does the Gatherer support cookies?
3.2 Why doesn't Local-Mapping work?
3.3 Does the Gatherer gather the Root- and LeafNode-URLs periodically?
3.4 Can Harvest gather https URLs?
3.5 When will Harvest be able to gather https URLs?
3.6 Does Harvest support client based scripting/plugin like Javascript, Flash?
3.7 Why does the gatherer stop after gathering few pages?
3.8 How can I index local newsgroups? How can I put hostname into News URL?
3.9 What do the gatherer options "Search=Breadth" and "Search=Depth" do and which keywords are available for "Search=" option?
3.10 How can I index html pages generated by cgi scripts? How can I index URLs which has a "?" (question mark) in it?
3.11 Why is the gatherer so slow? How can I make it faster?
3.12 Why is the gatherer still so slow?
3.13 How do I request "304 Not Modified" answers from HTTP servers?
3.14 Why does Harvest gather different URLs between gatherings?
3.15 Why has the Gatherer's database vanished after gathering?
3.16 How can I avoid GDBM files growing very big during Gathering?
3.17 Can I use Htdig as Gatherer? Can the Broker import data from Htdig?
3.18 How can I control access to Gatherer's database?
3.19 Does Harvest's Gatherer support WAP/WML, Gnutella, Napster?
3.20 How do I gather ftp URLs from wu-ftp daemons?
3.21 Why doesn't file URLs in LeafNodes work as expected?
3.22 Why does gathering from a site fail completely or for parts of the site?
4.
Summarizer
4.1 Why doesn't Post-Summarizing work?
4.2 How can I summarize meta tags in HTML documents?
4.3 Why are raw HTML tags in some query results?
4.4 How can I summarize DVI files?
4.5 How can I summarize Pdf files?
4.6 Where can I get pdftotext?
4.7 How can I improve summarizer for Microsoft Word files?
4.8 Where can I get wvWare?
4.9 How can I add support for new file type?
4.10 How can I use nsgmls instead of sgmls to summarize documents?
5.
Broker
5.1 How can I start a Broker at boot time?
5.2 How can I start a Broker without starting a collection?
5.3 Why don't the documents which I have gathered right now show up in the Broker?
5.4 Why do I get error messages when I try to access "http://some.host/Harvest/brokers/your-broker-path/" after running $HARVEST_HOME/RunHarvest?
5.5 Why are NEWS URLs broken? Where are the hostnames in NEWS URLs? How can I follow NEWS URLs?
5.6 Why don't I get any results if I use a long or complex query string?
5.7 Can I use wildcards in attribute value for structured queries?
5.8 Are the attribute names case sensitive?
5.9 Why doesn't collecting from broker work?
5.10 How can I customize the Harvest user interface?
5.11 How do I localize/translate user interface?
5.12 How can I replace the bundled Glimpse with an other version of Glimpse?
6.
Terms
6.1 What is a Gatherer?
6.2 What is Local-Mapping?
6.3 What is a Summarizer?
6.4 What is a Broker?
7.
Miscellaneous
7.1 Who are the maintainers of Harvest?
7.2 I have found a bug. What should I do?
7.3 Is there a mailinglist for Harvest? What about a newsgroup?
Next
Previous Contents