Changes in version 1.8.3: * Merged with Harvest 1.9.12. * Added Dutch, French, Italian and Swedish user interface. * Added Microsoft PowerPoint summarizer. * Compile fixes for GCC 3.3.1. * Added support for URLs enclosed in single quotes. Changes in version 1.8.2: * Updated GDBM to 1.8.3. * Harvest compiles under Cygwin. (Michael Schlenker) * Fixed bogus result cache hit reported by Sutapa Ranjan. * Added Russian query pages. (Andrew Malashevich) Changes in version 1.8.1: * Added Spanish user interface. (Javier Masa Marin, Harald Weinreich) Added spanish query and result page. Changes in version 1.8.0: * glimpseindex is now a shell script which calls glimpseindex.bin to create index. This enables to use additional commands before or after creating index. * Charcter set for result pages are now configurable. (Harald Weinreich) Changes in version 1.7.37: * Fixed configure bug in gdbm. configure failed in gdbm when some environment variables like CC or CFLAGS were set. This is fixed. * URI attribute added to all SOIF. * Added German localization for the Broker. (Harald Weinreich) * Added filter support for Broker. (Harald Weinreich) See query-glimpse.html for details how to use the filter. Changes in version 1.7.36: * Improved rtf summarizer. (Dmitry Potapov) rtf2html now supports title and href tags. Changes in version 1.7.35: * Fixed search result paging. The url pointing to other pages of the search result contained spaces causing problems with some browsers. Changes in version 1.7.34: * Decreased latency for searches. (Harald Weinreich) search.cgi now stores search results in a single temporary file without any formatting. This makes displaying the first result page much faster. Consecutive pages are created on the fly by reading and parsing only the requested part of the temporary file. Changes in version 1.7.33: * Updated documentation. Changes in version 1.7.32: * Improved RTF summarizer. (Harald Weinreich) rtf2html doesn't put full name of the temporary html file into title which led to bogus hits for "tmp", "gatherer", etc. * Improved default HTML summarizer. HTML-lax.sum removes multiple whitespaces and empty lines from full-text. * New user interface. (Javier Masa Marin, Harald Weinreich) Major improvement of the user interface. Changes in version 1.7.31: * New user interface. (Harald Weinreich) Changes in version 1.7.30: * catdoc will be installed under Harvest's directory hierarchy even if no --prefix were given during configure stage. Default prefix for catdoc was not /usr/local/harvest. This caused catdoc to be installed under /usr/local instead of /usr/local/harvest. This is fixed. Changes in version 1.7.29: * Fixed file:/// url handling. File urls with empty hostname were detected as invalid urls. This is fixed. Changes in version 1.7.28: * Changed default setting not to summarize nested types like tar and zip archive. Current implementation of essence lets the gdbm file grow too much to be useful. * Improved Pdf summarizer. (Harald Weinreich) pdftotext from xpdf package creates huge text files for some pdf files, making essence running out of memory. A workaround for this problem has been added to pdf summarizer. * Changed RTF summarizer to use rtf2html. GNU unrtf creates "pictnnn.pict" files despite "--nopict" flag. * Cosmetic changes in result set presentation. * Added summarizer for Microsoft Excel files and modified summarizer for Microsoft Word to use catdoc. (Harald Weinreich) * Merged catdoc into Harvest distribution. Changes in version 1.7.27: * Fixed bug in essence. (Jason Downs) Essence sometimes died when writing into gdbm file due to a buffer overflow. This caused httpenum process eating up all CPU time without doing anything. This is fixed. Changes in version 1.7.26: * Improved user interface. (Harald Weinreich) Changes in version 1.7.25: * Fixed attribute searches. (Harald Weinreich) Changes in version 1.7.24: * If-Modified-Since gathering is now a configuration option. To enable this feature, add a line like this to your gatherer's configuration file: HTTP-If-Modified-Since: Yes * Fixed misceallaneous variable initialization with wrong type. Changes in version 1.7.23: * Fixed IMS Gathering. Harvest's HTTP Gatherers, httpenum-breadth and httpenum-depth now supports "If-Modified-Since" gathering. They can send "Last-Modified" header and won't retrieve the HTML page if the server answers with "304 Not Modified". This should speed up the gathering, in most cases. To use this feature, point the environment variable HARVEST_GATHERER_DBS to the directory containing PRODUCTION.gdbm. For example, you might want to add following line to your RunGatherer script: export HARVEST_GATHERER_DBS=/usr/local/harvest/gatherers/MY_Gatherer/data * Merged HTML4 support for SGML based HTML summarizer from Leonhard Knauff. Changes in version 1.7.22: * Added workaround for gathering from wu-ftpd 2.6.x servers. * The current default HTML summarizer HTML-lax.sum behaves more like the other two HTML summarizers. (Harald Weinreich) * Fixed coredump in httpenum-breadth when using Local-Mapping. (Guido Kerkewitz) Changes in version 1.7.21: * Don't print additional empty result page when number of hits modulo objects per page is 0. * Print link to search page on every result page. * Fixed epoch rollover bug in search.cgi. Temporary files created after epoch rollover are not deleted from the temporary directory. If you are using the stock paging algorithm, check your $HARVEST_HOME/tmp directory and clean it up if necessary. * Enabled code to shrink "WORKING.gdbm" file while gathering. GDBM doesn't shrink database file when entries are deleted. Even though GDBM tries to reuse the deleted space, the database file will keep growing with many deletes. Calling gdbm_reorganize() will shrink the database file. To control how often gdbm_reorganize() should be called, use the Gatherer configuration option: Essence-Options: --max-deletions n The default is n = 0, which means not to shrink at all, n = 10000 means to shrinkt every 10000 deletions. If your "WORKING.gdbm" file grows too much, try some different values for n. Changes in version 1.7.20: * Javascript and https are not logged as unknown URLs. * Fixed temporary file leak in NEWS enumerator. Changes in version 1.7.19: * Fixed Local-Mapping in default http enumerator (httpenum-breadth). * Fixed essence, not to unlink temporary files twice. * Removed error message from CreateBroker when external which (as opposed to internal builtin which in bash and tcsh) can't find wais. Changes in version 1.7.18: * Fixed file leak in local disc cache. Changes in version 1.7.17: * Harvest compiles on FreeBSD. * It is now possible to build in objdir != srcdir. Changes in version 1.7.16: * The value in $HARVEST_MAX_LOCAL_CACHE is now the maximum local cache size in MB instead of Bytes. * Documentation updates. Changes in version 1.7.15: * No user visible changes. Changes in version 1.7.14: * ZQuery is now included in contrib directory of Harvest distribution. Changes in version 1.7.13: * Documentation updates. Changes in version 1.7.12: * Perl scripts now use localtime() instead of ctime.pl. Changes in version 1.7.11: * BrokerStats is now included in contrib directory of Harvest distribution. Changes in version 1.7.10: * Documentation is now included in Harvest distribution. Changes in version 1.7.9: * The default HTML summarizer (HTML-lax.sum.c) now creates attribute names in mixed cases instead of lower case, e.g. full-text became Full-Text. Changes in version 1.7.8: * Fixed broker bug introduced in 1.5.18, which prevented gathering from brokers. The broker should be able to export to and import from any versions of Harvest. Changes in version 1.7.7: * Gatherer doesn't gunzip All-Templates for gathering. This saves some cpu cycles and much space in $TMPDIR. * Default enumeration method is breadth first. When not gathering everything from a site but limit the number of URLs, this should give a more accurate overview of the site. * Broker now uses 256 directories to store SOIF objects. If you start this version with data from earlier versions of Harvest, the Broker will create the additional directories but will complain about errors in the registry. To fix this problem, stop the broker, do "make realclean" in $HARVEST_HOME/brokers/YOUR_BROKER directory and restart the broker. * Memory usage for glimpseindex is not compiled in, but made configurable in broker.conf. Edit your $HARVEST_HOME/brokers/admin/broker.conf and change the line "GlimpseIndex-Flags -n" to "GlimpseIndex-Flags -n -M 50" or whatever amount of memory you are willing to give to glimpseindex. With "-M 50", glimpseindex will use 50MB plus some MB of RAM. Changes in version 1.7.6: * Added uudecode and fixed unshar. Changes in version 1.7.5: * Support for bzip2 compressed files and tar archives. Changes in version 1.7.4: * Gatherer bug introduced in 1.7.3 which caused deletion of files when using "Local-Mapping" feature was fixed. Changes in version 1.7.3: * Gatherer bugs fixed. Unnecessary temporary files will be cleaned up immediately afer processing. Changes in version 1.7.2: * C Summarizer uses Darren Hiebert's Exuberant ctags by default. * Pdf added to the list of files to gather by default. * Added RTF summarizer using rtf2html by Chuck Shotton and Dmitry Potapov. * Bugfixes for dvi and rfc summarizers. Changes in version 1.7.1: * Sort search results by relevance works now. Changes in version 1.7.0: * Started cleaning up the tree. * Results display is now paged. Changes in version 1.6.1: * Fixes for mispackaged 1.6.0. Changes in version 1.6.0: * No user visible changes. Changes in version 1.6.pre0: * Minor documentation changes. Changes in version 1.5.20-kj-0.10: * No user visible changes. Changes in version 1.5.20-kj-0.9: * Updated glimpse to 4.12.6. Changes in version 1.5.20-kj-0.8: * Updated dvi2tty to 5.3, modified some html files. Changes in version 1.5.20-kj-0.7: * Internal cosmetic changes to various html files. Changes in version 1.5.20-kj-0.6: * Should now build on systems without regex library. Changes in version 1.5.20-kj-0.5: * Bugfixes in HTML-sum.pl. Changes in version 1.5.20-kj-0.4: * Default summarizer is now HTML-lax.sum. This should speed up gathering and indexing up to four times. Changes in version 1.5.20-kj-0.3: * Harvest can create brokers with swish as default indexer. Changes in version 1.5.20-kj-0.2: * Bugfixes for CreateBroker. Changes in version 1.5.20-kj-0.1: * newsget.pl should work with any news server now. * Updated gdbm from 1.7.3 to 1.8.0. * make realclean in broker directory will delete any auto generated data. * glimpseindex now uses 20MB of RAM instead of 10MB. * Switched from acrobat to xpdf for summarizing PDF. * HTML-sum.pl now default summarizer for HTML.