Changes in version 1.9.16: * Added OpenOffice document summarizer. (Steve Slaven) * Updated to qdbm 1.8.20, yaz 2.0.30, idzebra 1.3.18. Changes in version 1.9.15: * zquery.pl displays time used for completing searches. zquery.pl uses Time::HiRes module to measure search time. For Perl versions prior to 5.8, Time::HiRes module has to be installed. * Location of search page is now "/Harvest/brokers/". The new query interface is now location independent and can be placed anywhere. * Updated to qdbm 1.7.14 and idzebra 1.3.14. Changes in version 1.9.14: * Added a location indepent template gatherer. Creating new gatherer can be done by copying the template gatherer $HARVEST_HOME/gatherers/template to anywhere on disk and editing gather.cf. * Improved broker administration cgi script. The broker administration script was renamed to brokerAdmin.cgi. It is now location independent and doesn't rely on html file created by $HARVEST_HOME/bin/Harvest. * Improved Zebra integration. (Adam Dickmeiss) Added xsoif.abs. * Improved Zebra based User interface. zquery.pl starts to display the results before it actually has an answer from Zebra server, which should make it feel faster. Summaries of search results are truncated at a whitespace end with "..." when necessary. Query terms are highlighted in search results. Added query tips, error messages and debugging aids. Changes in version 1.9.13: * Improved soif2xml.pl. soif2xml.pl uses a smarter approach to keep the XML tree in sync with the SOIF tree. It also should create fewer invalid XML files. * Improved zquery.pl. * Essence quits immediately when it can't compile quick-sum.cf. Essence crashed when a broken regular expression was used to summarize a candidate with essence's interneal regex based summarizer, causing loss of all data gathered until the crash. The new behaviour should avoid data loss and make it easier to write quick-sum.cf. * Broker doesn't eliminate duplicate data from a gatherer based on MD5. (Suggested by jwaite@ti.com) Former behaviour of the broker was to avoid duplicate documents. This was done by comparing MD5s of documents. If a gatherer sent summaries of two to identical documents "a.html" and "b.html", only one of them was kept in the broker. This behaviour was changed. It is now possible to have identical documents under different names. * Updated to curl 7.10.8. Changes in version 1.9.12: * Added Russian translation of Harvest User's Manual. (Andrei Malashevich) * Improved PowerPoint summarizer. * Improved Zebra integration. Changes in version 1.9.11: * Added Dutch localization of the user interface. (Rick Jansen) * Added support for URLs enclosed in singlequotes. (Dima Suliman) * Compile fixes for GCC 3.3.1. * Updated to idzebra 1.3.13, yaz 2.0.4 and curl 7.0.5. Changes in version 1.9.10: * Finished French translation of the user interface. (Ellen Reitnauer) * Added Microsoft Powerpoint summarizer. * Updated to idzebra 1.3.11 and yaz 2.0.1. * Started transition from SOIF to XML. Changes in version 1.9.9: * Updated to qdbm 1.4.7. Harvest now uses the curia api of qdbm which enables to split up the database into a number of files instead of having a single file. It is now possible to have a database larger than 2GB on some 32BIT systems. * Added curl 7.10.4. With curl, it is now possible to gather ftp urls from sites which use wu-ftpd ftp server. Changes in version 1.9.8: * Updated to qdbm 1.4.5. * Updated to idzebra 1.3.10. Changes in version 1.9.7: * Updated to qdbm 1.4.1. * Updated to idzebra 1.3.9. * Added french user interface. * Improved integration of zebra. The index for zebrasrv will be updated when glimpseindex updates its index and zebrasrv will start when the broker starts. The basic query page for the zebra fulltext engine is located at: http://your.server/Harvest/cgi-bin/zquery.pl A more detailed search page is available at: http://your.server/Harvest/brokers/zquery/index.html Changes in version 1.9.6: * Updated to yaz 2.0 and idzebra 1.3.7. * zquery.pl now uses zoomsh instead of yclient. * Changed default setting not to build WAIS support. * Updated to qdbm 1.4.0. Changes in version 1.9.5: * Switched from GDBM 1.8.3 to QDBM 1.3.2. Changes in version 1.9.4: * Cosmetic changes in user interface. (Harald Weinreich) * Portability fixes for OpenBSD. (Harald Weinreich) Changes in version 1.9.3: * Swedish user interface. (Anders Wandahl) Changes in version 1.9.2: * Minor changes for FreeBSD. (Tsuguru Kato) * Italian user interface. (Roccon Graziano) Changes in version 1.9.1: * Updated to yaz-1.9.2 and idzebra-1.3.4. * Added flags to query pages. (Harald Weinreich) Changes in version 1.9.0: * Added Indexdata's Zebra. Changes in version 1.8.2: * Updated GDBM to 1.8.3. * Harvest compiles under Cygwin. (Michael Schlenker) * Fixed bogus result cache hit reported by Sutapa Ranjan. * Added Russian query pages. (Andrei Malashevich) Changes in version 1.8.1: * Added Spanish user interface. (Javier Masa Marin, Harald Weinreich) Added spanish query and result page. Changes in version 1.8.0: * glimpseindex is now a shell script which calls glimpseindex.bin to create index. This enables to use additional commands before or after creating index. * Charcter set for result pages are now configurable. (Harald Weinreich) Changes in version 1.7.37: * Fixed configure bug in gdbm. configure failed in gdbm when some environment variables like CC or CFLAGS were set. This is fixed. * URI attribute added to all SOIF. * Added German localization for the Broker. (Harald Weinreich) * Added filter support for Broker. (Harald Weinreich) See query-glimpse.html for details how to use the filter. Changes in version 1.7.36: * Improved rtf summarizer. (Dmitry Potapov) rtf2html now supports title and href tags. Changes in version 1.7.35: * Fixed search result paging. The url pointing to other pages of the search result contained spaces causing problems with some browsers. Changes in version 1.7.34: * Decreased latency for searches. (Harald Weinreich) search.cgi now stores search results in a single temporary file without any formatting. This makes displaying the first result page much faster. Consecutive pages are created on the fly by reading and parsing only the requested part of the temporary file. Changes in version 1.7.33: * Updated documentation. Changes in version 1.7.32: * Improved RTF summarizer. (Harald Weinreich) rtf2html doesn't put full name of the temporary html file into title which led to bogus hits for "tmp", "gatherer", etc. * Improved default HTML summarizer. HTML-lax.sum removes multiple whitespaces and empty lines from full-text. * New user interface. (Javier Masa Marin, Harald Weinreich) Major improvement of the user interface. Changes in version 1.7.31: * New user interface. (Harald Weinreich) Changes in version 1.7.30: * catdoc will be installed under Harvest's directory hierarchy even if no --prefix were given during configure stage. Default prefix for catdoc was not /usr/local/harvest. This caused catdoc to be installed under /usr/local instead of /usr/local/harvest. This is fixed. Changes in version 1.7.29: * Fixed file:/// url handling. File urls with empty hostname were detected as invalid urls. This is fixed. Changes in version 1.7.28: * Changed default setting not to summarize nested types like tar and zip archive. Current implementation of essence lets the gdbm file grow too much to be useful. * Improved Pdf summarizer. (Harald Weinreich) pdftotext from xpdf package creates huge text files for some pdf files, making essence running out of memory. A workaround for this problem has been added to pdf summarizer. * Changed RTF summarizer to use rtf2html. GNU unrtf creates "pictnnn.pict" files despite "--nopict" flag. * Cosmetic changes in result set presentation. * Added summarizer for Microsoft Excel files and modified summarizer for Microsoft Word to use catdoc. (Harald Weinreich) * Merged catdoc into Harvest distribution. Changes in version 1.7.27: * Fixed bug in essence. (Jason Downs) Essence sometimes died when writing into gdbm file due to a buffer overflow. This caused httpenum process eating up all CPU time without doing anything. This is fixed. Changes in version 1.7.26: * Improved user interface. (Harald Weinreich) Changes in version 1.7.25: * Fixed attribute searches. (Harald Weinreich) Changes in version 1.7.24: * If-Modified-Since gathering is now a configuration option. To enable this feature, add a line like this to your gatherer's configuration file: HTTP-If-Modified-Since: Yes * Fixed misceallaneous variable initialization with wrong type. Changes in version 1.7.23: * Fixed IMS Gathering. Harvest's HTTP Gatherers, httpenum-breadth and httpenum-depth now supports "If-Modified-Since" gathering. They can send "Last-Modified" header and won't retrieve the HTML page if the server answers with "304 Not Modified". This should speed up the gathering, in most cases. To use this feature, point the environment variable HARVEST_GATHERER_DBS to the directory containing PRODUCTION.gdbm. For example, you might want to add following line to your RunGatherer script: export HARVEST_GATHERER_DBS=/usr/local/harvest/gatherers/MY_Gatherer/data * Merged HTML4 support for SGML based HTML summarizer from Leonhard Knauff. Changes in version 1.7.22: * Added workaround for gathering from wu-ftpd 2.6.x servers. * The current default HTML summarizer HTML-lax.sum behaves more like the other two HTML summarizers. (Harald Weinreich) * Fixed coredump in httpenum-breadth when using Local-Mapping. (Guido Kerkewitz) Changes in version 1.7.21: * Don't print additional empty result page when number of hits modulo objects per page is 0. * Print link to search page on every result page. * Fixed epoch rollover bug in search.cgi. Temporary files created after epoch rollover are not deleted from the temporary directory. If you are using the stock paging algorithm, check your $HARVEST_HOME/tmp directory and clean it up if necessary. * Enabled code to shrink "WORKING.gdbm" file while gathering. GDBM doesn't shrink database file when entries are deleted. Even though GDBM tries to reuse the deleted space, the database file will keep growing with many deletes. Calling gdbm_reorganize() will shrink the database file. To control how often gdbm_reorganize() should be called, use the Gatherer configuration option: Essence-Options: --max-deletions n The default is n = 0, which means not to shrink at all, n = 10000 means to shrinkt every 10000 deletions. If your "WORKING.gdbm" file grows too much, try some different values for n. Changes in version 1.7.20: * Javascript and https are not logged as unknown URLs. * Fixed temporary file leak in NEWS enumerator. Changes in version 1.7.19: * Fixed Local-Mapping in default http enumerator (httpenum-breadth). * Fixed essence, not to unlink temporary files twice. * Removed error message from CreateBroker when external which (as opposed to internal builtin which in bash and tcsh) can't find wais. Changes in version 1.7.18: * Fixed file leak in local disc cache. Changes in version 1.7.17: * Harvest compiles on FreeBSD. * It is now possible to build in objdir != srcdir. Changes in version 1.7.16: * The value in $HARVEST_MAX_LOCAL_CACHE is now the maximum local cache size in MB instead of Bytes. * Documentation updates. Changes in version 1.7.15: * No user visible changes. Changes in version 1.7.14: * ZQuery is now included in contrib directory of Harvest distribution. Changes in version 1.7.13: * Documentation updates. Changes in version 1.7.12: * Perl scripts now use localtime() instead of ctime.pl. Changes in version 1.7.11: * BrokerStats is now included in contrib directory of Harvest distribution. Changes in version 1.7.10: * Documentation is now included in Harvest distribution. Changes in version 1.7.9: * The default HTML summarizer (HTML-lax.sum.c) now creates attribute names in mixed cases instead of lower case, e.g. full-text became Full-Text. Changes in version 1.7.8: * Fixed broker bug introduced in 1.5.18, which prevented gathering from brokers. The broker should be able to export to and import from any versions of Harvest. Changes in version 1.7.7: * Gatherer doesn't gunzip All-Templates for gathering. This saves some cpu cycles and much space in $TMPDIR. * Default enumeration method is breadth first. When not gathering everything from a site but limit the number of URLs, this should give a more accurate overview of the site. * Broker now uses 256 directories to store SOIF objects. If you start this version with data from earlier versions of Harvest, the Broker will create the additional directories but will complain about errors in the registry. To fix this problem, stop the broker, do "make realclean" in $HARVEST_HOME/brokers/YOUR_BROKER directory and restart the broker. * Memory usage for glimpseindex is not compiled in, but made configurable in broker.conf. Edit your $HARVEST_HOME/brokers/admin/broker.conf and change the line "GlimpseIndex-Flags -n" to "GlimpseIndex-Flags -n -M 50" or whatever amount of memory you are willing to give to glimpseindex. With "-M 50", glimpseindex will use 50MB plus some MB of RAM. Changes in version 1.7.6: * Added uudecode and fixed unshar. Changes in version 1.7.5: * Support for bzip2 compressed files and tar archives. Changes in version 1.7.4: * Gatherer bug introduced in 1.7.3 which caused deletion of files when using "Local-Mapping" feature was fixed. Changes in version 1.7.3: * Gatherer bugs fixed. Unnecessary temporary files will be cleaned up immediately afer processing. Changes in version 1.7.2: * C Summarizer uses Darren Hiebert's Exuberant ctags by default. * Pdf added to the list of files to gather by default. * Added RTF summarizer using rtf2html by Chuck Shotton and Dmitry Potapov. * Bugfixes for dvi and rfc summarizers. Changes in version 1.7.1: * Sort search results by relevance works now. Changes in version 1.7.0: * Started cleaning up the tree. * Results display is now paged. Changes in version 1.6.1: * Fixes for mispackaged 1.6.0. Changes in version 1.6.0: * No user visible changes. Changes in version 1.6.pre0: * Minor documentation changes. Changes in version 1.5.20-kj-0.10: * No user visible changes. Changes in version 1.5.20-kj-0.9: * Updated glimpse to 4.12.6. Changes in version 1.5.20-kj-0.8: * Updated dvi2tty to 5.3, modified some html files. Changes in version 1.5.20-kj-0.7: * Internal cosmetic changes to various html files. Changes in version 1.5.20-kj-0.6: * Should now build on systems without regex library. Changes in version 1.5.20-kj-0.5: * Bugfixes in HTML-sum.pl. Changes in version 1.5.20-kj-0.4: * Default summarizer is now HTML-lax.sum. This should speed up gathering and indexing up to four times. Changes in version 1.5.20-kj-0.3: * Harvest can create brokers with swish as default indexer. Changes in version 1.5.20-kj-0.2: * Bugfixes for CreateBroker. Changes in version 1.5.20-kj-0.1: * newsget.pl should work with any news server now. * Updated gdbm from 1.7.3 to 1.8.0. * make realclean in broker directory will delete any auto generated data. * glimpseindex now uses 20MB of RAM instead of 10MB. * Switched from acrobat to xpdf for summarizing PDF. * HTML-sum.pl now default summarizer for HTML.