Next:
List of Figures
Up:
Harvest User's Manual
Previous:
History of Free
Contents
Acknowledgements
Copyright
Terms of Use
Derivative Works
History of Free Software Status
List of Figures
1 Introduction to Harvest
2 Subsystem Overview
Distributing the Gathering and Brokering Processes
3 Installing the Harvest Software
3.1 Requirements for Harvest Servers
3.1.1 Hardware
3.1.2 Platforms
3.1.3 Software
3.2 Requirements for Harvest Users
3.3 Retrieving and Installing the Harvest Software
3.3.1 Distribution types
3.3.2 Optional Harvest components
3.3.3 User-contributed software
3.3.4 Unpacking a binary distribution
3.4 Building the Source Distribution
Building individual Harvest components
3.5 Installing the Harvest software
3.5.1 Additional installation for the Harvest Broker
Checking the installation for HTTP access
Required modifications to the Broker's CGI programs
Required modifications to your HTTP server
CERN httpd v3.0
NCSA httpd v1.3 or v1.4; Apache httpd v0.8.8
GN HTTP server
Plexus HTTP server
Other HTTP servers
3.6 Upgrading versions of the Harvest software
3.6.1 Upgrading from version 1.3 to version 1.4
3.6.2 Upgrading from version 1.2 to version 1.3
3.6.3 Upgrading from version 1.1 to version 1.2
3.6.4 Upgrading to version 1.1 from version 1.0 or older
3.7 Starting up the system: RunHarvest and related commands
3.8 Harvest team contact information
4 The Gatherer
4.1 Overview
4.2 Basic setup
Gathering News URLs with NNTP
Cleaning out a Gatherer
4.3 RootNode specifications
4.3.1 RootNode filters
4.3.2 Generic Enumeration filter description
4.3.3 Example RootNode configuration
4.3.4 Using extreme values -- ``robots''
4.3.5 Gatherer enumeration vs. candidate selection
4.4 Generating LeafNode/RootNode URLs from a program
4.5 Extracting data for indexing: The Essence summarizing subsystem
4.5.1 Default actions of ``stock'' summarizers
4.5.2 Summarizing SGML data
Location of support files
The SGML to SOIF table
Errors and warnings from the SGML Parser
Creating a summarizer for a new SGML-tagged data type
The SGML-based HTML summarizer
Adding META data to your HTML
Other examples
4.5.3 Summarizer components distribution
Using ``Rainbow'' to summarize MIF and RTF documents
The translation table
4.5.4 Customizing the type recognition, candidate selection, presentation unnesting, and summarizing steps
Customizing the type recognition step
Customizing the candidate selection step
Customizing the presentation unnesting step
Customizing the summarizing step
4.6 Post-Summarizing: Rule-based tuning of object summaries
The Rules file
Rewriting URLs
4.7 Gatherer administration
4.7.1 Setting variables in the Gatherer configuration file
4.7.2 Local file system gathering for reduced CPU load
4.7.3 Gathering from password-protected servers
4.7.4 Controlling access to the Gatherer's database
4.7.5 Periodic gathering and realtime updates
4.7.6 The local disk cache
4.7.7 Incorporating manually generated information into a Gatherer
4.8 Troubleshooting
5 The Broker
5.1 Overview
5.2 Basic setup
5.3 Querying a Broker
Example queries
Query options selected by menus or buttons
Result set presentation
Regular expressions
Default query settings
5.4 Customizing the Broker's Query Result Set
5.4.1 The BrokerQuery.cf configuration file
Defined Variables
List of Definitions
5.4.2 Example BrokerQuery.cf customization file
5.4.3 Integrating your customized configuration file
5.4.4 Displaying SOIF attributes in results
5.5 World Wide Web interface description
HTML files for graphical user interface
CGI programs
Help files for the user
5.6 Administrating a Broker
Deleting unwanted Broker objects
Command-line Administration
5.7 Tuning Glimpse indexing in the Broker
The glimpseserver program
5.8 Using different index/search engines with the Broker
Using WAIS as an indexer
Using Verity as an indexer
Using GRASS as an indexer
5.9 Collector interface description: Collection.conf
5.10 Troubleshooting
6 The Object Cache
6.1 Overview
6.2 Basic setup
6.3 Using the Cache as an httpd accelerator
6.4 Using the Cache's access control
6.5 Using the Cache's remote instrumentation interface
6.6 Setting up WWW clients to use the Cache
6.7 Running a Cache hierarchy
6.8 Using multiple disks with the Cache
6.9 Details of Cache operation
6.9.1 Cache access protocols
6.9.2 Cacheable objects
6.9.3 Unique object naming
6.9.4 Cache consistency
6.9.5 Negative caching and DNS caching
6.9.6 Security and privacy implications
6.9.7 Summary: object caching ``flow chart''
6.10 Meanings of log files
6.11 Troubleshooting
7 The Replicator
7.1 Overview
7.2 Basic setup
CreateReplica usage line
7.3 Customizations
7.4 Distributing the load among replicas
7.5 Troubleshooting
References
A Programs and layout of the installed Harvest software
A.1 $HARVEST_HOME
A.2 $HARVEST_HOME/bin
A.3 $HARVEST_HOME/brokers
A.4 $HARVEST_HOME/cgi-bin
A.5 $HARVEST_HOME/gatherers
A.6 $HARVEST_HOME/lib
A.7 $HARVEST_HOME/lib/broker
A.8 $HARVEST_HOME/lib/cache
A.9 $HARVEST_HOME/lib/gatherer
B The Summary Object Interchange Format (SOIF)
B.1 Formal description of SOIF
B.2 List of common SOIF attribute names
C Gatherer Examples
C.1 Example 1 - A simple Gatherer
C.2 Example 2 - Incorporating manually generated information
C.3 Example 3 - Customizing type recognition and candidate selection
C.4 Example 4 - Customizing type recognition and summarizing
Using regular expressions to summarize a format
Using programs to summarize a format
Running the example
Index
About this document ...
Duane Wessels
Wed Jan 31 23:46:21 PST 1996