A Gatherer is a system that retrieves documents from various sources (Web-, News-, FTP-server, local files) for processing. In HTML/HTTP context, it is also often called crawler, robot, or spider.
To reduce the CPU load and speed up Gathering, Harvest can map local files to URLs. The gatherer can bypass the server and use local file, while pretending that the objects were gatherered as usual to the rest of the Harvest system.
A Summarizer transforms a document into a form which is more suitable for fulltext searching.
The HTML summarizer for example, extracts the title of a document, removes all HTML tags, generates a wordlist, etc.
A Broker processes search requests received from a user by a cgi-script and presents the search results.