To reduce wide-area network bandwidth demand and to reduce the load on HTTP servers around the Web, Harvest caches resolve misses through other caches higher in a cache hierarchy, as illustrated in Figure 3. For example, several of the Harvest project members are running caches on their home workstations, configured as children of caches running in laboratories at their universities. Each cache in the hierarchy independently decides whether to fetch the reference from the object's home site or from the Cache or caches above it in the hierarchy.
Figure 3: Hierarchical Cache Arrangement
The cache resolution algorithm also distinguishes parent from neighbor caches. A parent cache is a cache higher up the hierarchy; a neighbor cache is one at the same level in the hierarchy, provided to distribute cache server load. When a cache receives a request for a URL that misses, it uses UDP ``pings'' to try to determine which of the neighbor caches, parent caches, or object home site can satisfy the request most efficiently.
Note that, as the hierarchy deepens, the root caches become responsible for more and more clients. For this reason, we recommend that the hierarchy terminate at the first place in the regional or backbone network where bandwidth is plentiful.
To place your Cache in a hierarchy, use the cache_host variable in the cached.conf to specify the parent and neighbor nodes. For example, the following cached.conf file on littleguy1.usc.edu configures its cache to retrieve data from one parent cache and two neighbor caches:
# # cached.conf - On the host: littleguy1.usc.edu # # Format is: hostname type ascii_port udp_port # cache_host bigserver.usc.edu parent 3128 3130 cache_host littleguy2.usc.edu neighbor 3128 3130 cache_host littleguy3.usc.edu neighbor 3128 3130
Note: earlier Versions of the Cache supported a tcp_port. That is no longer supported starting in Version 1.3.