Web Caching and Replication BOF - Opening the floor to server-server and cache-server protocols, and does the IETF need to standardize protocols? - Keith Moore -- Concerned about the transparent caching problem -- SONAR and mirroring: SONAR keeps a database of mirrored sites, client queries SONAR, SONAR sends back a list, client chooses "closest" site. -- Replication is used in the context of pre-demand distribution vs. caching is used in the context of on-demand distribution - Ingrid Malve, UNINET Norway -- Overview of existing work: problems with HTTP 1.0, help is on the way with HTTP 1.1 -- Client configuration is still a problem, even with the Jscript configuration URL -- Traffic interception is EVIL! --- L4 switches --- Firewall redirectors --- Router redirectors -- Inter Cache Communication --- CARP, WCCP: Clustering and distribution of the client request --- ICP, HTCP, and Cache Digest: Cooperating caches --- HTCP is "HyperText Cache Protocol" -- URN resolution --- There is a lot of work done in the web proxy to do redirection based on the URL. --- Especially popular for FTP traffic. -- What do proxy caches do? They route queries. -- Application level routing is another area of research - Comments: -- There was some suggestions to incorporate proxy caches into the WWW MIB, but it was subsequently rejected. -- Caches are a limited form of traffic engineering. Most people use caches to reduce latency, and they tend to do both equally well. - Ivan Lovric, France Telecom -- Cache expectations --- Increase performance --- Should be helpful for forthcoming services, such as push and prefetching. -- Three requirements --- Caches must exchange information so that they can locate content locally. --- Push must be supported between caches (sounds like he wanted to support data dissemination) --- Data compression (to reduce bandwidth) -- draft-lovric-icp-ext-00.txt -- ICP: Have an existing protocol, let's extend it. -- "Easily extensible" -- New OP codes and flags for exchaning information and pushing data. -- New mechanisms for compressing data and/or send it over a different protocol. -- Format of a list file for information and commands -- Duane Wessels commented that he wants to discourage extensions to ICP since ICP was an ad-hoc protocol at the time. He doesn't want to see people adding extensions to it. - Joe Touch put in a plug for his LSAM research - Duane Wessels, HTCP (Paul Vixie), and Squid -- It's similar to ICP with message-response queries but fixes some of the problems. For example it passes around full HTTP headers in addition to the URL. -- It also supports authentication. -- Age is passed around, so that a request can indicate how old the contents it's willing to accept. -- Paul has implemented HTCP in his product WGI. -- Rudimentary support exists in Squid. -- Duane didn't know if Paul has any plans for pushing HTCP forward as a RFC. -- Discussion ensued that indicates that Paul wanted the BOF to find somewhere to push it forward. -- Look for the 02 HTCP draft. (draft-vixie-htcp-proto-02.txt) -- Brian Carpenter was disturbed that procedure is violated by pure electronic submission, even if Paul has done great things in the past. Apparently the draft is headed for last call, informational RFC. -- Duane went say a few more things about Squid --- Cache digests is one extension they added recently. --- Rudimentary support for URNs. -- Questions: --- Josh Cohen: Does Squid support CARP? Answer: Yes, some rudimentary support exists, but there's no way to discover the CARP array because the configuration is statically described in the squid.conf file. - CacheCaster (Karsten Borman) -- Based on NewsCaster, which offers light-wieght unreliable, multicast, distribution of compressed batches of USENET articles. -- http://www.newscaster.org/ is where the software lives. -- CacheCaster is VaporWare, but to do the same thing for web cache objects. - CARP (Josh Cohen) -- Looking at caching from an enterprise perspective, where an enterprise has a number of caches attached to a firewall. -- Problem is that caches tend to become perfect replicas of each other over time. -- Hash the URL to distribute load across the caches -- CARP is not moving forward as an RFC because MS and UPenn haven't really been pushing it. -- Dave Karker's work deals with what happens when you add and remove servers. This is the fundamental problem with CARP, because you have the problem case when servers join and leave the CARP cache array. - Keith Moore -- His approach uses DNS to find the "oracle" which leads you to where the content is. -- Decomposes the URL, sends out NAPTR queries to lead you the oracle who has the content "closest" to you. -- See his RCDS Resource Catalog work. - Scott Michel, UCLA, and Adaptive Web Caching -- Self-organizing system of caches -- Caches form federations and do application level routing of URLs between the federations. -- Load-based content dissemination down through the federations to dissipate load away from the origin servers. -- http://irl.cs.ucla.edu/AWC/ - Jim Gettys: All of this work is generally a research problem, but the overwhelming problem is the client autoconfiguration problem. It doesn't matter what the cache-cache and cache-server protocols are, but if the client can't find the cache, caching doesn't matter. - Josh Cohen: MS, Inktomi, and Sun are writing a spec for doing cache discovery via DNS. B Look for WPAD draft submitted by Inktomi. Web site is http://egg.microsoft.com/wpad, mailing list is at majordomo@egg.microsoft.com. - Lixia: Redirecting of requests is a violation of Internet architecture. Packets are delivered to the indicated destination, otherwise it's called 'hijacking'. - Joe Touch: Are we ready to move forward with a working group? Everything here is preliminary and things are moving forward, but it is useful to get together to make sure that RFCs aren't violated. - Keith Moore: Sometimes its better to get to the cache I trust is more important than getting to the "closest" cache. - Lixia: Of course, there's policy which modifies your definition of closest. - Keith Moore: Caching isn't completely divorced from replication. You might want to make the interface look consistent to the client, who doesn't care if things are cached or replicated. - Patrick achieved consensus that we should form a WG and write an architecture and requirements document. - Josh Cohen: Agrees. Will autodiscovery be included? - Copyright problem discussion: -- Want to make sure that the requirements document includes some language about copyright issues. -- HTTP 1.1 solves some of these problems. -- HTTP 1.1 has an odd trust directive, where I trust people to obey my directives and give me feedback about them. -- IETF is the appropriate forum to get these things hammered out. -- Patrick found that there was interest in creating a security scenario document. -- Crypto and caching is another issue that needs to be looked at, assuming that the key was shared and known. -- The Mouse Problem: Legal restrictions on ISPs where every packet is monitored to detect copyright violations. There are some people who don't like caches. -- Ingrid: It costs money to move web pages from one place to another. Therefore, she will do what she has to do to reduce the costs of using bandwidth. Not too concerned about the copyright issue except as necessary. -- Joe Touch: Copyright issues are a red-herring because we don't have copyright cops next to all of the Xerox machines. -- Keith Moore: Not very happy with the copyright mechanisms in place today, but we're doing something good for the consumer. We may need to define an access control mechanism but caches are not the right place to put this mechanism. -- Josh Cohen: We may supply a mechanism but the mechanism isn't morality. -- Xerox PARC digital copyrights and language: Includes a copyright language and protocol negotiation which allow a browser to handle the copyright issues. -- We should put some cycles into trying to address the copyright problem even if we can't solve it completely. -- Another set of problems is what happens if the wrong page is cached (misrepresentation) or what happens when the cache manipulates the page (by adding new or replacing advertisements.) -- Ingrid: Cache hit rates may be small, but it still feels like the right thing to do. -- Gettys: W3C group which is studying web cache hit rates. -- Michah Michaelson: HTTP may not the right protocol to deliver all of these features and application level protocols need to be examined. -- ??: Mechanisms in HTTP NG as independant hooks to support replication. - ??: Australian cache system -- Trying to achieve hit rates of 50-60% using 300-400MB of disk space. -- Currently locked into vendor specific solutions and interested in some IETF standardization effort so as to get some choice. - Mailing list: webrepl[-request]@cs.utk.edu - Want to start the WG: Yes.