Web Caching and Replication BOF
- Opening the floor to server-server and cache-server protocols, and does
the IETF need to standardize protocols?
- Keith Moore
-- Concerned about the transparent caching problem
-- SONAR and mirroring: SONAR keeps a database of mirrored sites, client
queries SONAR, SONAR sends back a list, client chooses "closest" site.
-- Replication is used in the context of pre-demand distribution
vs. caching is used in the context of on-demand distribution
- Ingrid Malve, UNINET Norway
-- Overview of existing work: problems with HTTP 1.0, help is on the way
with HTTP 1.1
-- Client configuration is still a problem, even with the Jscript
configuration URL
-- Traffic interception is EVIL!
--- L4 switches
--- Firewall redirectors
--- Router redirectors
-- Inter Cache Communication
--- CARP, WCCP: Clustering and distribution of the client request
--- ICP, HTCP, and Cache Digest: Cooperating caches
--- HTCP is "HyperText Cache Protocol"
-- URN resolution
--- There is a lot of work done in the web proxy to do redirection
based on the URL.
--- Especially popular for FTP traffic.
-- What do proxy caches do? They route queries.
-- Application level routing is another area of research
- Comments:
-- There was some suggestions to incorporate proxy caches into the WWW
MIB, but it was subsequently rejected.
-- Caches are a limited form of traffic engineering. Most people use
caches to reduce latency, and they tend to do both equally well.
- Ivan Lovric, France Telecom
-- Cache expectations
--- Increase performance
--- Should be helpful for forthcoming services, such as push and
prefetching.
-- Three requirements
--- Caches must exchange information so that they can locate content
locally.
--- Push must be supported between caches (sounds like he wanted to
support data dissemination)
--- Data compression (to reduce bandwidth)
-- draft-lovric-icp-ext-00.txt
-- ICP: Have an existing protocol, let's extend it.
-- "Easily extensible"
-- New OP codes and flags for exchaning information and pushing data.
-- New mechanisms for compressing data and/or send it over a different
protocol.
-- Format of a list file for information and commands
-- Duane Wessels commented that he wants to discourage extensions to ICP
since ICP was an ad-hoc protocol at the time. He doesn't want to see
people adding extensions to it.
- Joe Touch put in a plug for his LSAM research
- Duane Wessels, HTCP (Paul Vixie), and Squid
-- It's similar to ICP with message-response queries but fixes some of
the problems. For example it passes around full HTTP headers in
addition to the URL.
-- It also supports authentication.
-- Age is passed around, so that a request can indicate how old the
contents it's willing to accept.
-- Paul has implemented HTCP in his product WGI.
-- Rudimentary support exists in Squid.
-- Duane didn't know if Paul has any plans for pushing HTCP forward as a
RFC.
-- Discussion ensued that indicates that Paul wanted the BOF to find
somewhere to push it forward.
-- Look for the 02 HTCP draft. (draft-vixie-htcp-proto-02.txt)
-- Brian Carpenter was disturbed that procedure is violated by pure
electronic submission, even if Paul has done great things in the
past. Apparently the draft is headed for last call, informational RFC.
-- Duane went say a few more things about Squid
--- Cache digests is one extension they added recently.
--- Rudimentary support for URNs.
-- Questions:
--- Josh Cohen: Does Squid support CARP? Answer: Yes, some rudimentary
support exists, but there's no way to discover the CARP array
because the configuration is statically described in the
squid.conf file.
- CacheCaster (Karsten Borman)
-- Based on NewsCaster, which offers light-wieght unreliable, multicast,
distribution of compressed batches of USENET articles.
-- http://www.newscaster.org/ is where the software lives.
-- CacheCaster is VaporWare, but to do the same thing for web cache
objects.
- CARP (Josh Cohen)
-- Looking at caching from an enterprise perspective, where an enterprise
has a number of caches attached to a firewall.
-- Problem is that caches tend to become perfect replicas of each other
over time.
-- Hash the URL to distribute load across the caches
-- CARP is not moving forward as an RFC because MS and UPenn haven't
really been pushing it.
-- Dave Karker's work deals with what happens when you add and remove
servers. This is the fundamental problem with CARP, because you have
the problem case when servers join and leave the CARP cache array.
- Keith Moore
-- His approach uses DNS to find the "oracle" which leads you to where
the content is.
-- Decomposes the URL, sends out NAPTR queries to lead you the oracle who
has the content "closest" to you.
-- See his RCDS Resource Catalog work.
- Scott Michel, UCLA, and Adaptive Web Caching
-- Self-organizing system of caches
-- Caches form federations and do application level routing of URLs
between the federations.
-- Load-based content dissemination down through the federations to
dissipate load away from the origin servers.
-- http://irl.cs.ucla.edu/AWC/
- Jim Gettys: All of this work is generally a research problem, but the
overwhelming problem is the client autoconfiguration problem. It doesn't
matter what the cache-cache and cache-server protocols are, but if the
client can't find the cache, caching doesn't matter.
- Josh Cohen: MS, Inktomi, and Sun are writing a spec for doing cache
discovery via DNS. B Look for WPAD draft submitted by Inktomi. Web site
is http://egg.microsoft.com/wpad, mailing list is at
majordomo@egg.microsoft.com.
- Lixia: Redirecting of requests is a violation of Internet
architecture. Packets are delivered to the indicated destination,
otherwise it's called 'hijacking'.
- Joe Touch: Are we ready to move forward with a working group? Everything
here is preliminary and things are moving forward, but it is useful to
get together to make sure that RFCs aren't violated.
- Keith Moore: Sometimes its better to get to the cache I trust is more
important than getting to the "closest" cache.
- Lixia: Of course, there's policy which modifies your definition of
closest.
- Keith Moore: Caching isn't completely divorced from replication. You
might want to make the interface look consistent to the client, who
doesn't care if things are cached or replicated.
- Patrick achieved consensus that we should form a WG and write an
architecture and requirements document.
- Josh Cohen: Agrees. Will autodiscovery be included?
- Copyright problem discussion:
-- Want to make sure that the requirements document includes some
language about copyright issues.
-- HTTP 1.1 solves some of these problems.
-- HTTP 1.1 has an odd trust directive, where I trust people to obey my
directives and give me feedback about them.
-- IETF is the appropriate forum to get these things hammered out.
-- Patrick found that there was interest in creating a security scenario
document.
-- Crypto and caching is another issue that needs to be looked at,
assuming that the key was shared and known.
-- The Mouse Problem: Legal restrictions on ISPs where every packet is
monitored to detect copyright violations. There are some people who
don't like caches.
-- Ingrid: It costs money to move web pages from one place to
another. Therefore, she will do what she has to do to reduce the costs
of using bandwidth. Not too concerned about the copyright issue except
as necessary.
-- Joe Touch: Copyright issues are a red-herring because we don't have
copyright cops next to all of the Xerox machines.
-- Keith Moore: Not very happy with the copyright mechanisms in place
today, but we're doing something good for the consumer. We may need to
define an access control mechanism but caches are not the right place
to put this mechanism.
-- Josh Cohen: We may supply a mechanism but the mechanism isn't
morality.
-- Xerox PARC digital copyrights and language: Includes a copyright
language and protocol negotiation which allow a browser to handle the
copyright issues.
-- We should put some cycles into trying to address the copyright problem
even if we can't solve it completely.
-- Another set of problems is what happens if the wrong page is cached
(misrepresentation) or what happens when the cache manipulates the
page (by adding new or replacing advertisements.)
-- Ingrid: Cache hit rates may be small, but it still feels like the
right thing to do.
-- Gettys: W3C group which is studying web cache hit rates.
-- Michah Michaelson: HTTP may not the right protocol to deliver all of
these features and application level protocols need to be examined.
-- ??: Mechanisms in HTTP NG as independant hooks to support replication.
- ??: Australian cache system
-- Trying to achieve hit rates of 50-60% using 300-400MB of disk space.
-- Currently locked into vendor specific solutions and interested in some
IETF standardization effort so as to get some choice.
- Mailing list: webrepl[-request]@cs.utk.edu
- Want to start the WG: Yes.
This archive was generated by hypermail 2b29 : Thu Nov 18 2004 - 11:21:25 MST