The Officially Unofficial minutes

From: Scott Michel (scottm@rush.aero.org)
Date: Thu Sep 17 1998 - 15:29:10 MDT


                      Web Caching and Replication BOF

- Opening the floor to server-server and cache-server protocols, and does
  the IETF need to standardize protocols?

- Keith Moore
  
  -- Concerned about the transparent caching problem
  
  -- SONAR and mirroring: SONAR keeps a database of mirrored sites, client
     queries SONAR, SONAR sends back a list, client chooses "closest" site.

  -- Replication is used in the context of pre-demand distribution
     vs. caching is used in the context of on-demand distribution

- Ingrid Malve, UNINET Norway

  -- Overview of existing work: problems with HTTP 1.0, help is on the way
     with HTTP 1.1

  -- Client configuration is still a problem, even with the Jscript
     configuration URL
  
  -- Traffic interception is EVIL!

     --- L4 switches
     --- Firewall redirectors
     --- Router redirectors

  -- Inter Cache Communication
     
     --- CARP, WCCP: Clustering and distribution of the client request
     --- ICP, HTCP, and Cache Digest: Cooperating caches
     --- HTCP is "HyperText Cache Protocol"

  -- URN resolution

     --- There is a lot of work done in the web proxy to do redirection
         based on the URL.
     --- Especially popular for FTP traffic.

  -- What do proxy caches do? They route queries.
  -- Application level routing is another area of research

- Comments:

  -- There was some suggestions to incorporate proxy caches into the WWW
     MIB, but it was subsequently rejected.
     
  -- Caches are a limited form of traffic engineering. Most people use
     caches to reduce latency, and they tend to do both equally well.

- Ivan Lovric, France Telecom
     
  -- Cache expectations

     --- Increase performance

     --- Should be helpful for forthcoming services, such as push and
         prefetching.

  -- Three requirements

     --- Caches must exchange information so that they can locate content
         locally.

     --- Push must be supported between caches (sounds like he wanted to
         support data dissemination)

     --- Data compression (to reduce bandwidth)

  -- draft-lovric-icp-ext-00.txt

     -- ICP: Have an existing protocol, let's extend it.

     -- "Easily extensible"

     -- New OP codes and flags for exchaning information and pushing data.

     -- New mechanisms for compressing data and/or send it over a different
        protocol.

     -- Format of a list file for information and commands

  -- Duane Wessels commented that he wants to discourage extensions to ICP
     since ICP was an ad-hoc protocol at the time. He doesn't want to see
     people adding extensions to it.

- Joe Touch put in a plug for his LSAM research

- Duane Wessels, HTCP (Paul Vixie), and Squid

  -- It's similar to ICP with message-response queries but fixes some of
     the problems. For example it passes around full HTTP headers in
     addition to the URL.

  -- It also supports authentication.
 
  -- Age is passed around, so that a request can indicate how old the
     contents it's willing to accept.

  -- Paul has implemented HTCP in his product WGI.

  -- Rudimentary support exists in Squid.

  -- Duane didn't know if Paul has any plans for pushing HTCP forward as a
     RFC.

  -- Discussion ensued that indicates that Paul wanted the BOF to find
     somewhere to push it forward.

  -- Look for the 02 HTCP draft. (draft-vixie-htcp-proto-02.txt)

  -- Brian Carpenter was disturbed that procedure is violated by pure
     electronic submission, even if Paul has done great things in the
     past. Apparently the draft is headed for last call, informational RFC.

  -- Duane went say a few more things about Squid

     --- Cache digests is one extension they added recently.
     --- Rudimentary support for URNs.

  -- Questions:

     --- Josh Cohen: Does Squid support CARP? Answer: Yes, some rudimentary
         support exists, but there's no way to discover the CARP array
         because the configuration is statically described in the
         squid.conf file.

- CacheCaster (Karsten Borman)

  -- Based on NewsCaster, which offers light-wieght unreliable, multicast,
     distribution of compressed batches of USENET articles.

  -- http://www.newscaster.org/ is where the software lives.

  -- CacheCaster is VaporWare, but to do the same thing for web cache
     objects.

- CARP (Josh Cohen)

  -- Looking at caching from an enterprise perspective, where an enterprise
     has a number of caches attached to a firewall.

  -- Problem is that caches tend to become perfect replicas of each other
     over time.

  -- Hash the URL to distribute load across the caches

  -- CARP is not moving forward as an RFC because MS and UPenn haven't
     really been pushing it.

  -- Dave Karker's work deals with what happens when you add and remove
     servers. This is the fundamental problem with CARP, because you have
     the problem case when servers join and leave the CARP cache array.

- Keith Moore

  -- His approach uses DNS to find the "oracle" which leads you to where
     the content is.

  -- Decomposes the URL, sends out NAPTR queries to lead you the oracle who
     has the content "closest" to you.

  -- See his RCDS Resource Catalog work.

- Scott Michel, UCLA, and Adaptive Web Caching

  -- Self-organizing system of caches

  -- Caches form federations and do application level routing of URLs
     between the federations.

  -- Load-based content dissemination down through the federations to
     dissipate load away from the origin servers.

  -- http://irl.cs.ucla.edu/AWC/

- Jim Gettys: All of this work is generally a research problem, but the
  overwhelming problem is the client autoconfiguration problem. It doesn't
  matter what the cache-cache and cache-server protocols are, but if the
  client can't find the cache, caching doesn't matter.
  
- Josh Cohen: MS, Inktomi, and Sun are writing a spec for doing cache
  discovery via DNS. B Look for WPAD draft submitted by Inktomi. Web site
  is http://egg.microsoft.com/wpad, mailing list is at
  majordomo@egg.microsoft.com.
  
- Lixia: Redirecting of requests is a violation of Internet
  architecture. Packets are delivered to the indicated destination,
  otherwise it's called 'hijacking'.
  
- Joe Touch: Are we ready to move forward with a working group? Everything
  here is preliminary and things are moving forward, but it is useful to
  get together to make sure that RFCs aren't violated.
  
- Keith Moore: Sometimes its better to get to the cache I trust is more
  important than getting to the "closest" cache.
  
- Lixia: Of course, there's policy which modifies your definition of
  closest.
  
- Keith Moore: Caching isn't completely divorced from replication. You
  might want to make the interface look consistent to the client, who
  doesn't care if things are cached or replicated.
  
- Patrick achieved consensus that we should form a WG and write an
  architecture and requirements document.
  
- Josh Cohen: Agrees. Will autodiscovery be included?
  
- Copyright problem discussion:

  -- Want to make sure that the requirements document includes some
     language about copyright issues.
     
  -- HTTP 1.1 solves some of these problems.
     
  -- HTTP 1.1 has an odd trust directive, where I trust people to obey my
     directives and give me feedback about them.

  -- IETF is the appropriate forum to get these things hammered out.
     
  -- Patrick found that there was interest in creating a security scenario
     document.
     
  -- Crypto and caching is another issue that needs to be looked at,
     assuming that the key was shared and known.

  -- The Mouse Problem: Legal restrictions on ISPs where every packet is
     monitored to detect copyright violations. There are some people who
     don't like caches.

  -- Ingrid: It costs money to move web pages from one place to
     another. Therefore, she will do what she has to do to reduce the costs
     of using bandwidth. Not too concerned about the copyright issue except
     as necessary.

  -- Joe Touch: Copyright issues are a red-herring because we don't have
     copyright cops next to all of the Xerox machines.
     
  -- Keith Moore: Not very happy with the copyright mechanisms in place
     today, but we're doing something good for the consumer. We may need to
     define an access control mechanism but caches are not the right place
     to put this mechanism.

  -- Josh Cohen: We may supply a mechanism but the mechanism isn't
     morality.
     
  -- Xerox PARC digital copyrights and language: Includes a copyright
     language and protocol negotiation which allow a browser to handle the
     copyright issues.

  -- We should put some cycles into trying to address the copyright problem
     even if we can't solve it completely.
     
  -- Another set of problems is what happens if the wrong page is cached
     (misrepresentation) or what happens when the cache manipulates the
     page (by adding new or replacing advertisements.)
     
  -- Ingrid: Cache hit rates may be small, but it still feels like the
     right thing to do.
     
  -- Gettys: W3C group which is studying web cache hit rates.
     
  -- Michah Michaelson: HTTP may not the right protocol to deliver all of
     these features and application level protocols need to be examined.
     
  -- ??: Mechanisms in HTTP NG as independant hooks to support replication.
     
- ??: Australian cache system

  -- Trying to achieve hit rates of 50-60% using 300-400MB of disk space.

  -- Currently locked into vendor specific solutions and interested in some
     IETF standardization effort so as to get some choice.

- Mailing list: webrepl[-request]@cs.utk.edu
     
- Want to start the WG: Yes.



This archive was generated by hypermail 2b29 : Thu Nov 18 2004 - 11:21:25 MST