Re: Taxonomy draft, draft-melve-wrec-taxonomy-00.txt

From: Ian Cooper (ian@mirror-image.com)
Date: Wed Jun 16 1999 - 10:50:43 MDT


I've made some more comments on the terminology section, which I'm
including below. Apologies to John Dilley if I haven't attributed him
in all the places where I've included/modified his previous set of
comments.

Ian
----------------------------------------------------------------------
# 2. Terminology

# Where possible, existing definitions [5, 6] have been used in this
# document. Additional terminology has been agreed upon and defined in
# this document. All of the terminology used in this document is
# considered to be standardized with respect to IETF WREC working group
# RFCs.

# In this document a number of terms are used to refer to the roles
# played by participants in, and objects of, the HTTP communication.
# The following definitions are used in the HTTP/1.1 specification [6]
# :

The terms can also be split into 3 general areas - hence there should
probably be three separate lists:

1) General terms (including those taken from HTTP/1.1)

2) Caching device topology terms

3) Cache-to-device communication terms [Not too keen on that
wording, but it's a start]

# client
# An application program that establishes connections for the
# purpose of sending requests.

# user agent
# The client which initiates a request. These are often
# browsers, editors, spiders (web-traversing robots), or
# other end user tools.

# server
# An application program that accepts connections in order to
# service requests by sending back responses. Any given
# program may be capable of being both a client and a server;
# our use of these terms refers only to the role being
# performed by the program for a particular connection,
# rather than to the program's capabilities in
# general. Likewise, any server may act as an origin server,
# proxy, gateway, or tunnel, switching behavior based on the
# nature of each request.

# origin server
# The server on which a given resource resides or is to be
# created.

# proxy
# An intermediary program which acts as both a server and a
# client for the purpose of making requests on behalf of
# other clients. Requests are serviced internally or by
# passing them, with possible translation, on to other
# servers. A proxy must interpret and, if necessary,
# rewrite a request message before forwarding it. Proxies are
# often used as client-side portals through network firewalls
# and as helper applications for handling requests via
# protocols not implemented by the user agent.

I go with John Dilley's comments on using the terminology
"intermediary system".

# tunnel
# An intermediary program which is acting as a blind relay
# between two connections. Once active, a tunnel is not
# considered a party to the HTTP communication, though the
# tunnel may have been initiated by an HTTP request. The
# tunnel ceases to exist when both ends of the relayed
# connections are closed.

Again, agree with John Dilley's comments.

# cache
# A program's local store of response messages and the
# subsystem that controls its message storage, retrieval, and
# deletion. A cache stores cacheable responses in order to
# reduce the response time and network bandwidth consumption
# on future, equivalent requests. Any client or server may
# include a cache, though a cache cannot be used by a server
# while it is acting as a tunnel.

You have introduced the term "cacheable", but there is no definition
for this in the rest of the document. Is it appropriate to try and
come up with a concise definition?

# caching proxy
# A proxy with a cache, acting as server to the client, and
# client to the server.

Moved from later. You probably need to watch the ordering - does it
need to be alphabetical, or an order in which you can follow things
through in a logical progression?

# cache mesh
# a set of cooperating caching servers

[Beginning 2nd section of terminology now]

# To these definitions we add definitions specifically concerned with
# Web cache systems:

# local Web cache server
# caching server running on the same LAN as a client

I disagree with the use of "LAN" there, and I don't feel that it adds
any useful distinction between a "local" device (whatever local may
mean...) and an upper level device.

I think this section still needs some changes... replace all these
"server"s with the text below? (Also, I think the word "server" is
redundant - the definition of "proxy" indicates that a proxy is both
client and server, and we are dealing with caching proxies.)

# local cache , first level Web cache server
# the Web cache server an end user client connects to
# [Ed note: confusion on usage of "local cache" and
# "user agent cache"]

# upper level Web cache server
# seen from the clients view, all caches participating in the
# caching mesh that are not the clients first level cache are
# upper level caches

# top-level Web cache server
# one or more servers in a hierarchical caching mesh,
# normally few requests are made to other caching servers
# from the top level, serves first level Web cache servers

... In the order in which caches are accessed for any given request:

        user-agent cache
               the cache held within the user-agent program

        local caching proxy
               the caching proxy a user-agent connects to

        upper level caching proxy
               seen from the user-agent's view, all caches
               participating in the caching mesh that are not the
               user-agent's local caching proxy.

Is a "top-level" caching proxy needed? I don't see anywhere in the
document that refers to it directly. (The diagrams include them but
don't make any distinction - to my view - between any other upper
level caching proxy and an top level caching proxy.)

The key point about all upper level caching proxies is that they are
aggregation points. The use of "proxy" infers that they have a point
of egress for requests. That means we need to add another term for
alternative (existing/deployed) environments that don't follow this
model:

        central cache
               a centralized server of requests made by local
               and upper level caching proxies, but which does not
               act as a proxy itself

(I know that doesn't explain how to get the content into the central
cache [at Mirror Image we fetch out-of-band from origin server and
local/upper level caching proxies].)

# network element
# router or switch

[That probably belongs in the first section of the terminology.]

# transparent proxying
# Transparency means that the user should not be aware of the
# existence of the proxy.

That text doesn't really convey the correct meaning to me .
Transparent proxying is the art of both getting hold of the traffic
(WCCP, WPAD(?), policy routing, switch redirection, etc.) without the
user having to configure the use of proxy, and of carrying out the
task of being a caching proxy without interfering with typical
interaction.

Traffic interception is then the technology used to grab hold of
traffic (transparently) when it isn't being directed to a caching
proxy but we would like it to be. This can be achieved with a
redirecting network element or an in-path transparent caching proxy
(which I note isn't mentioned yet). Thus (borrowing from John
Dilley):

# traffic interception
# ???

        transparent proxy
               a proxy that intercepts user agent requests without
               the user agent's knowledge, in order to proxy the user
               agent's requests

That doesn't consider WPAD, which sits half way between user
configuration (someone (if only the programmer of the user agent) has
to turn on the WPAD functionality) and traffic interception (where an
external system routes the traffic elsewhere.

        traffic interception
               detection and redirection of traffic to another
               system, often to a transparent proxy

Hmm, that doesn't feel precise enough to me... but is it better for
that?

# out-of-path transparent caching proxy
# A transparent caching proxy not in the forwarding path
# between client and server. Used with a redirecting network
# element.

This doesn't consider systems where the traffic being intercepted is
in the path of traffic. That said, do we need to distinguish between
systems that are in- and out-of-path?

# redirecting network element
# A network element which intercepts web traffic and
# redirects it to an out-of-path transparent caching proxy.

If we can arrange a better definition of traffic interception, this
term might be unnecessary?

[3rd part of the terminology - cache-to-device communication terms]

# proxy cluster
# load sharing, tightly coupled

Again, taking from John Dilley:

        proxy cluster
               a tightly coupled set of proxies acting together to
               share load

# proxy mesh
# loosely coupled co-operating proxies

        proxy mesh
               a loosely couples set of co-operating proxies or proxy
               clusters, acting independently but sharing cacheable
               content between themselves

John suggests adding that these are often arranged into hierarchies.
I'm a little concerned about that as it possibly implies the largest
aggregation point has to be a proxy.

# Temporal Domain, sparse working set cache
# collection of caching machines storing temporarily,
# a subset of data sets
# [Ed note: this term is very difficult to capture in a
# concise articulate statement]

I agree with John that this is confusing. If it can't be defined in
a concise statement then it's a good candidate for being defined with
a different term in a different way... John's suggestions seem to
make some good sense:

> Authoritative Reference (The logical owner of the data, maybe on a
> publishing system)

> Full Replica (A complete replica of a data set, typically defined as a
> subtree/.)

Hmm... subtrees? Does that mean subtrees in relation to relative
URLs?

> Partial Replica (A replica of a portion of a content subtree.)

> Partial Cache (this one is harder - differentiating between cache and
> replica gets to the temporal relationship between the content and the
> origin data in the cache... any other ways to clarify this?)

Hmm, isn't there a distinction between caches and replicas... caches
being built as a result of requests from user agents (erk, shows that
"user agent" might be too generic a term) whereas replicas are built
from the agents of the authoritative reference? [I know that isn't
quite the case and is a bit woolly.]

# Persistent Domain
# collection of origin servers maintaining complete
# persistent data sets
# [Ed note: this term is very difficult to capture in a
# concise articulate statement]

# Replica Origin Server
# origin server storing a persistent replica of a data set

# Browser
# A browser is a special instance of an end user client (a
# user agent) that acts as a content presentation device for
# the end user.

# Diffused Arrays
# tightly coupled array of caching proxy servers, acting
# logically as one service and partitioning the URL name
# space across the array

# [Ed note: The section above is incomplete and needs a lot of work.
# Need to use generic terms, seek agreement on terms.]

John suggested adding the following:

> Passive Cache
> Active Cache

What do you mean by those?

> Reverse Proxy Cache

Initially I didn't think that this deserved a special mention, but I
guess that since it's somewhere between a cache and a replica we need
to say something. (Also, we've mentioned that traffic goes from a
user agent, through a proxy, to the origin server - while the origin
server might well be a reverse proxy cache.)

> Cookie

Erk, yes, we need to say something. The reference is, at least,
easy:
     [xx] D. Kristol and L. Montulli, Bell Laboratories, Lucent
     Technologies and Netscape Communications. HTTP State Management
     Mechanism, RFC2109. Available from .....rfc2109.txt

I have a feeling that this will also require the definition of
"cacheable".



This archive was generated by hypermail 2b29 : Thu Nov 18 2004 - 11:21:26 MST