Re: wrec round 2

From: Keith Moore (moore@cs.utk.edu)
Date: Sat Sep 26 1998 - 00:25:34 MDT


> - There are clients. There are origin servers. There may exist
> structure between clients and origin servers. Clients shouldn't
> be manually configured to use this intermediate structure,
> otherwise, things like packet hijacking and request hijacking
> become the norm.

autoconfiguration can a Good Thing, and request hijacking is a
Bad Thing, but I think this could be stated better.

Here's a slightly different view as an illustration:

when I tell my client to download a web page, I'm giving it
very specific instructions. whether those instructions come
from a URL that I typed in or one found in an HREF doesn't
matter. a cache/proxy/whatever has no right to circumvent those
instructions unless it can guarantee *exactly* the same results,
because otherwise it has to know what I want better than I do.
so my client shouldn't use a cache/proxy/whatever unless it's
explicitly told to do so.

now OTOH #1, if I can somehow tell my client "it's okay to use the
local cache/proxy/thingy if you can find it", and my client can
figure out where it is, that's a Good Thing. it frees me from
having to type in the config information, and it also frees me from
keeping track of changes, and it can probably adapt to more complex
configuration scenarios. (but if the client somehow acquires the
wrong config info, and the autoconfig system is not very carefully
designed, diagnosing and fixing the problem can be a real bear...)

OTOH #2, the meaning of a URL http://x/y is not necessarily "file
y at host x". Another way to look at it is that the meaning of the
URL is determined by whoever "owns" the URL...whoever that particular
URL was delegated to by the domain owner. If there is some way
for the owner of a URL to advertise, on the net, "http://x/y is
equivalent to the other resources with names a://b/c, and d://e/f",
and my browser can be reasonably certain that the definition really
does come from the owner of http://x/y, then the client can access the
resource I want from an alternate location, maybe using a better protocol
than HTTP (who knows, maybe it's available via multicast, or streaming
video that adapts better to net losses than http), make more effective
use of bandwidth by fetching a nearby copy in preference to a distant
one, maybe even be able to fail over from one resource location to
another in mid-transfer, and still accomplish exactly what I asked it to do.

Or if the owner of http://x/y says "this resource is currently equivalent
to the file with MD5 fingerprint ..... and that file happens to be available
from a nearby cache, then the browser can fetch the file from the cache
and still do exactly what I asked it to do.

If a client is forced/expected to use autoconfiguration,
I really don't see much difference between this and request hijacking.
It is still request hijacking, it's just happening at a different layer.

> - Packet and request hijacking is A Really Evil Thing. So, one goal
> is the design of protocol and mechanisms for getting clients to
> discover this additional structure and use it. It can be as simple
> as discovering an enterprise's egress proxy to discovering Keith's
> "oracle".

in my mind, there are two kinds of structure:

1. structure provided and maintained by the content provider
2. structure provided and maintained by the consumer or by the ISP

and a client may need slightly different mechanisms to discover each.
infrastructure of type #2 could also reference infrastructure of type #1.
(and perhaps use the same discovery mechanism that clients use)

> - There are a lot of caching systems out there, the dominant one
> appears to be Squid (and Harvest). There will be others as well
> (successful PhD research notwithstanding). What kind of metadata
> could or should these caching systems exchange with each other?
> Do we even want caching structures to interoperate with each other?

for the case where consult type #1 structure and find a unique name
or md5 or whatever for the resource you want, it would certainly be
nice to be able to ask a local cache if that resource were available.

beyond that, the usual stuff: expire dates, time-to-live, maybe
some primitive IPR policy beyond what is in HTTP now (e.g. "you can
cache this for your own use but you cannot give this to others outside
your administrative domain")... but that's probably getting out of scope.

> - Structure requires some kind of architecture and requirements doc to
> describe the various entities and their interactions.
>
> As Keith pointed out, "proxy" has some semantic context which is
> too restrictive. This structure encompasses mirrors, caches, proxies,
> tunnels, gateways, etc. RFC 2068 simply calls it 'servers' (1.3,
> Terminology). We need terminology...
>
> - One thing I tried to stay away from is the copyright issue. As Ingrid
> Melve pointed out to me in a private e-mail, there is a per-copy charge
> in the fine country of Norway for xeroxing documents. However, we might
> incorporate some language about this in a security scenarios document.

IPR is certainly one of the justifications for providing alternatives
to consumer-side caching...content providers certainly do have the legal
right to say "you must not cache this file", and there are good reasons why
they might want to say this. (and ISPs presumably have the right to provide
only trickle levels of bandwidth to non-cacheable files). But if the
content-provider has a way of (in effect) "paying for the extra bandwidth"
(by arranging for replicas to be maintained at various net sites) then
they can still reduce the load on the net without giving up control.
(You *really* want this for video-on-demand sites...)

> I don't think we want the scope of wrec (for lack of a better name) to
> have too large a scope. If we can at least come up with a few approaches
> and possible standards-track RFC for client<>"structure" autodiscovery
> I think we'd be able to declare the WG a success on that point alone.

I think we need to go a tad further...not just being able to discover
the "structure" but being able to define the complete process by which
a client or proxy can map a "source URI" onto a list of
"resource location URLs".

Keith

p.s. I still think "webrepl" is a good name since caching is a
subset of the more general term replication ... but "webrec"
is also kind of catchy. (I like the pun with "wreck") OTOH,
"-rec" also sounds like "recreation"



This archive was generated by hypermail 2b29 : Thu Nov 18 2004 - 11:21:25 MST