I can see that I am not really helping this group towards consenus on
terminology, and I may be introducing some points of view which are not
intiutive. I can only say that I have spent a bit of time trying to make
semantic sense out of service replication, and this is what I came up with.
There is clearly a lot of disagreement on the use of the terms "cache" and
"mirror". Some people want a "push cache" to be a cache, others see it as
a kind of mirror. Our intuitions are opposite - something has to give!
I would say that wget is a cache, just a somewhat manual one. It's arguable,
but at least with my suggestion there is a clear way of deciding. The notion
that mirroring creates a "new site" while caching is fine grained and
transparent is a historical one. I2-DSI is trying to create an automated,
transparent mirror. Since we copy source objects, I don't think anyone would
want to call it a cache.
Most of our terminology comes from this: If it's a hacked form of Squid with
a new policy then it's a kind of a cache. If it's done using rdist or
some similar (manually configured) file copying mechanism then it's a mirror.
My terminology suggestion matches this - things like wget don't quite fit in
because they are neither of these. Sorry.
I question the notion of static vs. active content as a useful distinction.
In my model, an HTTP server is a kind of interpreter just like a Perl
interpreter, but HTML is a declarative rather than a procedural language.
This takes account of processing of HREFS on the way out as well as server
side includes. In this view, every sevice request is processed by finding
a file (object) and passing it though an interpreter (method). Perl scripts
are just a more complex programming model and API than HTML or JPEG.
We actually use the term "API" when to talk about the rules for writing
HTML which can, for example, be mirrored on I2-DSI.
Even if you don't agree with this picture, at least you can see that it has
some regularity to it? All Web content is in some sense "source" which is
interpreted, although for some objects the interpretation is just the identity
function. Then there is a notion of source and of output, although output
can itself sometimes be treated as source (if it meets the source API) - this
is recursion! A Web service request is then just a kind of remote method
invocation. The languages are just simplified and so easier to use than other
programming systems.
Bill Maggs asks:
> Do you see any difference between say, having an Altavista search engine
> replica in Sweden and having only a user interface there to a US-based
> engine ?
> would you consider both of them the same 'service replicas' ?
I would consider both of these "service replicas." Clearly, service
replication is a very broad, so it's usefullness might be questioned.
However, consider some examples:
1. a cache may sometimes store and sometimes pass requests through
based on its internal policy ("we only cache .edu") for instance.
These are both examples of service replication.
2. A resolution mechanism might consider three ways to get the same service:
one the origin server, from a "true copy" (mirror or cache) and from an
interface to a fast private network which gets results from the US. All are
service replicas, and any one can be chosen.
3. Is distributed update of a database mirroring or caching? Probably neither,
but a different, distributed form of service replication.
You have to pardon me - my background is one which leads me to look for
abstract generalizations which unify things which most people consider very
different. I have done work on unifying functional and imperative programming
languages, and in some ways this is very similar: whether a value is stored or
passed on a wire are just two ways of getting the same thing.
I feel that the work of WREC is one which requires bringing out
the duality between trasmitting a value and retrieving it from storage.
Also, the similarities between the Web and other remote invocation mechanisms
should be brought out in order to avoid reinventing too many concepts.
Otherwise it's just a process of documenting the particular choices which
have been made in current systems (which is also important, but difficult to
make a coherent framework for).
I wouldn't presume to push my semantically tortured view on this group. There
is a danger that the community wouldn't accept it anyway. If you understand
my view, then probably you can take from it whatever you find valuable.
I just hope that WREC does not adopt terminology which makes I2-DSI very
difficult to describe, since it violates many people's assumptions about what
mirroring systems can and cannot do, and ultimately hopes to unify mirroring,
caching, and other service replication approaches.
/micah
This archive was generated by hypermail 2b29 : Thu Nov 18 2004 - 11:21:26 MST