Re: I-D ACTION:draft-tewari-webi-wcdp-00.txt,.ps

From: Ted Anderson (ota@transarc.com)
Date: Wed Apr 10 2002 - 10:36:14 MDT


Comments on Web Content Distribution Protocol (WCDP) based on
draft-tewari-webi-wcdp-00.txt, by Tewary, Niranajan [sic], and
Ramamurthy dated February 2002.

Looks like Niranjan's name is spelled wrong. There are some non-ascii
characters in the .txt file which appear okay in the postscript version.

In the terminology section, "heartbeat" is described as being sent from
the invalidation server to the client caches. However, neither of these
two entities is defined in this section. I'm guess that these are the
WCDP server and WCDP client, respectively, but maybe this should be
clarified.

Odd that heartbeats travel from server to client, not the more
traditional direction, from client to server. Is there a justification
for this? I suppose the reason is that invalidations are emitted from
the server, so it is the one that wants to determine proactively if an
invalidation might timeout due to a client failure. In the case of
delta consistency, a missing heartbeat indicates that an invalidation
could also have been missed. In the case of strong consistency, the
receipt of the heartbeat effectively extends to the lease granted by the
server to the client. So too many missed heartbeats allows the lease to
expire and invalidates the corresponding cached objects. [ Just
thinking out loud here. ]

Does scalability goal include the ability for WCDP clients to
communicate with very many WCDP servers?

It doesn't seem like the refresh directive avoids any extra messages
compared to simple invalidation. Is it only intended to reduce latency?

The delayed refresh directive would seem to put the onus of spreading
delay on the server which must provide explicit scheduling to the
clients. It would seem better for the refresh directive to provide a
time range which clients should select delays with uniform probability,
at least as an option. Assuming the server knows how many client it has
an how many requests per second it can handle this time range should be
easy to derive. In the absence of such information the probabilistic
mechanism would be much easier to configure than a static value.

I don't understand implicit subscriptions. It sounds like the
administrative step means that each fetch registers a subscription for
the corresponding (individual) object, not a content group.

In explicit subscriptions it says: "send invalidations to objects
belonging" but you probably mean "for objects" not "to objects".

There seems to be some confusion between the WCDP server and the Origin
server. The text seems inconsistent about which one sends invalidation.
Since content groupings are part of metadata they are reported in the
HTTP headers which presumably come from the origin server. Clearly, it
must know about group membership so these groups are not something the
WCDP server can define on its own. Maybe it would be good to clarify
the distinction of roles more clearly or describe the practical limits
to their separation.

In section 4.4 the text says that the differences between "strong and
weak consistencies are not significant...". It seems that the key
difference is that for weak consistency when sending invalidations the
server must wait until all responses are received. For weak it doesn't
need to wait. Maybe this sentence doesn't add anything and can be
removed.

In node-level strong consistency, why do servers have to wait for
clients to retrieve the new content before making it live. This seems
sure to delay the process (especially if the delayed refresh directive
is used) but doesn't improve the consistency. Also in this case, the
client need a special way to get the not-yet-live content from the
origin server. How do they do this?

The force option in the invalidate request seems dubious. What happens
if a client declines to perform the invalidation in response response to
a force, or an update perhaps due to a network outage. What does it
"mean" for the server to specify force? How does force impact the
semantics of the consistency guarantees?

The wave description of propagating invalidations in section 4.6 is
confusing. How does the invalidation wave carry with it the list of
notified clients and how does this influence the pulling of data? Also
we have clients fetching data from WCDP servers, not origin servers
here, but I thought that was only the roll of origin servers. Maybe the
mapping of rolls across levels of the hierarchy need to be better
explained.

Is the organization of an WCDP client/server hierarchy is outside the
scope of this protocol? It would seem that NTP-like stratum mechanism
could be used to to self-organize CDN intermediates into a tree. This
would allow automatic mechanisms to construct the hierarchy, which
should be much easier to maintain and define than a static
configuration.

Explicit consistency is typically much weaker than strong consistency,
unless the HTTP cache directives are all set appropriately (i.e. "do not
cache"). Thus it would seem that falling from strong back to explicit
consistency rules during a server outage would be unsafe. Maybe
reverting to uncached (as while waiting for a commit) would be safer for
content groups that desire strong consistency?

In section 4.7 it sounds like servers don't detect failed clients via
heartbeat timeouts. Is the assumption that the transport mechanism
won't notify the server of a failure to deliver a heartbeat request too
strong? This means of failure detection would help reduce the latency
of invalidations when some clients have failed. For servers with many
clients, or at the root of large hierarchies, this will likely be a
crucial optimization.

In handling failures of refresh directive, this section says that the
client will try another WCDP server. But that won't help if the origin
server is down, because all WCDP server share the same Origin server. I
guess just don't understand the Origin/WCDP server distinction. Also I
thought the refresh directive is basically a performance optimization,
but it sounds from this paragraph (i.e. a failure to refresh caused the
invalidation to fail) that there are semantic implications the affect
the consistency guarantees. I don't understand what those are.

I am sorry these comments are so late in coming, but I hope they are of
some use.

Ted Anderson



This archive was generated by hypermail 2b29 : Thu Nov 18 2004 - 11:23:01 MST