2. Section 2.3 IP Multicast
Two main comments. First, I have tried to re-organize the section to
make the fundamental issues more clear and to separate the "short-term
hack" for application-level reliability to support unreliable IP
multicast from the rest of the discussion (since that is the section
most likely to change or get deleted over time.) Second, I chew on the
application-level reliability discussion a bit more and try to
generalize it to accomodate the Yu et. al approach.
<<<<< BEGIN proposed update to 3.2.3 >>>>>>>
3.2.3 IP multicast
In this mode, an IP multicast group is allocated for the
invalidation channel and its address is advertised as part of the
channel information. The invalidation client subscribes to this
multicast group to receive cache invalidations and/or object
updates. Ideally, it's a single-source multicast group, meaning that
the invalidation client subscribes to the sender and group address
pair <S, G>, where S is the invalidation server address and G is the
multicast group address.
IP multicast removes the scalability concern at the invalidation
server in that the invalidation server now only needs to send one
copy of any message. Plus, it doesn't maintain per-client state. A
multicast invalidation channel is much more efficient than unicast-
based cache consistency schemes.
Worth noting, however, is that anything the invalidation server
sends to the invalidation channel goes to every
subscriber. Therefore, objects covered by a multicast invalidation
channel need to be correlated so that if an invalidation client is
interested in or has cached some objects of the channel, it is
highly likely that it will cache the other ones. For example, CNNfn
top stories should belong to one channel while ESPN top stories
belong to another.
There are two issues in implementing WCIP over ip multicast: (1)
establishing synchronization and (2) retaining synchronization
depite packet loss
(1) Establishing synchronization
To establish synchronization, the client must ensure that all
cached objects' versions (Last modified times or Etags) match the
most recent versions known to the consistency servers. The standard
unicast approach for establishing synchronization -- the client
sends a Registration message to the server containing the version
numbers of cached objects and the server replies with current
version numbers of those objects (see Section 5.2.1) -- has two
problems. First, it is not scalable in that it can overload the
consistency server. Second, because the servers reply would have to
be sent on the multicast channel (in order to maintain ordering and
reliability constraints), all clients would see all regitration
replies, which would be inefficient.
Therefore, multicast channels should use a server-driven approach to
establish synchronization: the invalidation server periodically
transmits resynchronization data (e.g., a list of objects' current
Etags) to allow clients to re-synchronize their object freshness
state.
The client-driven and server-driven resynchronization protocols are
discussed in more detail in section TBD. <see MDD main comment #3>
(2) Retaining synchronization despite packet loss
To provide consistency guarantees, invalidation/heartbeat message
channels must maintain the invariant: that the invalidation client
must never receive a heartbeat without first receiving all
preceding invalidations sent to it.
This can be done in one of two ways.
First, the system may use a reliable multicast transport protocol
(e.g., PGM, SRM, etc.) by specifying the protocol to be used in the
channel information header (see section 4.1). The invalidation
protocol then proceeds as it does for the reliable unicast case:
clients invalidate objects when they receive invalidation messags,
and they re-register (but using server-driven re-registration as
described above and in section TBD) if the transport layer detects
a lost packet. Note: the WCIP channel information header does not
currently define any headers for reliable multicast protocols.
Second, because no reliable multicast transport protocols are
widely deployed, WCIP provides an application-level reliability
protocol to allow it to run on top of unreliable transports such as
raw IP multicast.
MDD:
OK. I think we may agree on everything up to here. Then the remaining
task is to define how application-level reliability should be done.
After chewing on the specified protocol a bit, I finally understand it
and basically like it. I particularly like the new idea in the draft of
incrementally resynchronizing. In fact, this notion might be worth
generalizing to the unicast reconnection case (see main comment 3 in
notes.11.20.2000b.txt).
The potential disadvantage is that if packet loss --> sync loss,
synchronization losses may be common and this protocol requires work
in proportion to the size of a volume to re-establish
synchronization. I can easily imagine situations where clients spend a
majority of their time unsynchronized. Fundamentally, the other option
is to pay a cost proportional to the number of objects being
invalidated to reduce the probability that synchronization is lost by
retransmitting.
Note that since this only reduces the probability, there still needs
to be a way to establish syncrhonization, and I like your approach for
that.
Now that I finally understand your protocol, however, it appears to me
that it is simple to add this second mode of operation as an optional
optimization. This is attractive for two reasons (1) it lets an
implementation balance the "per-object" overhead against the
"per-invalidation" overhead; I can imagine there are some systems
where one is the dominant factor and others where the other is; (2)
it is general enough to accomodate, say, Yu, Breslau, and Shenker's
protocol (SIGCOMM99).
As always the question is whether the added complexity is worth the
performance benefit. Especially given that this is a "stop gap" until
real reliable multicast comes along... I could be swayed either way on
that. (My instinct is that it if we support unreliable IP multicast at
all, then this is worth adding.)
/MDD
MDD
Protocol from current draft (wording slightly modified; one bug fixed;
optimizations added at the end):
/MDD
Absent an off-the-shelf real-time reliable multicast protocol, WCIP
allows for an unreliable transport protocol with application-level
recovery from failures.
(1) The invalidation server marks invalidation channel messages
with incrementing sequence numbers;
MDD
NOTE: (bug fix I think) heartbeat packets need sequence numbers
too. Otherwise, we can not detect the case when the last invalidation
message before a heartbeat is lost.
/MDD
(2) Whenever the invalidation client sees a sequence number gap, it
considers itself to have lost synchronization with the
channel. The client marks all objects as "no-object-lease"
and reverts to following the normal HTTP Cache-control
directives.
MDD
I don't care what this state is called, but we refer to this state in
a bunch of situations. Rather than saying "revert to following normal
HTTP Cache Control directives" each time, it would be more clear to
describe the state machine for each object and then just say "set all
objects to state X". See main comment 3 in notes.11.20.2000b.txt
/MDD
(3) The invalidation client resynchronizes according to the
resynchronization protocol specified for the channel
Typically, this is server-driven resynchonization in which the
server periodically transmits the current version numbers
of objects. Note that resynchronization messages must
also include sequence numbers.
(4) Once the invalidation client resynchronizes the freshness state
of certain objects, it switches those objects from HTTP cache-
control back to WCIP freshness guarantees.
(5) As more re-synchronization messages arrive, the invalidation
client gradually reinstates all its objects back to WCIP
freshness guarantees. In fact, a cache proxy may join the
multicast channel and become gradually synchronized this way
without ever directly contacting the invalidation server via
unicast.
MDD
Here are the new optimizations for discussion
/MDD
A client MAY process or buffer invalidation packets received with
out of order sequence numbers. A client MUST NOT process a
heartbeat until (a) its has processed all preceeding sequence numbers
or (b) it has declared a loss of synchronization and set the state
of all objects in the channel to state "no-object-lease" (see
Section TBD).
An invalidation server MAY send a packet multiple times
to reduce the probability that synchronization is lost.
To reduce the cost of retransmission, an invalidation server MAY
group multiple packets into a single retransmission. In this case,
the sequence number field indicates a range of sequence numbers,
specifying which numbers are included in the packet. Such a packet
MAY include "new" information, in which case the sequence number
range is extended to also include a new sequence number
For example, an invalidation server could include
a range of preceeding invalidations in each heartbeat message to
make sure that invalidation clients "catch up" with missing
messages by the end of a heartbeat interval~\cite{yu99}.
seqno=100 invalidate "foo" Etag=A
seqno=101 invalidate "bar" Etag=z
seqno=102 invalidate "baz" Etag=q
seqno=100-103 invalidate "foo" Etag=A, invalidate "bar"
Etag=z, invalidate "baz" Etag=q, heartbeat
<<<< END proposed update to 3.2.3 >>>>
This archive was generated by hypermail 2b29 : Thu Nov 18 2004 - 11:21:29 MST