Lots of comments, some editorial, some that are issues I think the
group needs to address. In general I'd say there isn't enough
introductory material in many of the sections - we need to justify
some of the background before diving into plain requirements. Also,
the numbered lists don't seem to work, in part because they've
created very long paragraphs (you can use <vspace> to break them if
you need to Dan).
The concepts of synchronization and notification groups needs to be
laid out and explained in much more detail.
(Note: In (belated, apologies) response to comments at the London
IETF Mark and I are setting up an issues list to try to help things
move along so that we and the group are aware of what still needs to
be worked on. If you have issues that you think need to be addressed
please let us know either on the list or privately. We're working on
a way of getting the list/status available to all (possibly a cron'ed
mailing to the list), but right now need to get it populated :)
Depending on the volume of issues we may provide a template, but in
the meantime please be intelligent in what you send (e.g. title,
section number, description).)
>1. Introduction
>
> Cache invalidation and cache coherence protocols enable cooperation
> among content servers and web intermediaries by eliminating the round
> trips in a per-request cache validation model using HTTP if-modified-
> since directive.
"... using HTTP conditional request directives (e.g. If-Modified-Since)."
> One of the main problems addressed here is the fact that the current
> practices proved the existing HTTP cache control mechanisms
> unsatisfactory. Existing mechanisms are based on the assumption that
> the content server knows the document expiration time at the moment
> when this document is delivered. This assumption in many cases is
> not correct, and we need a new mechanism for server-driven updates.
I don't think this is the most useful way of phrasing the issue, and
believe this relates to Joe Touch's comments at the Minneapolis IETF.
The point seems to be that a very great number of content providers
want the ability to publish new material and ensure that everyone
sees that new content upon the next request.
The rationale behind RUP is, surely, the ability for content
providers to allow caching of their content (to "help" the network)
but to have the benefits of "no-cache" or immediate expiration within
user agents.
We also need to be careful about using the term "document";
"resource" would be the correct term, or "entity" if resource is
implying too much.
> A number of cache coherence or cache invalidation protocols have been
> proposed by the research community and the caching and content
> distribution industry. Approaches vary, with some proponents seeking
> to enhance existing protocols and others developing new protocols
> either specifically for this purpose or which include this
> functionality. Examples include WCIP [1], PSI [2] and DOCP [3].
It might be useful to give those examples in context with the
previous sentence, identifying which protocols are enhancements which
are new etc.
> A carefully developed mechanism for the communication of information
> about changes to Internet resources offers the potential for other
> functions above and beyond invalidation of cached objects. Resource
> updates may also be an appropriate way of informing systems which
> generate content dynamically that the underlying data which they
> manipulate (e.g. to produce HTML pages) has changed. While RUP may
> facilitate these functions in the future, they are currently outside
> of RUP scope.
Perhaps "... they are currently outside the scope of consideration in
this requirements document."
> The IETF's Web Intermediaries working group (WEBI) has been chartered
> to develop a Protocol based on requirements to be gathered here. The
> main goal of a RUP protocol is to improve cache coherence compared to
> HTTP's client-polling protocol in order to (a) provide stronger
> guarantees at a similar cost or (b) provide similar gurantees at a
> lower cost.
"The goal of RUP is to improve cache coherence over HTTP's
conditional request polling technique in order to ...."
The first sentence is unnecessary.
> For the reasons described above, we will refer to an abstract
> Resource Update Protocol (RUP, or simply 'the protocol') whose
> functionality will initially be limited to simple invalidation of
> cached objects, as a basis for future extensions to richer
> functionality. However, general "meta data" is allowed, and
> accommodated as concrete payload types and arbitrary payload options
> (see the section on "notification and extensibility"). Note that RUP
> is at least conceptually a new protocol, but may in practice be based
> wholly or partly on existing protocols.
I think this paragraph goes beyond the scope of a requirements
document - you've certainly got one statement in there saying that
metadata is allowed, and some of the other text reads like there's a
protocol already "out there" that we have in mind.
I'm also a little wary of mentioning what the protocol (huh, this is
a requirements document...) will do in the future. I have a feeling
IESG wouldn't react too well to that.
>2. Terminology
>
> This document uses terms defined and explained in the WREC Taxonomy
> [4], and the HTTP/1.1 specification [5]. The reader should be
> familiar with both of these documents.
"This document uses terms defined in the Internet Web Replication
Taxonomy [4]" or simply mention the RFC numbers. "The reader is
expected to be familiar with these documents."
[snip]
> A "RUP server" is the entity that knows the state of the content and
> generates resource updates. A "RUP client" is the entity that needs
> to know the state of the content and receives resource updates from
> the RUP server.
There was a brief discussion in London on the possibility of getting
rid of "client" and "server" since they can lead to confusion. The
minutes identify some suggestions of possible replacements:
>>??: "invalidator" for client, "listener" for server
>>
>>Imran Chaudhri: content viewer and content publisher
>>
>>Paul Gleichauf suggested we examine references in the distributed
>>processing realm for assistance here.
> An invalidation is a signal from a RUP server to a RUP client to
> indicate that the master copy of a certain resource has changed. RUP
> must clearly define the actions this signal implies or mandates and
> the semantics the actions accomplish, in relationship to the RUP
> coherence models. E.g., RUP may specify (among many things) that a
> RUP client must not serve an invalidated cached object without an if-
> modified-since HTTP query.
This paragraph is more than terminology... doesn't feel right to me.
I'm also starting to see works like "must" and "may" and I'm
wondering whether there's a case in accurately using the language in
RFC2119 in this document.
>3. Design Guidelines
Of course, this is where the language in 2119 would cause problems :-)
[With apologies for saying things I know have been said before.]
> 1. The protocol must be simple and extensible, and it should be
> possible to systematically extend the protocol to accommodate
> richer functionalities in future iterations of the protocol.
Is this a useful guideline? I'd prefer "should" to "must" - at least
that way we would hopefully not get bogged down in determining
whether we'd got the extensibility or simplicity aspect right.
> 2. The protocol should not reinvent the same technologies, but
> leverage existing technologies (e.g. XML, HTTP, URIs) that have
> proven themselves with wide bases of deployment.
"The protocol should leverage existing technologies where possible
(e.g. XML, HTTP, URIs)."
> 3. RUP should allow other protocols of similar nature to
> interoperate or co-exist with it. The protocol should be easy to
> integrate into applications such as content management engines
> and Web server-side software components.
This sounds like two requirements rather than one design guideline.
> 4. The protocol must be scalable to tens of thousands of surrogates
> and caching proxies.
Do we have a way to measure this...?
(And what are the issues if user agents with the protocol are brought
into the environment...)
> 5. The protocol must define a set of minimal functionality that all
> implementations must support. If other functions exist, the
> protocol must define a default reaction to those functions.
I'm not sure there's any value in defining that guideline.
>4. Scoping Requirement
>
> RUP should work across a wide range of deployment environments and
> should be designed for use by origin servers, its delegates, and Web
> intermediaries, including surrogates, CDNs, and caching proxies
> alike. User agents (e.g., browsers or crawlers) may in the future
> find RUP or its derivatives suitable for purposes beyond the core
> charter established at the writing of this document.
I'm not sure that part about user agent really conveys the issues and
why we're limiting our considerations to intermediaries.
We probably want to clearly state that we are specifically excluding
consideration of user agents in the requirements/design, but we also
need to state (probably in the security considerations section?) what
we know could happen if such protocols (based on these requirements)
were built into more systems that we were expecting.
(And I apologize for the fact that the concepts in the above
paragraph are so amazingly wooly that I could probably make myself a
nice winter sweater right here and now.)
> RUP is motivated primarily by surrogates and CDNs, because there's
> much more urgent need for RUP among the more tightly coupled content
> delivery systems. Therefore, if there are conflicts in requirements
> between the surrogate/CDN entities and standalone caching proxies,
> RUP should accommodate those of the surrogates and CDNs but make them
> optional (e.g., whether to join a RUP session, whether to perform
> prefetch, and whether to accept per-object state). A RUP client
> should not have to support every operation.
This paragraph seems far too complex.
Taking the issues one by one
1) I'm assuming the "urgent need" for tightly coupled systems like
CDNs applies to cases such as CDI where there is a greater need for
multi-vendor support for a common protocol? Does this mean between
interoperating CDNs, or between content provider and contracted
CDN(s)?
2) The language in what is optional is confusing me a little. There
should be a base set of requirements that is applicable to all, and
CDNs may have more that are optional for support in caching proxies?
3) If there is some degree of optionality, the last sentence seems
somewhat redundant. Is there a case for defining levels of
compliance (such as was the case in WPAD)?
> RUP should accommodate performance requirements of all classes of Web
> intermediaries by making support of potentially expensive features
> optional. RUP may guarantee support to specific features by
> mandating all Web intermediaries to acknowledge support of an
> optional feature at server request.
See #3 above.
>5. Use Cases
>
> Please note that the protocol level details discussed here are only
> hypothetical at this stage, but necessary to support the examples.
I don't see protocol level details below, other than for requirements
of the protocol itself...
>
> In terms of administrative domains (scenarios and roles) the protocol
> runs in, following are the use cases:
>
> 1. A CDN needs to distribute invalidations from a single point under
> its control to a large number of intermediaries under its
> control, on behalf of content publishers, in a scalable fashion,
> while maintaining certain consistency guarantee for the cached
> resources.
>
> 2. A CDN needs to distribute invalidations from a single point under
> its control to a number of other peering CDNs, on behalf of
> content publishers, in a scalable fashion, while maintaining
> certain consistency guarantee for the cached resources.
>
> 3. A web site or its delegate needs to distribute invalidations from
> a single point under its control to a large number of
> intermediaries not under its control, on behalf of content
> publishers, in a scalable fashion, while maintaining certain
> consistency guarantee for the cached resources. At the same
> time, the intermediaries here also need to maintain certain
> consistency guarantee on behalf of their end-users.
Is there a suggestion that invalidations will be handled differently
depending on whether they are occurring within the same
administrative domain or not? I'm not sure I see a difference
between 2 and 3; 2 only differs from 1 with respect to the inter-CDN
aspect.
For what it's worth, 1 could easily be a situation where a national
caching proxy environment needs to pass invalidations down to
subordinate systems (having received notification from the web site
as in 3) ... does that configuration deserve being called a CDN?
> In terms of operations that one would expect to use the protocol to
> perform, following are the use cases:
>
> 1. Server-driven invalidation: in this scenario the RUP server would
> send object or resource group invalidations to the RUP clients.
> This mode allows the server (and its adminitrator) to control the
> RUP activity according to the server's own load, scheduling, and
> configured preference. The connection between RUP client and
> server could be established by either party, and could be
> persistent - so as to facilitate monitoring of the update
> guarantee, e.g., through heartbeat packets or positive
> acknowledgements.
Would this not be one of those places where we should say something
like the connection SHOULD be persistent? If the connection is
initiated from the RUP server (I'm presuming this is the case where a
content provider knows the CDN or major caching proxies it wishes to
communicate with) then it is to its advantage to maintain a
persistent connection, surely.
If the RUP client initiates the connection then surely it's more
related to the scenario below... the "client" wants to do something.
For sure, there perhaps needs to be a negotiation of activity based
on the loads of both systems, but I'm not sure I see the distinction
between 1 and 2 here in quite the way they're presented.
> 2. Client-driven validation: in this scenario the RUP client would
> take the lead, querying the RUP server for the freshness status
> of an object or group of objects (e.g., denoted by a URI). This
> mode allows the client to control the RUP activity according to
> the client's own load, failure recovery needs, and configured
> preferences. The RUP server would reply with the latest changes
> since the last time the client asked - based on information such
> as Etag, timestamp, and/or version number. Whether and when the
> RUP client asks the RUP server is determined by the consistency
> guarantee the client is committed to provide, and should follow
> the semantic rules defined by the RUP protocol.
Isn't the notion of "version number" related to Etag?
Also, you seem to have introduced the notion of a consistency
guarantee here without mentioning it before
> 3. Content location update: in this scenario the RUP server is able
> to designate a new location as the source for a cached object.
> It could do that even if the object is fresh. The RUP server
> would use the location update to notify RUP clients that an
> object is to be fetched, e.g., from a regular web server, the
> parent caching proxy, a CDN peer, or a multicast object
> distribution channel. RUP may make this an optional use so that
> a standalone caching proxy may ignore the alternate location.
This is a use case that's conveying requirements, it seems. My
recommendation would be to move any requirements to a later section
and then reference that section here - otherwise I think we run the
risk of requirements being "lost" in odd parts of the document.
My reading says this is performing a "push" version of an HTTP 30x
response... in the cases you identify above I'm not sure if it's
related to an HTTP/1.1 301, 302 or 305 response code (or whether it
can take the functionality of each of them depending on the specific
use).
1) If it's related to a pushed 30x then it would probably be clearer
to state so here - it should help folks understand what's being
described;
2) We need to be *very* clear about which of the 30x responses are
actually possible... there are some obviously serious implications if
the "wrong one" is used.
> 4. Content prefetch hint: in this scenario the RUP server may tag
> resources or resource groups as suitable for prefetch, and RUP
> clients may prefetch the content and pin it in their cache. Such
> use is expected to be common in surrogate, CDN, and CDN peering
> deployment scenarios, to provide updates for prepositioned
> content besides on-demand content. RUP may make this an optional
> use.
Again, there's a requirement tagged onto the end of a use case.
The language doesn't seem to work either, "may" and "optional" in the
same sentence... ouch. This is the requirements document - it's
either optional or it's not (my vote is on the former).
> A use case that is out of scope for the first RUP standard is
> "content updates". In this scenario, the RUP server would, instead
> of sending a cache invalidation and/or location update signal to the
> client, send the RUP client with either the full content of a
> modified object or a delta update showing changes against the
> previous revision. The reason for not supporting it right now is two
> fold.
"Actual content updates are out of scope for consideration at this
time. Content updates refer to a RUP server sending either the full
content, or a delta update of changes (e.g. <reference to HTTP
deltas>), of updated resources rather than an invalidation message.
> First, relative to cache invalidation, there's much less
> understanding of the kinds of content updates RUP may need. In
> particular, mixing signaling with data leads to problems including
> scaling, object consistency and security issues that are not well
> understood. Second, there are existing mechanisms addressing content
> retrieval, e.g., HTTP [5] and Delta Encoding [6], which also
> demonstrate the high complexity of such a functionality. RUP,
> however, will provide hooks for "content updates", i.e., through
> "content location update" (see use case 3). This allows RUP to
> leverage, instead of reinventing, existing mechanisms. RUP will not
> preclude future protocols that build on RUP to integrate more
> advanced content update functionality in the future.
This is a rather rambling paragraph. I think it would be better to
just define the issues we know:
1) Mixing signalling and data leads to problems of scaling,
consistency and security
2) Such functionality (e.g. delta updates) are very complex
Beware of words like "hooks". Just say something like "updates to
content can, of course, be provided by content prefetch hints (see
above)." I would also suggest that, if the group thinks this is
something that might be a realm of future development, to mandate
reservation of a content push operation within a protocol. (Maybe
reserve the operator, but leverage a flexibility of the payload to
enable folks to come back to this at a later point in time.)
> In short, RUP will not be used to update content or HTTP meta
> information of cached entities. As use case 3 illustrates, such
> updates can be implemented as a sequence of invalidate-and-fetch
> operations.
Hmm, my memory is rather vague at the moment on whether we've
discussed whether RUP should not be used for updating metadata at
all, though I think Joe might have mentioned this in a related way at
some point...
What we seem to be suggesting is that in a RUP enabled world, the RUP
clients will ignore any cacheability heuristics they might otherwise
employ in favor of receiving updates from the RUP server... but will
switch over to using the heuristics if they are unable to communicate
with the RUP server for some reason. Yes?
In that case, is it appropriate for content providers to attach
expiry metadata to their content for consumption by user agents
(which will revalidate with RUP enabled intermediaries, or as normal
with the origin server), but have the ability to modify that expiry
metadata on the intermediaries if "it's midnight and all is well"?
(Rather than the intermediary doing a conditional (pre)fetch.)
In either case I see a requirement that needs to be stated here.
> Another use case that is out of scope for RUP is the dynamic
> discovery of RUP services. For example, the URI of a particular RUP
> resource group could be manually configured, sent as header
> information in the HTTP responses from the origin server, or
> distributed via a separate out-of-bound mechanism.
This isn't a use case; this is a statement of fact. It probably
belongs in the introduction and/or design guidelines.
Hmm...
I see us heading down the road of developing a protocol and the
intermediaries having no way of knowing which resource groups
exist.......
Is this the right document in which to specify how those groups are
discovered, or are we going to have another document for that?
>6. Functional Requirements
>
>6.1 Coherence Model
These might work better as sub-sub-sections rather than as a numbered
list. Also there's a lot of descriptive prose (good) around
requirements (bad)... my suggestion would be to have the
description/commentary, possibly containing the requirements, and
then at the end to make the requirements explicit.
> 1. The protocol should provide an operation mode where the RUP
> server requires "confirmation of actions" from its RUP clients,
> e.g., to return positive acknowledgement upon receiving an
> invalidation message to signal the completion of cache
> invalidation or even content retrieval. If relay points are
> used, the relay node must preserve the semantics of positive
> acknowledgement, e.g., by waiting for every child client's
> acknowledgement before sending its acknowledgement upstream. The
> protocol shall allow a RUP server to specify that an invalidation
> message be acknowledged or that an invalidation message not be
> acknowledged. Explicit acknowledgement may, for example, be used
> to facilitate fail-over at CDN content routers. Conversely,
> running in no-acknowledgements mode may improve scalability.
OK, there seem to be a few issues covered in here... a biggie is the
possibility of relays.
Is it beneficial for the RUP server to know that the intermediary it
is speaking with is a relay? Is there a possibility that the RUP
server might be willing to delegate confirmation to the relay, so the
relay acknowledges receipt of the message for itself and then
monitors its subordinates?
In an inter-CDN case I could see operators being a little concerned
about releasing such key metrics of their propagation characteristics
to the outside world.
Also, I don't see what happens in the case of a relay's subordinate
constantly failing to acknowledge the confirmation of actions... do
we really have to wait indefinitely (forever) for an acknowledgement
that might never arrive?
I also see a mixing of "shoulds" and "musts" that don't seem to be compatible.
> 2. The protocol should also make loose consistency available, for
> applications that do not require tight coupling, e.g.,
> traditional batch mode mirroring applications. In particular,
> the system should provide delta consistency guarantees in which a
> RUP server may specify a maximum staleness "delta" in seconds.
> If an object that a client is caching is updated, within delta
> seconds, the client will either (a) be notified of the update or
> (b) detect that its cache is no longer synchronized with the
> server. Note that this allows a client to enforce a worst-case
> staleness guarantee of delta seconds.
I've re-read the second part of that paragraph twice, and I'm still
not sure I can really make sense of it.
They key, is, of course that there is a period of at most n seconds
during which a cache may be inconsistent. But it's not really clear
when the clock starts ticking.
> 3. While participating in a RUP session, a cache should never return
> potentially stale RUP-controlled data without a warning that the
> data are potentially stale, e.g., via HTTP Warning: 110/111.
This is an interaction with RFC2616 so should be labeled as such.
This is for cases where the HTTP metadata states that the object it
stale and yet RUP does not, right?
I'm not sure I see the rationale for choosing Warning 110 or 111
unless the cache has become disconnected from the RUP server. That
said... do we want to register HTTP warning codes?
> 4. If a cache wants to transition a resource from RUP-controlled
> coherence to HTTP cache-control based coherence, it must first
> consider the resource stale and revalidated via HTTP (e.g., if-
> modified-since). Similarly, to transition a resource from HTTP
> cache-control to RUP-controlled coherence, it must first consider
> the resource stale and revalidate via RUP.
Drop the HTTP example, stating that an HTTP revalidation must occur
is sufficient.
> 5. It is essential that the protocol support "express
> resynchronization". I.e., if a RUP client becomes de-
> synchronized from a RUP server, the client should be able to
> reconnect (resynchronize) quickly. There are a number of ways to
> support this, e.g., batch revalidation based on version numbers,
> incremental background revalidation, incremental foreground
> revalidation, delayed invalidation, log playback, etc. It's up
> to the RUP protocol design to decide on the specific mechanisms
> for revision control and express resynchronization of resources.
I'm not particularly comfortable with the way this requirement is
presented, though I understand the rationale behind leaving it
somewhat open in scope to allow different protocols to provide
different/better ways of doing things.
What do you mean by "foreground" and "background" revalidation?
> 6. Resource update guarantees must propagate correctly through the
> scaling mechanisms even if multiple levels of intermediary are
> used.
Again, this doesn't seem particularly well stated, and the motivation
for stating it is somewhat hidden.
>6.2 Naming and Framing - Synchronization groups
We *definitely* need a paragraph here outlining what synchronization
groups are, and why it is beneficial to address groups of entities
rather than just single entities.
> 1. The protocol must enable definition of a "synchronization group"
> of objects, which is a group of objects about which a client can
> subscribe to receive notifications. Synchronization groups
> represet the granularity of synchronization. RUP servers must
> only send notifications about resources in a synchronization
> group to RUP clients that have requested notifications for that
> group.
Is it worth mandating how groups must be addressed, or is that open
to interpretation in candidate protocols?
There's also a slightly buried requirement here that RUP clients and
servers negotiate and that clients specify which groups they want
notifications for. I don't think that's really been mentioned before
(note: don't assume things are obvious). I'm uneasy about the idea
of some as-yet unstated "negotiation" in light of a requirement like
this.
Could you explain the last sentence to me?
> 2. The policy to group resources into synchronization groups is
> outside the scope of RUP. Grouping may be determined by the
> content provider, CDN operator, traffic analysis tools, or other
> means. RUP is not required to provide dynamic negotiation
> between the RUP server and RUP client over the composition of a
> resource group. In other words, "targeted invalidation", in
> which a server only sends an invalidation about object X to
> clients that have registered callbacks on object X, is out of
> scope for the initial version of RUP. This restriction is
> motivated by complexity and scalability concerns about servers
> (and clients) having to negotiate and maintain individual views
> of resource groups for all the RUP clients (and servers) they
> speak to. It's anticipated that predefined resource groups will
> fit well with the majority of the RUP deployment cases
> (surrogates, mirror sites, and CDNs).
This paragraph/requirement is much too complex for a single
paragraph, and the first sentence clearly suggests that this
shouldn't even belong in this section.
This requirement also reads as though it's in direct opposition to #1
above: servers must only send notifications about resources for which
a subscription exists, but it's too complex right now to keep track
of everything that individual clients want for targeting...... I'm
lost.
> 3. RUP must support "in band" and "out of band" means to describe
> the composition of a synchronization group. Our of band
> assignment of an object to a synchronization means that
> assignment occurs outside of RUP information exchange procedure,
> (e.g., in the object's HTTP header.) In-band assignment would
> include a synchronization group definition message in RUP. An
> out-of-band HTTP header based approach specification is simple
> and makes it efficient to determine to which group an object
> belongs, while in-band specification should be supported so that
> RUP is self-contained. In particular, CDN operators would prefer
> not having to change the origin web server before anything can be
> put into a resource group, but rather self-describe the
> composition within the group.
OK, so I'm seeing that in-band would have to be for relationships
that were hard-coded into systems in some way (there's no kind of
"discovery"). Out of band can be discovered by having the
synchronization group URI specified in an HTTP header (note: I'd
suggest this is the document in which that header and URI are
defined).
These might be better if separated into separate requirements in
order to facilitate the description.
> 4. The protcols for describing synchronization group composition
> should be efficient with respect to both network transmission and
> client-matching logic for both in band and out of band protocols.
> Network transmission should support descriptions that grow less
> than linearly with the number of objects in a volume (e.g., URI
> prefix or regular expression matching) but they may also support
> listing of objects. For "out of band" protocols, they may
> support allowing a header to indicate that the current object is
> part of a particular synchronization group (rather than fully
> specifying the membership of the synchronization group). In
> order to service reads efficiently, it must be possible to
> implement matching logic so that work to determine which
> synchronization group(s) a cached object is a member of grows
> much more slowly than linear in the number of synchronization
> groups. (For example, consider a hypothetical RUP protocol where
> the membership of a synchronization group is specified by a short
> list of URI prefixes. Such URI prefixes can be organized into a
> tree so that given an object URI, the enclosing URI prefix can be
> found in work logarithmic to the number of URIs. Conversely, it
> may be more difficult to determine which (if any) arbitrary
> regular expression matches a given URL, so allowing
> synchronization groups to be defined by arbitrary regular
> expressions may limit scalability of RUP clients.)
This really needs to be broken up. You've got a lot of background
material (which in my opinion belongs earlier so that you can
describe what synchronization groups actually are [we have a
terminology section, it should probably be used to define what the
things are]) interspersed with requirements.
> 5. The protocol should allow RUP servers to be common with or
> disjoint from data servers. Therefore, in addition to specifying
> the collection of URIs that belong to a synchronization group, a
> in- or out-of-band definition of a synchronization group must
> also specify protocol and server information that indicate with
> whom to communicate (e.g., "wcip://example.acm.org/channel7").
I'd recommend against using a candidate protocol in the example.
Maybe just replace with:
protocol://example.com/channel7
>
>
>6.3 Naming and Framing - Notification groups
>
> 1. The protocol must enable atomic notification regarding an
> arbitrary "notification group" of resources. For example, the
> protocol must be able to invalidate a list of multiple URIs with
> one message.
>
> 2. The protocol must enable efficient notification regarding a pre-
> specified group of resources (i.e., it must be possible to define
> a group where a notification that applies to all members of a
> group can be transmitted with network bandwidth that grows less
> than linearly with the size of the group.) For example, the
> protocol might implement this requirement by allowing a
> notification group to be specified in different ways such as (1)
> by a list of URIs included in the message, (2) by a regular
> expression or path prefix that refers to a set of URIs, and/or
> (3) by a URI that itself refers to a (possibly hierarchical) list
> of URIs.
This seems a slightly long-winded way of saying that a RUP candidate
must allow the invalidation of an arbitrary number of resources, but
their URIs, within a single message. Given that we're doing this
work in the IETF I wonder whether we actually have to say things
about optimization, or whether it's simply sufficient to state that
"the protocol should support means of aggregating URIs by use of
wildcards/regular expressions" ??
>6.4 Naming and Framing - Notification and extensibility
>
> 1. As an extension mechanism, a RUP message may carry a set of
> options for a notification group of resource. The set of options
> may be empty. RUP must specify a generic option format and
> define the content prefetch hint and content location update as
> options. No option is mandatory. A RUP client can ignore any
> options it doesn't recognize or doesn't want to support. If
> positive acknowledgement is requested, the acknowledgement must
> indicate the options it has carried out.
There's a requirement here that has to do with acknowledgements so
that we have a base understanding of what MUST be acknowledged. If
options are not ignored, does the server have any understanding of
what has taken place...? Do we need to specify that the protocol
must provide option specification language that defines whether an
acknowledgement is required?
As a general comment, is something an option if it were mandatory...?
> 2. The protocol must define an extensible format for RUP messages
> that is capable of carrying a variety of payloads. Possible
> payloads include (1) cache invalidation, (2) content location
> update, (3) content prefetch hints, (4) removal and addition of
> resources to a resource group, (5) adjustments to cache
> consistency parameters, etc. While the above payloads may share
> the same RUP mechanism, it's not a requirement for the initial
> protocol to address all of them simultaneously.
I'm confused.
[OK, my apologies but comments will be a little less detailed from here on.]
>6.5 Client-Server Interaction
>
> 1. The protocol must define "RUP client" and "RUP server" roles. It
> should be possible for either the RUP server or the client to
> initiate information exchange.
Why does that belong in the protocol document? You already stated
the latter point earlier in the document.
> 2. We anticipate that the primary RUP clients and servers will be
> Web intermediaries and origin servers, although the protocol
> should not be so designed as to preclude use by other entities.
> For example, the origin server or servers may delegate the role
> of RUP server to a CDN which operates dedicated content signaling
> channels and servers.
This has also already been mentioned in an earlier section.
> 3. The protocol should be designed to scale to systems where there
> are a large number (more than 10,000) surrogates of a given
> origin server. This may require multiple levels of intermediary
> relay points and/or IP multicast.
Ditto (though the last sentence is new).
Note: We need to support relay points, and we need to support
scaling. DNS scales well because of how it's distributed. If our
scaling is dependent on "someone" putting special relay boxes out
there then ... that's not so good.
> 4. The protocol should be capable of operating efficiently on a wide
> variety of underlying media, high latency satellite links in
> particular will need to be considered. E.g., caching vendors
> have TCP optimizations that an administrator can turn on if the
> link is satellite. A reliable multicast protocol would use more
> FEC If the link is asymmetric. The analogy in RUP is that RUP
> should be able to turn off any client->server messages (such as
> ACKs and client-driven updates) if the link is satellite or if
> the transport is IP multicast.
This sounds very dangerous to me as there's almost a notion of doing
some layer violation.
Just say something like "RUP must be able to operate in a
unidirectional manner with no feedback". But you need to closely
consider interaction of such a statement with the other requirements
in the document... such as the one below.
> 5. To support sequential consistency and monitoring of the RUP
> clients, it must be possible to determine whether resource update
> messages have been missed, e.g. due to a RUP client or server
> being down or unreachable. There must be a feedback mechanism
> which enables the RUP server to determine the extent to which
> resource updates have propagated to surrogates and carried out.
> E.g., if the feedback from a surrogate never comes back or comes
> back as failed, the RUP server may either delay publishing the
> content, syslog the failure, disable the surrogate that sent the
> failure code, or stop content routing to that CDN peer, etc.
> Specific deployments must be able to choose whether or not to
> operate with feedbacks.
>
> 6. It should be possible to reach a consistent state on all
> surrogates of a given origin server and collection of resources.
> I.e., RUP must guarantee that either a RUP client sees an update
> as intended or be able to detect that it might have missed an
> update.
I'm thinking I should ignore the comments about turning off ACKs etc.
... since there seem too many requirements for having a feedback link.
> 7. The protocol should define the failure mode, i.e., the
> interaction and assumption of the RUP server and client in the
> presense of failure, in order to clearly define and preserve the
> semantic guarantees that can be offered in failure modes. In
> particular, if so desired, a RUP server must be able to detect if
> some RUP clients didn't receive an update or didn't carry out an
> action, e.g., via positive acknowledgement, even if there's a
> network failure or client failure. Similarly, a RUP client must
> be able to detect if the RUP server or the network connection to
> the RUP server has failed and, if so, automatically perform
> appropriate actions to expire contents that are potentially
> stale.
>
>
>6.6 Network and Host Environment
>
> 1. The protocol should be useable both in a surrogate/origin server
> relationship and a traditional caching proxy/origin server
> relationship. The protocol should also be general enough to be
> useable in content delivery network (CDN) environments to allow
> freshness control of CDN delivery nodes.
This has already been said, no?
> 2. It must be possible for the protocol to be used in an environment
> where some or all communications are mediated through a firewall
> or other intermediary device. The protocol design must identify
> issues involved in firewall traversal and provide ways by which
> these may be avoided or circumvented. These may not be
> explicitly security related concerns, e.g. working around any
> problems caused by use of Network Address Translation.
>
> 3. It must be possible for the protocol information to be relayed
> (single source, single destination) and/or be broadcasted (single
> source, multiple destinations) by RUP proxies. It must be
> possible for the protocol information to be cached (e.g., for
> broadcasting purposes) by RUP proxies. RUP must guarantee that
> the invalidation effect of a relayed messages on a compliant
> destination is the same as if the message reached the destination
> directly from the source.
Do you mean RUP relays?
>6.7 Host-to-host Communication
>
> 1. The protocol should layer cleanly and independently on top of the
> underlying communication layers, e.g., TCP, HTTP, BEEP, or SOAP.
> The protocol semantics and message formats should be self-
> contained in that they stay the same regardless of the underlying
> transport, and thus portable to different transports.
I'm not sure that including HTTP there is a good idea.
The common point in the above is the notion of guaranteed delivery...
is that something you want to require?
[OK, my brain gave out... no comments on the last little bit, except this one]
<snip>
> [4] Cooper, I., Melve, I. and G. Tomlinson, "Replication and
> Caching Taxonomy", RFC 3040, January 2001.
Just an observation that might be applicable to other references...
that's not the published title of that RFC.
This archive was generated by hypermail 2b29 : Thu Nov 18 2004 - 11:23:00 MST