Re: Cache _log_ summaries

From: Jens-S. Voeckler (voeckler@rvs.uni-hannover.de)
Date: Thu May 18 2000 - 15:28:41 MDT


Martin Hamilton wrote:

> It's been a bugbear of mine for a long time that although we have
> (more or less) a standard log file format for proxy caches (well, OK,
> plain vanilla Common Logfile and Squid), there isn't a commonly
> accepted format for summaries of those log files.

Considering native log format style, now supported by major cache
vendors, there might be even more interesting things to report. Yes,
I know about the offer from squid-dev to implement them. I am (still)
not concrete but thinking in terms of what Adrian Cockroft did with
the SUN web servers... And talking about that, some meta definition
of log file formats is missing, too, e.g. a generally accepted style
to describe what a log file does report - so you can parse a variety
of logs if you have the meta description, not limited to squid native
format.

> In case it's not clear, I'm thinking in terms of stats _such as_ (but
> not necessarily including or limited to!) a periodic breakdown of
> sites visited (bytes and requests shipped vs. HTTP status codes) and
> clients visiting (ditto), plus performance measures for the cache
> itself (e.g. median and standard deviation for hit service times per N
> time units). It's worth bearing in mind that this is also the sort of
> thing most logfile analysis tools (e.g. Calamaris) have to collate
> internally before they can do their thing.
>
> With one of my hats on I have to write code to process ~80 million
> proxy cache logfile entries/day (aren't Service Level Agreements a
> wonderful innovation? :-), and would dearly like to make some or all
> of the resulting summaries available for people doing caching
> research. Of course, if lots of people were to do this independently
> using incompatible file formats it would be a real pain.

Ahm, if you would be a little more specific, I might be tempted to
extend Seafood into that direction (why didn't you mention this in
Lund?). Maybe the topic can be elaborated at the tf-cache meeting in
Lisbon, too?

> So... anyone interested in getting together (metaphorically :-) to
> work something out ? Mail me, or post to the list if you have points
> which you think other people would like to hear... !

Medians sound like a good thing. One of the special feature
of Seafood is the ASN for DIRECT. Doing all the lookups (DNS, whois)
is what is killing any decent analysis - even with DNS/whois caches
involved. Seafood takes 25s for parsing and accumulating some log file,
but another 510s for doing the lookups (DNS cache miss). So we might
want to focus on things which do not need look-ups of external services.

-- 
Le deagh dhùrachd,
Dipl.-Ing. Jens-S. Vöckler (voeckler@rvs.uni-hannover.de)
Institute for Computer Networks and Distributed Systems
University of Hanover, Germany; +49 511 762 4726



This archive was generated by hypermail 2b29 : Thu Nov 18 2004 - 11:21:28 MST