$Id: Release-Notes-1.0.txt,v 1.1.2.9 1996/06/06 05:07:40 wessels Exp $

Release Notes for version 1.0 of the Squid cache.

TABLE OF CONTENTS:

	Private Objects
	Proper parsing of HTTP reply codes
	Support for If-Modified-Since GET
	Improvements to the access log
	Metadata reloads in the background
	Unlinking swap files on restart and the -U option
	Changes to debugging
	New Access Control scheme
	Using SIGHUP to reconfigure the cache
	ftpget server
	Changes to cache shutdown
	Assigning weights to cache neighbors
	Converting 'cache/log' from cached-1.4.pl3
	Notes on stoplists vs. ttl_pattern
        SIGUSR1 now rotates log files


Private Objects
==============================================================================

The Squid cache uses the notions of ``private'' and ``public''
objects.  An object can start out as being private, but may later be
given public status.  Private objects are associated with only a single
client whereas a public object may be sent to multiple clients at the
same time.  When the cache finishes retrieving an object, if the object
is private it will be ejected from the cache.  Only public objects
are saved on disk.

There are a few ways to determine whether an object should be private
or public.  One is the request method.  Only URLs requested with
the ``GET'' method can be public.  Another way is by examining the 
URL string.  URLs which match one of the stoplist entries will 
always be private objects.  Usually this includes ``cgi-bin'' scripts.
A third way is by checking the HTTP request and reply headers.  For 
example, if the request includes user authentication information, then
the object should never be made public.  Additionally, some HTTP
replies such as ``401 Unauthorized'' should also never be made public.

For these reasons, Squid starts all objects out as private and changes
them to public only after the HTTP reply headers have been read.

Unfortunately, this causes some problems with the UDP-based Internet
Cache Protocol (ICP) used to query neighboring caches.  Specifically, when
an ICP reply packet is received, it only contains the object URL which
is not sufficient enough to locate private objects in the cache metadata.
To get the additional information needed to locate private objects, we
decided to use the ``reqnum'' field of the ICP packet.  This is an
acceptable solution, except that as implemented in cached-1.4.pl3 and
earlier, all ICP replies have the reqnum field reset to zero!

Squid will make use of private objects until it notices that one of
its neighbors is sending ICP replies with the reqnum field set to zero.
It will then only use private keys for objects which are not going to
be queried for via ICP.  These include objects in the stoplist and
If-Modified-Since requests.  

Proper parsing of HTTP reply codes
==============================================================================

Squid parses HTTP replies to extract the reply code.  The codes are used
to determine which objects should be cached, which should be ejected,
and which should be negative-cached. 

See HTTP-codes.txt for a list of HTTP response codes, and how they are
cached.

The HTTP codes are now logged to "access.log" in the native format 
(ie with 'emulate_httpd_log off').

Support for If-Modified-Since GET
==============================================================================
Squid supports IMS GET retrievals, but not through any neighbor caches.
Whenever an IMS GET request is received, Squid will bypass the cache
hierarchy and fetch the object on its own.


Improvements to the access log
==============================================================================
The "access.log" file has been improved in a number of ways.  There is now
only one log entry per client request and the size is always correct. 
The format is now

   timestamp  elapsed  src-address  type/code  size  method  URL

	- timestamp:	When the request is completed with millisecond
			resolution
	- elapsed:	elapsed time of the request, in milliseconds
	- src-address: 	IP address of the requesting client
	- type:		An indication of how the request was handled
			by the cache.  These are described further below
	- code: 	The HTTP reply code when available.  For ICP
			requests this is always "000."  If the reply code
			was not given, it will be logged as "555."
	- size:		For TCP requests, the amount of data written
			to the client.  For UDP requests, the size
			of the request.  (in bytes)
	- method:	The request method (GET, POST, etc).
	- URL		The URL of the request

Access Log Types:

"TCP_" refers to requests on the HTTP port (3128)

	TCP_HIT		A valid copy of the requested object was in the cache
	TCP_MISS	The requested object was not in the cache
	TCP_EXPIRED	The object was in the cache, but it had expired
	TCP_REFRESH	The user forced a refresh ("reload")
	TCP_IFMODSINCE	An If-Modified-Since GET request.
	TCP_SWAPFAIL	The object was believed to be in the cache,
			but could not be accessed.
	TCP_DENIED	Access was denied for this request

"UDP_" refers to requests on the ICP port (3130)

	UDP_HIT		A valid copy of the requested object was in the cache
	UDP_MISS	The requested object was not in the cache
	UDP_DENIED	Access was denied for this request


Metadata reloads in the background
==============================================================================
Upon restart, Squid automatically loads cache metadata in the
background.  It will be able to service new requests immediately.  As
new objects are added, there may be some "clashes" with old objects
using the same swap file on disk.  In these cases you may see a message
in the cache logfile about "Active clash."  This means the old object
has been discarded since it was replaced by a new object.


Unlinking swap files on restart and the -U option
==============================================================================
When the cache reloads object metadata from disk some of the objects
will be expired or otherwise invalid.  In the interest of speed, these
invalid objects will not be removed from the filesystem by default.  They
will eventually be overwritten by new objects as enter the cache and
get saved to disk.

The -U option can be used to actually remove the invalid objects from
disk.  

In addition, the -z option will not cause 'rm -rf [0-9][0-9]' to be
executed unless the -U option is also given.  

When swap files are not removed during restart there internal counters
for disk space taken will not match the actual disk space used.  If you
have a large cache or plenty of extra disk space, this should not be a
problem.  However, if space is an issue, you may want to use the -U
option at the cost of a slower restart.


Changes to debugging
==============================================================================
Squid has a flexible debugging scheme.  You can enable more debugging
for certain functions and less for others.  For example if you needed
to figure out why your access controls were behaving strangely, you 
could enable debugging for section 28 at level 9.  Currently, each
section corresponds to separate source code file:

	main.c:              Section 1
	cache_cf.c:          Section 3
	errorpage.c:         Section 4
	comm.c:              Section 5
	disk.c:              Section 6
	fdstat.c:            Section 7
	filemap.c:           Section 8
	ftp.c:               Section 9
	gopher.c:            Section 10
	http.c:              Section 11
	icp.c:               Section 12
	icp_lib.c:           Section 13
	ipcache.c:           Section 14
	neighbors.c:         Section 15
	objcache.c:          Section 16
	proto.c:             Section 17
	stat.c:              Section 18
	stmem.c:             Section 19
	store.c:             Section 20
	tools.c:             Section 21
	ttl.c:               Section 22
	url.c:               Section 23
	wais.c:              Section 24
	mime.c:              Section 25
	connect.c:           Section 26
	send-announce.c:     Section 27
	acl.c:               Section 28

Debugging levels are set in the configuration file with the 'debug_options'
line.  For example:

	debug_options ALL,1 28,9 22,5


New Access Control scheme
==============================================================================
The old IP-based access controls have been replaced with a much more
flexible scheme.  First you must define a set of access control lists. 
There are N types of lists:

	'src'		client IP address
	'dst'		server IP address**
	'method'	method of the request (eg, GET, POST)
	'proto'		protocol of the request (eg HTTP, WAIS)
	'domain'	domain of the URL request (eg .foo.org)
	'port'		port number of the URL request (eg 80, 21)
	'time'		time-of-day and day-of-week
			format: [SMTWHFA] [hh:mm-hh:mm]
	'pattern'	regular expression matching on the URL-path

After the access lists have been defined, you can then combine them
in way to allow or deny access.  

For example, your cache might be configured to accept requests 
from both inside and outside of your organization.  In that case you'd
probably want to allow internal clients to access anything, but limit
outside access to only sites within your organization.  It could be
done like this:

	acl ourclients src  128.138.0.0/255.255.0.0  198.117.213.0/24
	acl ourservers domain .whatsamattu.edu

	http_access deny !ourclients !ourservers
	http_access allow ourclients

If you wanted to limit FTP requests to off-peak hours, you could use:

	acl daytime time  MTWHF 08:00-17:00
	acl FTP proto FTP
	http_access deny FTP daytime

Any of the access list types can accept multiple values on the 
same line, except for 'time'.  Multiple values of an 'acl'
definition are treated with OR logic.  Multiple ACLs of
an 'http_access' are treated with AND logic.  
That is, all ACLs much match for the 'allow' or 'deny' take effect.
The order of the 'http_access' lines are important.  When a line
matches any following lines are not considered at all.

'icp_access' is the same as 'http_access' but it applies to the ICP
port.  However, it is not yet fully implemented.  It is only able to check
'src' and 'method' ACLs.

**Note, the 'dst' ACL type has been added for version 1.0.beta12.  In
that version it is implemented in a "lazy" manner.  If the URL hostname
is not already in the IP cache, the ACL checks will not match it, but
they will start a DNS lookup so that it will likely be present for
future ACL checks.  This means some users may occasionally get oddball
results.  For example, a page may fail the first time, but succeed on
the second try, or vice-versa.

Changes to cache shutdown
==============================================================================
Squid attempts to implement a "nice shutdown" upon receipt of a SIGTERM
signal.  Rather than simply breaking all current connections, it waits
a configurable number of seconds for active requests to complete.  The
default 'shutdown_lifetime' value is 30 seconds.

As soon as the SIGTERM is received, the incoming HTTP socket is closed
so that no further requests will be accepted.  


Using SIGHUP to reconfigure the cache
==============================================================================
Sending the squid process a HUP signal will prompt it to re-read its
configuration file.  Before it can be reconfigured, it must make sure
that all active connections are closed.  For this purpose squid
pretends to do a shutdown as described above.  ie, it will wait up to
30 seconds for active requests to complete before re-reading the
configuration file.


ftpget server
==============================================================================
The ftpget program has been modified to act as a server for FTP
request.  You may now notice that an "ftpget -S" process is always
present while the cache is running.  The benefit of using an ftpget
server is that the cache process (which may be very large) no longer
needs to fork itself for FTP requests.


Assigning weights to cache neighbors
==============================================================================
Squid allows you to assign weights to parent caches.  The weights are 
used to calculate the ``first miss parent.''  The weight is specified in
the 'options' field of the 'cache_host' line.  For example:

     cache_host  big.foo.org parent 3128 3130 weight=5

The weight must be a non-zero integer.  It is used as a divisor to
calculate a weighted round-trip-time (RTT).  Higher weights will cause
a parent to have a ``better'' RTT.

Weights are only involved when all parent caches return MISS.  Squid still
fetches an object from the first parent or neighbor to reply with a HIT,
regardless of any weight values.

Converting 'cache/log' from cached-1.4.pl3
==============================================================================
Squid uses a slightly different format for the 'cache/log' file.  In 
particular, the words 'FILE:' and 'URL:' have been removed from each
line.  To save your on-disk cache, you will need to convert this log
file before starting Squid.  To do that use a simple awk command:

     mv log log.old
     awk '{print $2,$4,$5,$6,$7}' < log.old > log


Notes on stoplists vs. ttl_pattern
==============================================================================
You can use the stoplists ('http_stop', etc) in the configuration file
to prevent objects from being cached.  Using a 'ttl_pattern' with the
TTL to zero will also prevent objects from being saved.

There is one important difference between these two methods however.
Squid nevery makes ICP queries for objects which match the stoplists.
Instead, the object will be fetched directly (unless on the other side
of a firewall).  We recommend that you use the stoplist for cgi-bin
scripts and use the ttl_pattern rules to prevent caching of normal
objects.

SIGUSR1 now rotates log files
==============================================================================
In order to be more consistent with other daemon programs, SIGHUP is used
to reconfigure the running process.  This means that we needed to change
the signal used to rotate the log files.  We now use SIGUSR1 to rotate the logs.