WWWOFFLE - World Wide Web Offline Explorer - Version 2.8 ======================================================== The WWWOFFLE programs simplify World Wide Web browsing from computers that use intermittent (dial-up) connections to the internet. Description ----------- The WWWOFFLE server is a proxy web server with special features for use with dial-up internet links. This means that it is possible to browse web pages and read them without having to remain connected. Basic Features - Caching of HTTP, FTP and finger protocols. - Allows the 'GET', 'HEAD', 'POST' and 'PUT' HTTP methods. - Interactive or command line control of online/offline/autodial status. - Highly configurable. - Low maintenance, start/stop and online/offline status can be automated. While Online - Caching of pages that are viewed for later review. - Conditional fetching to only get pages that have changed. - Based on expiration date, time since last fetched or once per session. - Non cached support for SSL (Secure Socket Layer e.g. https). - Can be used with one or more external proxies based on web page. - Control which pages cannot be accessed. - Allow replacement of blocked pages. - Control which pages are not to be stored in the cache. - Requests compressed pages from web servers (compile time option). - Requests chunked transfer encoding from web servers. While Offline - Can be configured to use dial-on-demand for pages that are not cached. - Selection of pages to download next time online - Using normal browser to follow links. - Command line interface to select pages for downloading. - Control which pages can be requested when offline. - Provides non-cached access to intranet servers. Automated Download - Downloading of specified pages non-interactively. - Options to automatically fetch objects in requested pages - Understands various types of pages - HTML 4.0, Java classes, VRML (partial), XML (partial). - Options to fetch different classes of objects - Images, Stylesheets, Frames, Scripts, Java or other objects. - Option to not fetch webbug images (images of 1 pixel square). - Automatically follows links for pages that have been moved. - Can monitor pages at regular intervals to fetch those that have changed. - Recursive fetching - To specified depth. - On any host or limited to same server or same directory. - Chosen from command line or from browser. - Control over which links can be fetched recursively. Convenience - Optional information footer on HTML pages showing date cached and options. - Options to modify HTML pages - Remove scripts. - Remove Java applets. - Remove stylesheets. - Remove shockwave flash animations. - Indicate cached and uncached links. - Remove the blink tag. - Remove refresh tags. - Remove links to pages that are in the DontGet list. - Remove inline frames (iframes) that are in the DontGet list. - Replace images that are in the DontGet list. - Replace webbug images (images of 1 pixel square). - Demoronise HTML character sets. - Stop animated GIFs. - Remove Cookies in meta tags. - Provides information about cached pages - Headers, raw and modified. - Contents, images, links etc. - Source code unmodified by WWWOFFLE. - Automatic proxy configuration for Netscape. - Searchable cache with the addition of the ht://Dig, mnoGoSearch (UdmSearch) or Namazu programs. - Built in simple web-server for local pages. - Allows CGI scripts - Timeouts to stop proxy lockups - DNS name lookups. - Remote server connection. - Data transfer. - Continue or stop downloads interrupted by client. - Based on file size of fraction downloaded. - Purging of pages from cache - Based on URL matching. - To keep the cache size below a specified limit. - To keep the free disk space above a specified limit. - Interactive or command line control. - Compression of cached pages based on age. - Provides compressed pages to web browser (compile time option). - Use chunked transfer-encoding to web browser. Indexes - Multiple indexes of pages stored in cache - Servers for each protocol (http, ftp ...). - Pages on each server. - Pages waiting to be fetched. - Pages requested last time offline. - Pages fetched last time online. - Pages monitored on a regular basis. - Configurable indexes - Sorted by name, date, server domain name, type of file. - Options to delete, refresh or monitor pages. - Selection of complete list of pages or hide un-interesting pages. Security - Works with pages that require basic username/password authentication. - Automates proxy authentication for external proxies that require it. - Control over access to the proxy - Defaults to local host access only. - Host access configured by hostname or IP address. - Optional proxy authentication for user level access control. - Optional password control for proxy management functions. - Can censor incoming and outgoing HTTP headers to maintain user privacy. Configuration - All options controlled using a configuration file. - Interactive web page to allow editing of the configuration file. - User customisable error and information pages. Changes ------- Since version 2.8-beta: Bug Fixes: Fix some portability problems (C++ comments, flex specifics). Increase the recursion limit for Location headers (now 8). From version 2.7h to version 2.8-beta. Bug Fixes: Allow viewing Javascript source in info pages. Make parsing of monitor option day/hour ranges more robust. Show correct times for URLs monitored every hour. Remove newline from end of line when calling syslog. Preserve username and password in FTP links. Fix some small memory leaks. Purge unmatched O* and U* files from outgoing. Improve spool error messages. Validate the -d option in wwwoffled. Keep the config file permissions when writing new one. Don't call freeaddrinfo(NULL). Running 'wwwoffle URL' when online now actually fetches the URL. Improve lexical analyser code for EOF condition, speed and new version of flex. Remove some lockfile race conditions. Better handling of non-ASCII URLs when parsing. The info page for a URL now shows all links. Index sorting by file type is case insensitive. Handle & in HTML tags like '&'. Better memory freeing in certain cases. Make the wwwoffle -[drR] options handle spaces before number. Allow wwwoffle program to request recursive fetching of depth 0. FTP requests with passwords work like HTTP. Running 'wwwoffle http://www/bar#foo' now does the right thing. Correctly handle recursive fetch options. Running 'wwwoffle http://aaa:bbb@www.foo/bar' now does the right thing. Allow files on the local web server with spaces in them. Fix overwriting of old error message with page. Remove title attributes from DontGet images when modifying. Check form entries for unwanted whitespace. New Features: Chunked encoding from servers and to clients is now possible. Changed all HTML output to HTML 4.01 DTD and validated most output pages. Added in a WWWOFFLE stylesheet and some styles to the interal web pages. Removed the use of temporary files between server/cache and client. Added a parser for CSS (Stylesheets) to detect included files and images. The default directory for configuration files is /etc/wwwoffle. Remove the enable-modify-online option (no penalty for modifying online). Handle deflate compression (the commonly implemented but wrong version). Guess the compression type coming from servers (don't believe the headers). Stop infinite recursion when following Location headers or Meta Refresh tags. Stop infinite recursion if images are actually HTML (e.g. non-404 error pages). Fetch images etc in pages with error status codes. New Options Added options to enable chunked encoding from servers and to clients. Added an option to disable the use of Etag as a cache validator. Added an option to force the insertion of a User-Agent header. Added an option to not make conditional requests to specified hosts. Added an option to fetch the favourite/shortcut icons automatically. The *-no-cache options now appear in OnlineOptions and OfflineOptions. Add an option to disable cookies from being set by HTML meta tags. Programs: Added '-g' option to wwwoffle to fetch no images, stylesheets etc. Availability ------------ Version 2.8 uploaded, but may not be available yet FTP server: ftp://ftp.ibiblio.org/pub/Linux/apps/www/servers/wwwoffle-2.8.tgz FTP server: ftp://ftp.demon.co.uk/pub/unix/httpd/wwwoffle-2.8.tgz Web page: http://www.gedanken.demon.co.uk/wwwoffle/ Author & Copyright ------------------ This program is copyright Andrew M. Bishop 1996,97,98,99,2000,01,02,03 (amb@gedanken.demon.co.uk) and distributed under GPL. email: amb@gedanken.demon.co.uk [Please put wwwoffle in the subject line]