P3P Architecture Working Group WD-P3P-arch

This is a W3C Working Draft for review by W3C members and other interested parties. It is a draft document and may be updated, replaced or obsoleted by other documents at any time. It is inappropriate to use W3C Working Drafts as reference material or to cite them as other than "work in progress". A list of current W3C working drafts can be found at: http://www.w3.org/TR/

This document represents a work in progress. It is not intended to be advanced toward W3C recommendation status, but rather it should be used, along with the P3P Grammatical Model and Data Design Model Working Draft, as a basis for developing the Protocols and Data Transport Working Group's deliverable of a specification, fully specifying the conversational framework for user-agent/service interaction. It is strongly recommended that only experimental software be implemented to this specification. The Platform for Privacy Preferences Project will not allow early implementations to affect their ability to make changes to the framework described in this document.

Comments on this working draft should be sent to the P3P Project Manager, Philip DesAutels

The purpose of this document is to provide a general overview to the P3P architecture. We define terms and concepts which form the underpinning of the Personal Privacy Preferences (P3P) system.

P3P addresses the twin goals of meeting the data privacy expectations of consumers on the Web while assuring that the medium remains available and productive for electronic commerce. Both goals can be achieved by following the principles of consumer notice, choice, and control with respect to site privacy practices and data control.

The goal of this document is to clearly delineate the design space from the implementation and to determine which issues remain to be addressed. This document does not provide specific details of the privacy practice grammar, data design, nor the transport and negotiation protocols. Please see the relevant working group drafts for that information.

The term "privacy" covers a very wide range of concerns, and it is important that one understands from the outset the precise scope of the P3P work. P3P will enable sites to express privacy practices and for user to express their preferences about those practices and have their agent act on it accordingly. The user agent can then provide the user a safe and seamless experience.

A P3P interaction will result in an agreement between the service and the user agent regarding the practices associated with a user's implicit (i.e., click stream) or explicit (i.e., user-answered) data. The agreement may include service side permissions regarding the storage and release of data written by the service and accepted by the user agent. Allowing client side storage of user data in a data repository increases the user's access to and control of data, while providing a mechanism so that the user need not repeatedly enter frequently solicited information. This architecture enhances personal privacy while providing richer, easier access to Web services.

The larger goal of P3P is to create a framework that promotes trust and confidence in the Web. We believe that the key ideas are:

Access	For P3P purposes, a clause that expresses the ability of users to obtain and correct information that an entity has collected about them. A vocabulary may define various degrees of access.
Agreement	A statement that a service and a user agent have agreed to abide by.
Clause	The "parts of speech" from which P3P statements are constructed.
Credentials	Signed statements of authorization, identification, or practice (e.g. certificates granting authority or identity, or signed metadata). These credentials may be presented or be requested by either the user agent or the service.
Data category	A quality of a data element or class that may be used by a trust engine to determine what type of element is under discussion (such as anonymous demographics versus personal contact information).
Data class	A grouping of data elements such as mailing address (which includes, e.g., name, street address, city, state, and country).
Data element	A single data entity such as last name or phone number.
Grammar	(From Computing Dictionary) A formal definition of the syntactic structure of a language (see syntax), normally given in terms of production rules which specify the order of constituents and their sub-constituents in a sentence (a well-formed string in the language). Each rule has a left-hand side symbol naming a syntactic category (e.g. "noun-phrase" for a natural language grammar) and a right-hand side which is a sequence of zero or more symbols. Each symbol may be either a terminal symbol or a non-terminal symbol. A terminal symbol corresponds to one "lexeme" - a part of the sentence with no internal syntactic structure (e.g. an identifier or an operator in a computer language). A non-terminal symbol is the left-hand side of some rule. One rule is normally designated as the top-level rule which gives the structure for a whole sentence. A grammar can be used either to parse a sentence (see parser) or to generate one. Parsing assigns a terminal syntactic category to each input token and a non-terminal category to each appropriate group of tokens, up to the level of the whole sentence. Parsing is usually preceded by lexical analysis. Generation starts from the top-level rule and chooses one alternative production wherever there is a choice.
Grammar (P3P)	In P3P, the grammar defines the structure of P3P clauses used to make a valid P3P statement. The grammar are the rules for properly ordering clauses. The following example structures clauses (in the parentheses) to make a simple privacy practice statement: for (these URIs) the following (practices) apply to this (set of data)
P3P data repository	A mechanism for storing data under the control of P3P preferences over a period of time. These data might include personal data.
Permissions	A set of conditions (e.g., access mode) which are specified on requested P3P data. A service asks the user agent to initially consent to permissions in conjunction with the service's commitment to follow its declared privacy practices.
Persona	A persona is the combination of a set of user preferences and P3P data. Personae allow the user to create different views of themselves by changing the data given to a service. The persona may be based upon the service's purpose (e.g., business, gaming, home, etc.), credentials (e.g., level of associated trust), consequences and practices (e.g., personalization, shipping, mailing list), or any user defined rationale (e.g., time of day, phase of moon, etc). A user may have multiple personae.
Policies	The collection of all user defined preferences, including, but not limited to, P3P preferences.
Protocol	(From the Computing Dictionary) A set of formal rules describing how to transmit data, especially across a network. Low level protocols define the electrical and physical standards to be observed, bit- and byte-ordering and the transmission and error detection and correction of the bit stream. High level protocols deal with the data formatting, including the syntax of messages, the terminal to computer dialogue, character sets, sequencing of messages etc.
Practice	A P3P clause that describes what a service plans to do with data.
Preference	A rule, or set of rules, that determines what action(s) a user agent will take or allow when involved in a conversation or negotiation with a service. A preference might be expressed as a formally defined computable statement; e.g., a PICSRules rule. In this document, preferences govern the types of agreements that can be reached between a user agent and a service. Within this document, "preferences" are assumed to be P3P preferences.
Proposal	A series of statements. A proposal is used when a user agent and a service are negotiating to form an agreement.
Request	A message in which a service asks a user agent to transmit (read request) or store (write request) a data element or set of data elements.
Result set	The user's data sent to the service by the user agent.
Service	A program, for P3P purposes, requesting data from, or providing data to, a user agent. By this definition, a service may be a server, a local application, a piece of locally active code, such as an ActiveX control or Java applet, or even another user agent.
Statement	A description of what data a service will request, what the service will do with it, and the consequence to the user.
Syntax	(From Computing Dictionary) The structure of strings in some language. A language's syntax is described by a grammar. For example, the syntax of a binary number could be expressed as binary_number = bit [binary_number] bit = "0" \| "1" meaning that a binary number is a bit optionally followed by a binary number and a bit is a literal zero or one digit. The meaning of the language is given by its semantics.
Trust Engine	A mechanism for evaluating incoming statements to make a decision. For P3P purposes, the trust engine evaluates P3P proposals and requests.
User	An individual or group of individuals acting as a single entity. For the purposes of this document, the user is further qualified as an entity for which personal data exists and/or can be collected.
User agent	A program that acts on a user's behalf. The user agent may act on preferences (rules) for a broad range of purposes, such as content filtering, trust decisions, or privacy. For P3P purposes, a user agent acts on a user's privacy preferences. Users may use different user agents at different times.
Vocabulary (schema)	The defined set of words or statements that are allowable in a clause. For instance, a vocabulary may define the practice clause to be one of the two values: 'for system administration', 'for research'.

The P3P Architecture Working Group is charged to create a robust and efficient architecture for the Platform for Privacy Preferences.

It is important to note that the P3P architecture provides mechanisms that are policy-neutral. It provides a general architecture that can allow a range of social and commercial policies to be implemented.

The P3P architecture contains two basic entities: the user agent and the service. Each is defined in terms of its functionality, and each is considered largely as a black box. The service requests data from a user. The service, in the P3P architecture, states that it does not collect data or it collects some specified set of data according to some specified set of practices.

This functionality-based architecture is distinct from a client-server model. For example, in the P3P architecture, there is only one user agent. However, in reality, the user agent is a black box functionality. The implementation may have any number of different agents on any number of machines. The user agent may be in close communication with the user through some graphical user interface, or it may act as an automatic proxy. The user agent may even act on behalf of multiple users (e.g., a family) or for privileged users (e.g., a parent). From the viewpoint of P3P, the relations among users and user agents are all outside the scope of this project.

The implementation of a service (e.g., the relationship between a service and its servers) is also not specified within this architecture.

In the P3P architecture, it is presumed that the user agent has access to any data that the user wishes to safeguard. This data is kept within the data repository. The nature of both the data repository and how the data is accessed is not specified by this architecture. For example, in the case of a hand-held device with little onboard memory and storage, the user agent may act through another agent to obtain access to the data repository located on a third-party.

Data would be read from the data repository to provide personal information to a service. Services might also write information to the data repository; this would allow the capabilities provided by "cookies" in HTTP today.

The functional description of the data repository is provided in the Vocabulary Working Group's working draft. The data repository may include data elements written by both user agents and services. The user agent can help the user evaluate whether to allow reads from or writes to the data repository. Note that the reading and writing of data are always under the control (implicitly or explicitly) of the user.

Users evaluate Web services on the basis of many criteria in addition to privacy considerations. These "trust" criteria may include, among other issues, content, authority, cost, and governmental regulations. For example, a user may decide whether to look at a specific Web page because it contains sports, was authored by someone at the Boston Globe, and costs less than five cents. Users' evaluations are complex and based upon many personal nuances.

User agent implementations, then, may need to check for the existence of a variety of inputs: statements from services, labels on content, credentials, and other environmental information (IP Address, time of day, etc). These user agents will react to these statements. For instance, the user agent might restrict access to a site, control information flow to service, or allow the execution of active content. The user agent would act according to a broad set of preferences (rules) a user has established with that agent. (Some or all of these preferences may have been obtained from a third party, such as a government sanctioned preference bureau, a social or religious affiliated service, or other trusted source.) Furthermore, these statements may be acted on individually or in combination.

User agents, then, will serve as proxies in the evaluation process by users. Within the user agent, there will be some type of trust engine that makes this evaluation on behalf of the user. A variety of mechanisms may be employed by trust engine implementations. For example, one might implement such a trust engine using expert system rules or a neural network.

Users may specify their preferences using a variety of interfaces (determined by the implementation of the user agent they use). At some point these preferences might be stored using a standard language. They might be stored as purpose-specific practices (e.g., PICSRules for PICS labels, another language for P3P privacy preferences) or in a more general language. The set of all stored preferences is a user's policy.

A user’s policy might include preferences regarding P3P statements, signer credentials, RSACi labels, digital signature algorithms, safe-code labels, domain name restrictions (the server is in *.domain.com or *.edu), or locally defined statements/labels. For example, a user may restrict interest to the domains *.foobar.com or *.edu. As another example, the user might keep a database of sites she’s visited and generate personally meaningful labels for them. Or if a site presents no statements or labels, the user might fetch the page and generate one for it ("contains no profanity" or "does not include Java applets."). The user’s policy might also include rules for identifying how all of the practices should interact.

Users who have a different set of preferences based upon type of sites, time of day, etc., are creating policies that combine P3P preferences with preferences about other input statements and credentials. Together, P3P statements and preferences are part of a larger picture. The trust engine, as mentioned, will evaluate many types of incoming data, including P3P statements. One of the reasons to keep the user agent and service as block boxes is to restrict the scope of P3P to privacy considerations. By treating the user agent as a black box, no consideration of the trust engine implementation is required. We merely assume that, in some way, the user agent is able to evaluate P3P practice statements on behalf of the user.

There are 8 types of statements or requests that can be made in P3P: The form of the statements and requests is still to be specified.

Statements 1, 4, and 6 may be combined by a service. The request for data and the request for the transfer of data have been separated to allow different negotiation mechanisms, and they may be combined in future P3P protocols.

Additionally, some negotiation between the user agent and the service may take place. The negotiation statements will be specified by a future working group.

User agents must deal with unsolicited proposals. As the user moves between servers within the same experience space, the servers (within the same service) may need to send unsolicited proposals.

It is recommended that user agents at least keep the last agreement in the current experience space. The user agent can then compare requests and agreements, allowing optimization.

In the simplest scenario, a service understands P3P, but collects no information. If the browser does not understand P3P, it should look to the browser as it does currently. If the browser understands P3P, it has the option of returning an agreement as an acknowledgement.

In a standard scenario, both the service and user agent are P3P-compliant, and the service will request personal data from the user agent.

Steps 4 to 6 may be done in any order and may be done multiple times. Steps 1 through 6 may be bundled together in various ways (e.g., 2 and 4 together) for optimization.

The following are implementation recommendations considered important by the architecture group:

	Service	User agent
1		Request of "index.html" from www.random-site.com.
2	Service sends P3P statement and index.html.
3		(optional) User agent returns agreement.

	Service	User agent
1		Request of "index.html" from www.random-site.com; user agent makes request for proposal from service
2	Service sends P3P proposal.
3		User agent returns agreement.
4	Service sends request for data.
5		User agent transfers data to service.
6	Service sends index.html.

WD-P3P-arch-971022

P3P Architecture Working Group

General Overview of the P3P Architecture

W3C Working Draft 22-October-97