This NOTE is a technical overview and introduction to the Platform for Privacy Preferences (P3P), submitted for publication in the Communications of the ACM. Whereas the NOTE's topic is the W3C's P3P Activity, it is not a direct result of a W3C Interest or Working Group and its publication indicates no endorsement of its content by the W3C.
Comments are welcome and should be addressed to the authors above.
Copyright © 1998 AT&T and the Massachusetts Institute of
Technology. All Rights Reserved. Distribution policies are governed by the
W3C intellectual property
terms.
Introduction
P3P in a Nutshell
Technical Mechanisms
Anonymity and Cookies
User Data Repository
Implementation and Deployment
Conclusions
References
Sidebar: W3C Specifications and Notes
Sidebar: The P3P Vocabulary
The World Wide Web Consortium (W3C)'s Platform for Privacy Preferences Project (P3P) provides a framework for informed online interactions. The goal of P3P is to enable users to exercise preferences over Web sites' privacy practices. P3P applications will allow users to be informed about Web site practices, delegate decisions to their computer agent when they wish, and tailor relationships with specific sites. We believe that users' confidence in online transactions will increase when they are presented with meaningful information and choices about Web site privacy practices.
P3P is not a silver bullet; it is complemented by other technologies as well as regulatory and self-regulatory approaches to privacy. Some technologies have the ability to technically preclude practices that may be unacceptable to a user. For example, digital cash, anonymizers, and encryption limit the information that the recipient or eavesdroppers can collect during an interaction. Laws and industry guidelines codify and enforce expectations regarding information practices as the default or baseline for interactions.
A compelling feature of P3P is that localized decision making enables flexibility in a medium that encompasses diverse preferences, cultural norms, and regulatory jurisdictions. However, for P3P to be effective, users must be willing and able to make meaningful decisions when presented with disclosures. This requires the existence of (1) easy-to-use tools that allow users of P3P to delegate much of the information processing and decision-making to their computer agents when they wish, as well as (2) a framework that promotes the use and integrity of disclosures by Web sites.
P3P is a project of the W3C, an international industry consortium that specifies protocols that promote the evolution of an open and interoperable World Wide Web. The development of P3P occurred within a consensus process involving representatives from more than a dozen W3C member organizations, as well as invited experts from around the world 1.
P3P is designed to help users reach agreements with services (Web sites and applications that declare privacy practices and make data requests). As the first step towards reaching an agreement, a service sends a machine-readable proposal in which the organization responsible for the service declares its identity and privacy practices. A proposal applies to a specific realm, identified by a URI or set of URIs. Figure 1 provides an example privacy proposal in both English and P3P syntax. Notice that this privacy proposal enumerates the data elements that the service proposes to collect and explains how each will be used, with whom data may be shared, and whether data will be used in an identifiable manner. The set of statements that may be made in a proposal is defined by the harmonized vocabulary, which is a core set of information practice disclosures. These disclosures are designed to describe what a service does rather than whether it is compliant with a specific law. The harmonized vocabulary is described in more detail in the sidebar.
Proposals can be automatically parsed by user agents -- such as Web browsers, browser plug-ins, or proxy servers -- and compared with privacy preferences set by the user. Thus, users need not read the privacy policies at every Web site they visit. If a proposal matches the user's preferences, the user agent may accept it automatically by returning a fingerprint of the proposal, called the propID. If the proposal and preferences are inconsistent, the agent may prompt the user, reject the proposal, send the service an alternative proposal, or ask the service to send another proposal.
The procedure in which a service sends a proposal and the user agent accepts or rejects it happens in a flexible manner, sometimes referred to as "negotiation." Similar negotiations occur in systems designed for negotiating whether a telephone caller sends his or her identification and whether a call recipient's telephone should ring [4, 5].
P3P achieves flexibility by allowing services to offer multiple proposals. For example, a Web site that provides information about movies might offer to supply movie reviews. In addition it provides local movie schedules to visitors who provide their zip codes. The P3P protocol also supports multi-round negotiation, however, the P3P specification recommends that services offer all proposals to the user on first contact. In many cases, the same agreement can be reached whether the service engages in multi-round negotiation or sends all proposals at once. Occasionally, multiround negotiation may benefit one of the participants [2]. However, such benefits are realized at the common expense of additional communication delays and poor network caching.
Some P3P implementations will likely support a data repository where users store information they are willing to release to certain services. If they reach an agreement that allows the collection of specific data elements, such information can be transferred automatically from the repository. Services may also request to store data in the user's repository. These read and write requests are governed by the P3P agreement reached with the user. In addition, the repository can be used to store site-specific identifiers that allow for pseudonymous interactions with Web sites that are governed by a P3P agreement. The P3P specification defines a standard set of base data elements that all P3P user agents should be aware of. It also provides a mechanism that services can use to define new data elements.
Although P3P provides a technical mechanism for ensuring that information is released only under an acceptable agreement, it does not provide a technical mechanism for making sure services act according to their agreements. However, laws and self-regulatory programs can provide enforcement mechanisms. For example, P3P proposals may include reference to an assuring party that may take legal or other punitive action against the service provider if it violates an agreement. TRUSTe is one such organization that has provided Web site privacy assurances, even before the development of P3P. In addition to accountability to an assuring party, services may also be held accountable to industry guidelines and laws.
In the following sections we describe some of the technical mechanisms behind
P3P, explain how P3P can be used to facilitate pseudonymous interactions,
describe the role of the P3P user data repository, and discuss P3P implementation
and deployment issues. We conclude with a discussion of P3P's relationship
to larger privacy questions.
CoolCatalog, makes the following statement for the Web pages at http://www.CoolCatalog.com/catalogue/. We collect clickstream data in our HTTP logs. We also collect your first name, age, and gender to customize our catalog pages for the type of clothing you are likely to be interested in and for our own research and product development. We do not use this information in a personally-identifiable way. We do not redistribute any of this information outside of our organization. We do not provide access capabilities to information we may have from you, but we do have retention and opt-out policies, which you can read about at our privacy page http://CoolCatalog.com/PrivacyPractice.html. The third party PrivacySeal.org provides assurance that we abide by this agreement.
<PROP realm="http://CoolCatalog.com/catalogue/" entity="CoolCatalog" propID="94df1293a3e519bb"> <USES> <STATEMENT purpose="1" recipient="0" id="0"> <REF name="Web.Abstract.ClientClickStream"/> </STATEMENT></USES> <USES> <STATEMENT purpose="2,3" recipient="0" id="0" consequence="a site with clothes you'd appreciate."> <WITH><PREFIX name="User."> <REF name="Name.First"/> <REF name="Bdate.Year" OPTIONAL="1"/> <REF name="Gender"/> </PREFIX></WITH> </STATEMENT></USES> <DISCLOSURE discURI="http://CoolCatalog.com/PrivPractice.html" access="3" other="0,1"/> <ASSURANCE org="http://PrivacySeal.org" text="third party" image="http://PrivacySeal.org/Logo.gif"/> </PROP>
At a high level, P3P can be viewed simply as a protocol for exchanging structured data. An extension mechanism of HTTP1.1 is used to transport information (proposals and data elements) between a client and service. At a more detailed level, P3P is a specification of syntax and semantics for describing information practices and data elements. The specification uses XML and RDF to capture the syntax, structure, and semantics of the information. (XML is a language for creating elements used to structure data; RDF provides a restricted data model for structuring data such that it can be used for describing Web resources and their relations to other resources.)
An important distinction between P3P and previous W3C meta-data activites such as the Platform for Internet Content Selection (PICS), is that P3P provides an opportunity for flexible access to resources. Services could use PICS labels to describe their privacy practices statically [6], and agreements could be inferred when user agents send data to a service after having received a privacy label. However, P3P allows services to offer multiple choices to users; data that is returned is accompanied by the propID of the governing agreement.
After reaching an agreement with a service, user agents with sufficient storage space should make note of the agreement by indexing the proposal by its propID. This allows user agents and services to refer to past agreements. Rather than sending a new proposal to the user agent on every contact, a service may send the propID of an existing agreement. This is 1) asserting that the service and the user agent have already agreed to a proposal, 2) signaling which proposal and privacy practices it is operating under, and 3) requesting those data elements referenced in the agreement. The user agent may turn away, respond with the requested data, or request a full proposal (if it has no record of such an agreement or it desires a new one). Future P3P revisions are likely to require that propIDs be digitally signed, thus providing irrefutable evidence of an agreement.
Many commercial Web sites are eager to maintain persistent relationships with users; most do not require personally-identifiable information. For those that simply wish to track the number of unique visitors to their sites, customize pages for repeat visitors, or serve advertising tailored to each visitor's interests anonymous and pseudonymous relationships work well.
Currently, many Web sites use HTTP cookies to develop relationships with visitors. By accepting a cookie containing a unique identifier from a Web site, users can identify themselves to the site when they return. However, the current HTTP cookie protocol provides minimal information to users, and cookie implementations in popular Web browsers don't make it easy for users to control which sites to accept cookies from -- though this may change 2. P3P includes two identifiers that users can exchange with services in place of cookies. The pairwise or site ID (PUID) is unique to every agreement the agent reaches with the service. If a user agrees to the use of a PUID, it will return the PUID and propID to URIs specified in the agreement's realm, as shown in Figure 2. The temporary or session ID (TUID) is used only for maintaining state during a single session. If a user returns to a site during another online session, a new TUID will be generated. PUID and TUIDs are solicited as part of a proposal and consequently have disclosures relevant to the purpose, recipients, and the identifiable use associated with them.
Although users can enter into agreements with services in which they agree
to provide a PUID or TUID and no personally identifying data, their interactions
may still be traceable, perhaps through their IP address. Users who are concerned
about this can use P3P in conjunction with an anonymizing service or tool
such as the Anonymizer,
LPWA,
Onion Routing, or
Crowds.
Whereas anonymous browsing may account for many Web interactions, there will be times when services solicit information from a user. Often such information is necessary in order to complete a transaction that a user has initiated. For example, services may require real world contact information for payment or delivery services. The P3P user data repository stores information on behalf of the user and releases it to services in accordance with P3P agreements.
The P3P user data repository provides benefits to users and services. Users benefit from not having to retype or keep track of data elements that may be requested multiple times. Services benefit from receiving consistent data from users each time they return to a Web site. There are also privacy benefits associated with coupling data solicitation and exchange with the P3P mechanism for disclosing privacy practices. Services may retrieve data from a user's repository as needed rather than storing it in a central database, thus giving users more control over their information. In addition this coupling reduces ambiguities. Rather than promoting general disclosures over types of information collected -- which may be poorly characterized -- the disclosures apply directly to the information solicited.
It will be important that implementors take care to implement data repositories in such a way that a user's data are protected from rogue applets, viruses, and other processes that might try to gain unauthorized access to this data.
P3P defines a base set of commonly requested data elements that are familiar to all P3P user agents. These elements include the user's name, birth date, postal address, phone number, email address, and similar information. Each element is identified by a standard name and assigned a data format. Whenever possible, the elements have been assigned formats consistent with other standards in use on the Internet, for example vCard [7]. Both individual elements and sets of elements may be requested. For example, a service might request an entire birth date or just a year of birth. By standardizing these elements, the P3P designers hoped to reduce accidental -- or purposeful -- user confusion resulting from requests by different services for the same data element but under different names.
User agent implementations may allow users to pre-populate their repositories with data. Or, user agents may prompt users to enter data when it is requested and save it to the repository automatically for future requests. Note, that even when a data element is already stored in a repository, user agents may not send it to a service without the user's consent. Information is only sent after an agreement has been reached.
In addition to frequently requested data elements, the base set also includes the PUID and TUID elements and a set of abstract elements that do not have static values stored in the repository. Rather, abstract elements represent information exchanged in the course of an HTTP interaction. For instance, the abstract elements include client and server click stream data, server-stored negotiation history, and form data. The abstract form element is used to indicate that a service proposes to collect data through an HTML form rather than through P3P mechanisms, thus giving the service greater control over the presentation of user prompts.
The service can use a standard set of categories defined in the harmonized vocabulary to describe the type of information to be collected through the form. For example, if the service wishes to give the user an opinion survey, rather than enumerating all of the questions on the survey it might simply declare that it is collecting form data of the "preference data" category and explain the purposes for which the data will be used. Furthermore, the form element can be used to signal user agents to look for elements in the data repository that match the fields in a form a service presents to the user. Some user agents may be able to automatically fill-in these fields with data from the repository.
Because many services are likely to want to collect information not contained in the base set, P3P includes a mechanism for services to declare their own sets of data elements and to request that users add them to their repositories.
Some user agents might also allow users to specify multiple personae and associate a different set of data element values with each persona. Thus users might specify different work and home personae, specify different personae for different kinds of transactions, and even make up a set of completely fictitious personae. By storing the data values that correspond to each persona in their repository, users will not have to keep track of which values go with each persona to maintain persistent relationships with services. A system similar to LPWA might be used to automatically generate pseudononymous personae with corresponding email addresses and other information.
The truest test of a technology is in how it is implemented, deployed, and actually used. Although implementation and deployment issues are beyond the scope of the P3P specification, they are critical to the usefulness of P3P. Consequently, we briefly touch on issues related to deployment, interface, and usability.
Because it is unlikely that P3P will be adopted by Web sites and users universally and immediately, it is important that P3P implementations be designed to support incremental adoption and to provide incentives for individuals and Web sites to begin using P3P. Thus, for example, user agent implementations should not make it difficult for users to access sites that do not offer P3P proposals, but they should make users aware of whether sites offer P3P proposals. They might also use heuristics to identify and warn users about sites that do not offer P3P proposals and appear to collect personal information.
P3P could suffer from some what of a "chicken and egg problem" where Web sites are reluctant to adopt it until they see user demand, but user demand is minimal without widespread adoption by Web sites. It is our hope that good user agent implementations will help drive user demand, and regulatory and self-regulatory pressures will help speed adoption by Web sites. As P3P-compliant user agents are implemented and individuals begin using them, there will be more incentives for Web sites to offer P3P proposals and to take advantage of P3P mechanisms to bind data collection to specific privacy disclosures. As more Web sites offer P3P proposals and users grow accustomed to the added convenience and confidence that P3P can provide, their tolerance for non-P3P-compliant Web sites may decrease.
The P3P specification does not address or place requirements on the user interface. Although good user interfaces are very important, they need not be standardized. Indeed, the P3P developers have stated they wish to encourage creativity and innovation in P3P user interfaces and not place unnecessary restrictions on them. At the same time, there are certain principles that P3P participants hope implementors will keep in mind so that their implementations, including their user interfaces, achieve the goals of P3P; these principles are captured in the P3P Guiding Principles Note.
If P3P is to prove valuable to users, implementations must be user friendly. The P3P vocabulary provides a level of granularity that not everyone may wish to configure initially. However, because people have varying sensitivity towards privacy, we can not afford to reduce the amount of information expressed to all users to the granularity that is desired by the "lowest common denominator" -- those who want the least information. Consequently, well-designed abstractions and layered interfaces -- by which users initially choose from a small number of basic settings, and then if they desire, access advanced interfaces -- are critical to the success of P3P [1].
Because users may not be comfortable with default settings developed by a software company, they might prefer to select a configuration developed by a trusted organization, their system administrator, or a friend. Thus P3P includes a mechanism for exchanging recommended settings. These "canned" configuration files are expressed by APPEL, A P3P Preference Exchange Language. Rather than manually configuring a user agent, a user can select a trusted source from which to obtain a recommended setting. These are the settings the user agent will use when browsing the Web on behalf of its user.
Furthermore, all configuration need not happen the first time someone uses P3P. Users should not have to configure all settings on installation. Instead, users can use a recommended setting to configure the most basic preferences. Afterwards, users can grow their trust relationships over time by accruing agreements with the services they frequent.
The ability to offer explicit agreements on the basis of specific privacy disclosures is a compelling method of addressing policy concerns in the context of a decentralized and global medium. P3P applications will aid users' decision-making based on such disclosures. Consequently, if widely deployed, P3P will likely be one of the first applications that enables trust relationships to be created and managed by the majority of Web users. However, these interactions must happen in a comprehensible, almost intuitive way -- akin to how we make sophisticated (but taken-for-granted) decisions in real life. Fortunately, P3P supports two mechanisms that can make this possible. First, it enables users to rely on the trusted opinions of others through the identity of the service and assuring party, as well as through recommended settings. Second, trust relationships can be interactively established over time, just as they are in the real world.
P3P is designed to accommodate the balance desired by individuals, markets, and governments with respect to information exchange, data protection, and privacy. However, this type of solution must be cast in a context cognizant of its primary assumption: decentralized, agent-assisted decision-making tools allow users to make meaningful decisions. P3P's success will be determined by how well users' believe their privacy expectations are being met when using P3P. This is dependent on the quality of the implementations, the abilities of users, and the presence of a framework that promotes the use and integrity of disclosures.
Furthermore, other external factors will have a significant effect on how technologies like P3P fare as they are implemented and adopted. Will expectations of higher-levels of privacy than are currently offered force a change in market practices? Will the tenacity of practices require people to modify their expectations? Or in practice, do people actually care about privacy to the degree that recent surveys would have us believe? The answers to these questions will not be determined by technology alone.
W3C working groups initially produce specifications in the form of working drafts. Eventually, a working draft may be put forward to the W3C membership as a proposed recommendation. After an advisory vote by the membership, the W3C Director may issue the proposal as a recommendation. W3C also publishes notes, which are simply public records of ideas. The publication of a W3C Note does not imply any endorsement by W3C. The following W3C documents are relevant to P3P and this paper.
The P3P harmonized vocabulary specifies a core set of information practice disclosures. These disclosures are designed to describe what a service does rather than whether it is compliant with a specific law or whether it upholds a particular principle.
A P3P proposal consists of a series of disclosures that make assertions about a service's information practices. The set of assertions defined by the harmonized vocabulary is listed below. Numbers in parenthesese indicate the codes used to represent each assertion in a P3P proposal.
The following proposal elements make assertions that apply to an entire proposal.
Realm - The Unifrom Resource Identifier (URI) or set of URIs covered by the proposal.
Disclosure URI - The URI of a service's human-readable privacy policy. This policy must include contact information for the service provider.
Access to Identifiable Information - The ability of the individual to view personally-identifiable information and address questions or concerns to the service provider. Services's may disclose one or more of the following access categories and provide a human-readable statement about their access practices at the disclosure URI:
Assurance (accountability) - An assuring party that attests that the service will abide by its proposal, asserts that the service follows guidelines in the processing of data, or makes other relevant assertions. Assurance may come from the service provider or an independent assuring party.
Other Disclosures - Services may indicate that they make either of the following additional disclosures as part of their human-readable policy:
The following proposal elements make assertions that apply to a set of data elements or data categories. Some may be applied to an entire proposal as well.
Consequence - A human-readable description of the benefits or other results of agreeing to a proposal.
Data category - A quality of a data element or class that may be used by the user's agent to determine what type of element is under discussion. The following data categories have been defined in the harmonized vocabulary:
Purpose - The reason a data element is collected. The follwoing purposes have been defined in the harmonized vocabulary:
Identifiable Use - A declaration as to whether data is used in a way that is personally identifiable -- including linking it with identifiable information obtained from other sources. While some data is obviously identifiable (such as a person's full name), other data (such as zip code, salary, birth date) could allow a person to be identified, depending on how it is used.
Recipients - An organizational area, or domain, beyond the service provider and its agents where data may be distributed. The following recipients choices are provided in the harmonized vocabulary:
Joseph Reagle (reagle@w3.org) is a policy analyist at the World Wide Web Consortium, and co-chair of the P3P Interest group; http://www.w3.org/People/Reagle/
Lorrie Faith Cranor (lorrie@acm.org) is a senior technical staff member at the AT&T Labs-Research Shannon Laboratory in Florham Park, NJ, and co-chair of the P3P Interest group; http://www.research.att.com/~lorrie/