Data Stewardship and Accountability at
the U. S. Census Bureau
Nancy A. Potok
Principal Associate Director and Chief Financial Officer
U.S. Census Bureau
Gerald W. Gates
Chief, Policy Office
U.S. Census Bureau
“Benefits and Stewardship of Linked Survey and Administrative Data”
Federal Committee on Statistical Methodology Statistical Policy Seminar
November 6-7, 2002
This paper has undergone a review more limited in scope than that given to official Census Bureau publications. It is released to inform interested parties about the Census Bureau’s data stewardship approach to balancing confidentiality protections while providing quality data and to encourage discussion of these important issues.
Statistical agencies have long recognized the fundamental tension between their mandate to provide high-quality data that informs sound research and public policy development and their requirement to protect the privacy and confidentiality of their respondents. These dynamics often operate at odds with one another, as demands for richer data products face off against increasing public concerns about privacy, the increased availability of personal information on the internet, and newer, cheaper desktop data processing capability. However, a statistical agency’s reputation for respecting privacy and confidentiality is critical to maintaining high response rates and, thus, the quality of its data.1 The U.S. Census Bureau’s mission to be the “preeminent collector and provider of data on people and the economy of the United States,” requires that this tension be balanced successfully.
The Census Bureau’s legal mandate, Title 13 of the United States Code, authorizes the collection of data, but it also establishes strict requirements for maintaining the confidentiality of data collected from its respondents. Indeed, the Census Bureau may not publish data about a particular establishment or individual that allows them to be identified. Even when the Census Bureau requires expert consultation from outside the agency, such experts are not permitted access to the data unless they are brought on as “Special Sworn Status” individuals2 – effectively temporary staff – who are sworn to uphold the Census Bureau’s confidentiality standards. Criminal penalties, specifically up to $250,000 in fines and 5 years imprisonment, further help to create an environment intolerant to such disclosures. Given the agency’s strong legal mandate and ethical commitment to privacy and data confidentiality, how does it ensure that collected data result in useful, relevant and timely products?
A sound data stewardship structure within which such issues can be weighed provides a forum where the Census Bureau’s can make balanced business decisions – data quality and access on one side of the scale and privacy and confidentiality on the other. The concept of “stewardship” is borrowed from environmentalists – the objective being to create a sustainable balance that supports one’s needs over the long term.
Establishing a Basic Data Stewardship Structure
While data stewardship principles may exist, they are not always well coordinated or integrated, and/or they are applied in an ad hoc manner, depending on the particular circumstances involved. Chart 1 demonstrates how business decisions that affect data-related operations -- collections, processing, analysis, dissemination, and archiving -- can become unbalanced and lose a corporate focus when there is no integration of strategies, policies, controls or practices, or they are not used systematically to make business decisions.
Chart 1 --
If strategies, policies, controls, and practices are fully integrated, the organization has a better chance of ensuring that business decisions will lead to the desired outcome. Chart 2 illustrates how an otherwise ad hoc approach can be stabilized, achieving balance between business objectives and constraints. This better supports the data related operations.
hart 2 --
Chart 3 --
One goal of the DSEP Committee is to ensure that strategic goals, corporate ethics, policies, controls, and operational practices are integrated and consistent. This means that strategic goals are shaped by corporate ethics and drive policies. Policies in turn drive the creation of organizational controls, and these controls incorporate practices that ensure compliance. For example, as shown in Chart 4, one of the Census Bureau’s strategic goals is to foster trust and cooperation through privacy and confidentiality. In support of this goal, the Census Bureau developed a set of ethical standards called Privacy Principles, one of which is Confidentiality. This Privacy Principle resulted in the Census Bureau adopting a policy prohibiting the browsing of records with personal identifiers by employees and others who may have access to those records. The Census Bureau is currently working to establish access control and auditing procedures, such as identifying data custodians in each division responsible for monitoring access to personal identifiers. The result will be that fewer employees will have access to sensitive records, and those that do will have all their interactions with the data tracked and monitored by an automated audit system.
Chart 4 --
The DSEP structure has been successful in systematically establishing policies and procedures in several key areas. Accomplishments include the release of an Administrative Records Handbook, and documenting procedures for the negotiation, acquisition, access, and use of administrative record data. The DSEP Committee also has finalized a policy on appropriate data access and use for non-employees with Census Bureau Special Sworn Status. It is currently completing an analysis of how well existing policies support the Privacy Principles.
While the primary responsibility of the DSEP Committee is to serve as the policy-making body, it also gives considerable attention to controls and practices. However, translating policy decisions into day-to-day operational practices is a highly human resource-intensive activity. As a result, policy implementation is moving ahead more slowly than was originally anticipated. The Census Bureau has handled this challenge, in part, by establishing a new Policy Associates Program, which details competitively selected Census Bureau program staff for one year to the Policy Office to help implement new data stewardship policies.
Data Stewardship and the Use of Administrative Records
The benefits and stewardship of linked survey and administrative data, the subject of this panel, are of great interest to the Census Bureau’s DSEP Committee, which uses its data stewardship framework to guide and support use of administrative records for statistical purposes. Using the approach introduced in Chart 4 above, the Census Bureau first looked to its strategic plan and whether administrative record data would support its goals. The Bureau’s strategic goal of “Fostering an environment that supports innovation, reduces respondent burden, and ensures individual privacy,” supports use of data from administrative records. They minimize the cost of direct data collection, reduce the burden on respondents, improve and enhance census and survey collections, and enable the development of improved data products that inform public policy. This strategic goal drives the development of policies that balance the benefits of administrative record use against privacy and confidentiality concerns, particularly given that these benefits are primarily derived from linking administrative records to other datasets.
Policy issues surrounding use of administrative records are identified by the DSEP Committee, with subsequent policy analysis and recommendations developed by the CARPP (see Chart 3 above). In addition to weighing the needs of the data user community and the public, the CARPP must give special consideration to the Census Bureau’s data providers, including managing and safeguarding data in accordance with their legal authorities and policy requirements. The CARPP and the DSEP Committee have established a number of procedures for managing the use of administrative records at the Bureau.
Procedures for managing administrative records include consistent review criteria for all proposed projects; centralized custodial functions to control data access on a “need-to-know” basis; and centralized tracking of administrative record projects. In addition, personal identifiers on administrative records (e.g., Social Security Number and name) are maintained in a restricted environment by the custodian. Identifiers are stripped from the records before they are released to researchers. When necessary, the custodian replaces the personal identifiers with a “Protected Identification Key,” or “PIK,” to enable record linkage. Currently, the CARPP is developing a policy to guide the Bureau’s record linkage activities, again seeking the balance between developing relevant, high-quality data products and providing appropriate privacy and confidentiality protections to respondents.
Although the basic data stewardship structure provides a mechanism for balancing data quality and access with privacy and confidentiality, that balance is still somewhat precarious. Looking back at the generic framework in Chart 2, it is useful, then, to consider ways to further stabilize this structure.
The Census Bureau has considered a number of sources for guidance in strengthening its data stewardship approach. First, it conducted a benchmarking exercise, making structured inquiry of six best practice-oriented private and government organizations about their policies, agency structures, and roles with regard to privacy. It also conducted a literature review consisting of recent privacy research both at the Census Bureau and elsewhere. The Census Bureau also drew on a General Accounting Office report issued in April 2001, Record Linkage and Privacy: Issues in Creating New Federal Research and Statistical Information, which provides a toolkit of approaches to support data stewardship.3 Lastly, the DSEP Committee commissioned an evaluation of the DSEP structure (executive body plus four staff committees). The evaluation targeted four areas for improvement -- the need to focus on employee awareness of the data stewardship structure; include stakeholders in policy discussions; be more systematic in assessing the operational impacts of policies; and restructure the role of the Security staff committee. The assessment activities also identified four key components that can help stabilize the data stewardship structure – culture and tradition, technical and administrative tools, awareness and outreach, and an integrating authority.
As shown in Chart 5, adding these steps to the data stewardship pyramid helps achieve a more stable balance between data access and use, on the one hand, and data protection, on the other.
Chart 5 --
Culture and tradition form the basis for a statistical agency’s approach to data stewardship. Al Zarate, Confidentiality Officer at the National Center for Health Statistics (NCHS) describes the Census Bureau as having a "culture of confidentiality.”4 Some organizations have cultures that focus predominantly on access to information. In an academic environment, for example, information sharing is the lifeblood of learning. The primary focus is on sharing research, not limiting access. Other organizations, like the National Security Agency, place a priority on keeping information highly controlled and access limitation is paramount. Survey organizations would not continue to do business without a focus on both confidentiality and access. The Census Bureau’s culture and tradition fit this model well.
Technical and administrative tools play an important role in a well-grounded data stewardship structure. Today, most organizations control disclosure by providing safe settings, where data can be used for legitimate statistical purposes, and by releasing safe data, where the data have been modified to hamper those who attempt to identify individual respondents. These tools allow organizations to more effectively accomplish the business objective of providing access to data while also ensuring confidentiality. They also play a role in restricting access and limiting uses within the organization. Need-to-know access and file-level auditing ensure that employees are not tempted to browse records or give others access, regardless of the motive. In deciding what tools to apply, the organization must be aware of external threats, assess the physical constraints on users, and take into consideration the impact on utility of the data for intended research.
Awareness and outreach activities help ensure that business decisions are based on the valid concerns of external stakeholders, including respondents, privacy advocacy groups, and the data user community. Without adequate research and data on privacy attitudes and behaviors and data needs, it is easy to fall into an endless loop of supposition and speculation in the policy development process. The Census Bureau has conducted privacy attitude surveys for the past decade, to measure the public’s awareness of confidentiality requirements and gauge concerns over the use of administrative records. Attitude surveys, focus groups, and cognitive interviews play an important role in understanding awareness of organization practices and identifying practices that may be misunderstood or not be acceptable. Messages that are conveyed to employees and to the public help reassure that data uses are important and that protections are appropriate. Message wording benefits from cognitive testing to ensure that what is intended is what is understood.
An agency’s marketing activities also support the agency’s outreach efforts by emphasizing the organization’s objectives and constraints and how its culture, tools and legal authority enforce its approach to data stewardship. It is critical, however, that messages accurately reflect practice (i.e., the “talk matches the walk”) -- saying you do something when you don’t can be worse than not saying anything at all.
An integrating authority is critical to ensure integration of strategies, policies, controls and practices and to make most effective use of culture, tools and awareness. This typically entails a role for persons or groups to decide or advise on policies, controls and practices. The National Center for Health Statistics (NCHS) enlists its confidentiality officer for this purpose, who provides internal advice on data protection and access decisions. The Canadian government has established a Privacy Commissioner, who provides counsel and direction on matters affecting the privacy of Canadian citizens. Statistics Canada also has a privacy and confidentiality officer. In other instances, agencies are subject to Institutional Review Boards that review and approve survey research affecting human subjects. NCHS and the Census Bureau have also established Disclosure Review Boards to review and approve all publicly released data. Lastly, there is a trend among U.S. institutions to name a Chief Privacy Officer whose responsibility it is to implement privacy policies across the organization. Legislation recently enacted to establish a Department of Homeland Security requires affected federal agencies to establish a Chief Privacy Officer.
In short, there are several non-mutually exclusive options for establishing an integrating authority, all providing varying degrees of control. Some are purely internal, some external, and some provide a combination of the two orientations. The use of external decision makers is controversial and often resisted, but part of that resistance stems from a concern that such counsel generally lends itself to advocacy of privacy and confidentiality to the exclusion of balancing those concerns against the agency’s need to provide quality data products. A redirection of the integrating authority’s focus to a balanced data stewardship approach may alleviate this concern.
At this writing, the Census Bureau is deliberately working towards full implementation of the enhanced data stewardship framework illustrated in Chart 5. There are several data stewardship issues that will influence the way the Census Bureau – and the federal statistical community in general -- will function this decade. The impact of recent legislation like the USA Patriot Act and future implementation of new data sharing legislation (H.R. 2458), which passed through Congress in November 2002, need to be assessed and addressed. Additional challenges continue to arise.
As the Census Bureau explores the potential of using administrative records for statistical purposes, it needs a clear policy on record linkage methodology and standards for obtaining informed consent from respondents to conduct such matches. Also, administrative record procedures must include adequate controls on access and use of these data, which must be maintained in accordance with the requirements of the providing agencies. The Census Bureau is currently responding to new Office Management and Budget requirements for Privacy Impact Assessments, building on the Privacy Principles developed within the parameters of the data stewardship structure. A broad range of disclosure limitation approaches that permit safe release of data for public policy uses, must be developed, including contracting with experts to attempt unauthorized links of public data sets, and developing synthetic data sets to permit public users access to data while reducing the risk of identifying respondents.
Lastly, a key point bears repeating: developing and maintaining a viable data stewardship structure requires a significant commitment and investment of resources from an agency. Nevertheless, this more structured approach to data stewardship is integral to striking a balance between the tensions inherent in meeting data user needs and honoring the privacy and confidentiality of its respondents. In the end, privacy and confidentiality -- which are typically perceived as business constraints – can actually enable an agency’s mission and business objectives by establishing the public’s trust and cooperation as respondents.
The authors wish to thank Eloise Parker for her role in the preparation of this paper and Eloise Parker, Wendy Alvey, and Ta Shunna Marshall for their assistance with the presentation.
Recommended Resources on Data Access and Confidentiality
Doyle, Pat. Julia I. Lane, Jules J.M. Theeuwes and Laura V. Zayatz. Confidentiality, Disclosure, and Data Access: Theory and Practical Applications for Statistical Agencies. North-Holland, published in conjunction with the Census Bureau (2001).
Duncan, George T. Thomas B. Jabine, Virginia A. de Wolf, Eds. Private Lives and Public Policies. Panel on Confidentiality and Data Access. Committee on National Statistics, Commission on Behavioral and Social Sciences and Education, and National Research Council. Washington, DC: National Academy Press (1993).
Federal Committee on Statistical Methodology (FCSM) (May 1994). Report on Statistical Disclosure Limitation Methodology. (Statistical Working Paper 22). Washington, DC: Office of Management and Budget, Office of Information and Regulatory Affairs, Statistical Policy Office.
Holz, V. Joseph, Robert Goerge, Julie Balzekas and Francis Margolin. Administrative Data for Policy-Relevant Research: Assessment of Current Utility and Recommendations for Development. A Report of the Advisory Panel on Research Uses of Administrative Data of the Northwestern University/University of Chicago Joint Center for Poverty Research (January 1998).
H.R. 2458. E-Government Act of 2002, containing the Confidential Information Protection and Statistical Efficiency Act of 2002. Legislation passed both Houses of Congress by November 15, 2002; pending presidential signature at this writing.
Privacy Protection Study Commission. Personal Privacy in an Information Society (July 1997). The principle of functional separation is addressed in Chapter 15.
U.S. Census Bureau. Administrative Records Handbook. May 2001. For inquiries about the handbook, please contact Eloise Parker, Administrative Records Coordinator, U.S. Census Bureau, FOB 3, Room 2430, Washington, DC 20233; (301) 763-2520; firstname.lastname@example.org.
U.S. General Accounting Office. Record Linkage and Privacy: Issues in Creating New Federal Research and Statistical Information. April, 2001. GAO-01-126SP.
Zarate, Alvan. Government Perspective on Data Stewardship for Statistical Data. Panel, “Statistical Data Stewardship in the 21st Century,” Joint Statistical Meetings, New York, NY, August 11, 2002. (American Statistical Association CD-ROM Proceedings in process.)
Zarate, Alvan O. Jacob Bournazian and Virginia de Wolf. Integrating Federal Statistical Information and Processes. Federal Committee on Statistical Methodology (FCSM) Committee on Data Access and Confidentiality (November 8-9, 2000). .