Datasets and Numeric Data Services
The Heard Library provides acess to several sources of data. The Data Services Librarian is available to assist Vanderbilt faculty, staff, and students in locating numeric data sets for secondary data analysis.
Through the Heard Library, Vanderbilt is a member of the Inter-university Consortium for Political and Social Research (ICSPR). Established in 1962 at the University of Michigan in Ann Arbor, ICPSR is the world's largest online archive of social science data. In addition to preserving and making the data sets available. ICPSR also offers training in analyzing social science data. For more information, visit the Research Guide for Finding & Using Data from ICPSR.
The Roper Center for Public Opinion is another source of social science data that is provided by the library. Access to the iPOLL database is available for members of the Vanderbilt community. This database offers searching at the question level for public opinion polls. In many cases, you will be able to download the poll and data with a few mouse clicks. To locate other public opinion data sets, you may search the Roper Center's catalog. Contact the Data Services Librarian for more information about Vanderbilt's access to the Roper Center's data.
While the Data Services Librarian can help you make sure you can open the dataset in a statistical software package, such as SPSS, SAS, or STATA, she does not provide statistical consulting or instruction in these software programs.
What are Data?
Data vs. Statistics
Data are raw ingredients from which statistics are created. Statistics are useful when you just need a few numbers to support an argument (ex. In 2003, 98.2% of American households had a television set--from Statistical Abstract of the United States). They are usually presented in tables. Statistcal analysis can be performed on data to show relationships among the variables collected. Through secondary data analysis, many different researchers can re-use the same data set for different purposes.
Aggregate/Macro Data vs. Microdata
Aggregate or Macro Data are higher-level data that have been compiled from smaller units of data. For example, the Census data that you find on AmericanFactfinder have been aggregated to preserve the confidentiality of individual respondents. Microdata contain individual cases, usually individual people, or in the case of Census data, individual households. The Integrated Public Use Microdata Sample (IPUMS) for the Census provides access to the actual survey data from the Census, but eliminates information that would identify individuals.
Data Sets, Studies, and Series
In ICPSR, a data set or study is made up of the raw data file and any related files, usually the codebook and setup files. The codebook is your guide to making sense of the raw data. For survey data, the codebook usually contains the actual questionnaire and the values for the responses to each question. The setup files help will not display properly.
ICPSR uses the term series to describe collections of studies that have been repeated over time. For example, the National Health Interview Survey is conducted annually. In the ICPSR archive, you will find a description of the series that provides an overview. You will also find individual descriptions of each study (i.e. National Health Interview Survey, 2004). The study number in ICPSR refers to the individual survey.
Types of Data
Cross-Sectional describes data that are only collected once.
Time Series study the same variable over time. The National Health Interview Survey is an example of time series data because the questions generally remain the same over time, but the individual respondents vary.
Longitudinal Studies describe surveys that are conducted repeatedly, in which the same group of respondents are surveyed each time. This allows for examining changes over the life course. The Project on Human Development in Chicago Neighborhoods (PHDCN) Series contains a longitudinal component that tracks changes in the lives of individuals over time through interviews.
For more definitions, I highly recommend the Glossary of Selected Social Science Computing Terms and Social Science Data Terms compiled by Jim Jacobs, Data Services Librarian, UCSD.
Phone: (615) 343-3081
Office: Central Library 800 BC