The datasets underlying research findings are increasingly acknowledged as valuable scholarly products. Through data publication, researchers can give more visibility to their work while making their research data available to others for reuse. There are a number of great reasons for publishing data, including meeting funding agency requirements, advancing knowledge within your discipline, and increasing exposure to your research.
The benefits of data sharing to advance knowledge are becoming increasingly clear, and data intensive research has come to be known as the Fourth Paradigm of Discovery (along with empirical, theoretical, and computational modes). By performing new analyses and meta-analyses of datasets, researchers can use existing data sources to answer new questions.
Sharing the datasets that support research findings also makes the research process more open, permitting others to replicate the findings of a study. To advance this goal, a number of journals (e.g. Nature and the Public Library of Science [PLOS]) require authors to submit datasets along with articles for peer review, and to describe how they will make those datasets publicly available.
Researchers who share datasets are also seeing increased citations to their work. One recent study found that
“cancer clinical trials which share their microarray data were cited about 70% more frequently than clinical trials which do not. This result held even for lower-profile publications and thus is relevant to authors of all trials.”
While these findings are specific to cancer clinical trials, we can expect to see more studies addressing citation rates for datasets in a variety of disciplines.
Piwowar HA, Day RS, Fridsma DB (2007) Sharing Detailed Research Data Is Associated with Increased Citation Rate. PLoS ONE 2(3): e308. doi:10.1371/journal.pone.0000308
On February 22, 2013, the Office of Science and Technology Policy (OSTP) in the Executive Office of the President released the memo Increasing Access to the Results of Federally Funded Scientific Research directing federal agencies to act to ensure that "the direct results of federally funded scientific research" [including peer-reviewed publications and digital data] "are made available to and useful for the public, industry, and the scientific community."
In response, grant funding agencies in the United States, including the National Institutes of Health (NIH) and the National Science Foundation (NSF) now require grant applicants to submit data management plans, describing how research data will be managed during the course of a study and shared at the study's conclusion.