"A method to track dataset reuse in biomedicine: Filtered GEO accession" by Heather A. Piwowar

DataONE Sociocultural and Usability & Assessment Working Groups

Title

A method to track dataset reuse in biomedicine: Filtered GEO accession numbers in PubMed Central

Source Publication (e.g., journal title)

Proceedings of the American Society for Information Science and Technology

Authors

Heather A. Piwowar, National Evolutionary Synthesis CenterFollow

Document Type

Conference Proceeding

Publication Date

11-2010

DOI

10.1002/meet.14504701450

Abstract

Reusing research data has important potential benefits: generative science and efficient resource use. Tracking the reuse of research datasets would allow us to understand whether the potential benefits are indeed realized, enable recognition of investigators who produce, annotate, and share useful data, and inform data sharing and reuse initiatives, tools, and policies.

Unfortunately, the lack of clear attribution practices for data make automated tracking of data reuse difficult. I present a method for tracking research data reuse that takes advantage of the community norms around gene expression microarray data sharing and the rich NCBI Entrez resources. Specifically, the full‐text of papers stored in PubMed Central are queried for accession numbers of datasets archived in NCBI's Gene Expression Omnibus (GEO) repository. Studies known to have created microarray data are excluded through automated filters and guided manual curation. MeSH terms attached to the data creation and data reuse studies provide additional information for analysis. Finally, I extrapolate the findings to all of PubMed.

Automated portions of this method have been implemented in python and are openly available. Although imperfect, this dataset is a valuable initial resource for research into patterns of data reuse.

Recommended Citation

Piwowar, H. (2010). A method to track dataset reuse in biomedicine: Filtered GEO accession numbers in PubMed Central. Proceedings of the American Society for Information Science and Technology, 47(1), 1-2.

Submission Type

Post-print

Link to Full Text

Files over 3MB may be slow to open. For best results, right-click and select "save as..."

Find in your library

COinS

DataONE Sociocultural and Usability & Assessment Working Groups

Title

Source Publication (e.g., journal title)

Authors

Document Type

Publication Date

DOI

Abstract

Recommended Citation

Submission Type

Search

Browse

Contributors

Links

About Trace

DataONE Sociocultural and Usability & Assessment Working Groups

Title

Source Publication (e.g., journal title)

Authors

Document Type

Publication Date

DOI

Abstract

Recommended Citation

Submission Type

Share

Search

Browse

Contributors

Links

About Trace