Research Guides: Text and Data Mining: Resources that allow TDM

Library Databases

Text and data mining rights vary by publisher. If the needed resource is not listed, please contact Research Support for assistance.

Publisher	Content Available	Access Method	Registration Process	For More Information
Adam Matthew	All Vanderbilt licensed content	API	Contact Research Support.	Adam Matthew data mining / text mining statement Adam Matthew API overview
Annual Reviews	All Vanderbilt licensed content		Contact Research Support.
American Association for the Advancement of Science (AAAS)	All Vanderbilt licensed content	Download from the AAAS online platform	Contact Research Support.	Science Online Journals Institutional License Agreement
Clarivate Analytics	Web of Science	API	Create a user account for the Clarivate Developer Portal. Because this site shares credentials with other Clarivate services, you may already have an existing account.	Available Web of Science APIs Clarivate Developers Portal
Duke University Press	Vanderbilt licensed ebooks and Project Euclid		Contact Research Support.
Elsevier	ScienceDirect	API	Request an API key via the Elsevier developers portal.	Elsevier text and data mining policy
Gale	19th Century Collections Online Archives of Sexuality and Gender Associated Press Collections British Library Newspapers Early Arabic Printed Books from the British Library Financial Times Historical Archive Making of Modern Law State Papers Online Full title list		Contact Research Support.
HistoryMakers	HistoryMakers Digital Archive		Contact Research Support.
JSTOR	Available data includes metadata, n-grams, and word counts for most articles and book chapters, and for all research reports and pamphlets on JSTOR. Datasets may include data for up to 25,000 documents.	Zip files containing .txt, .xml, or n-grams	A JSTOR account is required to request a dataset. Register for a free JSTOR account.	JSTOR Data for Research
Linguistic Data Consortium	All Vanderbilt licensed content		Vanderbilt users can select corpora published from 2022 - present. Contact your subject librarian for access.	LDC corpora by year
OCLC	WorldCat	API	Contact Research Support.	WorldCat Search API overview
Oxford University Press	Oxford Historical Treaties		Contact Research Support.
ProQuest	British Periodicals I-IV American Periodicals History Vault: Latino Civil Rights during the Carter Administration Proquest History Vault. American Federation of Labor Records: The Samuel Gompers era, 1877-1937		Contact Research Support.
Royal Society	Vanderbilt users may perform automated searches of licensed content.		Contact Research Support.
Sage Journals	All Vanderbilt licensed content	Download from the Sage platform or use the CrossRef Public API	No registration is required. Follow publisher instructions and terms of use.	Text and Data Mining on Sage Journals
Springer Nature	All Vanderbilt licensed and open access content	API	Register via the Springer Nature API Portal.	Springer Nature text and data mining policy
Taylor & Francis	All Vanderbilt licensed journal content	Arranged by request	Contact Research Support.	Taylor & Francis Text and Data Mining Policy
TDS Health	Vanderbilt licensed content in Stat!Ref	API	Contact Research Support.	TDS Health OpenSearch Support
Wiley	All Vanderbilt licensed content	API	Review the Wiley Text and Data Mining statement and scroll to the Get a Text and Data Mining Token section. Users must login using their Wiley Online Library credentials. If you are not registered, please do so at the registration page.	Wiley Text and Data Mining statement

Freely Available Content for TDM Projects

In addition to the specific resources listed below, check out this list of Open Access disciplinary repositories.

Publisher	Content Available	Access Method	Registration Process	For More Information
arXiv	Offers public API access to e-print content and metadata in the areas of physics, mathematics and computer science.	API	None	arXiv API access documentation
BioMed Central	Open access content published by BMC	API	Register via the Springer Nature API Portal.	BMC API overview
Caselaw Access Project	All U.S. federal and state case law	API	Some access requires registration for free API key.	Usage and access
CrossRef	Metadata records with CrossRef DOIs	API	None	Text and data mining for researchers
Digital Public Library of America	Metadata on items and collections	API	Request an API key	DPLA API Codex
Folger Shakespeare Library	Downloadable files of the Folger Shakespeare texts in six different digital formats.	Downloads available from https://shakespeare.folger.edu/download/.	None	Additional API tools
HathiTrust	Use the HathiTrust APIs to query and retrieve data when you have a known identifier. HathiTrust APIs are not search APIs (e.g., where you use a keyword to search across the collection).	API	To use the Data API request an API key.	HathiTrust Data Availability and API Options
Internet Archive	20 million freely downloadable books and texts	Individual works are downloadable from the Internet Archive website. Bulk download require a terminal emulator and wget.	None	Instructions for downloading in bulk
Library of Congress	Chronicling America: Historic American Newspapers	API	None	About the site and API directions
Library of Congress	LC for Robots provides machine-readable access to the Library of Congress' digital collections, including images, laws and regulations, and bibliographic information.	Varies		LC for Robots documentation
National Library of Medicine	Multiple text mining tools for accessing various NLM databases and biomedical literature.	Varies		Text Mining Tools NLM Products and Services
OECD	Programmatically access a selection of top used datasets covering data for OECD countries and selected non-member economies. OECD datasets are dynamically updated. It is recommended that VU researchers start in the OECD iLibrary subscription access as more data can be exported in one request.	API available and if needed RSS data feeds. Contact your subject librarian if RSS is needed.	None	OECD data for developers
PLOS (Public Library of Science)	Access to article corpus and article metadata.	API	None	PLOS text and data mining documentation
Project Gutenberg	Over 60,000 books, usually out of copyright.	TXT, HTML, ePUB	None	Project Gutenberg Permissions, Licensing and other Common Requests
PubMed Central	Selected datasets including PMC Open Access Subset and the PMC Author Manuscript Dataset	API	None	PMC for Developers