Resources available for text and data mining vary by publisher. If you do not see the resource you are looking for, please contact your librarian about obtaining access or where to find corpora for your research needs.
Publisher | Content Available | Access Method | Registration Process | For More Information |
---|---|---|---|---|
Adam Matthew | API |
Contact your subject librarian. |
||
Annual Reviews | All Vanderbilt licensed content | Contact your subject librarian. | ||
American Association for the Advancement of Science (AAAS) | All Vanderbilt licensed content | Download from the AAAS online platform | Contact your subject librarian. | Science Online Journals Institutional License Agreement |
Clarivate Analytics |
Web of Science | API | Create a user account for the Clarivate Developer Portal. Because this site shares credentials with other Clarivate services, you may already have an existing account. | |
Duke University Press | Vanderbilt licensed ebooks and Project Euclid | Contact your subject librarian. | ||
Elsevier | ScienceDirect | API | Request an API key via the Elsevier developers portal. | Elsevier text and data mining policy |
Gale |
19th Century Collections Online Archives of Sexuality and Gender Associated Press Collections British Library Newspapers Early Arabic Printed Books from the British Library Financial Times Historical Archive Making of Modern Law State Papers Online |
Contact your subject librarian. | ||
HistoryMakers | HistoryMakers Digital Archive | Contact your subject librarian. | ||
JSTOR | Available data includes metadata, n-grams, and word counts for most articles and book chapters, and for all research reports and pamphlets on JSTOR. Datasets may include data for up to 25,000 documents. | Zip files containing .txt, .xml, or n-grams | A JSTOR account is required to request a dataset. Register for a free JSTOR account. | JSTOR Data for Research |
Linguistic Data Consortium | All Vanderbilt licensed content | Vanderbilt users can select corpora published from 2022 - present. Contact your subject librarian for access. | LDC corpora by year | |
OCLC | WorldCat | API | Contact your subject librarian. | WorldCat Search API overview |
Oxford University Press | Oxford Historical Treaties | Contact your subject librarian. | ||
ProQuest |
British Periodicals I-IV American Periodicals History Vault: Latino Civil Rights during the Carter Administration Proquest History Vault. American Federation of Labor Records: The Samuel Gompers era, 1877-1937 |
Contact your subject librarian. | ||
Royal Society | Vanderbilt users may perform automated searches of licensed content. | Contact your subject librarian. | ||
Sage Journals | All Vanderbilt licensed content |
Download from the Sage platform or use the CrossRef Public API |
No registration is required. Follow publisher instructions and terms of use. | Text and Data Mining on Sage Journals |
Springer Nature |
All Vanderbilt licensed and open access content |
API |
Register via the Springer Nature API Portal. |
Springer Nature text and data mining policy |
Taylor & Francis | All Vanderbilt licensed journal content |
Arranged by request |
Contact your subject librarian. | Taylor & Francis Text and Data Mining Policy |
TDS Health | Vanderbilt licensed content in Stat!Ref |
API |
Contact your subject librarian. | TDS Health OpenSearch Support |
Wiley | All Vanderbilt licensed content | API | Review the Wiley Text and Data Mining statement and scroll to the Get a Text and Data Mining Token section. Users must login using their Wiley Online Library credentials. If you are not registered, please do so at the registration page. | Wiley Text and Data Mining statement |
In addition to the specific resources listed below, check out this list of Open Access disciplinary repositories.
Publisher | Content Available | Access Method | Registration Process | For More Information |
arXiv | Offers public API access to e-print content and metadata in the areas of physics, mathematics and computer science. | API | None | arXiv API access documentation |
BioMed Central | Open access content published by BMC | API | Register via the Springer Nature API Portal. | BMC API overview |
Caselaw Access Project | All U.S. federal and state case law | API | Some access requires registration for free API key. | Usage and access |
CrossRef | Metadata records with CrossRef DOIs | API | None | Text and data mining for researchers |
Digital Public Library of America | Metadata on items and collections | API | Request an API key | DPLA API Codex |
Folger Shakespeare Library | Downloadable files of the Folger Shakespeare texts in six different digital formats. | Downloads available from https://shakespeare.folger.edu/download/. | None | Additional API tools |
HathiTrust | Use the HathiTrust APIs to query and retrieve data when you have a known identifier. HathiTrust APIs are not search APIs (e.g., where you use a keyword to search across the collection). | API | To use the Data API request an API key. | HathiTrust Data Availability and API Options |
Internet Archive | 20 million freely downloadable books and texts | Individual works are downloadable from the Internet Archive website. Bulk download require a terminal emulator and wget. | None | Instructions for downloading in bulk |
Library of Congress | Chronicling America: Historic American Newspapers | API | None | About the site and API directions |
Library of Congress |
LC for Robots provides machine-readable access to the Library of Congress' digital collections, including images, laws and regulations, and bibliographic information. |
Varies | LC for Robots documentation | |
National Library of Medicine | Multiple text mining tools for accessing various NLM databases and biomedical literature. | Varies | ||
OECD | Programmatically access a selection of top used datasets covering data for OECD countries and selected non-member economies. OECD datasets are dynamically updated. It is recommended that VU researchers start in the OECD iLibrary subscription access as more data can be exported in one request. | API available and if needed RSS data feeds. Contact your subject librarian if RSS is needed. | None | OECD data for developers |
PLOS (Public Library of Science) | Access to article corpus and article metadata. | API | Register to obtain an API key. | PLOS text and data mining documentation |
Project Gutenberg | Over 60,000 books, usually out of copyright. | TXT, HTML, ePUB | None | Project Gutenberg Permissions, Licensing and other Common Requests |