Research Guides: Text and Data Mining: Home

Contact us

Text Data Mining, AI, and Research Support
- Follow this link to tell us about your TDM project and a librarian will contact you shortly to provide support.
Digital Lab
- The Digital Lab helps with text and data mining instruction and project consultation. The Lab provides self-paced, asynchronous learning modules for a variety of topics in digital scholarship.
Data Science Institute
- Contact the Data Science Institute for large-scale data mining projects.
McGee Applied Research Center for Narrative Studies
- Contact the McGee Center for support and information on text and data mining projects related to media, audiovisual objects, and journalism.

What is Text and Data Mining?

Text and data mining (TDM) uses computational methods to extract and analyze large quantities of text files or data sets to quickly identify patterns and relationships. Text analytics methods include information retrieval, named entity recognition, part of speech tagging, sentiment analysis, network analysis, and topic modelling.

Don't know where to start? Check out Ted Underwood's Seven ways humanists are using computers to understand text.

Planning Your Project

Finding Corpora

You can find and collect text for analysis from a variety of resources including library content we subscribe to, open access content, social media, and online web resources. Your librarian can help you find corpora suitable for analysis.

Some textual resources are born-digital (e.g., Wikipedia, social media); other works are digitized and converted to text by OCR (optical character recognition). The accuracy of OCR varies by resource and your corpora may require OCR cleaning depending on your research needs.

TDM Use and Copyright

Many textual resources are still under copyright, complicating full-text access. Simply because the library has subscribed to the journal or database does not necessarily mean that we have TDM rights to that same content. Please contact your librarian to understand whether you have TDM rights to the library resource you are interested. In some cases, we can negotiate additional access for you (often at additional cost that may be borne by the researcher). Please allow sufficient lead time to negotiate additional access.

Budgeting

We are happy to consult with researchers on projects using existing content, however, we may be unable to provide licensing or funding for individual text-mining projects for needs not covered by university wide licenses. The library is unable to pay for project-by-project fees but will attempt to negotiate with the vendor for a more institutional solution. Therefore, as noted above, we highly encourage scholars to consider research or grant funding in these cases.