What if you wanted to compare 50+ texts? Or count the number of times certain phrases were said? You could do it on your own, but it would take a long time.
Digital text analysis can help. Text analysis uses computer scripts to “read” a text and identify patterns. Text analysis provides a range of outputs, including numerical counts of particular words or phrases, identifying positive/negative language (i.e. sentiment analysis). With additional temporal and/or geographic data, we can even identify how language trends changed trends over time and space.
There are a ton of ready-made toolkits for text analysis. This guide shares some good toolkits to try, tutorials, plus tips on preparing your dataset.
Tools with Embedded Content
Tools like Constellate, Scopus, and HathiTrust Digital Library, entail searching for published articles/reviews and analyzing the results. Users aren’t required to provide a dataset, and are pulling from a database. These tools are great for learning the basics of text analysis and can be great hands-on tools for the classroom. Learn the basics of Constellate and Scopus with this tutorial.
Preparing your Dataset
Unlike the above tools, which pull texts from databases like JSTOR, there are a number of tools that allow users to upload their own dataset. Datasets can be any type of text – whole books (of any genre), scripts, etc. You could create a dataset by distributing a survey to respondents, or scraping (i.e. pulling from a specific website) Tweets or product/place reviews.
Prepping this dataset will take some time, and specific formatting decisions will depend on 1) what tool you’re using 2) your research questions/intended analyses. Preparing the dataset is called “cleaning”. You can learn the basics of cleaning text data but be prepared for future tweaks once you start analyzing.
Think about your dataset
Before you can choose an analytical tool/platform, you should ask yourself some questions:
- Am I analyzing trends within a single text, or am I interested in comparing texts? Check out the examples below to figure this out:
- Single Text Example: What are the twenty most popular terms in Pride and Prejudice?
Comparing Texts Example: Does Jane Austen’s vocabulary expand over the course of her six major novels?
- Single Text Example: What are the twenty most popular terms in Pride and Prejudice?
- Make a list of some analytical questions you want to answer. Make sure it’s possible to answer those questions with the data you have.
- Am I interested in change over time? Does my dataset record any dates/times?
- Do I want to show change across space? Are there locations in my dataset?
Choosing a Tool
With a clean dataset and a list of some analytical questions, you can make an informed decision about what tool/platform to choose.
The chart below summarizes key features across a few text analysis platforms. Click here for an accessible PDF version. You may need to experiment with a few to ensure the platform meets your needs.
If you’re curious about whether text analysis is appropriate for your work, I recommend looking at Voyant and Orange first. Voyant is a great option for beginners and you can learn it quickly using this tutorial. Orange has a steeper learning curve but is great for beginners who need a powerful interface with a wide array of functions.
If you want to get an in-depth understanding of text analysis in an instructional setting, Dr. Margarita Corral teaches a Text Analysis using R (a coding language) workshop. It’s entry level. If you are working on any projects using political data, survey responses, or any other type of Social Sciences data, it’s highly recommended that you reach out to Dr. Corral to hear all of the options and best practices in your field.