Finding the Promising Areas in COVID-19 Research

Man where glassing talking and gesturing with another man who has his back to the camera.
Mike Lovett
James Pustejovsky

More than 50,000 academic articles have been written about COVID-19 since the virus appeared in November. This volume isn’t necessarily a good thing. The sheer number of articles makes it challenging for scientists to home in on accurate, promising research that should be studied further.

Computer science and linguistics professor James Pustejovsky, whose research focuses on language and extracting information from large amounts of text, is helping to create an artificial intelligence platform called Semantic Visualization of Scientific Data (SemViz), which can sort through the growing mass of published work on the coronavirus, and help biologists gain insights and notice patterns across research that could lead to a treatment or a cure.

In this Q&A, Pustejovsky explains his work’s implications.

How would a biologist studying coronavirus use SemViz?

This tool gives a rapid way for biologists to see a global overview of inhibitors, regulators and activators of genes and proteins involved in the disease.

SemViz creates a visualization landscape that helps biologists make both global and specific connections among human genes, drugs, proteins and viruses. The overall program I’m working on contains three components: two semantic visualization outputs based on the entire coronavirus research data set, as well as a natural language-based question-answering application.

What’s the language application piece, and how does it work?

It is essentially a computer-based “reading machine” that interprets tens of thousands of research articles on coronavirus and presents the results of this process to biologists in a form that is visually accessible, and easily analyzed and interpreted.

It is more informative than a search engine, because it utilizes a host of language-understanding tools and artificial intelligence that can be applied to different domains (economics, news, science, literature) and text types (tweets, articles, books, email).

What are the implications of SemViz?

It’s hard to overstate the challenge brought about by information overload, particularly now with the coronavirus literature.

Biologists are interested in the mechanisms and functions of specific chemicals and proteins. SemViz can be the road map scientists use to sort through large amounts of research to find these functions and relationships.