NJIT Software May Help Scientists Communicate About COVID
Every complex scientific field needs an ontology, and soon the primary one that covers COVID-19 will be easier for medication and vaccination researchers to understand, using new interpretive methods and software developed by experts at NJIT's Ying Wu College of Computing.
Ontologies are essentially dictionaries and maps of medical terms. Terms with the same meaning, such as cardiac arrest and myocardial infarction, are grouped together. Each group is called a concept. Concepts in turn are connected to each other using arrows and boxes to indicate which are general and which are specific. New concepts are often called children and older ones are parents, as in genealogy.
The medical community's open-source Coronavirus Infectious Disease Ontology (CIDO) was released in January 2019 by University of Michigan Associate Prof. Oliver He. It quickly became a behemoth of 5,138 concepts and 113 relationship types by its July iteration. Professors Yehoshua Perl, James Geller and their students in the Structural Analysis of Biomedical Ontologies Center realized their own Ontology Abstraction Framework (OAF), originally developed from 2015-2017 by former postdoctoral student Christopher Ochs, could be adjusted to tame the complexity of CIDO.
Figure 1 (above article headline) illustrates such complexity. Every white dot represents a medical term. The colorful lines are generalization links. All lines emanating from the concepts of the same level are in the same color. For example, the term recombinant vaccine vector is connected by a line to the more general concept of a vaccine component. Zooming in doesn't help because the other ends of the lines emanating from a concept are out of view.
Their method is called weighted-aggregate taxonomy (WAT), which means the visualization software shows custom views of the requested ontology sub-topics based on their importance. Importance is computed based on the topic's number of concepts.
"We've been summarizing ontologies for 20 years. But until now we were managing with the old way of doing this," explained Perl, noting that the old methods relied on summarizing and visualizing information based on its relationships, compared to the new way which visualizes by depicting hierarchical relationships. "The great thing about the OAF tool is you give it any ontology and it creates a view with whatever granularity level you want."
"A partial-area taxonomy is a network which provides a summarized view of an ontology for a display on a screen that is easily comprehensible. However, for a large ontology this summarization network may still be too large. For example, the CIDO consisted of 5,138 concepts. Its partial-area taxonomy had 519 partial-areas, which cannot be displayed as 519 boxes on one screen," Geller stated.
"To obtain an even more compact summary of an ontology, we defined the weighted aggregate taxonomy. The idea is to differentiate between major partial-areas summarizing many concepts and minor ones summarizing just a few, by defining a cutoff value. In a WAT, only partial-areas above the cutoff are displayed as boxes. Each such node (box) summarizes a major subject in the topic modeled by the ontology."
"The WAT worked well for many ontologies, but for CIDO it created a long and narrow diagram that did not make good use of a computer screen. The new "child-of-based layout" was shown to overcome this problem and generates a balanced layout for the CIDO summarization fitting well for visualization (see figure 2). Examples of major subjects of CIDO, shown in Figure 2, include process (summarizing 301 concepts), viral vaccine (standing for 58 concepts) and viral protein (summarizing 43)."
"However, the partial-areas below the cutoff value are not deleted, they are just hidden. Their contribution is aggregated into the closest large parent or ancestor partial-area. By clicking with the mouse on a major subject node, our OAF software tool can expand it back to show the hidden details."
"For example, when clicking on the major subject process, the OAF software tool generates the network of secondary subjects of the major subject process shown in Fig 3. Among these secondary subjects we find Coronavirus infectious disease process (summarizes 8 concepts), COVID-19 diagnostic process (7 concepts), and its child COVID-19 diagnostic process by serological assay (3 concepts)."
Geller, Perl, Michigan's He, and Monmouth University Assistant Prof. Ling Zheng, formerly an NJIT doctoral student, now are documenting their work in an article recently submitted to the Journal of Biomedical Informatics. The research is also part of a $3 million National Institutes of Health proposal for the team's design of a dedicated interface terminology for annotation of clinical notes in COVID-19 patients. Geller and Perl received a letter of support from Rosemont, Ill.-based Intelligent Medical Objects, a provider of clinical interface terminology software, who will help evaluate the prototype for potential commercialization.
The general public's interest in COVID may wane after a vaccine becomes mainstream, which could happen as soon as spring 2021 according to recent news reports on companies such as Moderna and Pfizer. But CIDO and the software to make sense of it all will be relevant for years to come.
"Even when a vaccine is available, it will take time for the whole world population to get vaccinated. Furthermore medicine will have still to deal with the leftovers of the pandemic — all those symptoms and problems which people have after they were cured already. Another issue is that CIDO will be very helpful when the next virus hits. It will need to be modified, but the framework and many concepts will be useful," Perl explained.
"People who have overcome the disease and are virus-free still are tired, cold or hot, and [have] other symptoms. Nobody knows why some people and not all, what the mechanism is, how long this will last on average or in extremis, if this is open to treatment," Geller added. "In short, we are not out of the COVID woods whatsoever."