A simple method to improve life sciences patent searches using the cyberinfrastructure at the National Institutes of Health

Abstract

In the life sciences, an immense publicly–funded cyberinfrastructure contributes significantly to the rapid accumulation of knowledge and innovation. Much of the traditional public infrastructure consists of physical materials such as journal articles, data repositories, and repositories for biological and chemical samples. However in recent years, the US National Institutes of Health and in particular the NIH’s National Center for Biotechnology Information generated a cyberinfrastructure in the form of electronic databases for journals, gene sequences, protein structures, and other experimental data.(The NCBI’s cyberinfrastructure is described in detail elsewhere [1].) The NCBI’s cyberinfrastructure is notable because it is both extensive and highly interconnected across diverse types of data. That is, scientists that contribute an individual “unit” to one of these resources have their data standardized and cross–indexed against other databases, thereby facilitating discoveries that may only be illuminated in the context of other, related scientific resources. For example, consider the Pubmed entry shown in Figure 1. In addition to bibliographical information, the entry is tagged with numerous words and phrases that were manually added by an independent scientist reading the manuscript. The largest set of these tags is called the “MESH headings,” referring to the NIH’s “MEdical Subject Headings”: a hierarchical ontology developed to classify the biomedical literature.

Publication
First Monday
Date