Bioinformatics

What is bioinformatics?

Bioinformatics uses computers to make sense of the vast amount of data researchers can now glean from living things. These things can be as seemingly simple as a single cell or as complex as the human immune response. Bioinformatics is a tool that helps researchers decipher the human genome, look at the global picture of a biological system, develop new biotechnologies, or perfect new legal and forensic techniques, and it will be used to create the personalized medicine of the future.

Bioinformatics holds the potential to recognize patterns in huge datasets that would be difficult or impossible for humans to recognize manually. It is made possible by improved observational tools and the increased speed and capacity of computers. By applying computation and analysis, researchers can better capture, manage, and interpret biological data from modern biology and medicine.

Before bioinformatics, there were only two ways to perform biological experiments: within a living organism (in vivo) or an artificial environment (in vitro). Now, a few hours of computer experiments (in silico) can compare to an entire lab with expensive, state of the art equipment staffed by an army of post docs with endless resources. Bioinformatic researchers also sift through biological databases with sophisticated tools to make meaningful discoveries that had previously been hidden.

Digital screen with DNA background
Digital screen with DNA data background shows the sequence of only a short portion of the human genome’s 3.1 billion protein base pairs. (Image by JuSun | istock.com)

The history of bioinformatics

The genesis of bioinformatics came in the early 1960s while researchers worked to decipher the molecular sequences of proteins. If researchers know the sequence of a protein, they can better identify the structure of the protein and understand how it works in cellular processes. When a sequence is known, it can also be connected to the gene that encodes it. Before the advent of modern computers, protein sequences were assembled, analyzed, and compared manually on sheets of paper taped side by side.

As soon as early computers became available, forerunners of today’s bioinformaticians created methods from scratch to manipulate and analyze molecular sequences. Through this process, a new area of research analyzing protein sequences using computers was generated.

The first large scale bioinformatics effort was the human genome project, which ran from October 1990 through April 2003. In this project, an international team of researchers sequenced and mapped all of the 3.1 billion protein base pairs in the genes that together make the human genome. Even given the span of many lifetimes, humans could not have manually sifted through this vast amount of data. Therefore, bioinformatics did not make the human genome project faster; it made it possible.

Bioinformatics: why and how

Experiments now produce so much biological data that it cannot be meaningfully understood without the aid of computers. Bioinformatics is used to analyze and interpret biological data, develop computer programs to efficiently access, manage, and use biological information and create mathematical formulas and statistical approaches to evaluate relationships in large datasets.

Bioinformatics allows researchers to perform experiments on a timeline that matters in a person’s lifetime. For example, during the coronavirus pandemic, researchers were able to sequence the virus’s genome, identify what was likely to be causing the virus to enter cells, and develop a vaccine based on the discovered genome within a matter of months. This would not have been accomplished if researchers had manipulated handwritten codes to find meaningful connections. Additional bioinformatic approaches are being used to identify and treat other diseases like diabetes and cancer.

To glean significant findings from biological data, bioinformatics uses computer tools like data mining, pattern recognition, visualization, and machine learning. By combining domain science knowledge, statistics, and data science, bioinformatics researchers can create more targeted web lab experiments to more quickly home in on a specific area of research, such as creating a new drug. They can also comb through information from existing databases to discover the gene responsible for a disease or discover common immune responses to evaluate unknown pathogens.

Bioinformatics researchers use computer software programs such as the National Center for Biotechnology Information’s (NCBI) BLAST, which compares DNA sequences to sequence databases to find regions of similarity. Other software programs, such as NCBI’s BLASTP, sort, retrieve, predict, store, and analyze protein sequence data.

Internet databases, such as NCBI’s GENBANK and Protein, are also important to bioinformatics research. This includes both primary and secondary databases, many of which are available publicly. Primary databases hold archival data derived from experiments and submitted by researchers. Secondary databases comprise analysis results of primary data that can be used to gain more understanding from former experiments. These secondary databases have enormous potential for new discoveries through applying bioinformatic tools. In combination with statistical techniques and theory, bioinformatics software and databases can manage and analyze biological data to solve problems critical to living systems and human health.

Bioinformatics
Bioinformatics can be used to better understand the complex amino acid structures in a protein. (Image by Timothy Holland | Pacific Northwest National Laboratory)

The benefits of bioinformatics

Bioinformatics makes previously unthinkable areas of study possible through data management and analysis. Because bioinformatics tools can be used to sift through enormous amounts of data from multiple studies, they exponentially increase the usefulness of past data as researchers mine information to make new connections. This can be done using data from different parts of the globe and generated or analyzed by researchers who have never met.

The tools can also be used to improve current experiments. Because of the relative ease of in silico experiments, researchers can iterate on their experimental design by using data analysis to determine targets to study or how many samples they need to make a statistically valid discovery. In addition, the tools can sort through scant data to better inform researchers as they make interpretations or identify never-before-seen protein sequences. And functional bioinformatics allows researchers to model these possible protein structures to determine their potential interactions within cells.

The limits of bioinformatics

More complicated experiments are possible as technology gets more sophisticated, but there is a limit to how far bioinformatics can reach. Even though bioinformatics can transform enormous amounts of biological data into patterns humans can read and comprehend, biologists who have specialized in relevant fields are still required to interpret those patterns. Indeed, bioinformatics as a field often must bring researchers from many different domains together to communicate along every part of the experimental chain.

And computational resources will always limit how quickly—and how much—bioinformatics can decipher. Although current projects are a long way off from earlier efforts that solicited computing power from individuals to complete their queries, most tools still must run on big computers and servers that need to be on the cloud.

What’s next for bioinformatics

Improvements in high throughput computing expand the amount and complexity of data that researchers can examine using bioinformatics. Emerging methods allow researchers to examine gene responses more closely in individual cells, gain more insight into larger medical databases, and move beyond discovering the structure of proteins to determining how they fold. Protein folding is an exceptionally complex but vital cellular process with millions of these molecular specific folds happening every second. Proteins that fold correctly into their three-dimensional shapes maintain healthy cellular operations and protect from or fight diseases. Researchers have used bioinformatics and other approaches to identify an ever-increasing number of protein structures; discovering the folds that create dynamic movement within these protein structures could be key to advanced medicines and materials.

Bioinformatics will likely aid in developing new therapeutics by allowing researchers to simulate drug interactions. There is also the possibility to tease out new clues to health from human microbiomes. And as bioinformatic researchers continue to link data from genes to the movement of fats, proteins, and metabolites in cells, bioinformatics can be used to predict the virulence of new or emerging pathogens within individuals.

Bioinformatics at Pacific Northwest National Laboratory

PNNL researchers are using bioinformatics as part of an effort to build a legal basis for forensic proteomics, an emerging forensic method that focuses on identifying proteins and that could become as widespread and trustworthy as DNA profiling. Forensic proteomics can fill in when DNA is missing, ambiguous, or was never present to begin with. For example, forensic proteomics can provide meaningful evidence in a search centered on a protein-based toxin, drug, or sports doping hormone. It can reveal clues about whether a disease occurred naturally or as the result of a biological attack. And forensic proteomics can aid investigations after DNA has identified which individuals were present, but not what type of tissue, fluid, or fingerprint residue they left behind. Publishing, finding acceptance in the scientific community, and developing precise probabilities for forensic proteomics are key to a forensic method being admissible in court.

Bioinformatics is integral to PNNL’s work in understanding pathogens as we study bacteria, viruses, and human cell immune responses. This work could be used to create targeted therapies or monitor and prevent future viral outbreaks. It is also leading to a capability to more quickly identify the presence of harmful bacteria without necessarily identifying particular pathogens. Even a few pathogens among the billions of bacteria in dirt, water, or food could cause severe disease. And faster identification among unknown species could become particularly useful as rising temperatures thaw bacteria in ice caps that have been frozen for thousands of years.

PNNL’s bioengineering efforts leverage our expertise in data science for energy and nuclear security. High throughput computing is helping researchers identify genes that can make crops more resilient and contribute to higher biofuel yields. And PNNL researchers are deciphering the difference between lab grown and wild strains of toxins and potential bioweapons. As part of this effort, researchers are identifying markers that could also trace the provenance of toxins and potential bioweapons.

Our bioinformatics work is also contributing to efforts to better understand metabolites, the small molecules cells use for structure and to grow, signal, and defend themselves. Natural colors, fragrances, pharmaceuticals, and many common chemicals are metabolites derived from biological systems. Bioinformatics could help discover and identify many more to improve human life.

Biological data
PNNL researchers use bioinformatics tools to make groundbreaking scientific discoveries from biological data. (Image: m/q initiative | Pacific Northwest National Laboratory)