What is Biophysics?

Molecular biology itself grew out of biophysics.The British Biophysical Society defines biophysics as:

“an interdisciplinary field which applies techniques from the physical sciences to understanding biological structure and function”

More information about the various facets of the discipline can be found at the society’s site hosted at Birkbeck College, London.

Mike Goodrich wrote to ask what the status of biophysics was given the definition of computational biology submitted by Paul Schulte (below). A recent article in The Scientist [free registration required] dealt with this question—thanks to Jo Wixon (Managing Editor of Comparative and Functional Genomics) for the reference.

What is Computational Biology?

Computational biologists might object (please do), but, I find that people use “computational biology” when discussing that subset of bioinformatics (in the broadest sense) closest to the field of classical general biology.

Computational biologists interest themselves more with evolutionary, population and theoretical biology rather than cell and molecular biomedicine. It is inevitable that molecular biology is profoundly important in computational biology, but it is certainly not what computational biology is all about (see next paragraph). In these areas of computational biology it seems that computational biologists have tended to prefer statistical models for biological phenomena over physico-chemical ones. This is often wise…

One computational biologist (Paul J Schulte) did object to the above and makes the entirely valid point that this definition derives from a popular use of the term, rather than a correct one. Paul works on water flow in plant cells. He points out that biological fluid dynamics is a field of computational biology in itself. He argues that this, and any application of computing to biology, can be described as “computational biology” (see also the “loose” definition of bioinformatics below). Where we disagree, perhaps, is in the conclusion he draws from this—which I reproduce in full:

“Computational biology is not a “field”, but an “approach” involving the use of computers to study biological processes and hence it is an area as diverse as biology itself.”

Richard Durbin, Head of Informatics at the Wellcome Trust Sanger Institute, expressed an interesting opinion on this distinction in an interview:

“I do not think all biological computing is bioinformatics, e.g. mathematical modelling is not bioinformatics, even when connected with biology-related problems. In my opinion, bioinformatics has to do with management and the subsequent use of biological information, particular genetic information.”

What is Medical Informatics?

The Medical Informatics FAQ (no relation) provides the following definition:

“Biomedical Informatics is an emerging discipline that has been defined as the study, invention, and implementation of structures and algorithms to improve communication, understanding and management of medical information.”

That FAQ also points here

Aamir Zakaria, the author of the FAQ, emphasises that medical informatics is more concerned with structures and algorithms for the manipulation of medical data, rather than with the data itself.

This suggests that one difference between bioinformatics and medical informatics as disciplines lies with their approaches to the data; there are bioinformaticians interested in the theory behind the manipulation of that data and there are bioinformatics scientists concerned with the data itself and its biological implications. (I believe that a good bioinformatics researcher should be interested in both of these aspects of the field.)

Medical informatics, for practical reasons, is more likely to deal with data obtained at “grosser” biological levels—that is information from super-cellular systems, right up to the population level—while most bioinformatics is concerned with information about cellular and biomolecular structures and systems.

On both of these points I’d be happy for any medical informatics specialists to correct me.

What is Cheminformatics?

The Web advertisement for Cambridge Healthtech Institute’s Sixth Annual Cheminformatics conference describes the field thus:

“the combination of chemical synthesis, biological screening, and data-mining approaches used to guide drug discovery and development”

but this, again, sounds more like a field being identified by some of its most popular (and lucrative) activities, rather than by including all the diverse studies that come under its general heading.

The story of one of the most successful drugs of all time, penicillin, seems bizarre, but the way we discover and develop drugs even now has similarities, being the result of chance, observation and a lot of slow, intensive chemistry. Until recently, drug design always seemed doomed to continue to be a labour-intensive, trial-and-error process. The possibility of using information technology, to plan intelligently and to automate processes related to the chemical synthesis of possible therapeutic compounds is very exciting for chemists and biochemists. The rewards for bringing a drug to market more rapidly are huge, so naturally this is what a lot of cheminformatics works is about.

Here is a page with a commercial slant which links to some interesting discussions of the term “cheminformatics”, what it means, whether or not it exists as a distinct discipline, and even whether it should be replaced by “chemoinformatics”.

The span of academic cheminformatics is wide and is exemplified by the interests of the cheminiformatics groups at the Centre for Molecular and Biomolecular Informatics at the University of Nijmegen in the Netherlands. These interests include:

  • Synthesis Planning
  • Reaction and Structure Retrieval
  • 3-D Structure Retrieval
  • Modelling
  • Computational Chemistry
  • Visualisation Tools and Utilities

Trinity University‘s Cheminformatics Web page, for another example, concerns itself with cheminformatics as the use of the Internet in chemistry.

What is Genomics?

Genomics is a field which existed before the completion of the sequences of genomes, but in the crudest of forms, for example the oft-re-referenced estimate of 100 000 genes in the human genome derived from a(n) (in)famous piece of “back of an envelope” genomics, guessing the weight of chromosomes and the density of the genes they bear. Genomics is any attempt to analyze or compare the entire genetic complement of a species or species (plural). It is, of course possible to compare genomes by comparing more-or-less representative subsets of genes within genomes.

What is Mathematical Biology?

Mathematical biology is easier to distinguish from bioinformatics than computational biology. Mathematical biology also tackles biological problems, but the methods it uses to tackle them need not be numerical and need not be implemented in software or hardware. Indeed, such methods need not “solve” anything; in mathematical biology it would be considered reasonable to publish a result which merely establishes that a biological problem belongs to a particular general class.

The distinction between bioinformatics and mathematical biology was illuminated by an email I received from Alex Kasman at the College of Charleston. According to his working definition, he distinguished bioinformatics which (under the tight definition at least)…

“…seems to focus almost exclusively on specific algorithms that can be applied to large molecular biological data sets…”

…from mathematical biology which…

“…includes things of theoretical interest which are not necessarily algorithmic, not necessarily molecular in nature, and are not necessarily useful in analyzing collected data.”

What is Proteomics?

A recent review on proteomics in the journal Nature defined the field this way:

“The term proteome was first coined to describe the set of proteins encoded by the genome1. The study of the proteome, called proteomics, now evokes not only all the proteins in any given cell, but also the set of all protein isoforms and modifications, the interactions between them, the structural description of proteins and their higher-order complexes, and for that matter almost everything ‘post-genomic’.”

Michael J.Dunn, the Editor-in-Chief of Proteomics defines the “proteome” as:

“the PROTEin complement of the genOME”

and proteomics to be concerned with:

“qualitative and quantitative studies of gene expression at the level of the functional proteins themselves”

that is:

“an interface between protein biochemistry and molecular biology”

Characterizing the many tens of thousands of proteins expressed in a given cell type at a given time—whether measuring their molecular weights or isoelectric points, identifying their ligands or determining their structures—involves the storage and comparison of vast numbers of data. Inevitably this requires bioinformatics. Here is a constructively skeptical review by Lukas Huber.

What is Pharmacogenomics?

Pharmacogenomics is the application of genomic approaches and technologies to the identification of drug targets. Examples include trawling entire genomes for potential receptors by bioinformatics means, or by investigating patterns of gene expression in both pathogens and hosts during infection, or by examining the characteristic expression patterns found in tumours or patients samples for diagnostic purposes (possibly in the pursuit of potential cancer therapy targets).

The term “pharmacogenomics” is used for the more “trivial”—but arguably more useful—application of bioinformatics approaches to the cataloguing and processing of information relating to pharmacology and genetics, for example the accumulation of information in databases like this one. (Thanks to Ivanovi.)

What is Pharmacogenetics?

All individuals respond differently to drug treatments; some positively, others with little obvious change in their conditions and yet others with side effects or allergic reactions. Much of this variation is known to have a genetic basis. Pharmacogenetics is a subset of pharmacogenomics which uses genomic/bioinformatic methods to identify genomic correlates, for example SNPs (Single Nucleotide Polymorphisms), characteristic of particular patient response profiles and use those markers to inform the administration and development of therapies. Strikingly, such approaches have been used to “resurrect” drugs thought previously to be ineffective, but subsequently found to work with in subset of patients. They can also be used for optimizing the doses of chemotherapy for particular patients.

Overview of most common bioinformatics programs

Everyday bioinformatics is done with sequence search programs like BLAST, sequence analysis programs, like the EMBOSS and Staden packages, structure prediction programs like THREADER or PHD or molecular imaging/modelling programs like RasMol and WHATIF.

Overview of most common bioinformatics technology

Currently, a lot of bioinformatics work is concerned with the technology of databases (Thanks again to Ivanovi.) These databases include both “public” repositories of gene data like GenBank or the Protein DataBank (the PDB), and private databases, like those used by research groups involved in gene mapping projects or those held by biotech companies. Making such databases accessible via open standards is very important. Consumers of bioinformatics data use a range of computer platforms: from the more powerful and forbidding UNIX boxes favoured by the developers and curators to the far friendlier Macs often found populating the labs of computer-wary biologists.

Databases of existing sequencing data can be used to identify homologues of new molecules that have been amplified and sequenced in the lab. The property of sharing a common ancestor, homology, can be a very powerful indicator in bioinformatics (see below).

Acquisition of sequence data

Bioinformatics tools can be used to obtain sequences of genes or proteins of interest, either from material obtained, labelled, prepared and examined in electric fields by individual researchers/groups or from repositories of sequences from previously investigated material.

Analysis of data

Both types of sequence can then be analysed in many ways with bioinformatics tools.

They can be assembled. Note that this is one of the occasions when the meaning of a biological term differs markedly from a computational one (see the amusing confusion over the issue at Web-based geek forum Slashdot). Computer scientists, banish from your mind any thought of assembly language. Sequencing can only be performed for relatively short stretches of a biomolecule and finished sequences are therefore prepared by arranging overlapping “reads” of monomers (single beads on a molecular chain) into a single continuous passage of “code”. This is the bioinformatic sense of assembly.

They can be mapped—that is, their sequences can be parsed to find sites where so-called “restriction enzymes” will cut them.

They can be compared, usually by aligning corresponding segments and looking for matching and mismatching letters in their sequences. Genes or proteins that are sufficiently similar are likely to be related and are therefore said to be “homologous” to each other—the whole truth is rather more complicated than this. Such cousins are called “homologues“.

If a homologue (a related molecule) exists, then a newly discovered protein may be modelled—that is the three dimensional structure of the gene product can be predicted without doing laboratory experiments.

Bioinformatics is used in primer design. Primers are short sequences needed to make many copies of (amplify) a piece of DNA as used in PCR (the Polymerase Chain Reaction).

Bioinformatics is used to attempt to predict the function of actual gene products.

Information about the similarity, and, by implication, the relatedness of proteins is used to trace the “family trees” of different molecules through evolutionary time.

There are various other applications of computer analysis to sequence data, but, with so much raw data being generated by the Human Genome Project and other initiatives in biology, computers are presently essential for many biologists just to manage their day-to-day results

Molecular modelling / structural biology is a growing field which can be considered part of bioinformatics. There are, for example, tools which allow you (often via the Net) to make pretty good predictions of the secondary structure of proteins arising from a given amino acid sequence, often based on known “solved” structures and other sequenced molecules acquired by structural biologists.

Structural biologists use “bioinformatics” to handle the vast and complex data from X-ray crystallography, nuclear magnetic resonance (NMR) and electron microscopy investigations and create the 3-D models of molecules that seem to be everywhere in the media.

Unfortunately the word “map” is used is several different ways in biology/genetics/bioinformatics. The definition given above is the one most frequently used in this context, but a gene can be said to be “mapped” when its parent chromosome has been identified, when its physical or genetic distance from other genes is established and—less frequently—when the structure and locations of its various coding components (its “exons”) are established.

Bioinformatics Books

I’ve divided suggested reading into books of general interest, those best suited to people coming from a computational/mathematical background and books for biologists interested in bioinformatics. Links to other lists of bioinformatics books follow this section of suggested reading.

General Introductions

Computational/Mathematical aspects

Applying bioinformatics in biological research