 |
Two sequences can be aligned in such a way that the identical residues
will be on top of each other. Unfortunately, there are many ways to align
two sequences. Consider the two words: WORD and BIRD. We intuitively feel
that the best alignment is
Aligning WORD with BEARD is more difficult, we have a number of similar
solutions like:
B E A R D B E A R D B E A R D
| | | | | |
W O R D W . O R D W O . R D
In mathematical terms, we can use the unit operations of insertion,
deletion and replacement to edit a word. After a number of such
unit operations we can transform WORD to BEARD. The minimum number of operations
is the so-called Levenstein distance of two words. In sequence alignment
we operate with a specific weighted version of this distance concept:
We assign specific weights to insertion, deletion and replacement, as described
in detail under Database searching. Specifically, the replacement matrix
most often used is the Dayhoff matrix, that represents the evolutionary
weight of amino acid replacements. In protein comparisons we often use
the BLOSUM matrix, which is specifically designed for database searches.
Suffice to say for now that we have a distance function which will be
zero for identical sequences. Or, what is essentially the same, we can
use the similarity score, which is maximal for identical sequences.
|