Checklist for evaluating the plausibility of weak hits.
Examinations on the hit
Is the amino acid distribution consistent with globular (cytosolic,
extracellular), integral membrane, coiledcoil, fibrous or randomcoil
structure?
These are mutually exclusive structural classes that should not overlap
within a domain (although they can be juxtaposed in multidomain proteins).
High scoring random coil is not a good indicator of homology.
The knowledge of the threedimensional structure greatly facilitates
the evaluation as constraints from the hydrophic core, catalysis etc. can
be included.
Immediately rules out potential similarity. By definition, globular
domains do not overlap (although they can be inserted into loops in other
domains).
Globular structure is stabilised by interactions in the hydrophobic
core. Half a globular domain is a meaningless concept and very rarely observed.
Conserved blocks usually indicate secondary structural elements.
Is there a match to all highly conserved hydrophobic
residues?
These are essential to the given hydrophobic core. Very few exceptions
are tolerated.
Surface residues are usually hydrophilic, and are unconserved unless
binding other molecules. Multiple mismatched hydrophobic residues are contrary
indicators. (Surprisingly frequently, transmembrane regions are erroneously
aligned to cytosolic proteins}.
Pro is favoured in the Nterminal 3 residues of an ahelix
Any deeper and it breaks Hbonds. It is allowed on edge strands. It
breaks Hbonds on internal strands. Exceptions are rare and cannot
be arbitrarily invoked for weak hits.
The lack of a sidechain reduces helix and strand stability. Gly aligned
to small hydrophobic residues (Ala, Val, Cys) may indicate a plausible
tight packing arrangement; otherwise, only occasional exceptions may be
tolerated.
Is a segment rich in Gly, Pro, Asn, Ser aligned to a
block poor in these residues?
Indicates a loop region is erroneously aligned to a secondary
structure element.
Secondary structures of matched blocks should be identical. In addition
to the above rules, amino acid preferences may be indicative: e.g. aligning
a sequence composed of preferring residues like lle, Val, Thr, Ser onto
an ahelix would be highly implausible (unless these were already favoured
in the aligned sequences).
Have new insertions/deletions appeared in conserved
regions?
Alignment blocks are usually conserved due to structural or functional
constraints: therefore large or frequent insertions and deletions are unlikely.
For Cysrich sequences, do the Cys patterns match
or not?
Number and spacing of Cys residues distinguish between classes of extracellular
disulphiderich modules, as well as (often with His) intracellular
zinc fingers e.g. the GAL4 example.
Are the functions of the hits compatible?
On the one hand one should not overinterpret results to fit a tempting
functional context; on the other hand, some functional aspects (e.g. query
proteins are extracellular, hit is a metabolic enzyme) should be considered.
Does additional functional or biochemical information
provide some clues as for homology?
Already identified catalytic residues, disulfide bridges, mutation data
etc. add constraints that can be helpful in excluding false positives.
|