Most of the tools necessary to find or construct sequence patterns are
available on the Internet at various Web sites. Some of the rare programs
need to be installed on your computer. for the purposes of this course
at ICGEB, we installed some of the necessary programs, and in the following
we go through the major steps of pattern building and pattern search. The
goal of this excercise is to establish if a newly determined sequence contains
a known pattern, if it belongs to a protein family with known function
and/or if it contains a novel pattern. You can use the sequence example
given as homework or your own sequence. Read this scheme first, establish
your strategy and then start submitting your sequence to the servers or
to the programs available at ICGEB.
This is a relatively simple task. Submit your sequence to one of the
pattern search servers. Get the helpfile for the servers. If you find one
or several known patterns in your sequence, you willl have to decide if
the sequences not included into the known patterns are long enough so that
you start finding novel patterns in them.
Establish if your sequence has homologs in the databases. You can use
a number of e-mail or WWW servers for doing the search through the Internet.
The most commonly used program for fast searching of the daily updated
nucleic acid and protein database is BLAST.
BLITZ is available
for most detailed searches on Swiss-Prot, based on the Needleman Wunsch
algroithm. You can use FASTA (as a part of the GCG package or through a
server. The choice
depends greatly on what output processor program you want to use in the
following steps. If you do not know if your sequence is homologus to any
functional domain, submit your sequence to SBASE and or domain.
Independent of the further strategy, inspect your search outputs and
extract the important pairwise alignments into a separate file. Retrieve
some or all of the apparently homologous sequences for multiple alignment.
Perform a multiple alignment on the retrieved sequences. If the sequences
are long, first extract the aligned regions with 10 flanking residues into
a new file. You can use GCG pileup or the Multiple
sequence alignment server (CLUSTAL-W, MAP and PIMA alignments).
Formulate your pattern in generally known terms like PROSITE signatures
or regular expressions.
Build a profile from a multiple alignment. Use the GCG PROFILESEARCH
program to process your PILEUP output.
At this step you will have one or several patterns. You can perform
pattern searches by submiting them to the following servers:
For PROFILESEARCH (using a profile in the GCG format), you can use the
BIOACCELERATOR server at
Weizmann Institute of Science,
Israel.