IV. AN EXERCISE

Most of the tools necessary to find or construct sequence patterns are available on the Internet at various Web sites. Some of the rare programs need to be installed on your computer. for the purposes of this course at ICGEB, we installed some of the necessary programs, and in the following we go through the major steps of pattern building and pattern search. The goal of this excercise is to establish if a newly determined sequence contains a known pattern, if it belongs to a protein family with known function and/or if it contains a novel pattern. You can use the sequence example given as homework or your own sequence. Read this scheme first, establish your strategy and then start submitting your sequence to the servers or to the programs available at ICGEB.

Step 1. Finding known patterns

This is a relatively simple task. Submit your sequence to one of the pattern search servers. Get the helpfile for the servers. If you find one or several known patterns in your sequence, you willl have to decide if the sequences not included into the known patterns are long enough so that you start finding novel patterns in them.

Step 2. Finding similar sequences

Establish if your sequence has homologs in the databases. You can use a number of e-mail or WWW servers for doing the search through the Internet. The most commonly used program for fast searching of the daily updated nucleic acid and protein database is BLAST. BLITZ is available for most detailed searches on Swiss-Prot, based on the Needleman Wunsch algroithm. You can use FASTA (as a part of the GCG package or through a server. The choice depends greatly on what output processor program you want to use in the following steps. If you do not know if your sequence is homologus to any functional domain, submit your sequence to SBASE and or domain.

Independent of the further strategy, inspect your search outputs and extract the important pairwise alignments into a separate file. Retrieve some or all of the apparently homologous sequences for multiple alignment.

Step 3. Multiple alignment

Perform a multiple alignment on the retrieved sequences. If the sequences are long, first extract the aligned regions with 10 flanking residues into a new file. You can use GCG pileup or the Multiple sequence alignment server (CLUSTAL-W, MAP and PIMA alignments). Formulate your pattern in generally known terms like PROSITE signatures or regular expressions.

Step 4. Pattern building

Use the PIMA program available on the BCM server.
Use the program ICGEBPROT for automated pattern building. this program runs on Swisss-Prot and will detect patterns in FASTA vs Swiss-Prot outputs only. the program performs the search with queries shorter than 200 residues. Formulate your pattern in terms of a PROSITE signature or regular expression.

Step 5. PROFILE building

Build a profile from a multiple alignment. Use the GCG PROFILESEARCH program to process your PILEUP output.

Step 6. Pattern and profile searching

At this step you will have one or several patterns. You can perform pattern searches by submiting them to the following servers:

For PROFILESEARCH (using a profile in the GCG format), you can use the BIOACCELERATOR server at Weizmann Institute of Science, Israel.

Exercises

1. Finding domain and functional site homologies with database search (short)
2. Finding and establishing patterns in a new sequence (longer)