The syntax of PROSITE patterns

The PROSITE patterns are described using the following conventions:

  1. The standard IUPAC one-letter codes for the amino acids are used.
  2. The symbol `x' is used for a position where any amino acid is accepted.
  3. Ambiguities are indicated by listing the acceptable amino acids for a given position, between square parentheses `[ ]'. For example: [ALT] stands for Ala or Leu or Thr.
  4. Ambiguities are also indicated by listing between a pair of curly brackets `{ }' the amino acids that are not accepted at a given position. For example: {AM} stands for any amino acid except Ala and Met.
  5. Each element in a pattern is separated from its neighbor by a `-'.
  6. Repetition of an element of the pattern can be indicated by following that element with a numerical value or a numerical range between parenthesis. Examples: x(3) corresponds to x-x-x, x(2,4) corresponds to x-x or x-x-x or x-x-x-x.
  7. When a pattern is restricted to either the N- or C-terminal of a sequence, that pattern either starts with a `<' symbol or respectively ends with a `>' symbol.

    Examples

    [AC]-x-V-x(4)-{ED}
    

    This pattern is translated as: [Ala or Cys]-any-Val-any-any-any-any-{any but Glu or Asp}

    < A-x-[ST](2)-x(0,1)-V
    

    This pattern, which must be in the N-terminal of the sequence (`<'), is translated as: Ala-any-[Ser or Thr]-[Ser or Thr]-(any or none)-Val