Using the MEROPS Database for Proteolytic Enzymes and Their Inhibitors and Substrates

UNIT 1.25

Neil D. Rawlings,1,2 Alan J. Barrett,1 and Alex Bateman2 1

Wellcome Trust Sanger Institute, Wellcome Trust Genome Campus, Hinxton, Cambridge, United Kingdom 2 EMBL European Bioinformatics Institute, Wellcome Trust Genome Campus, Hinxton, Cambridge, United Kingdom

MEROPS is a database of proteolytic enzymes as well as their inhibitors and substrates. Proteolytic enzymes and protein inhibitors are organized into protein domain families. In turn, families are organized into clans. Each peptidase, inhibitor, family, and clan has associated annotation, a multiple sequence alignment, a phylogenetic tree, literature references, and links to other databases. Interactions between proteolytic enzymes and inhibitors and between proteolytic enzymes and substrates are also presented. The entries in MEROPS are available via the World Wide Web. This unit contains detailed information on how to access and utilize the information present in the MEROPS database. C 2014 Details on running MEROPS both remotely and locally are presented.  by John Wiley & Sons, Inc. Keywords: proteolytic enzyme r peptidase r protease r inhibitor r substrate

How to cite this article: Rawlings, N.D., Barrett, A.J., and Bateman, A. 2014. Using the MEROPS Database for Proteolytic Enzymes and Their Inhibitors and Substrates. Curr. Protoc. Bioinform. 48:1.25.1-1.25.33. doi: 10.1002/0471250953.bi0125s48

INTRODUCTION A proteolytic enzyme cleaves the peptide bond between adjacent amino acids. All peptides and proteins undergo proteolysis, and many proteins undergo a series of controlled proteolysis events during their lifetimes. Proteolysis determines the final destination of a protein either within the cell or outside it. It is one of the last post-translational modifications leading to maturation of many proteins including enzymes and peptide hormones, and is irreversible. It leads to inactivation of many proteins, either to terminate activity, switch biological signals on or off, or bring about the death of a cell. Proteolysis is also responsible for complete degradation of proteins to soluble amino acids and peptides to recycle a protein, to destroy unwanted proteins such as foreign proteins and misfolded proteins, or to obtain components for nutrition. Proteins are large and complex biological polymers, and it is no surprise that every organism possesses an equally large and complex suite of enzymes to perform all the tasks outlined above. In common with many enzymes for which the substrates are biological polymers, classifying proteolytic enzymes by the reaction catalyzed is extremely difficult, because describing the precise reaction is difficult, and along with endonucleases, glycosyl hydrolases, and protein kinases, proteolytic enzymes are poorly served by standard enzyme nomenclature. The MEROPS database was established in 1996 to implement the classification system for proteolytic enzymes devised by Rawlings and Barrett (1993). The classification, Current Protocols in Bioinformatics 1.25.1-1.25.33, December 2014 Published online December 2014 in Wiley Online Library (wileyonlinelibrary.com). doi: 10.1002/0471250953.bi0125s48 C 2014 John Wiley & Sons, Inc. Copyright 

Using Biological Databases

1.25.1 Supplement 48

database, and Web site are widely used by researchers around the world. Recent studies that have made extensive use of MEROPS include genome analyses (Consiglio et al., 2011; Mulenga and Erikson, 2011), drug target analyses (e.g., Cleynen et al., 2011; Gutlapalli, et al., 2012; Greening et al., 2013), evolution studies (e.g., Laskar et al., 2011; Buller and Townsend, 2013), peptidase specificity analyses and predictions (e.g., Song et al., 2012; Fuchs et al., 2013), and cleavage identification (Lange et al., 2012). BASIC PROTOCOL 1

SEARCHING MEROPS BY SEQUENCE The most common use of the MEROPS database is to retrieve information on a known peptidase or inhibitor and the family and clan to which it belongs. This section describes the many methods by which such information can be retrieved. The second most common use is to determine to which clan, family, and peptidase, or inhibitor species, a protein of known sequence belongs. This section describes how to carry out these analyses.

Necessary Resources Hardware Workstation with Internet connection Software Javascript-enabled browser (e.g., Mozilla Firefox, Google Chrome, or Microsoft Internet Explorer) Files Protein sequence of interest in FASTA format (see APPENDIX 1B for description of FASTA format) Access MEROPS via the Web 1. Access the MEROPS Web site, which is currently available at http://merops .sanger.ac.uk. Search strategies 2a. Search MEROPS with a novel protein sequence: From the MEROPS homepage (Fig. 1.25.1), click on BLAST MEROPS from the menu on the left-hand side. Paste the sequence into the text box that appears. For most purposes, the default settings will be sufficient. The MEROPS BLAST search is an augmented BLASTP search. The search is performed against all the sequences in the MEROPS collection. The results (see Fig. 1.25.2) are augmented to highlight active site residues (which are shown as white text on a red background) and metal ligands (white text on a blue background). If any of these residues are not conserved, then these are shown as white text on a black background. This allows immediate visual inspection to decide whether the sequence submitted is a peptidase or a non-peptidase homolog. Cysteine residues that are known to form disulfide bonds are shown as white text on a yellow background. This method is applicable to both peptidases and their inhibitors.

Using the MEROPS Database for Proteolytic Enzymes, Their Inhibitors and Substrates

2b. Search MEROPS with a library of protein sequences: The MEROPS Web site can be used to analyze a genome for peptidase and peptidase inhibitor homologs. The recommendation is to analyze a proteome rather than a genomic DNA sequence (either is supported). Files will be submitted to the MEROPS batch BLAST (Rawlings and Morton, 2008) and must be prepared in advance. A file to be submitted must contain ࣘ5000 sequences, and no sequence should be longer than 10,000 residues. Each sequence should be in FASTA format (Pearson and Lipman, 1988). From the MEROPS homepage, click on batch BLAST from the left-hand side menu. Use the browse option to select the file containing the library, or type in the drive, folder,

1.25.2 Supplement 48

Current Protocols in Bioinformatics

Figure 1.25.1

The MEROPS menu.

Figure 1.25.2 Pairwise alignment from a MEROPS BLAST search. The sequences are: ubiquitin specific protease 1, from Pan troglodytes (MER061938), and the SINCAMP00000020284 protein, from Callorhinchus milii. Active site residues are shown as white text on a red background.

and filename for the library. In the textbox marked “email to:” enter your e-mail address. If there are problems, such as too many sequences or a sequence that is too long, then an error message will appear in red on screen. For a sequence that is too long, the identifier will be presented. You must then remove this sequence from the FASTA file library (or alternatively, divide the sequence into segments, but each segment must have a unique identifier). It is unusual for a peptidase sequence to be longer than 5000 residues, unless it is part of a viral polyprotein, and most sequences longer than 5000 residues can be deleted. Results will be returned as a tab-delimited list by e-mail. The MEROPS batch BLAST will compare each sequence in the file supplied against a library of selected peptidase sequences (merops_scan.lib) rather than all the sequences in the MEROPS collection, to speed up searching. The library contains the sequence of the peptidase unit from every holotype (i.e., a unique and biochemically characterized peptidase; see Commentary, below): it does not contain every peptidase sequence. By restricting the search to peptidase units, false positives resulting from matching domains other than the peptidase domain are prevented. However, because only

Using Biological Databases

1.25.3 Current Protocols in Bioinformatics

Supplement 48

holotypes sequences are present in the library, the user should be aware that matches to a distant homolog in a family might not be detected. This method is applicable to both peptidases and their inhibitors. BASIC PROTOCOL 2

SEARCHING MEROPS BY NAME, IDENTIFIER, ACCESSION, OR PUBLICATION Another common use of the MEROPS Web site is to retrieve information knowing one of the following: the name of a peptidase, inhibitor, family, or clan; a MEROPS identifier; a primary database accession for a peptidase or inhibitor sequence; a gene name or locus; name of the source organism; or the PubMed identifier for a publication. Because inhibitor clans, families, and species are accessed from a different menu, when the search strategy differs from that of a peptidase, it is described in a separate paragraph in each of the following sections.

Necessary Resources Hardware Workstation with Internet connection Software Javascript-enabled browser (e.g., Mozilla Firefox, Google Chrome, or Microsoft Internet Explorer) Search strategies 1a. Search MEROPS with a peptidase or inhibitor name: From the MEROPS homepage (Fig. 1.25.1), click on SEARCHES ,and then “Search for a Peptidase or Inhibitor by name”. In the text box, type or paste the name, or part of a name, and then select “begins with”, “contains”, or “is” depending on whether you want results starting with the entered text, containing the text, or equaling the text. Then, click the “submit query” button. The search is performed against all names in MEROPS, and the results show the found name, the name recommended by the MEROPS curators, and the MEROPS identifier, which is a link to the relevant page. This method is applicable to both peptidases and their inhibitors.

Alternatively, click on “Name” from the left-hand, green menu, and the first page of an alphabetical listing of all peptidase names will appear. Clicking on a particular letter from across the top of the screen will load the page of names starting with that letter. Each row in the table shows the protein name, the name recommended by the MEROPS curators, and the MEROPS identifier which is a link to the relevant peptidase summary. For an inhibitor, click on the “Switch to inhibitors” button, and then from the left-hand, gray menu, select “Name”. The pages returned are similar to the peptidase name pages.

Using the MEROPS Database for Proteolytic Enzymes, Their Inhibitors and Substrates

1b. Search MEROPS for small-molecule inhibitors: Click on the “Switch to inhibitors” button, and then from the left-hand, gray menu, select “Small-molecule inhibitor”. The first page of an alphabetical listing of all small-molecule inhibitor names will appear. Clicking on a particular letter from across the top of the screen will load the page of names starting with that letter. Each row in the table shows a small-molecule inhibitor name and the name recommended by the MEROPS curators, which is a link to the relevant small-molecule inhibitor summary. This feature is not available for peptidases.

1.25.4 Supplement 48

Current Protocols in Bioinformatics

1c. Search MEROPS by family name: Click on Family from the left-hand, green menu; an alphabetical listing of MEROPS identifiers for families will appear. The list is divided into sections corresponding to catalytic type; to move to a section lower down the page, click on an item in the list of catalytic types across the top of the page. Each row in the table shows the MEROPS identifier for the family, which is a link to the relevant family summary, the MEROPS identifiers of the subfamilies (if any) into which the family is divided, and the type enzyme for the family or subfamily. For an inhibitor, click on the “Switch to inhibitors” button, and then from the lefthand, gray menu, select “Family”. An alphabetical listing of MEROPS identifiers for families will appear. The format of the table is the same as that for peptidases. 1d. Search MEROPS by clan name: Click on Clan from the left-hand, green menu; an alphabetical listing of MEROPS identifiers for clans will appear. The list is divided into sections corresponding to catalytic type; to move to a section lower down the page, click on an item in the list of catalytic types across the top of the page. Each row in the table shows the MEROPS identifier for the clan, which is a link to the relevant clan summary, the MEROPS identifiers of the families into which the clan is divided (each of which is a link to the relevant family page), the type enzyme for the clan, and whether a tertiary structure has been solved for any member of the clan. For an inhibitor, click on the “Switch to inhibitors” button, and then from the lefthand, gray menu select “Clan”. An alphabetical listing of MEROPS identifiers for clans will appear. The format of the table is the same as that for peptidases. 1e. Search MEROPS with a MEROPS identifier: From the MEROPS homepage (Fig. 1.25.1), click on SEARCHES, and then “Search for a Peptidase or Inhibitor by MEROPS Identifier”. In the text box, type or paste a MEROPS clan, family, peptidase, or inhibitor identifier. Then, click the “submit query” button. The MEROPS recommended name and a link to the relevant page are returned. This method is applicable to both peptidases and their inhibitors.

Alternatively, click on Identifier from the left-hand, green menu; the first page of an alphabetical listing of all peptidase identifiers will appear. There is a page for each catalytic type (aspartic, cysteine, glutamic, metallo, asparagine lyase, mixed, serine, threonine, or unknown, which start with the letters A, C, G, M, N, P, S, T, or U, respectively). Each row in the table lists the MEROPS identifier, which is a link to the relevant peptidase summary, the name recommended by the MEROPS curators, and the numbers of human and mouse expressed sequence tags (ESTs) from all tissues.

For an inhibitor, click on the “Switch to inhibitors” button, and then from the lefthand, gray menu, select “Identifier”. The page for simple peptidase inhibitors, with an alphabetical listing of all identifiers, will appear. To view the list of compound inhibitors, select the “Compound peptidase inhibitors” button at the top of the page. 1f. Search MEROPS with an accession from the UniProt, GenBank, or EMBL databases: From the MEROPS homepage (Fig. 1.25.1), click on SEARCHES and then “Search for a Peptidase or Inhibitor by Database Accession Number”. In the text box, type or paste an identifier or accession. Then click the “submit query” button. The results

Using Biological Databases

1.25.5 Current Protocols in Bioinformatics

Supplement 48

show the source database, the recommended protein name, the organism name, and the MEROPS identifier, which is a link to the relevant page. This method is applicable to both peptidases and their inhibitors.

1g. Search MEROPS with a gene name: From the MEROPS homepage (Fig. 1.25.1), click on SEARCHES, and then “Search for a Peptidase or Inhibitor Gene by gene name”. In the text box, type or paste the name, or part of a name, and then select “begins with”, “contains”, or “is”, depending on whether you want results starting with the entered text, containing the text, or equaling the text. Then, click the “submit query” button. The search is performed against all gene names in MEROPS, and the results show the source organism and the recommended protein name, which is a link to relevant peptidase or inhibitor page. This method is applicable to both peptidases and their inhibitors.

Alternatively, click on “Gene name” from the left-hand, green menu; the first page of an alphabetical listing of all peptidase gene names will appear. Clicking on a particular letter from across the top of the screen will load the page of names starting with that letter. Each row in the table shows the gene name, the source organism, which is a link to the relevant organism page, and the MEROPS identifier, which is a link to the relevant peptidase summary. For an inhibitor, click on the “Switch to inhibitors” button, and then from the left-hand, gray menu, select “Gene name”. The pages returned are similar to those for peptidases.

1h. Search MEROPS by species: Click on Organism from the left-hand, green menu; the first page of an alphabetical listing of all organism names will appear. Clicking on a particular letter from across the top of the screen will load the page of names starting with that letter. Each row in the table shows the organism name, the scientific name of the organism, which is a link to the relevant organism page, and counts of the number of peptidases and non-peptidase homologs known for that organism. A row highlighted in peach represents an organism with a completely sequenced genome. For an inhibitor, click on the “Switch to inhibitors” button, and then from the left-hand, gray menu, select “Organism”. The pages returned have a similar format to the peptidase pages.

1i. Search MEROPS by PubMed identifier. Click on SEARCHES form the left-hand, green menu, select “Search for references in MEROPS by PubMed identifier”, then enter a PubMed identifier and click “Submit search”. The full reference is returned, as well as a list of peptidases or inhibitors and the relevant MEROPS identifiers to which the reference has been assigned. This method is useful if it is unclear to which peptidase or inhibitor the reference refers, for example, if an unusual name has been used.

Using the MEROPS Database for Proteolytic Enzymes, Their Inhibitors and Substrates

Alternatively, click on SEARCHES form the left-hand, green menu, select “Search for references in MEROPS by author name or text”, and enter the text to be searched in the text box, for example an author’s name or a peptidase name. Then, click “Submit query”. All references containing the text are returned (in blocks of up to 100), and each is followed by a list of peptidases or inhibitors and the relevant MEROPS identifiers to which the reference has been assigned. This method is useful for books, book chapters and publications not submitted to PubMed.

1.25.6 Supplement 48

Current Protocols in Bioinformatics

Each reference is stored as a text string containing the author names, title and journal name, and it is this string that is searched.

View strategies View MEROPS annotation for a peptidase or protein inhibitor 2. Click on the MEROPS identifier from any MEROPS Web page to access the peptidase or inhibitor page, which allows viewing of the MEROPS annotation. Fig. 1.25.3 shows a summary page for a peptidase. The page consists of a list of alternative names for the peptidase; a diagram to show the domain organization of the holotype protein; the MEROPS classification and details of the holotype; details of catalytic activity including physiological roles, pharmacological relevance, and the effects of gene knockout; two diagrams to show specificity (a logo and a heatmap); and details of human and mouse genetics. Finally, there is a comment to indicate relevant inhibitors, not only those that inhibit, but also those that might be expected to inhibit but do not. The domain image shows the main protein chain in blue, the signal peptide as a small black box, the propeptide as a small brown box, the peptidase unit as a green box, and any other domain as a red box. Disulfide bridges are shown as open boxes. Positions of N-linked carbohydrates are shown as orange lollipops. Active site residues are shown as red lollipops and metal ligands as blue lollipops. The buttons across the top of the screen provide access to supplementary pages A similar summary page exists for an inhibitor, but without the details of activity and diagrams of specificity. The domain image is similar to that for the peptidase, except that the inhibitor unit is shown in gray.

View MEROPS annotation for a family 3. Click on the family identifier from any MEROPS Web page to access the family page, which allows viewing of the MEROPS family annotation. Fig. 1.25.4 shows a summary page for a family that has been written by the MEROPS curators. It is divided into sections describing the types of peptidase activity found in the family, a brief history, descriptions of the active site, specificity, inhibitors, molecular structure, biological functions, pharmacological and biotechnological relevance, crossreferences to other databases such as Pfam, InterPro (Hunter et al., 2011), CATH (Sillitoe et al., 2013), and SCOP (Andreeva and Murzin, 2010), and a list of peptidases in the family. If the family is divided into subfamilies, then there is a section for each subfamily. The summary for an inhibitor family has been written by the MEROPS curators and is divided into sections describing the peptidases inhibited, mechanism of inhibition, molecular structure, and biological functions. The row of buttons across the top of the page gives access to supplementary pages.

View MEROPS annotation for a clan 4. Click on the clan identifier from any MEROPS Web page to access the clan page, which allows viewing of the MEROPS clan annotation. Fig. 1.25.5 shows a clan summary page that has been written by the MEROPS curators. It is divided into sections describing the peptidase activities present in the clan, the catalytic mechanism, activation mechanism, the protein fold, and evolution of the family. Cross-references are provided to the equivalent Pfam clan and SCOP superfamily. All the families in the clan are listed, and the family name is a link to the family summary page. The distribution of homologs among organisms at the kingdom taxonomic level is shown. The summary page for an inhibitor clan shows only the list of families and distribution. Buttons across the top of the page provide access to the supplementary pages.

Using Biological Databases

1.25.7 Current Protocols in Bioinformatics

Supplement 48

Figure 1.25.3 A peptidase summary. The summary for presenilin 1 is shown. The row of buttons across the top of the page provides access to supplementary pages. The domain image shows the peptidase unit (in green), a transmembrane region (in black), and active site residues (as colored lollipops). Peptidase specificity is shown as a logo and as a heatmap. In the logo, the binding pockets are numbered 1 to 8, where 1 corresponds to S4 and 8 to S4 ; the larger the letter, the more the specificity is directed towards that amino acid. In the heatmap, the number of times an amino acid occurs in the binding pockets S4 to S4 from cleavages in all known substrates is shown; the lighter the shade of green, the greater the preference for the amino acid within that pocket. Cells highlighted in black indicate that the amino acid has not been observed to occupy that binding site for any of the known substrate cleavages.

1.25.8 Supplement 48

Current Protocols in Bioinformatics

Figure 1.25.4 A family summary. The summary for peptidase family A22 is shown. The three rows of buttons across the top of the page provide access to supplementary pages at the family and subfamily levels. There are separate sections for each of the two subfamilies into which the family is divided. Each section lists the peptidases contained within the subfamily, with links to the relevant peptidase summaries.

1.25.9 Current Protocols in Bioinformatics

Supplement 48

Figure 1.25.5 A clan summary. The summary for peptidase clan AD is shown. The row of button across the top of the page provide access to supplementary pages. The families section lists the families contained within the clan, and the organism distribution (at the kingdom level) is shown.

Figure 1.25.6 A summary page for a small-molecule inhibitor. The summary page for semagacestat, which is an inhibitor of presenilins 1 and 2, is shown. Using the MEROPS Database for Proteolytic Enzymes, Their Inhibitors and Substrates

View MEROPS annotation for a small-molecule inhibitor 5. Click on a small-molecule inhibitor name from any MEROPS Web page to access the small-molecule inhibitor page, which allows viewing of the MEROPS smallmolecule inhibitor annotation. Fig. 1.25.6 shows a summary page for a smallmolecule inhibitor that has been written by the MEROPS curators. It is divided

1.25.10 Supplement 48

Current Protocols in Bioinformatics

into sections describing alternative names of the inhibitor, a brief history, peptidases inhibited, mechanism of inhibition, properties, synthesis, and details of the inhibitor class to which it belongs. The structure and chemical name are also shown. Crossreferences are provided to the PubChem (Li et al., 2010) and ChEBI (Hastings, et al., 2013) databases. The buttons across the top of the page provide access to the following supplementary pages, as described in the subsequent steps.

View primary database identifiers for a peptidase or protein inhibitor 6. From a peptidase or protein inhibitor summary, click on Sequences to see a list of all primary database cross-references for all the sequences included in the same MEROPS identifier. Sequence are organized alphabetically by source organism and then by MERNUM. Cross-references relating to what is considered the same protein in MEROPS are given the same MERNUM. Cross references are to UniProt (with separate links to the curated SwissProt section and uncurated TrEMBL sections), the NCBI non-redundant protein sequence database (“GenPept”), Pfam, EMBL nucleotide sequence, GenBank nucleotide sequence, GenBank coding sequence (“CDS”), and a comment where appropriate. The organism name is a link to the organism page in MEROPS, and the MERNUM is a link to the sequence page in MEROPS. Click “Align variants” to view a dynamically generated alignment of all variants of the sequence form the same organism.

View cross references to pharmacological databases for a peptidase 7. From a peptidase summary, click on Pharma to see cross-references to the pharmacological databases BindingDB (Liu et al., 2007) and ChEMBL (Gaulton et al., 2012). This page is not available for an inhibitor.

View literature for a peptidase or protein inhibitor 8. From a peptidase or protein inhibitor summary, click on Literature to see a bibliography for the peptidase or inhibitor. References are ordered by date descending, and then first author. For each publication cross-references are given to PubMed, Europe PubMed, PubMed Central (if available), and digital object identifier (“DOI”) to a full-text version on the publisher’s Web site (for which a subscription may be required before the text can be viewed). Any of the following letters may be attached to the publication to indicate contents: A (assay), S (structure), M (mutation), I (inhibitor), P (specificity), E (expression), or V (review). All the references assigned the same annotation can be selected by clicking on the relevant letter at the top of the screen.

View literature for a small-molecule inhibitor 9. From a small-molecule inhibitor summary, click on Literature to see a bibliography relevant to the small-molecule inhibitor. The format is identical to that for literature at the peptidase or inhibitor level.

View literature for a family 10. From a family summary, click on Literature to see a bibliography relevant to the family (or subfamily). The format is identical to that for literature at the peptidase level. Using Biological Databases

1.25.11 Current Protocols in Bioinformatics

Supplement 48

View literature for a clan 11. From a clan summary, click on Literature to see a bibliography relevant to the clan. The format is identical to that for literature at the peptidase or inhibitor level. BASIC PROTOCOL 3

EXPLORING PEPTIDASE SPECIFICITY A peptidase can be identified and distinguished from a homologous peptidase by its actions on substrates and its interactions with inhibitors. Several views are available to aid with the exploration of peptidase specificity. These features are not available for inhibitors.

Necessary Resources Hardware Workstation with Internet connection Software Javascript-enabled browser (e.g., Mozilla Firefox, Google Chrome, or Microsoft Internet Explorer) Search strategies 1a. Search MEROPS by substrate name: Click on Substrate from the left-hand, green menu and then Substrate Name Index. An alphabetical listing of names for substrates will appear. Clicking on a particular letter from across the top of the screen will load the page of names starting with that letter. There is a row in the table for each peptidase known to cleave the substrate. Each row in the table shows the substrate name, the UniProt accession for the protein sequence, which is a link to the substrate page, the MEROPS identifier of one of the peptidases known to cleave this substrate, which is a link to the relevant peptidase summary, the name recommended by the MEROPS curators for the peptidase, and the total number of cleavages in the substrate made by the peptidase. 1b. Search MEROPS by number of substrates: Click on Substrate from the left-hand, green menu, and then “Count of substrates per peptidase”. The first page of an alphabetical listing of all peptidase identifiers will appear. There is a page for each catalytic type. Each row in the table lists the MEROPS identifier, which is a link to the relevant peptidase summary, the name recommended by the MEROPS curators for the peptidase, and counts of the total number of known substrates, substrates thought to be physiological, and substrates that are not physiological. 1c. Search MEROPS by peptidase specificity: Click on Substrate from the left-hand, green menu and then “Peptidase specificity by amino acid”. An alphabetical listing of all peptidase identifiers will appear. Each row in the table lists the MEROPS identifier, which is a link to the relevant peptidase summary, the name recommended by the MEROPS curators for the peptidase, the total number of cleavages from all known substrates of the peptidase, and preferences in the substrate binding pockets S4 to S4 calculated from residues occupying positions P4 to P4 in the known substrate cleavages. Using the MEROPS Database for Proteolytic Enzymes, Their Inhibitors and Substrates

Up to two amino acids are listed for each binding pocket. The brighter the green highlighting, the greater the preference for the amino acid listed.

Alternatively, click on Substrate from the left-hand, green menu and select an amino acid from the pull-down menu labeled “Peptidase specificity by selected amino acid”. An alphabetical listing of all peptidase identifiers will appear. Each

1.25.12 Supplement 48

Current Protocols in Bioinformatics

Figure 1.25.7 Peptidase specificity. Part of the peptidase specificity for amino acid groups is shown. The MEROPS identifier and recommended name for each peptidase is shown along with the total number of cleavages in our collection. Preferences for residues in positions P4 to P4 are shown; the lighter the shade of green, the greater the preference. Amino acids are grouped into aliphatic, aromatic, acidic, basic, small, and others.

row in the table lists the MEROPS identifier, which is a link to the relevant peptidase summary, the name recommended by the MEROPS curators for the peptidase, the total number of cleavages from all known substrates of the peptidase, and preferences in the substrate binding pockets S4 to S4 calculated from residues occupying positions P4 to P4 in the known substrate cleavages. The number indicates the times the selected amino acid occupies that binding pocket from all substrate cleavages. The brighter the green highlighting, the greater the preference for the amino acid selected. Cells are colored black if the selected amino acid has not been observed in that binding pocket from any substrate. Alternatively, click on Substrate from the left-hand, green menu and select “Peptidase specificity by amino acid type”. An alphabetical listing of all peptidase identifiers will appear (see Fig. 1.25.7). Each row in the table lists the MEROPS identifier which is a link to the relevant peptidase summary, the name recommended by the MEROPS curators for the peptidase, the total number of cleavages from all known substrates of the peptidase, and preferences in the substrate binding pockets S4 to S4 calculated from residues occupying positions P4 to P4 in the known substrate cleavages. Amino acids are grouped into aliphatic (Ile, Leu, Val), aromatic (Phe, Trp, Tyr), acidic (Asp, Glu), basic (Arg, His, Lys), small (Ala, Cys, Gly, Ser), and other (Asn, Gln, Met, Pro, Thr), and the group name is shown where there is a preference for the group in a binding pocket. The brighter the green highlighting, the greater the preference for that amino acid group. Cells are colored gray where the binding pocket does not exist. 1d. Finding peptidase substrates from the cleavage site: From the left-hand side menu, select SEARCHES and then “What peptidase can cleave this bond?”. In the boxes labeled P4, P3, P2, P1, P1 , P2 , P3 , and P4 , select an item from the pull-down menu and then submit the search. Items in each pull-down menu include amino acids, groups of amino acids, and blocking and reporter groups for synthetic substrates. A portion of the results returned are shown in Fig. 1.25.8. In this example, the specificity of presenilin 1 was entered by selecting ALI (for an aliphatic residue) in P4, P3, and P1 to P4 , and allowing any residue to occupy P2 and P1. The results Current Protocols in Bioinformatics

Using Biological Databases

1.25.13 Supplement 48

Figure 1.25.8 Search of the MEROPS database for which peptidase can cleave a selected bond. Part of the results from the search page “What peptidase can cleave this bond” is shown. The search entered was for aliphatic amino acids in P4, P3, P1 , P2 , P3 , and P4 , with any amino acid permitted in P2 and P1. Known cleavages that match this pattern are shown, with details of the cleavage site and the peptidase responsible.

show that the specificity is insufficient to distinguish cleavages by this peptidase from that of many other peptidases. 1e. Finding peptidases affected by an inhibitor: Click on SEARCHES form the lefthand, green menu, select “What peptidase is affected by this inhibitor?”, select an inhibitor from the pull-down menu, and click “Submit query”. Results can be ordered by MEROPS identifier or peptidase name. All peptidases known to be inhibited are returned. Each row in the table shows the MEROPS identifier, which is a link to the peptidase page, the recommended name, and a reference which is a link to the reference page. This method works for both protein and small-molecule inhibitors.

View strategies View peptidase specificity

Using the MEROPS Database for Proteolytic Enzymes, Their Inhibitors and Substrates

2. Click on the MEROPS identifier from any MEROPS Web page to access the peptidase page, which allows viewing the specificity of the peptidase. Specificity is shown as a logo and as a heatmap when ten or more cleavage sites in substrates are known (see Fig. 1.25.3).

1.25.14 Supplement 48

Current Protocols in Bioinformatics

Figure 1.25.9

(legend appears on next page)

1.25.15 Current Protocols in Bioinformatics

Supplement 48

View substrates for a peptidase 3. From a peptidase summary, click on Substrates to see a list of substrates where the cleavage position(s) are known. By default, the list is ordered by substrate name and then residue number. Each row in the table shows the substrate name, the UniProt accession, the residue range of the fragment used in the experiment, a description of the cleavage site, whether the cleavage is physiological or non-physiological, how the cleavage site was identified, the residues corresponding to position P4 to P4 in the substrate sequence, a reference, and a cross-reference to the CutDB database (Igarashi et al., 2007). It is possible to re-order this list by clicking on the table headings. The UniProt accession is a link to the substrate page, and the reference is a link to the reference page. The symbols for the cleavage type are: D, pathological; N, non-physiological; P, physiological; and S, synthetic. The symbols for how the cleavage was identified are: CS, consensus sequence; MS, mass spectroscopy; MU, site-directed mutagenesis; and NT, N-terminal sequencing. Alternatively, click on SEARCHES form the left-hand, green menu, select “What cleavages does this peptidase make?”, select a peptidase from the pull-down menu, and click “Submit query”. All known cleavages for the selected peptidase are returned. Each row in the table shows the substrate name, the UniProt accession, which is a link to the substrate page if the substrate is a protein, a description of the cleavage, and the residues occupying P4 to P4 in the substrate.

View inhibitors for a peptidase or peptidase for an inhibitor 4. From a peptidase summary, click on Inhibitors to see a list of inhibitors that affect a peptidase. Each row in the table shows the MEROPS identifier (if this is a protein inhibitor), the inhibitor name, the inhibition constant (Ki ) if known, a comment (“conditions”), and a reference. The MEROPS identifier is a link to the relevant protein inhibitor page, and if the inhibitor name is shown in blue, this is a link to the small-molecule inhibitor page. The reference is a link to the reference page. The table can be re-ordered by clicking on any of the column headings. From a protein inhibitor summary, click on Inhibits to see a list of peptidases affected by the inhibitor.

View peptidases inhibited by a small-molecule inhibitor 5. From a small-molecule inhibitor summary, click Click on Inhibits to see a list of peptidases affected by the inhibitor. The format of the page is identical to that for peptidases inhibited by a protein inhibitor (see above).

View MEROPS annotation for a substrate 6. From the substrate name index, click on the UniProt accession to see the annotation for a substrate. The display (see Fig. 1.25.9) shows the sequence in single-letter code (see APPENDIX 1A). If the bond character “-” is replaced by “†”, this indicates a cleavage;

Using the MEROPS Database for Proteolytic Enzymes, Their Inhibitors and Substrates

Figure 1.25.9 (image appears on previous page) Search of the MEROPS database for known cleavages of a substrate. The results from the search for known cleavages of a protein are shown. The user enters the UniProt accession of the protein in question, in this case the human amyloid A4 beta protein precursor. The amino acid sequence is shown in single-letter code. A bond that is cleaved is indicated by a dagger character, and a bond that is not cleaved is shown as a dash. All known cleavages are listed as a table ordered by the P1 residue number. Residue range indicates whether the cleavage occurred in the complete protein or a fragment. Cleavage type indicates if the cleavage is physiological, non-physiological, or pathological. The evidence column indicates how the cleavage site was determined experimentally.

1.25.16 Supplement 48

Current Protocols in Bioinformatics

Figure 1.25.10 Conservation of a cleavage site in a protein substrate. Part of the alignment for homologs of the human amyloid A4 beta protein precursor is shown. The MEROPS identifiers of the peptidases known to cleave the substrate are shown above the alignment, with arrows indicating whether the full length of the protein or a fragment is cleaved. A cleavage site is indicated by a dagger symbol. The substrate sequence is highlighted in green. Residues corresponding to P4 to P4 around the cleavage site (cleavage at His622 by membrane-type matrix metallopeptidase-3, MEROPS ID M10.016) are highlighted to show conservation. A residue identical to that in the known substrate is highlighted in pink; a replacement is highlighted in orange if that amino acid is known to occupy that binding site from examination of all cleavages known for this peptidase, but is shown as white text on a black background if the residue does not occur at that binding site in any of the substrates. Secondary structure elements are shown below the alignment: a helix is highlighted in red and a strand in green.

by hovering the mouse cursor over this character, the peptidases that perform this cleavage are listed. For cleavages thought not to be physiologically relevant, the name of the peptidase is shown in italics. Beneath this display is a table of all known cleavages in this protein ordered by residue number. For each cleavage, the peptidase name, the residue range of the substrate fragment used in the experiment, whether the cleavage is physiologically relevant or not, a comment, the method by which the cleavage position was determined, and a reference are shown.

View conservation of a substrate cleavage site 7. From the substrate page, click on “Click here to display alignment and conservation of cleavage sites of this sequence with close homologues” to see an alignment of the known substrate with closely related homologs, i.e., those contained in the same

Using Biological Databases

Current Protocols in Bioinformatics

Supplement 48

1.25.17

UniRef50 entry (Suzek et al., 2007). Sequences are aligned dynamically with MUSCLE (Edgar, 2004) and show conservation around each known cleavage site. The conservation is shown not only in terms of residues that are common to all homologs (shown as a pink background), but also in terms of peptidase specificity. Where a residue has changed from the known cleavage site but is known to occur in that binding pocket from other substrates of the peptidase, it is shown with an orange background. Where a residue has changed but is not known to occur in that binding pocket from any substrate of the peptidase, it is shown as white text on a black background. The underlying assumption is that physiologically relevant cleavages will be conserved, whereas those that are non-physiological or even pathological are unlikely to be conserved. The more residues highlighted in black, the less likely it is that the cleavage is physiological. Fig. 1.25.10 shows conservation around one of the known cleavage sites (at residue 622) in the amyloid beta A4 precursor by membrane-type metallopeptidase 3 (Ahmad et al., 2006). The presence of residues highlighted in black indicates that this is unlikely to be a physiological cleavage. BASIC PROTOCOL 4

EXPLORING PEPTIDASE OR INHIBITOR SECONDARY AND TERTIARY STRUCTURE The MEROPS Web site includes a number of views to explore secondary and tertiary structures of peptidases and protein inhibitors.

Necessary Resources Hardware Workstation with Internet connection Software Javascript-enabled browser (e.g., Mozilla Firefox, Google Chrome or Microsoft Internet Explorer) View strategies View tertiary structure for a peptidase or protein inhibitor

Using the MEROPS Database for Proteolytic Enzymes, Their Inhibitors and Substrates

1. From a peptidase or protein inhibitor summary, click on Structure to see the structure page. One or more Richardson diagrams (Richardson, 1985) are shown at the top of the page on the left-hand side, each derived from a Protein Data Bank (PDB; Rose et al., 2013) entry. A helix is shown in red, a strand in green, and random coil and turns in cyan. Active site residues are shown as ball-and-stick. On the righthand side, the same PDB entry is loaded into the Astex viewer (Hartshorn, 2002), allowing the image to rotate and allowing the user to manipulate the image. Click on “Show surface” to see a representation of the surface of the molecule. Beneath these images, the secondary structure elements of the molecule are shown plotted along the sequence. A helix is shown in red and a strand in green. The positions of active site residues and metal ligands, if appropriate, are shown. Beneath this is a table of PDB entries relating to the peptidase, ordered by organism and PDB identifier. Each row in the table list a description of the structure (“comment”), the resolution in Angstroms, and links to displays and analyses of the structure from PDB, DSSP (Kabsch and Sander, 1983), RasMol (Sayle and Milner-White, 1995), JMol (Hanson, 2010), Spice (Prli´c et al., 2005), and Proteopedia (Hodis et al., 2008). RasMol and JMol must be pre-installed.

View tertiary structure of a small-molecule inhibitor 2. From a small-molecule inhibitor summary, click on Structure to view the structure page.

1.25.18 Supplement 48

Current Protocols in Bioinformatics

The format of the page is similar to that for the structure pages for a peptidase or a protein inhibitor (see above).

View secondary structure for a family 3. From a family summary, click on Structure to see a graphical representation of secondary structure for selected holotypes in the family and its conservation. Secondary structure is mapped to the sequence, which is shown as a cyan line. A helix is shown as a red box and a strand as a green box. The lines and boxes are to scale. The positions of active site residues, and metal ligands if relevant, are shown. For each structure, the MEROPS identifier, recommended name, and source PDB identifier are shown. The PDB identifier is a link to the peptidase structure page.

View secondary structure for a clan 4. From a clan summary, click on Structure to see a graphical representation of secondary structure for selected holotypes in the clan and its conservation. The style of the display is the same as that at the family level.

EXPLORING EVOLUTIONARY RELATIONSHIPS The MEROPS Web site includes a number of views to explore evolutionary relationships. Protein sequence alignments are available at all levels of the hierarchy. Phylogenetic trees derived from the distance matrices of the alignments at the peptidase, subfamily, and family levels are also presented. There are also views that show distribution of peptidases and inhibitors among organisms, common sequence features, and structures of genes.

BASIC PROTOCOL 5

NOTE: For all alignments, the sequence of the holotype or family type example is highlighted in green. Numbers across the top of the screen relate to residue numbers for the holotype or type example with letters to indicate insertions. Active site residues are shown in white on a red background. Metal ligands are shown in white on a blue background where appropriate. A disulfide bridge is indicated by forward and backward slashes, and the participating cysteines are given the same number.

Necessary Resources Hardware Workstation with Internet connection Software Javascript-enabled browser (e.g., Mozilla Firefox, Google Chrome, or Microsoft Internet Explorer) View strategies View sequence features for a peptidase or protein inhibitor 1. From a peptidase summary, click on “Sequence features” to see a list of features from all sequences assigned to the same MEROPS identifier, showing the MERNUM, source organism, sequence length, extent of the peptidase or inhibitor unit, active site residue numbers (not for an inhibitor), metal ligand residue numbers if appropriate (not for an inhibitor), and primary sequence database cross-reference that is the source of the sequence in MEROPS. The MERNUM is a link to the sequence page in MEROPS, the organism name is a link to the organism page in MEROPS, and the sequence source is a link to the relevant primary sequence database entry. The list can be re-ordered by clicking on the species name. Using Biological Databases

1.25.19 Current Protocols in Bioinformatics

Supplement 48

View gene structure for a peptidase or protein inhibitor 2. From a peptidase or protein inhibitor summary, click on “Gene structure” to see a display of known gene structures for the peptidase or inhibitor. Structures are shown for a selection of eukaryotes: human, mouse, rat, Drosophila melanogaster, Caenorhabditis elegans, Arabidopsis thaliana, and Saccharomyces cerevisiae. The gene-coding sequence is shown as a brown bar, and exons are shown as lilac boxes. Exons are numbered. The phase of each intron/exon junction is shown. Conservation of gene structures gives an indication of how recently genes have diverged. Isoforms are included, so it is possible to see which exons have been included or excluded from which isoform.

View expressed sequence tags for a peptidase 3. From a peptidase summary, click on Human EST or Mouse EST to see alignments of expressed sequence tags (ESTs) from all tissues of that species. There are two alignments—the first shows ESTs with greater than 95% identity to the peptidase sequence, and the second shows ESTs with less than 95% identity but which match the peptidase in question more closely than any other peptidase in the proteome. Residues outside the peptidase unit are highlighted in peach. The nucleotide accession for the EST is given at the left- and right-hand sides, and this is a link to the entry in GenBank. This page is not available for an inhibitor.

View organism distribution for a peptidase or protein inhibitor 4. From a peptidase or protein inhibitor summary, click on Distribution to see a taxonomic tree of all organisms possessing a homolog in the same family as the peptidase in question. If a homolog from one of these organisms is assigned to the MEROPS identifier of the peptidase in question, then the organism name is shown in capitals and in blue. If the genome of the organism has been completely sequenced, this is indicated by the word COMPLETE next to the organism name. At the top of the screen, text indicates the highest taxonomic group to which all species belong that possess the peptidase.

View alignment for a peptidase or protein inhibitor 5. From a peptidase or protein inhibitor summary, click on Alignment to see a dynamically generated alignment of the peptidase units for all the sequences assigned to the same MEROPS identifier. The alignment is generated with MUSCLE (Edgar, 2004). The MERNUM of each sequence is shown at the left- and right-hand sides, and is a link to the sequence page.

View phylogenetic tree for a peptidase or protein inhibitor 6. From a peptidase or protein inhibitor summary, click on Tree to see a dynamicallygenerated phylogenetic tree derived from the alignment.

View human or mouse sequences in a family

Using the MEROPS Database for Proteolytic Enzymes, Their Inhibitors and Substrates

7. From a family summary, click on “H-seq” (human) or “M-seq” (mouse) to see all the sequences from that organism in the family. Each sequence is shown in FASTA format. The header line for each sequence contains the MERNUM, the recommended name, the MEROPS identifier, the extent of the peptidase or inhibitor unit, active site residue numbers, the source organism, and the database and accession of the source sequence. For the sequence, the peptidase unit is shown in red text; active site residues are shown in white text on a red background; metal ligands are shown in white text on a blue background; a cysteine known to form a disulfide bridge is shown with a

1.25.20 Supplement 48

Current Protocols in Bioinformatics

yellow background; an asparagine to which carbohydrate is attached is shown with a peach background; and a substrate-binding residue is shown with a gray background.

View distribution of a family 8. From a family summary, click on Genomes to see the taxonomic distribution of sequences in the family among organisms that have had their genomes completely sequenced. The name of an organism possessing a homolog is shown in blue text in capital letters. The number of paralogs present in each proteome is shown in brackets after the organism name. At the top of the page, there is a summary describing the distribution.

View domain images for holotypes in a family 9. From a family summary, click on Architecture to see architectures of all the holotypes in the family. Each domain image displayed is identical to that from the equivalent peptidase or inhibitor summary page.

View alignment for a family 10. From a family summary, click on Alignment to see an alignment of the peptidase or inhibitor units for all the sequences in the family. Each alignment is precalculated and generated with MAFFT (Katoh and Standley, 2013). Each sequence is numbered at the left- and right-hand sides, and the number is a link to the sequence page. By default, the alignment is shown with one sequence per line, but by clicking “Go to printer friendly view”, the alignment is shown in blocks of sixty residues. Clicking on “View key to alignment” opens another browser window, so that both the alignment and key can be seen simultaneously. Each row in the key shows the MEROPS identifier, which is a link to the relevant peptidase or inhibitor summary, the organism name, which is a link to the species page, the name of the peptidase recommended by the MEROPS curators, the MERNUM, which is a link to the sequence page, and the extent of the peptidase or inhibitor unit. The family type example is highlighted in green, and each peptidase or inhibitor holotype is highlighted in peach. An architecture string, which is derived from Pfam, shows the domain organization in the protein, and is shown once above a cluster of sequences with identical domain architectures. An alignment is shown for each family and for each subfamily within a family. Each alignment is filtered to exclude fragments, and, for a peptidase family, sequences where the active site residue, and metal ligands if relevant, are not conserved. The order of sequences in the alignment matches the order in the phylogenetic tree.

View alignment of holotypes for a family 11. From a family summary, click on Holotypes to see an alignment of the peptidase units for all the holotype sequences in the family. The style of the alignment is the same as in the previous section, except that the MEROPS identifier is shown on the left- and right-hand side for each sequence, and this is a link to the peptidase summary. By hovering the mouse over the MERNUM, the recommended name and source organism are shown. This view is recommended for a family with many homologs, because an alignment of all sequences in such a family is much larger than the screen or page can accommodate. Using Biological Databases

1.25.21 Current Protocols in Bioinformatics

Supplement 48

View phylogenetic tree for a family 12. From a family summary, click on Tree to see a tree derived from the alignment of the peptidase or inhibitor units for all sequences in the family. Each tree is pre-calculated using the UPGMA algorithm implemented in QuickTree (Howe et al., 2002). The tree is displayed so that the longer branches are at the bottom of the page. An approximation of the average percentage identity between sequences is shown across the top of the tree. Click on a node to view an alignment of sequences derived from that node. Click “View key for tree” to open a new browser window and see the key for the tree. The key is identical to the alignment key described above. A tree is shown for each family and for each subfamily within a family, provided there are at least five homologs in the family or subfamily.

View alignment for a peptidase clan 13. From a clan summary, click on Alignment to see conservation around the active site residues, and metal ligands if relevant, for all family type examples within the clan. Up to ten residues either side of each active site residue or metal ligand are shown. This view is not available for an inhibitor clan.

Finding peptidases common to two organisms 14. Click on SEARCHES form the left-hand, green menu, select “What are the common peptidases between two genomes?”, then from the pull-down menu, select the names of two organisms. Then, select “MEROPS identifier” to see results at the peptidase or inhibitor level, or Family to see results at the family level. Then select Peptidase or Inhibitor to see results for peptidases or inhibitors. Click “Submit query”. A list of MEROPS identifiers or families is returned with the number of homologs for each organism. The MEROPS identifier or family name is a link to the relevant page in MEROPS. When results are displayed at the family level, the counts are subdivided into known/putative peptidases and non-peptidase homologs. This method can be used to identify peptidases or inhibitors found in one organism but not the other, or to find a peptidase or inhibitor common to both organism, and may be useful when selecting a drug target.

Finding peptidases common to strains of the same organism 15. Click on SEARCHES from the left-hand, green menu, select “What are the common peptidases in different strains of bacteria or archaea?”, then select an organism name for the pull-down menu, select Peptidase or Inhibitor, and click “Submit query”. Each row in the table returned shows: the MEROPS identifier, which is a link to the peptidase or inhibitor page, the recommended name of the peptidase or inhibitor, and for each strain of the organism, a count of the number of homologs. Click on Alignment to see a dynamically generated alignment of the sequence variants. This method is provided only for bacteria and archaea, and allows the user to identify a peptidase or inhibitor found in one strain but not another, or to compare sequences of the same peptidase or inhibitor from different strains.

GUIDELINES FOR UNDERSTANDING RESULTS Using the MEROPS Database for Proteolytic Enzymes, Their Inhibitors and Substrates

Batch BLAST The example used here is the proteome of the elephant shark (Callorhinchus milii), the first cartilaginous fish to have its genome completely sequenced (Venkatesh et al., 2014), and of evolutionary interest because sharks existed before bony fish and other vertebrates. The proteome was downloaded from the University of Singapore Web site

1.25.22 Supplement 48

Current Protocols in Bioinformatics

(http://esharkgenome.imcb.a-star.edu.sg/download/) as a compressed file. This FASTA file was extracted and then split into six files, each of which contains less than 5000 sequences. Table 1.25.1 shows a portion of results from the MEROPS batch BLAST. The columns show the submitted sequence identifier, the peptidase (or inhibitor) family to which the sequence belongs, the extent of the predicted peptidase unit, prediction of the active site residues, prediction of metal ligands (if the sequence is predicted to be a metallopeptidase), the identifier of the closest homolog found in the peptidase library, the extent of the peptidase unit of that homolog, and the E value of the match. For each active site residue and metal ligand, the detected amino acid (in single letter code; see APPENDIX 1A) is shown, followed by the residue number. If a letter follows this number, it means that the amino acid found was not what would be expected, and the expected amino acid is shown (if more than one amino acid is possible at this position, then more than one letter is shown). It is unlikely that a protein in which the expected active site residues or metal ligands have been replaced functions as a peptidase, and such a protein is termed a non-peptidase homolog in the MEROPS database. It is also possible that a protein match in which most of the active site residues have been replaced is a false positive. The user should then check the E value. The lower the E value, the more significant the match. When filing the results of the MEROPS batch BLAST, we ignore any non-peptidase homolog where the E value is greater than e–10 . Sequence SINCAMP00000020029 from Table 1.25.1 would not be filed because the metal ligands have been replaced and the E value is high. Sometimes the match detected corresponds to only a fragment of the holotype peptidase unit. There are several reasons for this: BLASTP is unable to cope with large inserts in a sequence, and the missing fragment may have been found but is presented as another match; there was an error in the gene build and a frameshift has occurred, which means part of the sequence is missing; the wrong methionine has been assumed to initiate translation, and the protein is erroneously truncated at the N-terminal end; or the match is a false positive. The absence of an active site residue or a metal ligand is annotated in different ways. Sequence SINCAMP00000020284 is predicted to be a homolog in family C19, but it is a fragmentary match and two matches are returned. In the first, the active site residues are shown as D/N, and the right-pointing arrows show that the last two residues are missing, and the C-terminal portion of the peptidase unit is missing. If we look at the predicted extent of the peptidase unit from these two matches, 190-643 and 80-113, then it is clear that the complete peptidase unit is present, but that protein SINCAMP00000020284 has an insert (residues 114-189) which prevents BLASTP from matching the whole of the peptidase unit. This insert is known as a nested domain. The peptidase unit will be filed in the MEROPS database as “80-113, 190-643” to indicate a nested domain; the nested domain should not be included in the peptidase unit because it may match domains in other proteins that are not peptidases or peptidase homologs. The active site residues are therefore Asn95, Cys90, and His559, but a gap character (“-”) replaces the fourth member. This may be an error in the BLASTP alignment, or may indicate a frameshift at the C-terminus, or protein SINCAMP00000020284 may be a non-peptidase homolog. Only further examination using the single-sequence MEROPS BLAST will resolve this (see above). It should be remembered that some proteins will contain more than one peptidase or inhibitor unit, so merging of BLASTP hits should be done with caution. Peptidase inhibitors often comprise of more than one inhibitor unit, with no active site residues or metal ligands, so again, merging hits should be done cautiously.

Using Biological Databases

Current Protocols in Bioinformatics

Supplement 48

1.25.23

Table 1.25.1 Results from the MEROPS Batch BLASTa

Query identifier

Query peptidase Family unit

Active site residues

Metal ligands

Hit identifier

Hit peptidase unit

E value

SINCAMP00000020029 M23B

650-858

K776H

Q683H, K687D, E778H

MER083041

260-474

1.20e-06

SINCAMP00000020031

M23B

95-344

−203H

−120H, K122D, −203H

MER145504

282-515

2.90e-06

SINCAMP00000020035

S08A

7-454

D, >S

MER066474

412-459

5.40e-05

SINCAMP00000020145

C19

2298-2679

N2329, C2334, H2594, N2617

MER014780

1716-2098

5.20e-218

SINCAMP00000020157

C19

1386-1778

N1428, C1433, H1693, N1716

MER014780

1706-2098

9.40e-227

SINCAMP00000020184

C19

282-457

N/D

MER049717

850-1011

2.10e-06

SINCAMP00000020215

C86

23-181

C33, H138

MER125334

26-182

1.90e-61

SINCAMP00000020220

C86

58-136

Using the MEROPS Database for Proteolytic Enzymes and Their Inhibitors and Substrates.

MEROPS is a database of proteolytic enzymes as well as their inhibitors and substrates. Proteolytic enzymes and protein inhibitors are organized into ...
3MB Sizes 1 Downloads 5 Views