Grtir. 98 (1991) 153-159 Eisevier
GENE
153 03870
Multiple sequence alignment of protein families showing low sequence homology: a methodological approach using database pattern-matching discriminators for G-protein-linked receptors (Residue conservation; protein model building; colour multiple alignment)
Teresa K. Attwood, Elias E. Eliopouios
Received by JKC. Knowles: 7 March Revised: 10 July 1990 Accepted: 20 September 1990
menlbrane
proteins;
screen-oriented
multiple alignment
procedure;
SOMAP;
and John B.C. Findlay
1990
--_-__ SUMMARY
‘4 multiple ~iignment has been constructed, containing 37 sequences from related families of I~embrane-bound receptors believed to share the same structural framework as rhodopsin. Sequence homology ~j~~in families was high (occasionally > 90”/,), but homology between them was generally low f207; or less). Database pattern-scanning methods were therefore used to construct a set of discriminators to aid both the task of ~ignment and the identi~cation ofdistantiy related sequences showing similar rhodopsin-like transmembra~~ helices. The results indicate that these discriminators are uniquely able to identify each of the transmembrane helices without major cross-reaction with similar regions in unrelated integral membrane proteins. This ability engenders more accurate alignments of the sequences and facilitates structural analysis and model building of the receptors. -_ -.-
INTRODUG’PION
If the present rate of discovery is maintained over the next few years, it seems likely that membrane-bc)und receptors linked to G-proteins (Bi~baumer, 1990) will become recognised as the most widespread and diverse family of signal transduction mediators. Their activation hgands vary widely in structure and character, yet the proteins appear faithfully to have conserved their basic structural framework. From extensive protein chemistry with rhodopsin. the framework essentially consists of seven transmembrane helices, which together provide the ligand-birding site and,
C~~~~~~u~z~~~i~~ IO; Dr. J.B.C.,
FindJay,
University
LS2
of
Leeds,
Leeds
9JT
Department (U.K.)
of Biochemistry,
Tel. (44-5323333140;
Fax (44-5X)333167. Abbreviations: aa. amino acid(s); G-proteins, guanine ~ucleotide-biding proteins; LUPES, Leeds University Protein Engineering Software; SOMAP.
Screen-Oriented
Multiple
Alignment
Procedure.
at the least, the mechanism for initiating cascade-like phenomenon, the activated late the activity of specific receptors which a variety of effector proteins (3ir~baumer, Gilman, 1988).
activation. In a receptors stimuin turn influence 1990; Casey and
There is only a limited amount of information available on the 3D structure of these receptors, but what exists is consistent with a helix packing arrangement similar to that of bacteriorhodopsin (Sardet et al., 1974; Findlay et al.. 1988). On the basis of the labelling of opsin with hydrophilic and hydrophobic probes, therefore, it is possible to construct a representation of the receptor for use in protein engineering studies. Critical examination of such a model and its transmogrification into receptors of different specificities relies on accurate pattern searches and sequence alignments. For the purposes of this study, we wished not merely to generate a multiple alignment of the known rh~dopsin-like membrane-bound receptors, but to adopt a methodology that would be both rigorous and readily applicable to the
identification, and subsequent alignment, of the more distantly related sequences. Particular attention was paid to three principal sub-Families: the opsins, the /l-adrenergic and the muscarinic receptors. Overall, sequence h5n1015gy between famiiies was low (12-200,). but certain regionai homoiogies. corresponding to the seven putative transmembrane heiices, were higher (tip to 30’5,) (Kerlavage ct d.. 1985: ~ob~~~a~~ et al.. 1987). The USCof d~scr~t~jn~~tars therefore suggested its& as a viable approach IO their alignment, A discrit~iI~a~~r is a pattern-recogtlitictn matrix
generally
that attempts to distinguish a motif; defined from a set of aiigncd sequences (the motif-set). from all other scyuences. The simplest discriminator is a nuIl~cr~c~~~iyweighted matrix
family. The approach used here differs from previous and recent methods for pattern-searching (Taylor, 1986; Lathrop ct al.. 1987; Cribskov et al.. 1987; Smith et al.. 1990: Barton and Sternberg, 1990) in that it does not rely on 3D-structure alignments, it dots not include explicit definitions of secondary structure positions, and neither does it introduce gap penallies. The method relies ins&ad i)t> building up a character profiiu for each position in the discriminators, i~~depet~dc~~tly of’ any structural interpretation, and is used in ~1qt~~tli~~~~ivemar’tner to aid multiple sequence alignment.
based on residue frequencies at each position of the mutifset, i.e., a frequency matrix. Since the scoring potentiaf of such matrices depends on the nutnbcr of sequences in the motif-set, where only a limited number of sequences is available, this potential can be very poor. More powerful matrices may be designed, however, by the manual insertion of increased positiveand ncgativc weights, or they may be generated semi- or fully autonlatically using matrix creation programs (Parry-Smith and Attwood, 1991); Akrigg et al., 1988). For particularly far++eaching analyses, where the performance of a discri~~i~at~r should be quantized, its ‘diagnostic power’ can be measured objectively by reference to the results of true positive database searches. The value in protein sequmce databases lies in the ability to detect homologies and the inferred information such homufogies may carry with respect to structure and function. ttiP@S (Akrigg et al., 352%) is a publicly available software package that permils database exptoration, and includes ~~rctgrams for pattern matching and sequence alignment. An alternative, rnorc a~~~ei~~iblealjgnrnell~ system is provided by SOMAP (P~trry-Sni~~~~ and Artwood, 1WO), a procedure designed to assist in the creation of multiple protein sequence alignments by means of a ft.&screen interface. Expanding on the basic facilities of simple gap insertion and deletion offered in the LUPES package, it provides. amongst other functions, strict and ‘fu,zy‘ patternmatching and ca~c~la~~~n of ikequency and PAM X(1wcightcd matrices (DayhoK t978). Using these i‘aciEties, we wished to develop pattern-rec~~gnitj~n therefore, matrices that would aid multiple sequence alignment. The basis ofthis approach was the construction ofa set of seven discriminators from a seriesof approx. 20 residue motifs, each of which would have a single unique, but nevertheless
applicable,
solution:
i.c., each shouid be able to
distinguisli a sequence with the ~anre specific rhctdopsinlike features or periodicities (f;indla> and Pappin, 19X6; Findlay et al., i988). while only preserving similar overall residue ~~~(~~~i~~~~. We thus hoped to be abie to identify each of the rht~d~~psin-like ~ransnl~t~~br~ttte motifs. both within the major sub-families and within the more distantly r&ted sequences. thereby ~~cj~j~a~jng the aiignment. of the entire super-family uf sequences. and uitimtttcly aiding in modclbuilding and analysis of any particular member of the
KtStJL?
(a)
S AND
DISCIJSSfON
Construction of database discriminators
?‘hc starting point for discriminator dcsigrt was multiple alignment within the prirtcipal sub-fa~llj~~es chosen f& ~n~cst~ga~~o~i. namely the opsins, the adrcnergic and the 133uscxinics.
Skxx
sequence
t3on3oli~gy
within
these
fimi-
lies was gcncrally high, generating tlle~tni~lgful multipic ~tlign~licf~~s using SOMAP was c~)tlsequctl~~~ straight&ward, jlcccssj~ating the manual insertion or deferion of only ~~~a11numbers ofgaps in appropriate positions. Using the ~~~ignI~?~ntof the t+si~ sequences, in conjunction with cot~s~d~ra~~~tls of data front l~ydr~~palhy profilca, wt: selected the seven hydrophobic domains that had been given transmembrane assignments (Findiay and Pappin, 198h), and used SOMAP to generate ,f&y~rern~~matrices Fog each of these regions. LUPE.S pattwo-scanning suftwarc: (Akrigg et al.. 1938) was used to scan an ndrcncrgic sequence w+th the matrices but, in spite af the increased homology in the transmembrane regions, they could not identify seven unambiguous solutions (Fig. la).
155 100
I
II
III
IV V
VI
VII 0
101 3
I
II
III
IV
V
VI
VII
_.
.*
.I
.
.
‘.
~.
I
.
.
.
.
.I
I
.-
.I
_
. .
. .
I
I
.I
.I
*.
I.
-*
-
--
-
.r
II
L.
1‘
_-
--
*/
. .
.I
_
.
.
157 tion by position, using the requisite part of the opsin alignment. An attempt was made to interpret the predominant
TABLE
character
linked receptor
Identity
of each position (viz, polar-neutral, positively charged, negatively charged; aromatic or nonpolar) and positive, negative or zero weights were assigned accordingly. Thus, for example, if a position in an alignment always held an aromatic residue, the scores of’ aromatics would be given positive boosts, hydrophilics would be negatively weighted and larger nonpolars would be left with zero score. Similarly, if a position tended always to hold a neutral hydrophilic residue, other hydrophilics of that type would be positively weighted, charged hydrophilics would be left with zero score and all hydrophobics would receive negative weights, and so on with all positions in the matrix. It was essential to ensure that the choice of weights and of residue groupings had not been totally inappropriate; in short, that the sequence information contained within the discriminators was still rhodopsin-like. A representative rhodopsin sequence was therefore scanned with the matrices and the result showed seven discrete solutions (Fig. lb), indicating that they still discriminated well for this particular sequence and hence, by inference, for the whole opsin family. An adrenergic sequence was then scanned for the corresponding set of transmembrane motifs, and the result revealed seven unambiguous hits (Fig. lc). Concomitant database scans (Akrigg et al., 1988) with each discriminator, yielded the seven precise matches, not just for this sequence, but for the entire family of adrenergics, thus permitting alignment with the opsins, using patternmatching facilities within SOMAP. Each discriminator was now analysed, and subsequently modified. in terms of the additional information provided by the adrenergic sequences, in several cases widening the possible range of residues to be found at given positions, and the new, more general, set of matrices was used to scan the database for the muscarinic transmembrane motifs. Again, seven hits were clearly apparent (Fig. Id). This Fig. 2. Part of the alignment
of 37 G-protein-linked
receptors
showing
I and character
are assigned
according
to the following criteria:
in the superfamily
of 37 G-protein-
a Identity- 1 h
Identity
Character
Character-
1’
Helix
I
As*‘5
Leu”
LeuSX
Lcu5’Y
II
Leu’”
L.eu’?”
Asp”’
Ad”
III
Leu”’ Glu’ ‘,’ I-y+” Val I.I‘,
IV
Trp”’
V
Pro’, >
Trp”’ Pro”’
Tyr=’
VI
GW’
Trp””
Pro>“’
Pro3”’
VII
Tyr?(lh Loops
Asn”
Pro”’
GlY “‘(’ CYS’X’ ,’ This table shows the 14 residues
that are absolutely
alignment
of 37 G-proteill-linked
receptor
illustrated
in Fig. 2. In addition,
live positions
cated
where
the residue
has
been
sequences.
The table also shows
(according
to the criteria
the sequences,
conserved
sequences,
m the alignment
conserved
in an
part of which
is
are indi-
in all but one of the
17 positions
where
residue
character
set out in Fig. 2) has been conserved
and four positions
where character
in all of
has been conserved
all hut one of them. The three letter aa code is used. Numbering
in
as for
ovine rhodopsm ’ Minus one the first three
(-1) indicates
transmembrane
background ofthe figure, an asterisk denotes identities in adjacent sequences; a dot indicates adjacent or similarities in sequences further down the stack; and a dash indicates insertions within sequences. margin. Colours
conservation
sequences
conservation segments,
svccpt
generated
in one sequence. using SOMAP.
In the
similarities; an exclamation mark shows identities Protein identi~cat~on codes are shown on the left
(i) polar positive - blue; (ii) polar negative -red;
(iii) polar neutral - green; (iv) non-polar
- grey; (v) non-polar aromatic-purple; (vi) Pro or Gly - brown; (vii) Cys - yellow. Note how striking are completely conserved residues or residue groups, and how readily apparent are conserved hydrophobic (gray) or hydrophilic (blue/green) patches, such as those in the transmembrane and loop regions. respectively.
Sequences
blue-sensitive
opsin,
OOFF2, A2-B
receptor
muscarinic
receptor,
human;
are as follows: OOHUR,
rhodopsin,
human;
OOBO,
opsin RH3, fruit fly; OPS4$DRO,
SWTASHUM,
S-hydroxytryptamine
red-sensitive rhodopsin,
opsin, human; bovine;
OOHUG,
OOSH,
opsin RH4, fruit fly; SOO610, 1A receptor,
human;
green-sensitive
rhodopsin,
sheep;
rhodopsin,
SHTC$RAT,
opsin, human;
OOFF,
rhodopsin,
Doflein’s octopus;
5hydroxytryptamine
OOHUB, fruit fly;
A2BAR$HU,
IC receptor,
rat;
dopamine D2 receptor, rat; PLAZISHUM, platelet r2 receptor, human; AZARSHUM, r-2 adrenergic receptor, human; BlAR$HAM, a-l receptor, hamster; Bl ARSHUM, /?-I-adrenergic receptor, human; BlAR$MEL, p- a d renergic receptor, turkey: BZAR$HUM, p-2-adrenergic human; HUMBARR, /%adrenegic receptor, human; RATADBC, /?-adrenergic receptor, rat; QRHYBZ, p-2-adrenergic receptor, hamster;
ACM3SRAT, receptor
OPSD$HIJM,
opsin 2, fruit fly; OPS3SDRO.
adrenergic
DZSRAT, adrenergic
included in the alignment human;
muscarinic
acetylcholine
Ml, rat; ACM l$PIG, acetylcholine
M2, pig: RATCHRM, acetylcholine receptor
receptor
muscarinic
receptor
M3, rat; A29476,
acetylcholine
M5, rat; ACM2$HUM,
receptor
muscarinic
acetylcholine
Ml, pig; ACMS$HUM,
muscarinic
aeetylcholine
receptor
receptor muscarinic
M4, rat; ACMISRAT, acetylcholine
M2, human;
ACMZSPIG,
receptor
muscarinic
acetylcholine
M5, human;
ACMSSRAT,
muscarinic
acetylcholine
muscarinic chohnergic receptor, rat; ACM4~HUM, muscarinic acetylcholine receptor M4, human; ACM4%RAT, M4, rat; SKR$BOVI, substance-K receptor (skr), bovine; TVHUAS, transforming protein (mas), human.
receptor muscarinic
allowed incorporation of the muscarinic sequences into the opsinjadrenergic alignment and addition of further sequence information to the discriminators. The database was then searched for the more weakly homologous receptors and several further sequences, including those of the substance K and the mas oncogcne, were thereby added to the alignment. yielding a super-family of 37 sequences. This iterative process has generated a potent set of discriminators for trans~~lenlbranc he&s with .~~?~f~j~~~, rhodopsin-like features. I-Iowever, as a conscqucncc of the general hydrophobic nature of thcsc regions, thcsc matrices could also crudely identify ‘similar’ tra~~sn~e~~br~~~~eheliccs that are not rhodopsin-like. This point is illustrated in Fig. 1, f and g, where several transmembrane helices appear to have been identified but which, according to the multiple :tlignment criteria. are not sufficiently \veli-defined to be described as rhodopsin-like. Fig. 1h shows an osample of a ~l~~enl(~giobin sequence scanned with these discritl~in~~tors to reinforce the point that they do /w[ discril~~i~~~~tefor nctii-transmembrale helices. (b) Analysis of the alignment I”, representation of the final alignment was made in colour (Fig. 2). This facilitated rapid visual inspection ofthe seyuences. which, for an alignment of this size in monochrome, would have been extremely difficult. since significant features or residue patterns \vere not obvious. It is important to emphasize that the USC of colour is not an artistic whim; but is integral to the approach used here. It provides an extremely powerful, flcxibic method fbr rapid interpretation of nlultiple alignments, since ~~~pr~)pri~~te choice of colour allows any user-defined properties to be illustrated in an in~~~lediately informative \vay. no matter hoof large the alignment. For the purposes of this analysis, the alignment was colourcd according to rcsiduc charactct using the criteria listed in Fig. 2. The first, most striking observation was the apparent hydrophilicity of certain of the transmembrane regions, especially in some of the muscarinic receptors. However, with the use of helical wheels, the disposition of virtually all of these ~~ydroph~jic residues could be reconciled with the standard 3,3 alternating pattern character%& of amphipathic helices. but this was o@ true in specific cases (particularly helix VII), if the distortions from regular m-h&city predicted by the rhodopsin model (Findlay and Pappin, 1986) were taken into account. This tends to reinforce the validity of at least some aspects of the model. It was now possible to make adetailed examination ofthe alignment for conserved features of diagnostic importance. Such conserved features and identities were noted for each of the principal sub-families, differences between them perhaps suggesting areas of ,~~~~~~u~~ff~ significance, residues conserved in all three families giving strong indications of
critical features of st~ct~~ul importance. As a means of refining, or correcting, the model for ovine rhodopsin, which was constructed when this family of receptors was not cvcn suspected, such information should prove to be extremely valuable. This analysis revealed 14 complete identities, five identities except at one position, 17 locations with completely conserved character, and four with conserved character except at one position (Table I). Strictly conserved features included: two Cys, believed to form a disulphide bridge in ovine rhodopsin, one Giy, two ASH, an Arg in helix III, a Glu in helix VI, and a Leu in helix 1. Of particular interest was the occurrence of paired Pro md of aromatic residue5 in helices IV. V, VI and VII. (c) Comparison
of sequence information
with structural
information
The full value of this analysis can only properly be reaiised by a superp~~s~tion of the inf~~rIllatio~1 it kicids onto the model for ovine rhodopsin. It is thus possible to establish the rctativc positions in space of each of the conscrvcd and,or semi-conserved residues and consequently to begin making value judgments concerning the model itself. The picture that emerges is very revealing and indicates a distinct pattern of residue conservation : virtually a11conserved features appear in the transmembrane and cytoplasmic regions of the moiecule, i.e., not at the extracellular face. From this we infer that conservation in the transmembrane regions will have a predotninant~y .st~zi(,tl~~~~/ role. perhaps nlaintaining the integrity of helix packing, or contributing to the fairly rigid framework defining the hits of the binding pocket. We would mention, in this context, the four prolines that appear to surround the putative ligandbinding site, and the three aromatic residues on helix VI that present, in effect, an aromatic back-drop behind the chromophore, sealing a potential gap created by the possible distortion of helix VII, associated with the polar triplet adjacent to the chromophore attachment site (Lys’““). Conversely, the disparity of conserved residues falling at the cytopiasmic and extracellular faces carries a ,~~~~z~ti(~~7~i~ inference, i.e., is consistent with a common interacti~3n, ;tt the cytoplasmic end, with G-proteins: the acidic-Argaromatic triplet (aa residues 134- 136) at the top of helix I I I should prove to be a particularly strong candidate for such interactions, others that might be important being G1uZ4’. ]_,ys’4H and Arg314. It was further interesting to note that: (i) conserved polar residues that fell within the transmembrane regions were almost always positioned on internal or interfacial sides of the helices; one interesting group of polar residues could well form an interacting triplet, namely Asn5’ and Asn’“’ (or Asp in two ~~~~s~~~?~~lu sequences), which may sandwich the almost totally conserved intralne~~br~~nous negative
159 charge, AspX3; (ii) all but one of the conserved aromatics were located on external faces, closer inspection of the model revealing that all resided on the helices that house the four prolines; the sole internally oriented aromatic was that associated with the acid-Arg-aromatic triplet on helix III; and finally, (iii) a series of conserved nonpolar and aromatic residues, at the cytoplasmic end of the molecule, appear to form a kind of collar, defining, in effect, an apolar interface between the inner-membranous and the headgroup portions of the helices in this region. It is not unreasonable to suppose that the conformational changes that precipitate activation of the receptor will be largely the same throughout the family. Which of the residues identified above are key elements in this process is unknown.
ACKNOWLEDGEMENTS
Dr. T.K.A.
In future, however, it should be possible to look for even more distantly related receptors. At that stage, it would seem advisable to evaluate the diagnostic performance of our analysis, and relative operating characteristic methodology (Metz, 1986; Swets, 1988; D.J. Parry-Smith and J.C. Wootton, unpublished data), might provide the means of achieving this. The approach identified here could quite easily be adopted for other extended families; the sugar transporters and the mitochondrial carriers are obvious examples. Having achieved good alignment and residue analysis, attempts to build 3D models for the proteins will be facilitated. In the case of the G-protein receptors, the development of these discriminators has greatly aided the construction of models for a number of the subfamilies and this in turn has allowed more rigorous analysis of the 3D representation.
for providing
Institute
for Molecular
the funds
that supported
REFERENCES Akrigg,
D., Bleasby,
Parry-Smith, Hayes,
A.J., Dix, N.I.M.,
D.J., Wootton,
F., Islam,
Findlay.
J.C.,
S., Sternberg,
and Murray-Rust,
J.B.C., North, T.L.,
Blundell.
M.J.E.. Thornton,
P.: A protein
A.C.T.,
Gardner,
S.P.,
J.E., Tickle,
sequence/structure
database.
I.J.
Nature
335 (1988) 745-746. Barton,
G.J. and Sternberg,
M.J.E.: Flexible protein
A sensitive method to detect weak structural
sequence
similarities.
patterns.
J. Mol. Biol.
(I 990) 389-402.
2 I2
(d) Conclusions Arguably, the most straightforward method for multiple sequence alignment would be the use of one of the public11 available autnrnatic packages, such as CLUSTAL (Higgins and Sharp, 1988). However, where a 3D x-ray structure is unavailable and sequence homology is low (20% or less), the interpretation of results from automatic alignment procedures must be treated with considerable caution, since an alignment rationale based specifically on maximising identities and minimising gaps need not necessarily provide a meaningful result in a strictly biological context. The use of pattern-matching discriminators to aid the alignment of sequences with low homology, in this study at least, has proved extremely successful. It should be emphasised that this was not intended as a quantitative study, but rather as an approach specifically to assist uiignment. To date, WC have confined our analysis to those sequences for which there is already a weight of evidence from other sources placing them all in the same rhodopsin-like family.
is grateful to Glaxo
Biology, Geneva. this study.
Birnbaumer,
L.: G proteins
col. Toxicol.
in signal transduction.
Annu. Rev. Pharma-
30 (1990) 675-705.
Casey, P.J. and Gilman,
A.G.: G-protein
involvement
in receptor-effector
coupling. J. Biol. Chem. 263 (6) (1988) 2577-2580. Dayhoff, M.O.: A model ofevolutionary change in protclns. detecting
distant
Protein
Sequence
medical Dohlman,
relationships. and
Research
Structure.
Foundation,
H.G.. Caron,
coupled
In: Dayhoff,
to guanine
Vol. 5, Suppl.
Washington
Matrices
M.E. (Ed.),
for
Atlas
3. National
of
Bio-
DC, 1978, pp. 345-358.
M.G. and Lefkowitl.
R.J.: A family ofreceptors
nucleotide
proteins.
regulatory
Biochemistry
26
(IO) (1987) 2657-2663. Findlay.
J.B.C.
Biochem.
and
Pappin,
D.J.C.:
The
opsin
family
of proteins.
J. 23X (1986) 625-642.
Findlay, J.B.C., Pappin, D.J.C. and Ehopoulos, E.E.: The primary structure, chemistry and molecular modelling ofrhodopsin. Prog. Ret. Res. 7 (1988) 63-87. Gribhkov,
M., McLachlan,
detection
ofdistantly
A.D. and
Eisenberg,
related proteins.
D.: Profile
analysis:
Proc. Natl. Acad. Sci. USA X4
(1987) 4355-435X. Higgins, D.G. and Sharp. P.M.: CLUSTAL: a package multiple sequence alignment on a microcomputer.
for performing Gcnc I (198X)
237-244. Kcrlavage,
A.R., Fraser.
evolution
and cholinergic
receptors.
structure
and
Prot. Struct. Funct.
1 (1986) 2X7-301.
Genet. Lathrop,
C.M. and Venter. J.C.: hlolecular
of adrenergic
R.H.. Webster,
inference
T.A. and Smith. T.F.: Ariadne:
and hierarchical
nition. Commun.
abstraction
pattern-directed
in protein
structure
rccog-
ACM 30 (1987) 909-921.
Metz. C.E.: ROC methodology
in radiologic
Imaging.
Invest.
Radiol. 21
(9) 720-733. Parry-Smith, sequence Sardet,
D.J. and Atwood,
T.K.:
alignment.
(1990) Submitted.
C., Tardieu,
rhodopsin:
A. and
Luzrati,
V.: Shape
and
to multiple
size of bovine
study of a rhodopsn-deter-
J. Mol. Biol. 105 (1976) 383-407.
H.O., Annau,
T.M. and Chandrasegaran,
motifs in groups of functionally Sci. USA X7 (1990) 826-830. Swets, J.A.: Measuring
the accuracy
(1988) 1285-1293. Taylor, W.R.: Identification template
A novel approach
a small angle x-ray scattering
gent complex. Smith,
CABIOS
alignment.
of protein
related
S.: Finding
proteins.
of diagnostic sequence
sequence
Proc. Natl. Acad.
systems. homology
J. Mol. Biol. 18X (1986) 233-25X.
Scicncc
240
by consensus