www.proteomics-journal.com

Page 1

Proteomics

VIEWPOINT The twentieth anniversary of Proteomics and some of its origins.

Ian Humphery-Smith1 1

Skolkovo RusInnovations, Posledniy Pereulok, 17, 107045 Moscow, Russian Federation.

Abstract: The term ‘proteome’ was first introduced into the scientific literature in July 1995. Almost 20 years ago attempts to characterize the ‘total protein complement able to be encoded by a given genome’ only became possible due to privileged access to what were then the world’s most complete sets of genomic data. Today, proteomics has become an important pillar in the fields of disease diagnosis and drug research and development, while also playing a critical role in the much larger field of Healthcare Analytics and Biomarker Discovery and Detection. It is important to note that this industry originated mostly from building blocks in analytical science that predated the term “proteomics” by many decades. However, proteomics, as a discipline, has allowed protein scientists to more favourably compete in the face of highly-fashionable Big Science and, more specifically, genomics. Keywords: Proteome, whole genome analysis, historical context, replication-induced protein synthesis, signature peptides At the first Gene-to-Protein conference in Siena, Italy, in August 1994, my co-authors and I made last-minute additions to a poster (two pieces of A-4 paper stuck on the bottom of our poster) that showed more protein identifications than most other groups in Siena had been able to achieve in total. Thus, upon our return to Australia, we realized that we should rush to conclude a manuscript based on our work in Mycoplasmology and exploiting the term, ‘Proteome’. As a result, the first use of the term “PROTEOME” in the scientific literature occurred in July 1995, now almost Twenty Years ago [1]. The article was submitted in late 1994 and it was given the honour of being the 2000th article published in the scientific journal “Electrophoresis” by the then Chief and Founding Editor, Prof. Bertold J. Radola. One of the Editors, Mike Dunn, requested that we define this term as part of the scientific results being reported. As corresponding author, I completed this task as an addition to manuscript proofs. A graduate student at the University of Macquarie, Mark Wilkins, had thought-up the idea of a new word to define what we were doing in tracking the total protein content of whole living cells, namely, the simplest living organism on the planet earth, Mycoplasma genitalium. The article turned-out to be Mark’s first scientific publication at a time when his major Received: 08-Dec-2014; Revised: 19-Jan-2015; Accepted: 12-Feb-2015 This article has been accepted for publication and undergone full peer review but has not been through the copyediting, typesetting, pagination and proofreading process, which may lead to differences between this version and the Version of Record. Please cite this article as doi: 10.1002/pmic.201400582. This article is protected by copyright. All rights reserved.

www.proteomics-journal.com

Page 2

Proteomics

passion in life was the extracellular matrix of the social amoebae, Dictyostelium discoideum. Earlier notions we discussed included ‘proteo-genome’ for which ‘proteome’ became a logical contraction. It is noteworthy that researchers from several Australian Universities contributed to obtaining the various analytical results that allowed accurate identification of our Mycoplasma proteins.

As first defined: “The ‘Proteome’ refers to the total protein complement able to be encoded by a given genome”. In the discussion, we went on to mention that it was “unlikely that the totality of this potential for protein expression will be realised at any one given instant....” . This definition was intended to provide a definable outcome and highlight an important distinction between more traditional protein science and an underlying need to deliver genome-wide gene-product mapping at the level of proteins, i.e. so as to more effectively compete head-to-head with genomics. This concept continues to be over-looked by many in the proteomics community today [2, 3]. While this manuscript was in preparation, a special issue of the Journal of Proteomics appeared in August 2014, celebrating the 20th anniversary of both the well-known Siena meeting and the birth of the term ‘proteomics’. Several personal accounts of some of the major players in the field are worth reading [4-9], while underlying the fact that such historical surveys are mostly devoted to proteomics, whereas the present review embraces genomics and transcriptomics as well. It is noteworthy that the term ‘transcriptome’ did not come into use until 1997 [10]. It is arguable, therefore, that the subsequent ‘omics’ explosion may have been initially stimulated by this second son of genomics, ‘proteomics’.

The associated article (Figure 1) focused on delivering high levels of statistical confidence for the protein identifications. The latter were based on N-terminal protein sequence; amino acid composition analysis (some 16 numerical parameters); peptide mass fingerprinting (a half dozen or so more numerical parameters) and observed with respect to predicted Mr and pI. Even with what appeared at the time as high numbers of orthogonal data points combining to provide statistical confidence, this confidence in our research findings was only made possible by an important event that took place late in 1992. In late 1992, we had requested and received all the fruits of the first serious effort to sequence a whole genome under the auspices of the Human Genome Project, namely, the DNA content of Mycoplasma capricolum. This data did not become publicly available until almost two years later. The Nobel Laureate, Walter Gilbert and his team at Harvard University had been given the honour and serious financial support towards this first effort at whole genome sequencing. However, and with some 27 people on board, the project was closed-down earlier that same year. At the time, Walter Gilbert was most upset about this turn of events, but was happy to pass on the fruits of his endeavours to our small group in Australia. For the cost of one bottle of single malt whiskey, I was able to have all this DNA sequence data parsed into FAST-A format by the team of Alex Reisner at the Australian National Genome Information Service, a spin-out group from our Department of Microbiology at the Sydney University driven by Peter Reeves. Thus, in December 1992, as a new university recruit with a young team of graduate students at the Sydney University, we had become the proud owners of the world’s most complete

This article is protected by copyright. All rights reserved.

www.proteomics-journal.com

Page 3

Proteomics

DNA sequence anywhere and this for a congeneric Mycoplasma species. We then set about complementing this information with that from other DNA sequencing initiatives, namely, data in a privileged manner via Claire Fraser at TIGR for Mycoplasma genitalium and similar DNA sequence data from Heidelberg University via Richard Hermann for Mycoplasma pneumoniae. This high level of DNA genomic coverage made meaningful proteomic analyses feasible for the first time and then only in the simplest living organisms on the Planet. These approaches complemented expertise acquired with the same simplest of life forms, the Mollicutes, using 2DGE and advanced image analysis in France in the years since 1987. The French work focused on understanding multigenic phenomenon linked to bacterial pathogenicity, which, we knew, ran to dozens of genes. Here, proteins offered an attractive and alternate avenue for deciphering molecular complexity, i.e. as opposed to the time-consuming molecular techniques in use at the time. Walter Gilbert later opened the world’s first ‘by-name’ Proteomics Laboratory on the 26th of March 1996 at the Australian National Innovation Centre, a facility financed by Glaxo, then the world’s largest pharmaceutical company. It was one of the highest-end specification laboratories in the country and featured designer offices for students and managers alike in what had originally been a Railroad-engine manufacturing site in Redfern at the Australian Technology Park in Sydney. We were founding tenants. The Laboratory also played home to the Sydney University Protein And Macromolecular Analysis Centre (SUPAMAC), who went on to service some 120 institutes nationally for their DNA sequencing requirements. The final decision by Glaxo to invest in our work was made in August 1995. This decision was heavily influenced by a ‘pre-print’ from the journal Science that had been forwarded earlier to Glaxo executives. This editorial comment from July 1995 (almost the same day as ‘proteome’ came into existence) coincided with the publication of the DNA sequence of Haemophilus influenzae by a group led by Craig Venter at TIGR [11]. The Science editorial stated very clearly that the molecular workhorses of cells were the gene-products, namely, proteins (and RNA) and thus DNA was only a starting point for understanding molecular function in health and disease. Another of Walter Gilbert’s major contributions to proteomics remains a little-known observation with respect to his work on the ‘lac-operon’ and more specifically the ‘lac-repressor’. His experimental observations underwrite the best evidence as to the existence of Replication-Induced Protein Synthesis (RIPS) and the molecular knowledge that during cellular division, a few protein gene-products will be created for every single ORF in a genome. (NB: Not all possible posttranslational modifications across organismal tissue space and time, but at least protein synthesis for each and every ORF within a genome). The starting-point cellular matrix post-division, the half-life of those proteins derived by RIPS, additional protein synthesis and decay over time and posttranslation modifications dictate the abundance and nature of cellular proteins. The existence of RIPS remains all-important to proteomics and our potential to follow near-to-total coverage of cellular protein content, at least in rapidly replicate cells [12]. During cellular division, RIPS occurs following the passage of the replisome due to transcription initiated by RNA polymerase in association with: 1) negative supercoiling trailing the replisome / replication fork; 2) hemimethylation prior to the action of deoxyadenosine methylase; 3) transient de-repression following passage of the replisome / replication fork and prior to renewed synthesis of the repressor gene-

This article is protected by copyright. All rights reserved.

www.proteomics-journal.com

Page 4

Proteomics

product; and 4) 'sliding clamp' accessory DNA-binding proteins binding to the lagging strand DNA duplex to retard rotational upstream propagation of supercoils. Mycoplasmas turned out to be great taskmasters. When the fully sequenced genome of M. genitalium came out in October 1995 [13], we learned to our dismay that what we thought just a few months earlier had been 75% coverage of the protein complement of M. genitalium (based on predicted gene number from the most-complete DNA sets available) was indeed far far less than anticipated (i.e. < 30%). Elsewhere in protein science, groups were increasingly pleased with their prowess at high-throughput protein characterization. Mycoplasmas kept telling us that the technics we were employing were not very good, not sufficiently sensitive and not sufficiently scalable to deliver true high-throughput proteome screening with near-to-total proteomic coverage. It took us another five years to catch-up to where we thought we were in July 1995, i.e. by way of protein coverage in this the simplest of living organisms using the same traditional technologies [14]. Thus, in 1996, we started working on alternative strategies for protein display that could move beyond 2DGE by way of total protein coverage. The technique of 2DGE had been a cornerstone of protein screening for many years. We went-on to exploit protein chip technology using the principal of parallel monoclonal antibody synthesis and parallel specificity testing with a view to producing at least one reliable affinity ligand for synthetically-derived protein gene-products for each Open Reading Frame in a given genome [15]. Very rapidly, Mycoplasmas also taught us that all peptides were not all equal in proteomic space and thus the concept of ‘Signature Peptides’ emerged [16]. These statistically-significant peptides are found in all proteomes; are conserved by evolution across species boundaries; and are invariably linked to important biological function. They occur in both two-dimensional and threedimensional space. The latter are notably not detectable routinely by mass spectrometry, for example, the catalytic triad. (A catalytic triad is an extreme example whereby three amino acid residues function together at the centre of an active site of certain hydrolase and transferase enzymes. This active site is typically not derived from contiguous linear amino acid sequence, but rather due to conformational proximity brought about by protein folding in three-dimensional space, i.e. to produce a biochemically active site). By the time the first “Human Proteome Project: Genes Were Easy” meeting was held in Virginia in February 2001, enthusiasm for the ‘next phase of the Human Genome project’ had become more fashionable; but even with what appeared as widespread enthusiasm, it remained difficult to secure buy-in from leading lights in the world of proteomics during the preceding 18 months. This changed on 15 February 2001 with the publication of the Human Genome sequence by Celera Genomics and a call for the next phase of this project to include proteome screening, i.e. And now for the proteome ... [17]. The following day, a formal press release was issued to coincide with the ‘completion’ of the Human Genome, namely, the creation of HUPO backed by a 21-member inaugural council. Today, proteomics has become an important pillar in the fields of disease diagnosis and drug research and development, representing some $4bn -$5bn USD in sales globally per annum (http://bccresearch.blogspot.ru/2014/04/global-markets-for-proteomics-toreach.html#.VK_CpCusUXw ). It also constitutes a critical part of the much larger Healthcare Analytics and Biomarker industry valued in the $10’s of billions USD per annum. It is important to This article is protected by copyright. All rights reserved.

www.proteomics-journal.com

Page 5

Proteomics

note that this industry was been built mostly upon building blocks in analytical science that predated the term “proteomics” by many decades. However, proteomics, as a discipline, has allowed practitioners to more favourably compete in the face of highly-fashionable big science and, more specifically, genomics. This has facilitated access to increased levels of research funding, investment into the sector and the development of international consortia encompassing both private and public sectors. For proteomics, this has perhaps been most effective as somewhere for practicing protein scientists to ‘hang their coats’, namely, an unifying tag or umbrella, which still has much to offer all of biology and medicine in the years to come. The author has declared no conflict of interest. ######## References [1] Wasinger, V.C., Cordwell, S.J., Cerpa-Poljak, A., Yan, J.X. et al., Progress with gene-product mapping of the Mollicutes: Mycoplasma genitalium. Electrophoresis 1995, 16, 1090-1094. [2] Zubarev, R.A., The challenge of the proteome dynamic range and its implications for in-depth proteomics. Proteomics, 2013, 13,723-726. [3] Humphery-Smith, I. A human proteome project with a beginning and an end. Proteomics, 2004, 4, 2519-2521. [4] Calvete, J.J., Bini, L., Hochstrasser, D., Sanchez, J-C., Truck, N. The magic of words. J. Proteomics, 2014, 107, 1-4. [5] Rabilloud, T. Paleoproteomics explained to youngsters: how did the wedding of two-dimensional electrophoresis and protein sequencing spark proteomics on: Let there be light. J. Proteomics, 2014, 107, 5-12. [6] Williams, K.L., Gooley, A.A., Wilkins, M.R., Packer, N.H. A Sydney proteome story. J. Proteomics, 2014, 107, 13-23. [7] Anderson, L. Six decades searching for a meaning in the proteome. J. Proteomics, 2014, 107, 2430. [8] Celis, J.E. A personal account of the early stages of proteomics at Aarhus University. J. Proteomics, 2014, 107, 31-38. [9] Righetti, P.G. The Monkey King: A personal view of the long journey towards a proteomic nirvana. J. Proteomics, 2014, 107, 39-49. [10] Velculescu, V.E., Zhang, L., Zhou, W., Vogelstein, J. et al., Characterisation of the yeast transcriptome. Cell, 1997, 88, 243-251. [11] Fleischmann, R.D., Adams, M.D., White, O., Clayton, R.A. et al., Whole-genome random sequencing and assembly of Haemophilus influenzae Rd. Science, 1995, 269, 496-512. [12] Humphery-Smith, I., Replication-induced protein synthesis and its importance to proteomics. Electrophoresis 1999, 20, 653-659. [13] Fraser, C.M., Gocayne, J.D., White, O., Adams, M.D. et al., The minimal gene complement of Mycoplasma genitalium. Science, 1995, 270, 397-403.

This article is protected by copyright. All rights reserved.

www.proteomics-journal.com

Page 6

Proteomics

[14] Wasinger, V.C., Pollack, J.D., Humphery-Smith, I. The proteome of Mycoplasma genitalium. Chaps-soluble component. Eur. J. Biochem. 2000, 267, 1571-1582. [15] Albala, J.S., Humphery-Smith, Protein arrays, biochips and proteomics. The next phase of genomic discovery, 2003, Marcel Dekker, New York, pp 1-422. [16] Karaoglu, H., Humphery-Smith, I., Signature peptides. From analytical chemistry to functional genomics. Methods Mol. Biol., 2000, 146, 63-94. [17] Abbott, A. And now for the proteome. Nature, 2001, 409, 747.

Correspondence: Ian Humphery-Smith, Skolkovo RusInnovations, 17 Posledney Pereulok, 107045, Moscow, Russian Federation.

E-mail: [email protected]; Fax: +7 495 668 1300 Abbreviation: RIPS, replication-induced protein synthesis

This article is protected by copyright. All rights reserved.

www.proteomics-journal.com

Page 7

Proteomics

Figure 1: First use of the term, ‘proteome’, (circled) within the scientific literature, as displayed in PubMed.gov

This article is protected by copyright. All rights reserved.

The 20th anniversary of proteomics and some of its origins.

The term "proteome" was first introduced into the scientific literature in July 1995. Almost 20 years ago attempts to characterize the "total protein ...
610KB Sizes 1 Downloads 18 Views