search
for
 About Bioline  All Journals  Testimonials  Membership  News


Australasian Biotechnology (backfiles)
AusBiotech
ISSN: 1036-7128
Vol. 6, Num. 3, 1996
Australasian Biotechnology,
Volume 6 Number 3, May/June 1996,pp.162-167

Analytical Biotechnology and Proteome Analysis

By Keith L. Williams, Andrew A. Gooley and Nicki H. Packer,

Macquarie University Centre for Analytical Biotechnology (MUCAB), School of Biological Sciences, Macquarie University, Sydney, NSW, Australia 2109. ph 61-2-98508212,

fax 61-2-98508174 email:keith.williams@mq.edu.au


Code Number: AU96002
Size of Files:
    Text: 21.0K
    Graphics: Photographs (jpg) - 41.7K

[TABLES AND FIGURES AT END OF TEXT]

Developments in analytical biotechnology, combined with ready access to DNA sequence databases and bioinformatics have opened up a new field in biology: Proteome studies. Proteome means the Protein complement expressed by a genome in a subcellular fraction, cell, tissue, organism etc). This article discusses the new Proteome analytical biotechnology instruments that are revolutionising protein science.

Protein identification was a slow and tedious art

Until recently, the identity of a protein was established by protein sequencing using Edman degradation in which amino acids are cleaved from the protein sequentially starting at the N-terminus. This involved use of preparative chromatography to obtain sufficient pure protein for the protein sequence analysis. It was not unusual for an experienced post-doctoral fellow to spend 1-2 years purifying a particular protein, prior to the commencement of the interesting work of characterising it using analytical instrumentation. Protein sequencing equipment is still the sharp edge of analytical biotechnology, but it is expensive, the reagents costly, and the technology slow (20-30 amino acids from a protein in a day). For several decades this has been the definitive way to confirm the identity of the protein if it was already known, or to establish its identity if it was a new protein.

Cutting Corners

With the advent of monoclonal antibodies and western blotting, a protein could be identified without purification using polyacrylamide gel electrophoresis (PAGE) and Western blotting, provided a specific antibody was available. This simple analytical instrumentation provided a quick identity, but it allowed only limited further study of the protein. Subsequently Matsudaira (1987) reported Edman sequencing of proteins transferred to polyvinylidinedifluoride (PVDF) membrane after PAGE. This revolutionised protein sequencing as it meant that a partially purified mixture of proteins could be separated on PAGE if they were of different apparent molecular mass. They could then be sequenced as individual bands from PAGE transferred to PVDF membrane.

The next development in the methodology of the PVDF sequencing of proteins was to marry it to a 20 year old technology, that of separating proteins by 2-dimensional (2-D) gel electrophoresis. Using this technology, proteins are first separated according to their isoelectric point, and then in a second dimension by apparent molecular mass. With the advent of highly reproducible 2-D gels using immobilised pH gradients and the ability to load mg quantities of protein on a single 2-D gel (Bjellqvist et al. 1993), a revolution in protein science has arrived. Once a preparative 2-D gel of a tissue has been run, several hundred or even thousands of proteins have been purified in high ng to low ug quantities, in a single step.

This is a revolution which opens the way for the development of a new suite of analytical biotechnology instruments.

Protein Identification using Attribute Matching

The above discoveries, which have come together in the past three years, mean that no longer is protein purification an issue. Hundreds or thousands of proteins are purified in a single step; hence the creation of a new field of Proteome research where the protein complement of complex tissues (or even whole organisms) is now accessible for study. The old way of protein identification using Edman protein sequencing remains a major stumbling block because it is still expensive and slow.

While the protein revolution has been quietly beginning another much more public revolution has occurred. This is the field of genome sequencing. The complete DNA sequence of a small group of organisms has been completed, and even the mammoth task of sequencing the human genome is likely to be completed within a decade (or at least all of the gene products identified and sequenced). The DNA genomes of Mycoplasma genitalium (Fraser et al. 1995), Haemophilus influenzae (Fleischmann et al. 1995), Escherichia coli and Saccharomyces cerevisiae have been fully sequenced or will be finished very soon. This means that, for these organisms at least, all of the proteins have been sequenced (using DNA techniques). There remains the task of identifying which proteins are made when and where as well as how they are modified.

The 2-D gel presents the purified proteins, but we have seen already that Edman sequencing is too slow and expensive to be the sole tool for protein identification in Proteome screening. For the organisms whose DNA has been sequenced, the complete sequence of every protein lies in the relevant DNA database. It is a simple task to extract from the DNA database the sequence of every protein. Complexities arise of course due to post-translational modifications.

Using a bioinformatics approach a protein can be described in a number of ways (see Table 1). By determining several features of a protein it is possible to identify it without necessarily needing to obtain an extended amino acid sequence (Wilkins et al. 1995). The DNA database (translated into protein) is searched for the combination of the parameters. Some of these features are obtained as part of separating the proteins using 2-D gels (isoelectric point, apparent molecular mass), while others can be rapidly obtained using automated or semiautomated analytical biotechnology instrumentation (eg amino acid composition, peptide mass fingerprint). Surprisingly even a single, simple technology such as amino acid analysis, which gives the composition of 16 amino acids, can give an accurate estimate of the protein under study, without recourse to any protein sequencing at all.

This approach using attribute matching is at the core of Proteome studies as it will allow the rapid identification of large numbers of proteins. The challenge for analytical instrument manufacturers is to design new instruments designed for such rapid screens as well as to work on interfacing them so that much of the routine work can be robotised.

Cross-species Matching

DNA genome sequencing is a major undertaking and most of the sequence information in the DNA databases has been obtained from a small group (approx 10) of organisms. In the absence of another breakthrough in DNA sequencing technology, it is unlikely that more than a small group of eukaryotic organisms will be sequenced at the DNA level. On the other hand the proteins of hundreds or even thousands of organisms can be readily displayed using 2D gel technology. The attribute matching algorithms (Table 1) can be used across species, although these studies are not so straightforward as the protein sequence differs between species (Wasinger et al. 1995). Nevertheless, a large proportion of proteins will be able to be identified using this approach. The relevance of this to the biotechnology industry is obvious, especially for those studying medically or agriculturally important animals, plants and microorganisms.

Hence the way is open for Proteome studies on any organism or tissue. Since a subcellular fraction, cell type, or tissue can be displayed on a 2D gel, the way is open to get a complete picture of both normal and diseased states. The implications for both basic research and biotechnological applications are major.

Post-translational Modifications

Knowing the DNA sequence of a protein tells you about the sequence of amino acids along the protein. It gives some guidance about where the protein might start and finish, but many proteins are cleaved at the N- or C-terminus with the result that the mature protein is different from the protein actually translated. In addition, many genes are spliced so that the same gene specifies very different proteins. Finally there are many post-translational modifications to proteins, some of which can be predicted from the DNA sequence, but all of which can be different in different cells/ types etc. The only way you can be certain of the precise nature of a protein (gene product) and its modifications in any particular tissue is to study the protein itself.

For example, glycoproteins are commonly observed as "fuzzy" bands on 1D gels, and this is attributed to heterogeneity of the glycoforms on the protein. A surprising outcome of 2D gel studies is that in many cases glycoproteins are much less heterogeneous than expected. Indeed most glycoproteins appear as a series of discrete spots on 2D gels. Different migrations in the first dimension of 2D gels are presumably due to charge differences in the different forms of the protein (eg differential phosphorylation, altered sialylation protein processing etc) although insufficient studies have yet been made to be able to predict from the migration pattern the change in post-translational modification.

Analytical Biotechnology Instruments for use in Proteome Research

The components of Proteome research are the 2D gels, the technology for protein identification and the means to study post-translational modifications to the proteins, all at the level of a single 2D gel spot (ie at ug amounts of protein). We have reviewed these advances recently (Wilkins et al. 1995).

2-D gels : This work involves developing appropriate extraction(s) of the proteins from the tissue/cells for each project and preparing firstly analytical 2D gels to establish a suitable separation, followed by a series of preparative 2D gels for protein chemistry analyses (including post-translational modifications). It is now routine to load and separate mg quantities of proteins.

Amino acid analysis : Amino acid analysis is a powerful identification technique for matching proteins in the databases (Wilkins et al. 1996). With Melbourne based GBC Scientific Equipment, a fully automated amino acid analyser (AminoMate) has been developed. This instrument uses FMOC chemistry and precolumn derivatisation for amino acid analysis. It performs multitasking and also prepares the results in a form ready for submission to the databases for interrogation. Protein identification is achieved by matching the composition of the 16 amino acids determined for the unknown protein, with the amino acid composition of all proteins in a database. A single HPLC system allows study of up to 150 proteins weekly. This system is very cost effective.

Phosphorylation : In our laboratory, phosphoamino acids have been detected from single 2-D gel spots, using a minor modification to the chromatography used for the amino acid analysis.

Monosaccharide analysis : Many eukaryotic proteins are glycosylated. We have developed a rapid screen for monosaccharides from single 2D gel spots. The technique involves sequential treatments to a single spot to release sialic acids, neutral sugars, and ultimately the sample is hydrolysed for amino acid composition analysis and hence protein identification (Packer et al. 1996).

N-terminal Tag sequencing : By detuning a protein sequencer and adding a multiple sample delivery system, we have developed an instrument which delivers a three amino acid N-terminal sequence tag in 1 hour (Wilkins et al. 1996). This instrument will handle at least 100 samples/week, thereby making mass protein screening, from the ultrapure 2D gel spots, possible and economic. After the 3 cycles of Edman chemistry the sample (on PVDF) is removed, hydrolysed and amino acid analysis conducted. The combination of the sequence tag (even a blocked N-terminus is informative), amino acid composition, pI and MW (estimated from the 2D gel) gives a very powerful group of parameters for identification of proteins in the database, or to show that a new protein has been encountered. It is also very useful to know where the protein actually starts, as many proteins in the DNA databases have inferred start sites.

Peptide mass fingerprinting : While amino acid analysis, N-terminal tag sequencing, pI and MW estimates produce successful identification of proteins in the DNA databases, further study is needed for a confident match when low abundance samples are being analysed. Peptide mass fingerprinting offers a rapid approach to protein identification at very high sensitivity (subpicomole levels) (Wilm et al. 1996). Here, proteins from a 2D gel, or after transfer to PVDF, are digested with a site specific protease (eg trypsin) to generate peptides. These are analysed by mass spectrometry to seek matches with the mass of (tryptic) peptides generated in the database by rearranging the entries in the database to display tryptic peptides rather than the whole protein. Only a few peptide matches are needed to identify some proteins, although complete peptide coverage of a protein is desirable for confident identification.

By combining the abovementioned characteristics (amino acid composition etc) with peptide mass fingerprinting, the success rate and confidence of protein identification is increased (Wasinger et al. 1995; Cordwell et al. 1995). Another advantage of peptide mass fingerprinting technology is that protein mixtures (eg 2 or more proteins chromatographing in the same spot on the 2-D gel) can be analysed. This technology is cheap once the capital equipment has been purchased. Mass spectrometry is a fast evolving technology that is likely to impact increasingly on Proteome studies, especially as the techniques become automated (Houthaeve et al. 1995).

Our approach to Proteome research is to use a hierarchy of techniques such that many proteins are screened quickly and cheaply, followed by more intensive (and expensive) work on a small group of proteins of special interest. Therefore, Proteome studies can be viewed as a sequence of operations.

Protein identification using mass spectrometry or Edman sequencing : While many of the proteins separated on the 2D gels will be identified in the quick screens described above, some will not. These will include proteins present at levels below the sensitivity of the above screening methods and of course proteins that have not been identified by DNA sequencing ("new" proteins, not found in the databases). These proteins will require more detailed study including MS/MS sequencing of peptides, and extended Edman sequencing using high sensitivity protein sequencing.

Biotechnological Applications

The exciting thing about a paradigm shift is that a whole new world opens up. This is certainly the case for Proteome studies. In the first instance there are exciting opportunities in development of new analytical instrumentation. Clearly the next step for mass screening applications of this technology is to interface and robotise the technology, and we predict this will be a major issue in the next five years.

Of course biotechnology is about using biology for technological developments, and so at the end of the day the advances have an impact on biology. We predict a major effect on discovery in biology from the new Proteome technology. Indeed any system which is complex is amenable to the Proteome approach. For example researchers studying subcellular organelles can look at the complexity of their system merely by purifying the organelle and subjecting it to 2-D electrophoresis, followed by protein identification and study of post-translational modifications. This technology is not only of relevance to the basic researcher as, effectively, any changed state (eg disease) can be studied to understand what has altered. With the establishment of APAF (Australian Proteome Analysis Facility; see story this issue), Australian scientists are well placed to be at the forefront of this revolution.

References

Bjellqvist,B., Sanchez, J-C., Pasquali, C., Ravier, F., Paquet, N., Frutiger, S., Hughes, G.J. and Hochstrasser, D.F. (1993b). Micropreparative 2-D electrophoresis allowing the separation of milligram amounts of proteins. Electrophoresis 14.1375-1378.

Cordwell, S.J., Wilkins, M.R., Cerpa-Poljak, A., Gooley, A.A., Duncan, M., Williams, K.L. & Humphery-Smith, I. (1995). Cross-species identification of proteins separated by two-dimensional gel electrophoresis using matrix-assisted laser desorption ionisation/time-of-flight mass spectrometry and amino acid composition. Electrophoresis 16.438-443.

Fleischmann, R.D. et al (1995). Whole-Genome Random Sequencing and Assembly of Haemophilus influenzae Rd. Science 269.496-512.

Fraser, C.M. et al (1995). The Minimal Gene Complement of Mycoplasma genitalium. Science 270.397-403.

Houthaeve,T., Gausepohl, H., Mann, Matthias, Ashman, K. (1995). FEBS Lett. 376.91-94.

Matsudaira, P. (1987). Sequence of picomole quantities of proteins electroblotted onto polyvinylidene difluoride membranes. J.Biol.Chem. 262.10035-10038.

Packer, N.H., Wilkins, M.R., Golaz, O., Lawson, M.A., Gooley, AA., Hochstrasser, D.F., Redmond, J.W. & Williams, K.L. (1996). Characterization of human plasma glycoproteins separated by two-dimensional gel electrophoresis. Bio/Technol. 14.66-70.

Wasinger, V.C., Cordwell, S.J., Poljak, A., Yan J.X., Gooley, A.A., Wilkins, M.R., Duncan, M., Harris, R., Williams, K.L. & Humphery-Smith, I. (1995). Progress with gene-product mapping of the mollicutes: Mycoplasma genitalium. Electrophoresis. 16. 1090-1094.

Wilkins, M.R., Sanchez, J-C., Gooley, A.A., Appel, R.D., Humphery-Smith, I., Hochstrasser, D.F. & Williams, K.L. (1995). Progress with proteome projects: why all proteins expressed by a genome should be identified and how to do it. Biotechnol. Genet. Eng. Rev. 13.19-50.

Wilkins, M.R., Pasquali, C., Appel, R.D., Ou, K., Golaz, O., Sanchez, J-C., Yan, J.X., Gooley, A.A., Hughes, G., Humphery-Smith, I., Williams, K.L. & Hochstrasser, D.F. (1996). From proteins to proteomes: large scale protein identification by 2-dimensional electrophoresis and amino acid analysis. Bio/Technology. 14. 61-65.

Wilkins, M., Ou, K., Appel, R.D., Sanchez. J-C., Yan, J.X., Golaz. O., Farnsworth, V., Cartier, P., Hochstrasser, D., Williams, K.L. and Gooley, A.A. - Rapid protein identification using N-terminal sequence tag and amino acid analysis. Biochemical and Biophysical Research Communications. (In Press).

Wilm,M., Shevchenki, A., Houthaeve, T., Breit, S., Schweigerer, L., Fotsis, T. and Mann, M. (1996). Femtomole sequencing of proteins from polyacrylamide gels by nano-electrospray mass spectrometry. Nature 379.466-469.

Acknowledgements

We thank the Australian Research Council, the Australian Medical Research Council, and our commercial partners (Beckman Instruments, GBC Scientific Equipment, Gradipore Ltd, Fisons Instruments) who have supported our development of Proteome research.

--------------------------------------------------------------
Table. 1.  Attributes of Proteins used in their
Identification
--------------------------------------------------------------
Isoelectric point 
Apparent molecular mass (from 2-D gel)  
Real molecular mass (by mass spectrometry) 
Amino acid composition 
N-terminal sequence 
C-terminal sequence 
Peptide mass fingerprint 
Internal sequence tags 
Staining characteristics 
Monoclonal antibody epitope 
Presence of modifications (phosphate, sulfate, sugars) 
--------------------------------------------------------------

    Figure 1. Two dimensional gel electrophoresis of Tammar Wallaby whey proteins. The first dimension IPG strip was of pl range 4 to 7. The second dimension slab gel was an 8-17% T gradient gel. Proteins were visualised by silver staining (courtesy of Mark Molloy. MUCAB)

Copyright 1996 Australian Biotechnology Association Ltd.


The following images related to this document are available:

Photo images

[au96002a.jpg]
Home Faq Resources Email Bioline
© Bioline International, 1989 - 2024, Site last up-dated on 01-Sep-2022.
Site created and maintained by the Reference Center on Environmental Information, CRIA, Brazil
System hosted by the Google Cloud Platform, GCP, Brazil