Photo by National Cancer Institute
The genome — the entire set of twirly helical DNA — carries the code for life in the genes. The genes dictate how an organism exhibits different traits and characteristics. However, genes alone do not provide us with all the information regarding the body’s diseases and disorders. The next level in decoding the genome is in analysing the proteome, which is the full set of proteins, including the peptides (shorter proteins) and amino acids (the basic unit of a protein), that are produced according to the information in the genes. These proteins are essential components of our cells and keep our body functioning.
The Human Proteome Organisation launched the Human Proteome Project (HPP) in 2010, a decade after the release of the decoded information about the elusive genome by the Human Genome Project. HPP is an international collaboration which aims to assemble, analyse and understand the molecular nature of the proteome. In a recent study, researchers from Human Proteome Organizations (HUPO), including researchers of Indian Institute of Technology Bombay (IIT Bombay), discuss the highly stringent standards of processing and classifying human proteins by the HPP. The study was published in the journal Nature Communications.
The Project has two main objectives. First, it aims to catalogue the parts of the complex human proteome by establishing reliable standards. Further, it intends to integrate proteomics as a necessary part of life science studies for understanding the myriad roles proteins play in diseases. HPP goes on to provide crucial biochemical data, such as the changes a protein undergoes after it is synthesised, which cannot be obtained from the genome alone.
“The progressing technology of bioinformatics depends largely on data analysis, and still faces limitations in false discovery of protein function. The growth of the proteomics arena has understood this problem and provided stringencies in terms of protein and peptide identifications,” says Prof Sanjeeva Srivastava from IIT Bombay, the lead author of this study from India.
HPP relies on four resources to enforce its stringency in correctly identifying and classifying the proteins. First, it uses antibodies to identify the proteins and, here, the Project details the antibody-based techniques to find the location and understand the role of proteins. Then, it employs mass spectrometry (MS), a method used to find out protein structure, adhering to certain standards for the instruments and workflow employed in processing the raw MS data.
For the third resource, HPP relies on proteins’ pathology to provide the necessary epidemiological evidence, access to clinical samples, and diagnostic regulatory policies to find out the proteins responsible for several disorders. Finally, it compiles all this information as a knowledge base (KB) containing all the structural and functional information about proteins and makes it available to the community. One such KB, neXtProt, includes information about MS data (obtained from various other databases), antibody data, the interaction among proteins and the influence of the genes.
The neXtProt database classifies the existing proteins into five classes of credibility called Protein Existence (PE) levels. PE 1 level includes proteins that have clear experimental evidence for their structure and function. PE 2 level includes proteins whose structure and function have not been entirely identified. PE 3 level tells us about the possible similarity between proteins. PE 4 level proteins only provide the precursor genomic data. PE 5 usually consists of the incorrectly analysed proteins. As of 2020, 90.4% (approximately 17900 proteins) of the human proteome have credible PE1 evidence. This leaves the remaining 9.6% (about 1800 proteins, also called the “missing proteome”) of our proteome at PE2, PE3 and PE4 levels, which are yet to be identified at high stringency.
Protein assay tests have always been used in medical diagnostics, and are prone to inaccuracy. The tools of proteomics, along with genomics, can achieve the best results required to detect many pathogenic infections, including that of SARS-CoV-2(COVID-19), and understand disorders such as cancer and cardiovascular conditions.
“The initiative of HPP in identifying and characterizing human proteome has opened new avenues in the field of Proteomics. The advancement of time and technology will add new milestones which will enhance the understanding of human biology and expedite the role of proteomics in diagnosis, prognosis and precision medicine-based applications,” adds Mr Deeptarup Biswas, a PhD scholar at IIT Bombay who was a part of this study, about the further goals of the HPP.
This article has been run past the researchers, whose work is covered, to ensure accuracy.