Journal of Comprehensible Results

Wuchty, S., Rajagopala, S. V., Blazie, S. M., Parrish, J. R., Khuri, S., Finley, R. L., & Uetz, P. (2017).
The Protein Interactome of Streptococcus pneumoniae and Bacterial
meta-interactomes Improve Function Predictions
DOI: 10.1128/mSystems.00019-17

Translated by Farhana Khan

Experiment 3: Functional Predictions of Unknown Proteins in S. pneumoniae

Although many proteins were found via the yeast two-hybrid method and by configuring a meta-interactome from other bacterial strains, there still existed a great amount of proteins that could not be identified with their functions. Therefore in the study, a random set of the proteins that had known functions were chosen 1,000 times to determine the functions of the unknown proteins. This method was done because it is known that neighboring proteins in the same network could point out similar functions for adjacent proteins. In a given protein network (interactome) the interaction partners for a given protein can identify the functional classes of that protein, if the functions of the partners are known. [3 Meier M, Sit RV, Quake SR. 2013. Proteome-wide protein interaction measurements of bacterial proteins of unknown function. Proc Natl Acad Sci U S A 110:477. .]. The EggNOG database was used for resources on different protein functions.

A stochastic model is used to illustrate the profile of each protein in the strain. This profile describes the probability of that protein having a specific function. These probabilities were then used to create receiver operating characteristics (ROC) curves to predict the accuracy of the probability. ROC curves are generally good for diagnostic testing and area under the curves in an ROC graph generally will explain whether how good a parameter is in distinguishing two diagnostic groups (in this case, does that protein include the specified probability function or not). The first graph to the right shows that an addition of the metainteractome provided better results in understand functional prediction for the unknown proteins.

Once the probabilities were determined for all unknown proteins, a Z test was applied to find out the P value for each of these scores.

Fig. 5: Visualization of ROC Curve With a sample of 20% from the bacterial strain, the area under the curve was calculated to measure prediction quality of the functions of the orthologous proteins.