Journal of Comprehensible Results

Wuchty, S., Rajagopala, S. V., Blazie, S. M., Parrish, J. R., Khuri, S., Finley, R. L., & Uetz, P. (2017).
The Protein Interactome of Streptococcus pneumoniae and Bacterial
meta-interactomes Improve Function Predictions
DOI: 10.1128/mSystems.00019-17

Translated by Farhana Khan

Support 2: Orthologous Determination via InParanoid

The BLASTp algorithm was used in order to compare the different protein sequences of Streptococcus pneumoniae with the orthologous bacterial strains stated in the introduction. For each run of the program, sequence pairs that had mutually best scores were selected as central orthologous pairs. An InParanoid script was used, which focuses on using the methods of a two-way best pairwise (two at a time) match. This resource results in the automatic identification of orthologs while also taking account and differentiating these orthologs with paralogs [def Genes that are related via duplication but evolve to have very different functions]. The InParanoid script will result in ortholog clusters of best scores [def A section of a DNA sequence that locates where a genetic sequence can be coded and read] and will point out the paralogs independently. [4 Remm M, Storm CE, Sonnhammer EL. 2001. Automatic clustering of orthologs and in-paralogs from pairwise species comparisons. PJ Mol Biol 314:1041–1052 .].
Process of InParanoid script. Workflow of running the two BLAST passes in parallel. Pass 1 pairs are combined with pass 2 results. [5 Sonnhammer EL, Östlund G. InParanoid 8: orthology analysis between 273 proteomes, mostly eukaryotic." Nucleic Acids Res. 2015.].

To run the InParanoid script, two protein sequences are provided (the S. pneumoniae strain and another chosen bacterial strain that is chosen to test if it is orthologous to the S. pneuomoniae bacteria. Once this is done, the best hits are identified, aka the main ortholog pair between the two bacterial data sets. These main orthologs will serve as the centers for each ortholog cluster that the article hopes to visualize. To find the additional orthologs, more BLASTp analyses are done for additional sequences.

Once the additional pairs are found, the S value is used in order to determine the clusterings. It is implied that the higher the score, the shorter the distance between the two main orthologs (found as A1 and B1 in the image to the right). Additional pairings that have a score close to or shorter than S are found in the clustering of that specific ortholog cluster.

The confidence value is then calculated for each ortholog cluster. This is measured to show how far a given ortholog is from the main pairing. The main ortholog pair is set to having a confidence level of 100% as well.

Fig. 8: Clustering of ortholog data. A1 and B1 are the main ortholog sequences and S is the reverse distance between the main pairing.