The prior knowledge network (PKN) used by COSMOS is a network of heterogenous causal interactions: it contains protein-protein, reactant-enzyme and enzyme-product interactions. It is a combination of multiple resources:
Genome-scale metabolic model (GEM) from Chalmers Sysbio (Wang et al., 2021.)
Network of chemical-protein interactions from STITCH (https://stitch.embl.de/)
Protein-protein interactions from Omnipath (Türei et al., 2021)
This function downloads, processes and combines the resources above. With all downloads and processing the build might take 30-40 minutes. Data is cached at various levels of processing, shortening processing times. With all data downloaded and HMDB ID translation data preprocessed, the build takes 3-4 minutes; the complete PKN is also saved in the cache, if this is available, loading it takes only a few seconds.
Arguments
- organism
Character or integer: name or NCBI Taxonomy ID of an organism. Supported organisms vary by resource: the Chalmers GEM is available only for human, mouse, rat, fish, fly and worm. OmniPath can be translated by orthology, but for non-vertebrate or less researched taxa very few orthologues are available. STITCH is available for a large number of organisms, please refer to their web page: https://stitch.embl.de/.
- protein_ids
Character: translate the protein identifiers to these ID types. Each ID type results two extra columns in the output, for the "source" and "target" sides of the interaction, respectively. The default ID type for proteins depends on the resource, hence the "source" and "target" columns are heterogenous. By default UniProt IDs and Gene Symbols are included. The Gene Symbols used in the COSMOS PKN are provided by Ensembl, and do not completely agree with the ones provided by UniProt and used in OmniPath data by default.
- metabolite_ids
Character: translate the metabolite identifiers to these ID types. Each ID type results two extra columns in the output, for the "source" and "target" sides of the interaction, respectively. The default ID type for metabolites depends on the resource, hence the "source" and "target" columns are heterogenous. By default HMDB IDs and KEGG IDs are included.
- chalmers_gem_metab_max_degree
Numeric: remove metabolites from the Chalmers GEM network with defgrees larger than this. Useful to remove cofactors and over-promiscuous metabolites.
- stitch_score
Include interactions from STITCH with combined confidence score larger than this.
- ...
Further parameters to
omnipath_interactions
.
Value
A data frame of binary causal interations with effect signs, resource specific attributes and translated to the desired identifiers. The “record_id“ column identifies the original records within each resource. If one “record_id“ yields multiple records in the final data frame, it is the result of one-to-many ID translation or other processing steps. Before use, it is recommended to select one pair of ID type columns (by combining the preferred ones) and perform “distinct“ by the identifier columns and sign.
References
Wang H, Robinson JL, Kocabas P, Gustafsson J, Anton M, Cholley PE, et al. Genome-scale metabolic network reconstruction of model animals as a platform for translational research. Proceedings of the National Academy of Sciences. 2021 Jul 27;118(30):e2102344118.
Türei D, Valdeolivas A, Gul L, Palacio‐Escat N, Klein M, Ivanova O, et al. Integrated intra‐ and intercellular signaling knowledge for multicellular omics analysis. Molecular Systems Biology. 2021 Mar;17(3):e9923.