Prior knowledge network (PKN) for COSMOS

The prior knowledge network (PKN) used by COSMOS is a network of heterogenous causal interactions: it contains protein-protein, reactant-enzyme and enzyme-product interactions. It is a combination of multiple resources:

Genome-scale metabolic model (GEM) from Chalmers Sysbio (Wang et al., 2021.)
Network of chemical-protein interactions from STITCH (https://stitch.embl.de/)
Protein-protein interactions from Omnipath (Türei et al., 2021)

This function downloads, processes and combines the resources above. With all downloads and processing the build might take 30-40 minutes. Data is cached at various levels of processing, shortening processing times. With all data downloaded and HMDB ID translation data preprocessed, the build takes 3-4 minutes; the complete PKN is also saved in the cache, if this is available, loading it takes only a few seconds.

Usage

cosmos_pkn(
  organism = "human",
  protein_ids = c("uniprot", "genesymbol"),
  metabolite_ids = c("hmdb", "kegg"),
  chalmers_gem_metab_max_degree = 400L,
  stitch_score = 700L,
  ...
)

Arguments

organism: Character or integer: name or NCBI Taxonomy ID of an organism. Supported organisms vary by resource: the Chalmers GEM is available only for human, mouse, rat, fish, fly and worm. OmniPath can be translated by orthology, but for non-vertebrate or less researched taxa very few orthologues are available. STITCH is available for a large number of organisms, please refer to their web page: https://stitch.embl.de/.
protein_ids: Character: translate the protein identifiers to these ID types. Each ID type results two extra columns in the output, for the "source" and "target" sides of the interaction, respectively. The default ID type for proteins depends on the resource, hence the "source" and "target" columns are heterogenous. By default UniProt IDs and Gene Symbols are included. The Gene Symbols used in the COSMOS PKN are provided by Ensembl, and do not completely agree with the ones provided by UniProt and used in OmniPath data by default.
metabolite_ids: Character: translate the metabolite identifiers to these ID types. Each ID type results two extra columns in the output, for the "source" and "target" sides of the interaction, respectively. The default ID type for metabolites depends on the resource, hence the "source" and "target" columns are heterogenous. By default HMDB IDs and KEGG IDs are included.
chalmers_gem_metab_max_degree: Numeric: remove metabolites from the Chalmers GEM network with defgrees larger than this. Useful to remove cofactors and over-promiscuous metabolites.
stitch_score: Include interactions from STITCH with combined confidence score larger than this.
...: Further parameters to omnipath_interactions.

Value

A data frame of binary causal interations with effect signs, resource specific attributes and translated to the desired identifiers. The “record_id“ column identifies the original records within each resource. If one “record_id“ yields multiple records in the final data frame, it is the result of one-to-many ID translation or other processing steps. Before use, it is recommended to select one pair of ID type columns (by combining the preferred ones) and perform “distinct“ by the identifier columns and sign.

References

Wang H, Robinson JL, Kocabas P, Gustafsson J, Anton M, Cholley PE, et al. Genome-scale metabolic network reconstruction of model animals as a platform for translational research. Proceedings of the National Academy of Sciences. 2021 Jul 27;118(30):e2102344118.

Türei D, Valdeolivas A, Gul L, Palacio‐Escat N, Klein M, Ivanova O, et al. Integrated intra‐ and intercellular signaling knowledge for multicellular omics analysis. Molecular Systems Biology. 2021 Mar;17(3):e9923.

Examples

if (FALSE) { # \dontrun{
    human_cosmos <- cosmos_pkn(organism = "human")
} # }

Usage

Arguments

Value

References

See also

Examples