This is the most generic method for accessing data from the OmniPath web service. All other functions retrieving data from OmniPath call this function with various parameters. In general, every query can retrieve data in tabular or JSON format, the tabular (data frame) being the default.
Usage
omnipath_query(
query_type,
organism = 9606L,
resources = NULL,
datasets = NULL,
types = NULL,
genesymbols = "yes",
fields = NULL,
default_fields = TRUE,
silent = FALSE,
logicals = NULL,
download_args = list(),
format = "data.frame",
references_by_resource = TRUE,
add_counts = TRUE,
license = NULL,
password = NULL,
exclude = NULL,
json_param = list(),
strict_evidences = FALSE,
genesymbol_resource = "UniProt",
cache = NULL,
...
)
Arguments
- query_type
Character: "interactions", "enzsub", "complexes", "annotations", or "intercell".
- organism
Character or integer: name or NCBI Taxonomy ID of the organism. OmniPath is built of human data, and the web service provides orthology translated interactions and enzyme-substrate relationships for mouse and rat. For other organisms and query types, orthology translation will be called automatically on the downloaded human data before returning the result.
- resources
Character vector: name of one or more resources. Restrict the data to these resources. For a complete list of available resources, call the `<query_type>_resources` functions for the query type of interst.
- datasets
Character vector: name of one or more datasets. In the interactions query type a number of datasets are available. The default is caled "omnipath", and corresponds to the curated causal signaling network published in the 2016 OmniPath paper.
- types
Character vector: one or more interaction types, such as "transcriptional" or "post_translational". For a full list of interaction types see `query_info("interaction")$types`.
- genesymbols
Character or logical: TRUE or FALS or "yes" or "no". Include the `genesymbols` column in the results. OmniPath uses UniProt IDs as the primary identifiers, gene symbols are optional.
- fields
Character vector: additional fields to include in the result. For a list of available fields, call `query_info("interactions")`.
- default_fields
Logical: if TRUE, the default fields will be included.
- silent
Logical: if TRUE, no messages will be printed. By default a summary message is printed upon successful download.
- logicals
Character vector: fields to be cast to logical.
- download_args
List: parameters to pass to the download function, which is `readr::read_tsv` by default, and `jsonlite::safe_load`.
- format
Character: if "json", JSON will be retrieved and processed into a nested list; any other value will return data frame.
- references_by_resource
Logical: if TRUE,, in the `references` column the PubMed IDs will be prefixed with the names of the resources they are coming from. If FALSE, the `references` column will be a list of unique PubMed IDs.
- add_counts
Logical: if TRUE, the number of references and number of resources for each record will be added to the result.
- license
Character: license restrictions. By default, data from resources allowing "academic" use is returned by OmniPath. If you use the data for work in a company, you can provide "commercial" or "for-profit", which will restrict the data to those records which are supported by resources that allow for-profit use.
- password
Character: password for the OmniPath web service. You can provide a special password here which enables the use of `license = "ignore"` option, completely bypassing the license filter.
- exclude
Character vector: resource or dataset names to be excluded. The data will be filtered after download to remove records of the excluded datasets and resources.
- json_param
List: parameters to pass to the `jsonlite::fromJSON` when processing JSON columns embedded in the downloaded data. Such columns are "extra_attrs" and "evidences". These are optional columns which provide a lot of extra details about interactions.
- strict_evidences
Logical: reconstruct the "sources" and "references" columns of interaction data frames based on the "evidences" column, strictly filtering them to the queried datasets and resources. Without this, the "sources" and "references" fields for each record might contain information for datasets and resources other than the queried ones, because the downloaded records are a result of a simple filtering of an already integrated data frame.
- genesymbol_resource
Character: "uniprot" (default) or "ensembl". The OmniPath web service uses the primary gene symbols as provided by UniProt. By passing "ensembl" here, the UniProt gene symbols will be replaced by the ones used in Ensembl. This translation results in a loss of a few records, and multiplication of another few records due to ambiguous translation.
- cache
Logical: use caching, load data from and save to the. The cache directory by default belongs to the user, located in the user's default cache directory, and named "OmnipathR". Find out about it by
getOption("omnipathr.cachedir")
. Can be changed byomnipath_set_cachedir
.- ...
Additional parameters for the OmniPath web service. These parameters will be processed, validated and included in the query string. Many parameters are already explicitly set by the arguments above. A number of query type specific parameters are also available, learn more about these by the
query_info
function. For functions more specific thanomnipath_query
, arguments for all downstream functions are also passed here.
Value
Data frame (tibble) or list: the data returned by the OmniPath web service (or loaded from cache), after processing. Nested list if the "format" parameter is "json", otherwise a tibble.
Examples
interaction_data <- omnipath_query("interaction", datasets = "omnipath")
interaction_data
#> # A tibble: 81,529 × 15
#> source target source_genesymbol target_genesymbol is_directed is_stimulation
#> <chr> <chr> <chr> <chr> <dbl> <dbl>
#> 1 P0DP23 P48995 CALM1 TRPC1 1 0
#> 2 P0DP25 P48995 CALM3 TRPC1 1 0
#> 3 P0DP24 P48995 CALM2 TRPC1 1 0
#> 4 Q03135 P48995 CAV1 TRPC1 1 1
#> 5 P14416 P48995 DRD2 TRPC1 1 1
#> 6 Q99750 P48995 MDFI TRPC1 1 0
#> 7 Q14571 P48995 ITPR2 TRPC1 1 1
#> 8 P29966 P48995 MARCKS TRPC1 1 0
#> 9 Q13255 P48995 GRM1 TRPC1 1 1
#> 10 Q13586 P48995 STIM1 TRPC1 1 1
#> # ℹ 81,519 more rows
#> # ℹ 9 more variables: is_inhibition <dbl>, consensus_direction <dbl>,
#> # consensus_stimulation <dbl>, consensus_inhibition <dbl>, sources <chr>,
#> # references <chr>, curation_effort <dbl>, n_references <int>,
#> # n_resources <int>