Protein and gene annotations from OmniPath

Protein and gene annotations about function, localization, expression, structure and other properties, from the https://omnipathdb.org/annotations endpoint of the OmniPath web service. Note: there might be also a few miRNAs annotated; a vast majority of protein complex annotations are inferred from the annotations of the members: if all members carry the same annotation the complex inherits.

Usage

annotations(proteins = NULL, wide = FALSE, ...)

Arguments

proteins

Vector containing the genes or proteins for whom annotations will be retrieved (UniProt IDs or HGNC Gene Symbols or miRBase IDs). It is also possible to donwload annotations for protein complexes. To do so, write 'COMPLEX:' right before the genesymbols of the genes integrating the complex. Check the vignette for examples.

wide

Convert the annotation table to wide format, which corresponds more or less to the original resource. If the data comes from more than one resource a list of wide tables will be returned. See examples at pivot_annotations.

...

Arguments passed on to omnipath_query

organism: Character or integer: name or NCBI Taxonomy ID of the organism. OmniPath is built of human data, and the web service provides orthology translated interactions and enzyme-substrate relationships for mouse and rat. For other organisms and query types, orthology translation will be called automatically on the downloaded human data before returning the result.
resources: Character vector: name of one or more resources. Restrict the data to these resources. For a complete list of available resources, call the `<query_type>_resources` functions for the query type of interst.
genesymbols: Character or logical: TRUE or FALS or "yes" or "no". Include the `genesymbols` column in the results. OmniPath uses UniProt IDs as the primary identifiers, gene symbols are optional.
fields: Character vector: additional fields to include in the result. For a list of available fields, call `query_info("interactions")`.
default_fields: Logical: if TRUE, the default fields will be included.
silent: Logical: if TRUE, no messages will be printed. By default a summary message is printed upon successful download.
logicals: Character vector: fields to be cast to logical.
format: Character: if "json", JSON will be retrieved and processed into a nested list; any other value will return data frame.
download_args: List: parameters to pass to the download function, which is readr::read_tsv by default, and jsonlite::stream_in if format = "json". Note: as these are both wrapped into a downloader using curl::curl, a curl handle can be also passed here under the name handle.
add_counts: Logical: if TRUE, the number of references and number of resources for each record will be added to the result.
license: Character: license restrictions. By default, data from resources allowing "academic" use is returned by OmniPath. If you use the data for work in a company, you can provide "commercial" or "for-profit", which will restrict the data to those records which are supported by resources that allow for-profit use.
password: Character: password for the OmniPath web service. You can provide a special password here which enables the use of `license = "ignore"` option, completely bypassing the license filter.
exclude: Character vector: resource or dataset names to be excluded. The data will be filtered after download to remove records of the excluded datasets and resources.
strict_evidences: Logical: reconstruct the "sources" and "references" columns of interaction data frames based on the "evidences" column, strictly filtering them to the queried datasets and resources. Without this, the "sources" and "references" fields for each record might contain information for datasets and resources other than the queried ones, because the downloaded records are a result of a simple filtering of an already integrated data frame.
genesymbol_resource: Character: "uniprot" (default) or "ensembl". The OmniPath web service uses the primary gene symbols as provided by UniProt. By passing "ensembl" here, the UniProt gene symbols will be replaced by the ones used in Ensembl. This translation results in a loss of a few records, and multiplication of another few records due to ambiguous translation.
cache: Logical: use caching, load data from and save to the. The cache directory by default belongs to the user, located in the user's default cache directory, and named "OmnipathR". Find out about it by getOption("omnipathr.cachedir"). Can be changed by omnipath_set_cachedir.

Value

A data frame or list of data frames:

If wide=FALSE (default), all the requested resources will be in a single long format data frame.
If wide=TRUE: one or more data frames with columns specific to the requested resources. If more than one resources is requested a list of data frames is returned.

Details

Downloading the full annotations dataset is disabled by default because the size of this data is around 1GB. We recommend to retrieve the annotations for a set of proteins or only from a few resources, depending on your interest. You can always download the full database from https://archive.omnipathdb.org/omnipath_webservice_annotations__recent.tsv using any standard R or readr method.

Examples

annotations <- annotations(
    proteins = c("TP53", "LMNA"),
    resources = c("HPA_subcellular")
)

Usage

Arguments

Value

Details

See also

Examples