Translate gene, protein and small molecule identifiers from multiple columns
Source:R/id_mapping.R
translate_ids_multi.Rd
Especially when translating network interactions, where two ID columns exist
(source and target), it is convenient to call the same ID translation on
multiple columns. The translate_ids
function is already able
to translate to multiple ID types in one call, but is able to work only from
one source column. Here too, multiple target IDs are supported. The source
columns can be listed explicitely, or they might share a common stem, in
this case the first element of ...
will be used as stem, and the
column names will be created by adding the suffixes
. The
suffixes
are also used to name the target columns. If no
suffixes
are provided, the name of the source columns will be added
to the name of the target columns. ID types can be defined the same way as
for translate_ids
. The only limitation is that, if the source
columns are provided as stem+suffixes, they must be the same ID type.
Usage
translate_ids_multi(
d,
...,
suffixes = NULL,
suffix_sep = "_",
uploadlists = FALSE,
ensembl = FALSE,
hmdb = FALSE,
chalmers = FALSE,
entity_type = NULL,
keep_untranslated = TRUE,
organism = 9606,
reviewed = TRUE
)
Arguments
- d
A data frame.
- ...
At least two arguments, with or without names. These arguments describe identifier columns, either the ones we translate from (source), or the ones we translate to (target). Columns existing in the data frame will be used as source columns. All the rest will be considered target columns. Alternatively, the source columns can be defined as a stem and a vector of suffixes, plus a separator between the stem and suffix. In this case, the source columns will be the ones that exist in the data frame with the suffixes added. The values of all these arguments must be valid identifier types as shown at
translate_ids
. If ID type is provided only for the first source column, the rest of the source columns will be assumed to have the same ID type. For the target identifiers new columns will be created with the desired names, with the suffixes added. If no suffixes provided, the names of the source columns will be used instead.- suffixes
Column name suffixes in case the names should be composed of stem and suffix.
- suffix_sep
Character: separator between the stem and suffixes.
- uploadlists
Force using the `uploadlists` service from UniProt. By default the plain query interface is used (implemented in
uniprot_full_id_mapping_table
in this package). If any of the provided ID types is only available in the uploadlists service, it will be automatically selected. The plain query interface is preferred because in the long term, with caching, it requires less download and data storage.- ensembl
Logical: use data from Ensembl BioMart instead of UniProt.
- hmdb
Logical: use HMDB ID translation data.
- chalmers
Logical: use ID translation data from Chalmers Sysbio GEM.
- entity_type
Character: "gene" and "smol" are short symbols for proteins, genes and small molecules respectively. Several other synonyms are also accepted.
- keep_untranslated
In case the output is a data frame, keep the records where the source identifier could not be translated. At these records the target identifier will be NA.
- organism
Character or integer, name or NCBI Taxonomy ID of the organism (by default 9606 for human). Matters only if
uploadlists
isFALSE
.- reviewed
Translate only reviewed (
TRUE
), only unreviewed (FALSE
) or both (NULL
) UniProt records. Matters only ifuploadlists
isFALSE
.
Value
A data frame with all source columns translated to all target identifiers. The number of new columns is the product of source and target columns. The target columns are distinguished by the suffexes added to their names.
Examples
ia <- omnipath()
translate_ids_multi(ia, source = uniprot, target, ensp, ensembl = TRUE)
#> # A tibble: 560,056 × 17
#> source target source_genesymbol target_genesymbol is_directed is_stimulation
#> <chr> <chr> <chr> <chr> <dbl> <dbl>
#> 1 P0DP23 P48995 CALM1 TRPC1 1 0
#> 2 P0DP23 P48995 CALM1 TRPC1 1 0
#> 3 P0DP25 P48995 CALM3 TRPC1 1 0
#> 4 P0DP25 P48995 CALM3 TRPC1 1 0
#> 5 P0DP25 P48995 CALM3 TRPC1 1 0
#> 6 P0DP25 P48995 CALM3 TRPC1 1 0
#> 7 P0DP24 P48995 CALM2 TRPC1 1 0
#> 8 P0DP24 P48995 CALM2 TRPC1 1 0
#> 9 Q03135 P48995 CAV1 TRPC1 1 1
#> 10 Q03135 P48995 CAV1 TRPC1 1 1
#> # ℹ 560,046 more rows
#> # ℹ 11 more variables: is_inhibition <dbl>, consensus_direction <dbl>,
#> # consensus_stimulation <dbl>, consensus_inhibition <dbl>, sources <chr>,
#> # references <chr>, curation_effort <dbl>, n_references <int>,
#> # n_resources <int>, ensp_source <chr>, ensp_target <chr>