Skip to contents

Translate a column of identifiers by orthologous gene pairs

Usage

orthology_translate_column(
  data,
  column,
  id_type = NULL,
  target_organism = "mouse",
  source_organism = "human",
  resource = "oma",
  replace = FALSE,
  one_to_many = NULL,
  keep_untranslated = FALSE,
  translate_complexes = FALSE,
  uniprot_by_id_type = "entrez"
)

Arguments

data

A data frame with the column to be translated.

column

Name of a character column with identifiers of the source organism of type `id_type`.

id_type

Type of identifiers in `column`. Available ID types include "uniprot", "entrez", "ensg", "refseq" and "swissprot" for OMA, and "uniprot", "entrez", "genesymbol", "refseq" and "gi" for NCBI HomoloGene. If you want to translate an ID type not directly available in your preferred resource, use first translate_ids to translate to an ID type directly available in the orthology resource. If not provided, it is assumed the column name is the ID type.

target_organism

Name or NCBI Taxonomy ID of the target organism.

source_organism

Name or NCBI Taxonomy ID of the source organism.

resource

Character: source of the orthology mapping. Currently Orthologous Matrix (OMA) and NCBI HomoloGene are available, refer to them by "oma" and "homologene", respectively.

replace

Logical or character: replace the column with the translated identifiers, or create a new column. If it is character, it will be used as the name of the new column.

one_to_many

Integer: maximum number of orthologous pairs for one gene of the source organism. Genes mapping to higher number of orthologues will be dropped.

keep_untranslated

Logical: keep records without orthologous pairs. If `replace` is TRUE, this option is ignored, and untranslated records will be dropped. Genes with more than `one_to_many` orthologues will always be dropped.

translate_complexes

Logical: translate the complexes by translating their components.

uniprot_by_id_type

Character: translate NCBI HomoloGene to UniProt by this ID type. One of "genesymbol", "entrez", "refseq" or "gi".

Value

The data frame with identifiers translated to other organism.