Translate a column of identifiers by orthologous gene pairs

Usage

orthology_translate_column(
  data,
  column,
  id_type = NULL,
  target_organism = "mouse",
  source_organism = "human",
  resource = "oma",
  replace = FALSE,
  one_to_many = NULL,
  keep_untranslated = FALSE,
  translate_complexes = FALSE,
  uniprot_by_id_type = "entrez"
)

Arguments

data: A data frame with the column to be translated.
column: Name of a character column with identifiers of the source organism of type `id_type`.
id_type: Type of identifiers in `column`. Available ID types include "uniprot", "entrez", "ensg", "refseq" and "swissprot" for OMA, and "uniprot", "entrez", "genesymbol", "refseq" and "gi" for NCBI HomoloGene. If you want to translate an ID type not directly available in your preferred resource, use first translate_ids to translate to an ID type directly available in the orthology resource. If not provided, it is assumed the column name is the ID type.
target_organism: Name or NCBI Taxonomy ID of the target organism.
source_organism: Name or NCBI Taxonomy ID of the source organism.
resource: Character: source of the orthology mapping. Currently Orthologous Matrix (OMA) and NCBI HomoloGene are available, refer to them by "oma" and "homologene", respectively.
replace: Logical or character: replace the column with the translated identifiers, or create a new column. If it is character, it will be used as the name of the new column.
one_to_many: Integer: maximum number of orthologous pairs for one gene of the source organism. Genes mapping to higher number of orthologues will be dropped.
keep_untranslated: Logical: keep records without orthologous pairs. If `replace` is TRUE, this option is ignored, and untranslated records will be dropped. Genes with more than `one_to_many` orthologues will always be dropped.
translate_complexes: Logical: translate the complexes by translating their components.
uniprot_by_id_type: Character: translate NCBI HomoloGene to UniProt by this ID type. One of "genesymbol", "entrez", "refseq" or "gi".

Value

The data frame with identifiers translated to other organism.