Processes KEGG Pathways data extracted from a KGML file. Joins the entries and relations into a single data frame and translates the Gene Symbols to UniProt IDs.
Arguments
- entries
A data frames with entries extracted from a KGML file by
kegg_pathway_download
.- relations
A data frames with relations extracted from a KGML file by
kegg_pathway_download
.- max_expansion
Numeric: the maximum number of relations derived from a single relation record. As one entry might represent more than one molecular entities, one relation might yield a large number of relations in the processing. This happens in a combinatorial way, e.g. if the two entries represent 3 and 4 entities, that results 12 relations. If
NULL
, all relations will be expanded.- simplify
Logical: remove KEGG's internal identifiers and the pathway annotations, keep only unique interactions with direction and effect sign.
Value
A data frame (tibble) of interactions. In rare cases when a
pathway doesn't contain any relation, returns NULL
.
Examples
hsa04350 <- kegg_pathway_download('hsa04350', process = FALSE)
tgf_pathway <- kegg_process(hsa04350$entries, hsa04350$relations)
tgf_pathway
#> # A tibble: 53 × 12
#> source target type effect arrow relati…¹ kegg_…² genes…³ unipr…⁴ kegg_…⁵
#> <chr> <chr> <chr> <chr> <chr> <chr> <chr> <chr> <chr> <chr>
#> 1 51 49 PPrel activation --> hsa0435… hsa:70… TGFB1 P01137 hsa:70…
#> 2 57 55 PPrel activation --> hsa0435… hsa:15… GDF7 Q7Z4P5 hsa:26…
#> 3 34 32 PPrel activation --> hsa0435… hsa:36… INHBA P08476 hsa:92…
#> 4 20 17 PPrel activation --> hsa0435… hsa:48… NODAL Q96S42 hsa:92…
#> 5 60 46 PPrel activation --> hsa0435… hsa:40… SMAD1 Q15797 hsa:40…
#> 6 43 41 PPrel activation --> hsa0435… hsa:40… SMAD2 Q15796 hsa:40…
#> 7 22 16 PPrel activation --> hsa0435… hsa:40… SMAD2 Q15796 hsa:40…
#> 8 19 15 PPrel activation --> hsa0435… hsa:40… SMAD2 Q15796 hsa:40…
#> 9 27 26 PPrel inhibition --| hsa0435… hsa:46… MYC P01106 hsa:10…
#> 10 47 43 PPrel inhibition --| hsa0435… hsa:40… SMAD6 O43541 hsa:40…
#> # … with 43 more rows, 2 more variables: genesymbol_target <chr>,
#> # uniprot_target <chr>, and abbreviated variable names ¹relation_id,
#> # ²kegg_id_source, ³genesymbol_source, ⁴uniprot_source, ⁵kegg_id_target
# # A tibble: 50 x 12
# source target type effect arrow relation_id kegg_id_source
# <chr> <chr> <chr> <chr> <chr> <chr> <chr>
# 1 51 49 PPrel activ. --> hsa04350:1 hsa:7040 hsa:.
# 2 57 55 PPrel activ. --> hsa04350:2 hsa:151449 hs.
# 3 34 32 PPrel activ. --> hsa04350:3 hsa:3624 hsa:.
# 4 20 17 PPrel activ. --> hsa04350:4 hsa:4838
# 5 60 46 PPrel activ. --> hsa04350:5 hsa:4086 hsa:.
# # . with 45 more rows, and 5 more variables: genesymbol_source <chr>,
# # uniprot_source <chr>, kegg_id_target <chr>,
# # genesymbol_target <chr>, uniprot_target <chr>