Building protein networks around drug-targets using OmnipathR

Abstract

Many applications require to connect drugs to proteins in signaling networks. OmnipathR provides easy access to curated pathway resources from OmniPath. Here we use data from DrugBank to find direct protein targets of drugs and to connect them to downstream signaling proteins using OmnipathR.

Introduction

In many applications we would like to understand how a specific drug interacts with the protein signaling network through its targets.

library(dplyr)
library(ggplot2)
library(OmnipathR)
library(igraph)
library(ggraph)
library(magrittr)

Initialise OmniPath database

We query protein-protein interactions from the webservice of OmniPath [1,2] at https://omnipathdb.org/ using OmnipathR package:

# Download protein-protein interactions
interactions <- import_omnipath_interactions() %>% as_tibble()

# Convert to igraph objects:
OPI_g <- interaction_graph(interactions = interactions)

Querying drug targets

For direct drug targets we will use DrugBank [3] database accessed via the dbparser package. Please note, that the following few chuncks of code is not evaluated. DrugBank requires registrations to access the data, therefore we ask the reader to register at DrugBank and download the data from here.

The next block of code is used to process the DrugBank dataset.

library(dbparser)
library(XML)

## parse data from XML and save it to memory
get_xml_db_rows("..path-to-DrugBank/full database.xml")

## load drugs data
drugs <- parse_drug() %>% select(primary_key, name)
drugs <- rename(drugs,drug_name = name)

## load drug target data
drug_targets <-
    parse_drug_targets() %>%
    select(id, name,organism,parent_key) %>%
    rename(target_name = name)

## load polypeptide data
drug_peptides <-
    parse_drug_targets_polypeptides()  %>%
    select(
        id,
        name,
        general_function,
        specific_function,
        gene_name,
        parent_id
    ) %>%
    rename(target_name = name, gene_id = id)

# join the 3 datasets
drug_targets_full <-
    inner_join(
        drug_targets,
        drug_peptides,
        by = c("id" = "parent_id", "target_name")
    ) %>%
    inner_join(drugs, by = c("parent_key" = "primary_key")) %>%
    select(-other_keys)

Here we declare the names of drugs of interest.

drug_names = c(
    "Valproat"         = "Valproic Acid",
    "Diclofenac"       = "Diclofenac",
    "Paracetamol"      = "Acetaminophen",
    "Ciproflaxin"      = "Ciprofloxacin",
    "Nitrofurantoin"   = "Nitrofurantoin",
    "Tolcapone",
    "Azathioprine",
    "Troglitazone",
    "Nefazodone",
    "Ketoconazole",
    "Omeprazole",
    "Phenytoin",
    "Amiodarone",
    "Cisplatin",
    "Cyclosporin A"    = "Cyclosporine",
    "Verapamil",
    "Buspirone",
    "Melatonin",
    "N-Acetylcysteine" = "Acetylcysteine",
    "Vitamin C"        = "Ascorbic acid",
    "Famotidine",
    "Vancomycin"
)

drug_target_data_sample <-
   drug_targets_full %>%
   filter(organism == "Humans", drug_name %in% drug_names)

We only use a small sample of the database:

drug_targets <-
   OmnipathR:::drug_target_data_sample %>%
   filter(organism == "Humans", drug_name %in% drug_names)

Quality control

Check which drug targets are in Omnipath:

drug_targets %<>%
   select(-target_name, -organism) %>%
   mutate(in_OP = gene_id %in% c(interactions$source))
   # not all drug-targets are in OP.
   print(all(drug_targets$in_OP))

## [1] FALSE

# But each drug has at least one target in OP.
drug_targets %>% group_by(drug_name) %>% summarise(any(in_OP))

## # A tibble: 19 × 2
##    drug_name      `any(in_OP)`
##    <chr>          <lgl>       
##  1 Acetaminophen  TRUE        
##  2 Acetylcysteine TRUE        
##  3 Amiodarone     TRUE        
##  4 Ascorbic acid  TRUE        
##  5 Azathioprine   TRUE        
##  6 Buspirone      TRUE        
##  7 Ciprofloxacin  FALSE       
##  8 Cisplatin      TRUE        
##  9 Diclofenac     TRUE        
## 10 Famotidine     TRUE        
## 11 Ketoconazole   TRUE        
## 12 Melatonin      TRUE        
## 13 Nefazodone     TRUE        
## 14 Omeprazole     FALSE       
## 15 Phenytoin      TRUE        
## 16 Tolcapone      FALSE       
## 17 Troglitazone   TRUE        
## 18 Valproic Acid  TRUE        
## 19 Verapamil      TRUE

Downstream signaling nodes

We would like to investigate the effect of the drugs on some selected proteins. For example, the activity of these proteins are measured upon the drug perturbation. We’ll build a network from the drug targets to these selected nodes.

First we declare protein of interest (POI):

POI <- tibble(
    protein = c(
        "NFE2L2", "HMOX1", "TP53",
        "CDKN1A", "BTG2", "NFKB1",
        "ICAM1", "HSPA5", "ATF4",
        "DDIT3", "XBP1"
    )
)

Quality control

Checking which POI are in Omnipath

POI <- POI %>% mutate(in_OP = protein %in% interactions$target_genesymbol)
# all POI is in Omnipath
print(all(POI$in_OP))

## [1] TRUE

Build network between drug targets and POI

First, we find paths between the drug targets and the POIs. For the sake of this simplicity we focus on drug targets of one drug, Cisplatin.

The paths are represented by a set of nodes:

source_nodes <-
    drug_targets %>%
    filter(in_OP, drug_name == "Cisplatin") %>%
    pull(gene_name)

target_nodes <- POI %>% filter(in_OP) %>% pull(protein)

collected_path_nodes <- list()

for(i_source in 1:length(source_nodes)){

    paths <- shortest_paths(
        OPI_g,
        from = source_nodes[[i_source]],
        to = target_nodes,
        output = "vpath"
    )
    path_nodes <- lapply(paths$vpath, names) %>% unlist() %>% unique()
    collected_path_nodes[[i_source]] <- path_nodes

}

collected_path_nodes %<>% unlist %>% unique

The direct drug targets, the POIs and the intermediate pathway members give rise to the network.

cisplatin_nodes <-
    c(source_nodes,target_nodes, collected_path_nodes) %>%
    unique()

cisplatin_network <- induced_subgraph(graph = OPI_g, vids = cisplatin_nodes)

We annotate the nodes of the network and plot it.

V(cisplatin_network)$node_type <-
    ifelse(
        V(cisplatin_network)$name %in% source_nodes,
        "direct drug target",
        ifelse(
            V(cisplatin_network)$name %in% target_nodes,
            "POI",
            "intermediate node"
        )
    )

# temporary fix against segfault that happens pre-4.4 R devel builds
# only if the vignette is built within R CMD build and only on our
# Ubuntu 22.04 machine
# likely we can remove this conditional a few weeks later
if (Sys.info()["user"] != "omnipath") {

ggraph(
    cisplatin_network,
    layout = "lgl",
    area = vcount(cisplatin_network)^2.3,
    repulserad = vcount(cisplatin_network)^1.2,
    coolexp = 1.1
) +
geom_edge_link(
    aes(
        start_cap = label_rect(node1.name),
        end_cap = label_rect(node2.name)),
        arrow = arrow(length = unit(4, "mm")
    ),
    edge_width = .5,
    edge_alpha = .2
) +
geom_node_point() +
geom_node_label(aes(label = name, color = node_type)) +
scale_color_discrete(
    guide = guide_legend(title = "Node type")
) +
theme_bw() +
xlab("") +
ylab("") +
ggtitle("Cisplatin induced network")

}

The above network represents a way how Cisplatin can influence the POIs. One can for example filter out edges based on the number fo resources reporting the edge or based on the number of papers mentioning it. However, this is already covered by previous pypath tutorials.

Acknowledgements

The above pipeline was inspired by the post of Denes Turei available here.

References

[1] D Turei, A Valdeolivas, L Gul, N Palacio-Escat, M Klein, O Ivanova, M Olbei, A Gabor, F Theis, D Modos, T Korcsmaros and J Saez-Rodriguez (2021) Integrated intra- and intercellular signaling knowledge for multicellular omics analysis. Molecular Systems Biology 17:e9923

[2] D Turei, T Korcsmaros and J Saez-Rodriguez (2016) OmniPath: guidelines and gateway for literature-curated signaling pathway resources. Nature Methods 13(12)

[3] Wishart DS, Feunang YD, Guo AC, Lo EJ, Marcu A, Grant JR, Sajed T, Johnson D, Li C, Sayeeda Z, Assempour N, Iynkkaran I, Liu Y, Maciejewski A, Gale N, Wilson A, Chin L, Cummings R, Le D, Pon A, Knox C, Wilson M. DrugBank 5.0: a major update to the DrugBank database for 2018. Nucleic Acids Res. 2017 Nov 8. doi: 10.1093/nar/gkx1037.

Session info

## R version 4.3.3 (2024-02-29)
## Platform: x86_64-pc-linux-gnu (64-bit)
## Running under: Arch Linux
## 
## Matrix products: default
## BLAS:   /usr/lib/libblas.so.3.12.0 
## LAPACK: /usr/lib/liblapack.so.3.12.0
## 
## locale:
##  [1] LC_CTYPE=en_GB.UTF-8       LC_NUMERIC=C              
##  [3] LC_TIME=en_GB.UTF-8        LC_COLLATE=en_GB.UTF-8    
##  [5] LC_MONETARY=en_GB.UTF-8    LC_MESSAGES=en_GB.UTF-8   
##  [7] LC_PAPER=en_GB.UTF-8       LC_NAME=C                 
##  [9] LC_ADDRESS=C               LC_TELEPHONE=C            
## [11] LC_MEASUREMENT=en_GB.UTF-8 LC_IDENTIFICATION=C       
## 
## time zone: Europe/Madrid
## tzcode source: system (glibc)
## 
## attached base packages:
## [1] stats     graphics  grDevices utils     datasets  methods   base     
## 
## other attached packages:
## [1] magrittr_2.0.3   ggraph_2.2.1     igraph_2.0.3     OmnipathR_3.13.1
## [5] ggplot2_3.5.0    dplyr_1.1.4      BiocStyle_2.30.0
## 
## loaded via a namespace (and not attached):
##  [1] gtable_0.3.4        xfun_0.42           bslib_0.6.1        
##  [4] ggrepel_0.9.5       tzdb_0.4.0          vctrs_0.6.5        
##  [7] tools_4.3.3         generics_0.1.3      curl_5.2.1         
## [10] tibble_3.2.1        fansi_1.0.6         highr_0.10         
## [13] pkgconfig_2.0.3     checkmate_2.3.1     desc_1.4.3         
## [16] readxl_1.4.3        lifecycle_1.0.4     farver_2.1.1       
## [19] compiler_4.3.3      stringr_1.5.1       textshaping_0.3.7  
## [22] progress_1.2.3      munsell_0.5.0       ggforce_0.4.2      
## [25] graphlayouts_1.1.1  htmltools_0.5.7     sass_0.4.9         
## [28] yaml_2.3.8          tidyr_1.3.1         pillar_1.9.0       
## [31] pkgdown_2.0.7       later_1.3.2         crayon_1.5.2       
## [34] jquerylib_0.1.4     MASS_7.3-60.0.1     cachem_1.0.8       
## [37] viridis_0.6.5       zip_2.3.0           tidyselect_1.2.1   
## [40] rvest_1.0.4         digest_0.6.35       stringi_1.8.3      
## [43] purrr_1.0.2         bookdown_0.38       labeling_0.4.3     
## [46] polyclip_1.10-6     fastmap_1.1.1       grid_4.3.3         
## [49] colorspace_2.1-0    cli_3.6.2           logger_0.3.0       
## [52] tidygraph_1.3.1     XML_3.99-0.16.1     utf8_1.2.4         
## [55] readr_2.1.5         withr_3.0.0         prettyunits_1.2.0  
## [58] scales_1.3.0        backports_1.4.1     rappdirs_0.3.3     
## [61] bit64_4.0.5         lubridate_1.9.3     timechange_0.3.0   
## [64] rmarkdown_2.26      httr_1.4.7          bit_4.0.5          
## [67] gridExtra_2.3       cellranger_1.1.0    ragg_1.2.7         
## [70] hms_1.1.3           memoise_2.0.1       evaluate_0.23      
## [73] knitr_1.45          viridisLite_0.4.2   rlang_1.1.3        
## [76] Rcpp_1.0.12         glue_1.7.0          selectr_0.4-2      
## [79] tweenr_2.0.3        BiocManager_1.30.22 xml2_1.3.6         
## [82] vroom_1.6.5         jsonlite_1.8.8      R6_2.5.1           
## [85] systemfonts_1.0.6   fs_1.6.3

Attila Gabor

Alberto Valdeolivas

Julio Saez-Rodriguez

Introduction

Initialise OmniPath database

Querying drug targets

Quality control

Downstream signaling nodes

Quality control

Build network between drug targets and POI

Acknowledgements

References

Session info