dbscrape: scrape html tables from drugbank

Export HTML tables from DrugBank to dataframes. This function can come in handy in cases where downloading and processing the full database is not viable. Code was kept as simple as possible, using only the xpath of each table and spliting respective urls in a manner that takes pagination into account.

Note

After fetching is finished, some tables may need further proccesing for information to be presentable -I suggest using regex to fix those issues.

Install:

if (!requireNamespace("remotes")) install.packages("remotes")
remotes::install_github("ptogias/dbscrape")

Overview

There are three parameters used in dbscrape

1. std_url

Character. The URL post pagination (stops to 'page=').

2. pages

Numeric. Pagination range (i.e. '1:88'). Must start from 1 and end to the respective query end page (search for this manually).

3. postfix

[Optional] Character. Applied on-page filters. These are indicated after the "page=" url part. Defaults to an empty char vector. See second example.

Examples

std_url <- "https://go.drugbank.com/pharmaco/metabolomics?page="
pages <- 1:124
postfix <- ""
dbscrape::dbscrape(std_url, pages, postfix)

std_url <- "https://go.drugbank.com/categories?approved=0&ca=0&eu=1&experimental=1&illicit=0&investigational=0&nutraceutical=0&page="
pages <- 1:36
postfix <- "&q[description]=&q[drug_count]=&q[target_count]=&q[title]=&us=0&withdrawn=0"
dbscrape::dbscrape(std_url, pages, postfix)

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

README.md

README.md

dbscrape: scrape html tables from drugbank

Note

Overview

1. std_url

2. pages

3. postfix

Examples

Files

README.md

Latest commit

History

README.md

File metadata and controls

dbscrape: scrape html tables from drugbank

Note

Overview

1. std_url

2. pages

3. postfix

Examples