Skip to content

Commit

Permalink
docs(connector): adding a process overview via DBLP section
Browse files Browse the repository at this point in the history
  • Loading branch information
peiwangdb authored and dovahcrow committed Mar 20, 2021
1 parent 35fdb58 commit 5794d6c
Showing 1 changed file with 145 additions and 0 deletions.
145 changes: 145 additions & 0 deletions docs/source/user_guide/connector/DBLP.ipynb
Original file line number Diff line number Diff line change
@@ -0,0 +1,145 @@
{
"cells": [
{
"cell_type": "markdown",
"metadata": {},
"source": [
"# Process overview via DBLP"
]
},
{
"cell_type": "markdown",
"metadata": {},
"source": [
"## What is DataPrep.Connector?\n",
"\n",
"Connector is an intuitive, open-source API wrapper that speeds up development by standardizing calls to multiple APIs as a simple workflow.\n",
"\n",
"Connector provides a simple wrapper to collect structured data from different Web APIs (e.g., Twitter, Spotify), making web data collection easy and efficient, without requiring advanced programming skills.\n",
"\n",
"With Connector, you can collect data in two steps: **connect** to a website and **query** the data.\n",
"\n",
"We currently support tens of websites: https://github.com/sfu-db/DataConnectorConfigs/tree/develop/api-connectors\n",
"\n",
"You can also author your own configuration files for new websites.\n",
"We look forward to seeing your contribution to facilitate other users as well."
]
},
{
"cell_type": "markdown",
"metadata": {},
"source": [
"# Collecting data from DBLP\n",
"\n",
"\n",
"#### DBLP Website\n",
"DBLP (https://dblp.org/) is a computer science bibliography website.\n",
"We will use it as an example to illustrate how to collect data easily using DataPrep.Connector.\n",
"\n",
"API: https://dblp.org/faq/13501473.html\n",
"\n",
"And more examples are available here: https://github.com/sfu-db/dataprep/tree/develop/examples.\n",
"\n",
"#### Step 1: Installation\n",
"\n",
"You can install the DataPrep through the single command below if you have not.\n",
"\n",
"```\n",
"!pip install dataprep\n",
"```"
]
},
{
"cell_type": "markdown",
"metadata": {},
"source": [
"#### Step 2: Connecting to the API\n",
"\n",
"Once the library is installed, you can connect to a website that are supported by us or loading from local configuration files for connection.\n",
"The detailed usage and paramsters for connect() can be found in next section.\n",
"Here, we are connecting to DBLP API through the exsiting configuration file available here: https://github.com/sfu-db/DataConnectorConfigs/tree/develop/api-connectors/dblp"
]
},
{
"cell_type": "code",
"execution_count": null,
"metadata": {},
"outputs": [],
"source": [
"from dataprep.connector import connect\n",
"conn = connect(\"dblp\")"
]
},
{
"cell_type": "markdown",
"metadata": {},
"source": [
"#### Step 3: Understand how to use the API\n",
"\n",
"info() function helps you understand what is available from the website.\n",
"Here, the output shows there is one table called \"publication\" available. And to fetch the data, we have to specify the value of the \"q\" (query keyword) parameter.\n",
"The schema block shows the schema of the results."
]
},
{
"cell_type": "code",
"execution_count": null,
"metadata": {},
"outputs": [],
"source": [
"conn.info()"
]
},
{
"cell_type": "markdown",
"metadata": {},
"source": [
"#### Step 4: Customize the query to collect data\n",
"\n",
"Once you know how to use the API and the connection is built, you can issue the query through the query function.\n",
"The first parameter specifies which API endpoint you want to query. \n",
"The detailed parameter explanation can be found in later sections.\n",
"In this example, we are collecting 2000 CVPR2020 papers."
]
},
{
"cell_type": "code",
"execution_count": null,
"metadata": {
"scrolled": true
},
"outputs": [],
"source": [
"await conn.query(\"publication\", q=\"CVPR 2020\", _count=2000)"
]
},
{
"cell_type": "markdown",
"metadata": {},
"source": [
"And you have the data ready. It is so simple :)"
]
}
],
"metadata": {
"kernelspec": {
"display_name": "Python 3",
"language": "python",
"name": "python3"
},
"language_info": {
"codemirror_mode": {
"name": "ipython",
"version": 3
},
"file_extension": ".py",
"mimetype": "text/x-python",
"name": "python",
"nbconvert_exporter": "python",
"pygments_lexer": "ipython3",
"version": "3.7.7"
}
},
"nbformat": 4,
"nbformat_minor": 4
}

0 comments on commit 5794d6c

Please sign in to comment.