From 5794d6c8d4e5b3479f6c0d73de1d4a000f7f6c9f Mon Sep 17 00:00:00 2001 From: peiwangdb Date: Wed, 17 Mar 2021 18:46:32 -0700 Subject: [PATCH] docs(connector): adding a process overview via DBLP section --- docs/source/user_guide/connector/DBLP.ipynb | 145 ++++++++++++++++++++ 1 file changed, 145 insertions(+) create mode 100644 docs/source/user_guide/connector/DBLP.ipynb diff --git a/docs/source/user_guide/connector/DBLP.ipynb b/docs/source/user_guide/connector/DBLP.ipynb new file mode 100644 index 000000000..65b415c46 --- /dev/null +++ b/docs/source/user_guide/connector/DBLP.ipynb @@ -0,0 +1,145 @@ +{ + "cells": [ + { + "cell_type": "markdown", + "metadata": {}, + "source": [ + "# Process overview via DBLP" + ] + }, + { + "cell_type": "markdown", + "metadata": {}, + "source": [ + "## What is DataPrep.Connector?\n", + "\n", + "Connector is an intuitive, open-source API wrapper that speeds up development by standardizing calls to multiple APIs as a simple workflow.\n", + "\n", + "Connector provides a simple wrapper to collect structured data from different Web APIs (e.g., Twitter, Spotify), making web data collection easy and efficient, without requiring advanced programming skills.\n", + "\n", + "With Connector, you can collect data in two steps: **connect** to a website and **query** the data.\n", + "\n", + "We currently support tens of websites: https://github.com/sfu-db/DataConnectorConfigs/tree/develop/api-connectors\n", + "\n", + "You can also author your own configuration files for new websites.\n", + "We look forward to seeing your contribution to facilitate other users as well." + ] + }, + { + "cell_type": "markdown", + "metadata": {}, + "source": [ + "# Collecting data from DBLP\n", + "\n", + "\n", + "#### DBLP Website\n", + "DBLP (https://dblp.org/) is a computer science bibliography website.\n", + "We will use it as an example to illustrate how to collect data easily using DataPrep.Connector.\n", + "\n", + "API: https://dblp.org/faq/13501473.html\n", + "\n", + "And more examples are available here: https://github.com/sfu-db/dataprep/tree/develop/examples.\n", + "\n", + "#### Step 1: Installation\n", + "\n", + "You can install the DataPrep through the single command below if you have not.\n", + "\n", + "```\n", + "!pip install dataprep\n", + "```" + ] + }, + { + "cell_type": "markdown", + "metadata": {}, + "source": [ + "#### Step 2: Connecting to the API\n", + "\n", + "Once the library is installed, you can connect to a website that are supported by us or loading from local configuration files for connection.\n", + "The detailed usage and paramsters for connect() can be found in next section.\n", + "Here, we are connecting to DBLP API through the exsiting configuration file available here: https://github.com/sfu-db/DataConnectorConfigs/tree/develop/api-connectors/dblp" + ] + }, + { + "cell_type": "code", + "execution_count": null, + "metadata": {}, + "outputs": [], + "source": [ + "from dataprep.connector import connect\n", + "conn = connect(\"dblp\")" + ] + }, + { + "cell_type": "markdown", + "metadata": {}, + "source": [ + "#### Step 3: Understand how to use the API\n", + "\n", + "info() function helps you understand what is available from the website.\n", + "Here, the output shows there is one table called \"publication\" available. And to fetch the data, we have to specify the value of the \"q\" (query keyword) parameter.\n", + "The schema block shows the schema of the results." + ] + }, + { + "cell_type": "code", + "execution_count": null, + "metadata": {}, + "outputs": [], + "source": [ + "conn.info()" + ] + }, + { + "cell_type": "markdown", + "metadata": {}, + "source": [ + "#### Step 4: Customize the query to collect data\n", + "\n", + "Once you know how to use the API and the connection is built, you can issue the query through the query function.\n", + "The first parameter specifies which API endpoint you want to query. \n", + "The detailed parameter explanation can be found in later sections.\n", + "In this example, we are collecting 2000 CVPR2020 papers." + ] + }, + { + "cell_type": "code", + "execution_count": null, + "metadata": { + "scrolled": true + }, + "outputs": [], + "source": [ + "await conn.query(\"publication\", q=\"CVPR 2020\", _count=2000)" + ] + }, + { + "cell_type": "markdown", + "metadata": {}, + "source": [ + "And you have the data ready. It is so simple :)" + ] + } + ], + "metadata": { + "kernelspec": { + "display_name": "Python 3", + "language": "python", + "name": "python3" + }, + "language_info": { + "codemirror_mode": { + "name": "ipython", + "version": 3 + }, + "file_extension": ".py", + "mimetype": "text/x-python", + "name": "python", + "nbconvert_exporter": "python", + "pygments_lexer": "ipython3", + "version": "3.7.7" + } + }, + "nbformat": 4, + "nbformat_minor": 4 +}