Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

test key warnings #435

Open
wants to merge 8 commits into
base: master
Choose a base branch
from
Open
Show file tree
Hide file tree
Changes from all commits
Commits
File filter

Filter by extension

Filter by extension

Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
5 changes: 4 additions & 1 deletion _sass/friendly_code.scss
Original file line number Diff line number Diff line change
Expand Up @@ -61,4 +61,7 @@
.highlight .vc { color: #bb60d5 } /* Name.Variable.Class */
.highlight .vg { color: #bb60d5 } /* Name.Variable.Global */
.highlight .vi { color: #bb60d5 } /* Name.Variable.Instance */
.highlight .il { color: #40a070 } /* Literal.Number.Integer.Long */
.highlight .il { color: #40a070 } /* Literal.Number.Integer.Long */

.language-json-doc .err { border: none; font-style: italic }
.language-js .nx { font-style: italic; color: #007020; }
35 changes: 13 additions & 22 deletions api/index.md
Original file line number Diff line number Diff line change
Expand Up @@ -4,39 +4,30 @@ title: API
nav_order: 20
has_children: true
---

# Overview

The [Data Commons Graph](https://datacommons.org) aggregates data from many
different [data sources](https://datacommons.org/datasets) into a single
database. Data Commons is based on the data model used by
[schema.org](https://schema.org); for more information, see [the guide to the data model](/data_model.html).

The **Data Commons API** allows developers to programmatically access the data in Data Commons.
Data Commons provides several different ways to access its API's resources:
The Data Commons APIs allow developers to programmatically access the data in Data Commons.
Data Commons provides several different ways to access its resources:

1. A [REST API](/api/rest/v2) that can be used on the command line as well as in any language with an HTTP library.
1. A lightweight [Python](/api/python) wrapper.
1. A heavier [Pandas](/api/pandas) wrapper.
1. A convenient [Google Sheets](/api/sheets) add-on.
* A [REST API](/api/rest/v2) that can be used on the command line as well as in any language with an HTTP library.
* [Python](/api/python) and [Pandas](/api/pandas) wrappers.

The endpoints can be roughly grouped into four categories.
The endpoints can be roughly grouped into four categories:

- **Local Node Exploration**: Given a set of nodes, explore the
graph around those nodes.
- **Statistical data**: Given a set of statistical variables, dates and entities, get observations.

- **Domain specific APIs**: These are groups of APIs, specific to particular
domains.
- **Graph exploration**: Given a set of nodes, explore the
graph around those nodes.

- **Graph Query/SPARQL**: Given a subgraph where some of the nodes are
- **Graph query/SPARQL**: Given a subgraph where some of the nodes are
variables, retrieve possible matches. This corresponds to a subset of the
graph query language [SPARQL](https://www.w3.org/TR/rdf-sparql-query/).

- **Utilities**: These are Python notebook specific APIs for helping with
Pandas DataFrames, etc.

Most of the provided endpoints take references to nodes and properties as arguments. Every
node or property has a `Data Commons ID (DCID)`, which is used
to pass nodes as arguments to API calls.

**Note:** The DCID of schema.org terms used in Data Commons is their schema.org ID.
graph query language [SPARQL](https://www.w3.org/TR/rdf-sparql-query/). This is useful for getting very specific observations which would require multiple API calls (e.g. "hate crimes motivated by disability status in Californian cities").

- **Utilities**: These are Python notebook-specific APIs for helping with
Pandas DataFrames, etc.
32 changes: 16 additions & 16 deletions api/python/query.md
Original file line number Diff line number Diff line change
Expand Up @@ -12,7 +12,9 @@ Returns the results of running a graph query on the Data Commons knowledge graph
using [SPARQL](https://www.w3.org/TR/rdf-sparql-query/). Note that Data Commons is only
able to support a limited subsection of SPARQL functionality at this time: specifically only the keywords `ORDER BY`, `DISTINCT`, and `LIMIT`.

## General information about this method
Note: The Python SPARQL library currently only supports the [v1](/api/v1/query.html) version of the API.

## General information about the query() method

**Signature**:

Expand All @@ -24,7 +26,7 @@ datacommons.query(query_string, select=None)

* `query_string`: A SPARQL query string.

## How to construct a call to the query method
## How to construct a call to the query() method

This method makes it possible to query the Data Commons knowledge graph using SPARQL. SPARQL is a query language developed to retrieve data from websites. It leverages the graph structure innate in the data it queries to return specific information to an end user. For more information on assembling SPARQL queries, check out [the Wikipedia page about SPARQL](https://en.wikipedia.org/wiki/SPARQL) and [the W3C specification information](https://www.w3.org/TR/sparql11-query/).

Expand All @@ -49,51 +51,51 @@ A correct response will always look like this:

The response contains an array of dictionaries, each corresponding to one node matching the conditions of the query. Each dictionary's keys match the variables in the query SELECT clause, and the values in the dictionaries are those associated to the given node's query-specified properties.

## Examples and error returns
## Examples and error responses

### Examples
The following examples and error responses, along with explanations and fixes for the errors, are available in this [Python notebook](https://colab.research.google.com/drive/1Jd0IDHnMdtxhsmXhL5Ib5tL0zgJud1k5?usp=sharing).

#### Example 1. Retrieve the name of the state associated with DCID geoId/06.
### Example 1: Retrieve the name of the state associated with DCID geoId/06.

```python
>>> geoId06_name_query = 'SELECT ?name ?dcid WHERE { ?a typeOf Place . ?a name ?name . ?a dcid ("geoId/06" "geoId/21" "geoId/24") . ?a dcid ?dcid }'
>>> datacommons.query(geoId06_name_query)
[{'?name': 'Kentucky', '?dcid': 'geoId/21'}, {'?name': 'California', '?dcid': 'geoId/06'}, {'?name': 'Maryland', '?dcid': 'geoId/24'}]
```

#### Example 2. Retrieve a list of ten biological specimens in reverse alphabetical order.
### Example 2: Retrieve a list of ten biological specimens in reverse alphabetical order.

```python
>>> bio_specimens_reverse_alphabetical_order_query = 'SELECT ?name WHERE { ?biologicalSpecimen typeOf BiologicalSpecimen . ?biologicalSpecimen name ?name } ORDER BY DESC(?name) LIMIT 10'
>>> datacommons.query(bio_specimens_reverse_alphabetical_order_query)
[{'?name': 'x Triticosecale'}, {'?name': 'x Silene'}, {'?name': 'x Silene'}, {'?name': 'x Silene'}, {'?name': 'x Pseudelymus saxicola (Scribn. & J.G.Sm.) Barkworth & D.R.Dewey'}, {'?name': 'x Pseudelymus saxicola (Scribn. & J.G.Sm.) Barkworth & D.R.Dewey'}, {'?name': 'x Pseudelymus saxicola (Scribn. & J.G.Sm.) Barkworth & D.R.Dewey'}, {'?name': 'x Pseudelymus saxicola (Scribn. & J.G.Sm.) Barkworth & D.R.Dewey'}, {'?name': 'x Pseudelymus saxicola (Scribn. & J.G.Sm.) Barkworth & D.R.Dewey'}, {'?name': 'x Pseudelymus saxicola (Scribn. & J.G.Sm.) Barkworth & D.R.Dewey'}]
```

#### Example 3. Retrieve a list of GNI observations by country.
### Example 3: Retrieve a list of GNI observations by country.

```python
>>> gni_by_country_query = 'SELECT ?observation ?place WHERE { ?observation typeOf StatVarObservation . ?observation variableMeasured Amount_EconomicActivity_GrossNationalIncome_PurchasingPowerParity_PerCapita . ?observation observationAbout ?place . ?place typeOf Country . } ORDER BY ASC (?place) LIMIT 10'
>>> datacommons.query(gni_by_country_query)
[{'?observation': 'dc/o/syrpc3m8q34z7', '?place': 'country/ABW'}, {'?observation': 'dc/o/bqtfmc351v0f2', '?place': 'country/ABW'}, {'?observation': 'dc/o/md36fx6ty4d64', '?place': 'country/ABW'}, {'?observation': 'dc/o/bm28zvchsyf4b', '?place': 'country/ABW'}, {'?observation': 'dc/o/3nleez1feevw6', '?place': 'country/ABW'}, {'?observation': 'dc/o/x2yg38d0xecnf', '?place': 'country/ABW'}, {'?observation': 'dc/o/7swdqf6yjdyw8', '?place': 'country/ABW'}, {'?observation': 'dc/o/yqmsmbx1qskfg', '?place': 'country/ABW'}, {'?observation': 'dc/o/6hlhrz3k8p5wf', '?place': 'country/ABW'}, {'?observation': 'dc/o/txfw505ydg629', '?place': 'country/ABW'}]
```

#### Example 4. Retrieve a sample list of observations with the unit InternationalDollar.
### Example 4: Retrieve a sample list of observations with the unit InternationalDollar.

```python
>>> internationalDollar_obs_query = 'SELECT ?observation WHERE { ?observation typeOf StatVarObservation . ?observation unit InternationalDollar } LIMIT 10'
>>> datacommons.query(internationalDollar_obs_query)
[{'?observation': 'dc/o/s3gzszzvj34f1'}, {'?observation': 'dc/o/gd41m7qym86d4'}, {'?observation': 'dc/o/wq62twxx902p4'}, {'?observation': 'dc/o/d93kzvns8sq4c'}, {'?observation': 'dc/o/6s741lstdqrg4'}, {'?observation': 'dc/o/2kcq1xjkmrzmd'}, {'?observation': 'dc/o/ced6jejwv224f'}, {'?observation': 'dc/o/q31my0dmcryzd'}, {'?observation': 'dc/o/96frt9w0yjwxf'}, {'?observation': 'dc/o/rvjz5xn9mlg73'}]
```

#### Example 5. Retrieve a list of ten distinct annual estimates of life expectancy, along with the year of estimation, for forty-seven-year-old Hungarians.
### Example 5: Retrieve a list of ten distinct annual estimates of life expectancy, along with the year of estimation, for forty-seven-year-old Hungarians.

```python
>>> life_expectancy_query = 'SELECT DISTINCT ?LifeExpectancy ?year WHERE { ?o typeOf StatVarObservation . ?o variableMeasured LifeExpectancy_Person_47Years . ?o observationAbout country/HUN . ?o value ?LifeExpectancy . ?o observationDate ?year } ORDER BY ASC(?LifeExpectancy) LIMIT 10'
>>> datacommons.query(life_expectancy_query)
[{'?LifeExpectancy': '26.4', '?year': '1993'}, {'?LifeExpectancy': '26.5', '?year': '1992'}, {'?LifeExpectancy': '26.7', '?year': '1990'}, {'?LifeExpectancy': '26.7', '?year': '1994'}, {'?LifeExpectancy': '26.8', '?year': '1991'}, {'?LifeExpectancy': '26.9', '?year': '1995'}, {'?LifeExpectancy': '27.2', '?year': '1996'}, {'?LifeExpectancy': '27.4', '?year': '1999'}, {'?LifeExpectancy': '27.5', '?year': '1997'}, {'?LifeExpectancy': '27.5', '?year': '1998'}]
```

#### Example 6: Use the `select` function to filter returns based on name.
### Example 6: Use the `select` function to filter returns based on name.

```python
>>> names_for_places_query = 'SELECT ?name ?dcid WHERE { ?a typeOf Place . ?a name ?name . ?a dcid ("geoId/06" "geoId/21" "geoId/24") . ?a dcid ?dcid }'
Expand All @@ -105,9 +107,7 @@ The response contains an array of dictionaries, each corresponding to one node m
{'?name': 'Maryland', '?dcid': 'geoId/24'}
```

### Error returns

#### Error return 1: Malformed SPARQL query.
### Error response 1: Malformed SPARQL query

```python
>>> gni_by_country_query = 'SELECT ?observation WHERE { ?observation typeOf StatVarObservation . ?observation variableMeasured Amount_EconomicActivity_GrossNationalIncome_PurchasingPowerParity_PerCapita . ?observation observationAbout ?place . ?place typeOf Country . } ORDER BY ASC (?place) LIMIT 10'
Expand Down Expand Up @@ -139,7 +139,7 @@ ValueError: Response error 500:
b'{\n "code": 2,\n "message": "googleapi: Error 400: Unrecognized name: place; Did you mean name? at [1:802], invalidQuery",\n "details": [\n {\n "@type": "type.googleapis.com/google.rpc.DebugInfo",\n "stackEntries": [],\n "detail": "internal"\n }\n ]\n}\n'
```

#### Error return 2: Malformed SPARQL query string.
### Error response 2: Malformed SPARQL query string

```python
>>> gni_by_country_query = 'SELECT ?observation WHERE { ?observation typeOf StatVarObservation . ?observation variableMeasured Amount_EconomicActivity_GrossNationalIncome_PurchasingPowerParity_PerCapita . ?observation observationAbout ?place . ?place typeOf Country . } ORDER BY ASC (?place) LIMIT 10'
Expand Down Expand Up @@ -172,7 +172,7 @@ b'{\n "code": 2,\n "message": "googleapi: Error 400: Unrecognized name: place; D
>>> gni_by_country_query = 'SELECT ?observation WHERE { ?observation typeOf StatVarObservation . \\\\\ ?observation variableMeasured Amount_EconomicActivity_GrossNationalIncome_PurchasingPowerParity_PerCapita . ?observation observationAbout ?place . ?place typeOf Country . } ORDER BY ASC (?place) LIMIT 10'
```

#### Error return 3: Bad selector.
### Error response 3: Bad selector

```python
>>> names_for_places_query = 'SELECT ?name ?dcid WHERE { ?a typeOf Place . ?a name ?name . ?a dcid ("geoId/06" "geoId/21" "geoId/24") . ?a dcid ?dcid }'
Expand All @@ -186,4 +186,4 @@ Traceback (most recent call last):
KeyError: '?earthquake'
```

These examples and errors, along with explanations and fixes for the errors, are available in this [Python notebook](https://colab.research.google.com/drive/1Jd0IDHnMdtxhsmXhL5Ib5tL0zgJud1k5?usp=sharing).

Loading