Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

adds cheatsheet and etl to documentation #522

Merged
merged 16 commits into from
Jan 31, 2019
Merged
Show file tree
Hide file tree
Changes from 2 commits
Commits
File filter

Filter by extension

Filter by extension

Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
111 changes: 111 additions & 0 deletions docs/cheatsheet.rst
Original file line number Diff line number Diff line change
@@ -0,0 +1,111 @@
Cheatsheet
----------
andy-esch marked this conversation as resolved.
Show resolved Hide resolved

How to get census tracts for a state and specific measures
----------------------------------------------------------

...

Get raw measures from the DO
----------------------------

Key part is to use `predenominated` in the metadata and `how='geoid'` (or some other geom_ref) when using `CartoContext.data`. Here we're using a dataset with a column called `geoid` which has the GEOID of census tracts. Note that it's important to specify the same geometry ID in the measure metadata as the geometries you are wishing to enrich.

.. code::

# get median income for 2006 - 2010 and 2011 - 2015 five year estimates.
meta = [{
'numer_id': 'us.census.acs.B19013001',
'geom_id': 'us.census.tiger.census_tract',
'normalization': 'predenominated',
'numer_timespan': '2006 - 2010'
}, {
'numer_id': 'us.census.acs.B19013001',
'geom_id': 'us.census.tiger.census_tract',
'normalization': 'predenominated',
'numer_timespan': '2011 - 2015'
}]

boston_data = cc.data('boston_census_tracts', meta, how='geoid')

Engineer your DO metadata if you already have GEOID or another geom_ref
-----------------------------------------------------------------------

Use `how='geom_ref_col'` and specify the appropriate boundary in the metadata.

How to get a matplotlib figure with four maps
---------------------------------------------

.. code::

table = 'brooklyn_poverty'
cols = [('pop_determined_poverty_status_2011_2015', 'Sunset'),
('poverty_per_pop', 'Mint'),
('walked_to_work_2011_2015', 'TealRose'),
('total_population', 'Peach')]

fig, axs = plt.subplots(2, 2, figsize=(8, 8))

for idx, col in enumerate(cols):
cc.map(layers=[BaseMap('dark'), Layer(table,
color={'column': col[0],
'scheme': styling.scheme(col[1], 7, 'quantiles')})],
ax=axs[idx // 2][idx % 2],
zoom=11, lng=-73.9476, lat=40.6437,
interactive=False,
size=(288, 288))
axs[idx // 2][idx % 2].set_title(col[0])
fig.tight_layout()
plt.show()

.. image:: https://user-images.githubusercontent.com/1041056/35007309-42e818b6-fac7-11e7-87ab-b5148e011226.png

Get a GeoDataFrame
------------------

.. code::

from cartoframes import CartoContext
import geopandas as gpd
cc = CartoContext()

gdf = gpd.GeoDataFrame(cc.read('tablename', decode_geom=True))

Skip SSL verification
---------------------

.. code::

from requests import Session
session = Session()
session.verify = False

cc = CartoContext(base_url='...', api_key='...', session=session)

Reading large tables
--------------------

Sometimes tables are too large to read them out in a single `CartoContext.read` or `CartoContext.query` operation. In this case, you can read chunks and recombine, like below:

.. code::

import pandas as pd
dfs = []

# template query
q = '''
SELECT * FROM my_big_table
WHERE cartodb_id >= {lower} and cartodb_id < {upper}
'''

num_rows = cc.sql_client.send('select count(*) from my_big_table')['rows'][0]['count']

# read in 100,000 chunks
for r in range(0, num_rows, 100000):
dfs.append(cc.query(q.format(lower=r, upper=r+100000)))

# combine 'em all
all_together = pd.concat(dfs)
del dfs

When writing large DataFrames to CARTO, cartoframes takes care of the batching. Users shouldn't hit errors in general until they run out of size in the database.
34 changes: 34 additions & 0 deletions docs/etl.rst
Original file line number Diff line number Diff line change
@@ -0,0 +1,34 @@
ETL with cartoframes
====================

One common use case for cartoframes is its use in an ETL (Extract, Transform, and Load) process. The most common pattern is to load the data into CARTO:

.. code::

from cartoframes import CartoContext
import pandas as pd

# create cartocontext for your carto account
cc = CartoContext(<your credentials>)

# Extract into a pandas' DataFrame (can be replaced by other operation)
raw_data = pd.read_csv('https://<remote location>.csv')

# Transform
processed_data = <some processing pipeline>

# Load into your carto account
cc.write(processed_data, 'processed_data')


Use cases:

- Syncing datasets that aren't accessible to the Import API's sync option or that need intermediate processing
- Connecting datasets that reside in datalakes to CARTO
- Subsampling large datasets for preview in CARTO

Some more examples:

- `Hive -> CARTO connector <https://github.com/andy-esch/hive-carto-connector>`__
- `Accessing and parsing a live data feed <https://city-informatics.com/cartoframes-dashboard-tutorial/>`__
- `Live Power Outage reporting for Massachusetts <https://github.com/jhaddadin/massoutagemap>`__
3 changes: 3 additions & 0 deletions docs/index.rst
Original file line number Diff line number Diff line change
Expand Up @@ -8,6 +8,9 @@
.. toctree::
:maxdepth: 2

cheatsheet
etl

*************************
CARTOframes Functionality
*************************
Expand Down