Skip to content

alejandromerchan/USDAQuickStats.jl

Folders and files

NameName
Last commit message
Last commit date

Latest commit

 

History

26 Commits
 
 
 
 
 
 
 
 
 
 

Repository files navigation

USDAQuickStats.jl

USDAQuickStats.jl provides functions to access data from the USDA National Agricultural Statistics Service (NASS) Quick Stats database API in Julia.

Installation

add USDAQuickStats

Index

The package contains following functions:

  • set_api_key
  • get_counts
  • get_param_values
  • get_nass

Tutorial and Workflow

Set up an Environment Variable for the NASS API key

To start using the API, the user first needs to get a personal API key.

The user can request a NASS API key at https://quickstats.nass.usda.gov/api.

The API key can be saved as an environment variable called "USDA_QUICK_SURVEY_KEY" or used during each new Julia session by setting it up using:

using USDAQuickStats
set_api_key("YOUR_KEY"::String)

replacing "YOUR_KEY" with the private API key as a string.

Saving the key into a permanent variable in your environment is dependent on the operating system.

Query the database

The API for the Quick Stats database provides three main functions:

  • get_nass
  • get_counts
  • get_param_values

get_nass

get_nass(args...; format="json") The main function is get_nass, which queries the main USDA Quick Stats database.

args... is a list of the different headers from the database that can be queried. Each argument is a string with the name of the header and the value from that header in uppercase, e.g. "header=VALUE. The description of the different headers (also called columns) for the database is available [here].(https://quickstats.nass.usda.gov/api)

The format keyword can be added to the query after a semicolon ; and defines the format of the response. It is set to JSON as a default, other formats provided by the database are CSV and XML.

The function returns a HTTP.request object and the user can parse it using different packages, some examples below.

In the following example, the survey data for oranges in California (CA) for the year 2019 was queried for information about the headers "ACRES BEARING" and "PRICE RECEIVED". The format keyword isn't specified, so the request will return a JSON file.

Notice that header values that have spaces in them need to be passed with the symbol %20 replacing the space. In general, no spaces are allowed in the query.

query = get_nass("source_desc=SURVEY","commodity_desc=ORANGES","state_alpha=CA", "year=2019","statisticcat_desc=AREA%20BEARING","statisticcat_desc=PRICE%20RECEIVED")

output

HTTP.Messages.Response:
"""
HTTP/1.1 200 OK
Date: Sat, 26 Dec 2020 19:36:55 GMT
Server: Apache/2.4.23 (Linux/SUSE)
X-Frame-Options: SAMEORIGIN
Content-Length: 274515
Cache-Control: max-age=86400, private
Connection: close
Content-Type: application/json
Strict-Transport-Security: max-age=31536000; includeSubDomains; preload

{"data":[{"begin_code":"00","prodn_practice_desc":"ALL PRODUCTION PRACTICES","watershed_desc":"","state_fips_code":"06","commodity_desc":"ORANGES","statisticcat_desc":"AREA BEARING","Value":"147,000","watershed_code":"00000000","source_desc":"SURVEY","util_practice_desc":"ALL UTILIZATION PRACTICES","domaincat_desc":"NOT SPECIFIED","domain_desc":"TOTAL","state_alpha":"CA","week_ending":"","group_desc":"FRUIT & TREE NUTS","reference_period_desc":"YEAR","CV (%)":"","year":2019,"short_desc":"ORANGES - ACRES BEARING","country_code":"9000","load_time":"2019-08-28 15:09:57","country_name":"UNITED STATES","unit_desc":"ACRES","county_code":"","end_code":"00","sector_desc":"CROPS","state_name":"CALIFORNIA","zip_5":"","class_desc":"ALL CLASSES","county_ansi":"","asd_code":"","location_desc":"CALIFORNIA","congr_district_code":"","county_name":"","state_ansi":"06","region_desc":"","asd_desc":"","freq_desc":"ANNUAL","agg_level_desc":"STATE"},{"reference_period_desc":"MARKETING YEAR","CV (%)":"","yea
⋮
274515-byte body
"""

This query object can be post-processed in different ways, depending on the format. JSON is the default format and the object can be displayed using the packages JSON3.jl, JSONTables.jl and DataFrames.jl.

using JSON3
using JSONTables
using DataFrames

jobject = JSON3.read(query.body)
jtable = jsontable(jobject.data)
df = DataFrame(jtable)

The query can also be returned and processed as a CSV file.

using CSV
using DataFrames

query = get_nass("source_desc=SURVEY","commodity_desc=ORANGES","state_alpha=CA", "year=2019","statisticcat_desc=AREA%20BEARING","statisticcat_desc=PRICE%20RECEIVED"; format="csv")

# Display as DataFrame
CSV.read(query.body, DataFrame)

# Or save it to disk
CSV.write("query.csv", CSV.File(query.body))

The query can also return an XML file.

get_param_values

get_param_values(arg) is a helper query that allow user to check the values of a field arg from the database. This is useful when constructing different query strings, as it allows the user to determine which values are available on each field.

db_values = get_param_values("sector_desc")

output

HTTP.Messages.Response:
"""
HTTP/1.1 200 OK
Date: Sat, 26 Dec 2020 20:40:29 GMT
Server: Apache/2.4.23 (Linux/SUSE)
X-Frame-Options: SAMEORIGIN
Content-Length: 89
Cache-Control: max-age=86400, private
Connection: close
Content-Type: application/json
Strict-Transport-Security: max-age=31536000; includeSubDomains; preload

{"sector_desc":["ANIMALS & PRODUCTS","CROPS","DEMOGRAPHICS","ECONOMICS","ENVIRONMENTAL"]}"""

The query object can be post processed using the JSON3 package to obtain a more readable output if needed.

using JSON3

JSON3.read(db_values.body)

get_counts

get_counts(args...) is a helper query that allows user to check the number of records a query using the fields in args... will produce before performing the query. This is important because the USDA Quick Stats API has a limit of 50,000 records per query. Any query requesting a number of records larger than this limit will fail.

As in get_nass, args... is a list of the different headers from the database that can be queried. Each argument is a string with the name of the header and the value from that header in uppercase, e.g. "header=VALUE. The description of the different headers (also called columns) for the database is available [here].(https://quickstats.nass.usda.gov/api)

In the following example, the number of records for survey data for oranges in California (CA) for the year 2019 with information about the headers "ACRES BEARING" and "PRICE RECEIVED" was queried.

Notice that header values that have spaces in them need to be passed with the symbol %20 replacing the space. In general, no spaces are allowed in the query.

count = get_counts("source_desc=SURVEY","commodity_desc=ORANGES","state_alpha=CA", "year=2019","statisticcat_desc=AREA%20BEARING","statisticcat_desc=PRICE%20RECEIVED")

output

HTTP.Messages.Response:
"""
HTTP/1.1 200 OK
Date: Sat, 26 Dec 2020 20:47:55 GMT
Server: Apache/2.4.23 (Linux/SUSE)
X-Frame-Options: SAMEORIGIN
Content-Length: 13
Cache-Control: max-age=86400, private
Connection: close
Content-Type: application/json
Strict-Transport-Security: max-age=31536000; includeSubDomains; preload

{"count":276}"""

Same as before, the object can be processed with the JSON3 package to get a more readable output.

A very large query would be for example:

get_counts("source_desc=SURVEY", "year=2019")

output

HTTP.Messages.Response:
"""
HTTP/1.1 200 OK
Date: Sat, 26 Dec 2020 20:49:14 GMT
Server: Apache/2.4.23 (Linux/SUSE)
X-Frame-Options: SAMEORIGIN
Content-Length: 16
Cache-Control: max-age=86400, private
Connection: close
Content-Type: application/json
Strict-Transport-Security: max-age=31536000; includeSubDomains; preload

{"count":448878}"""

This query would fail if ran directly using the get_nass function, because it exceeds the limit of 50000 rows.

I would like to thank @markushhh, because I heavily used his FredApi.jl for inspiration. And sometimes blatant plagiarism.

Each comment, suggestion or pull request is welcome!