Skip to content

Splunk-App-and-TA-development/SA-socrata

 
 

Folders and files

NameName
Last commit message
Last commit date

Latest commit

 

History

3 Commits
 
 
 
 
 
 
 
 
 
 

Repository files navigation

Welcome

socrata command for Splunk allows import of datasets found on https://opendata.socrata.com and http://www.opendatanetwork.com directly into Splunk for further processing and analysis. Gain instant access to thousands unique data sets!

This project is hosted on GitHub, see https://github.com/hire-vladimir/SA-socrata

Install

App installation is simple, and only needs to be present on the search head. Documentation around app installation can be found at http://docs.splunk.com/Documentation/AddOns/released/Overview/Singleserverinstall

Getting Started

socrata offers many open and private datasets; some can be accessed anonymously, while others will require an API key. More information regarding finding datasets and obtaining the socrata API key can be found at https://dev.socrata.com/consumers/getting-started.html#throttling-and-application-tokens

Note: If a particular static/historic dataset is used in search, it is suggested to create a saved search that will run on a set interval of time, such that outputs of socrata command will output to a CSV file to be used as lookup.

Screenshot

socrata command for splunk example

System requirements

The command was tested on Splunk 6.3+ on CentOS Linux 7.1. Splunk python is used, without other dependencies, therefore command should work on other Splunk supported platforms.

Command syntax

socrata (<options>)* (<auth_key>)? <socrata_api_endpoint>

The socrata_api_endpoint can be represented in two different ways:

  • Direct API endpoint, such as https://opendata.socrata.com/resource/e2xy-undq.json
  • If the endpoint is hosted at opendata.socrata.com, it can be referred to by the dataset ID as e2xy-undq

Instructions on locating API endpoint is outlined at https://dev.socrata.com/consumers/getting-started.html#finding-your-api-endpoint

Command arguments (optional)

Command implements arguments listed below. There are two types of arguments for this command, debug, metadata, append that are unique to the command, rest are SODA API supported arguments; see full description and usage detail at https://dev.socrata.com/consumers/getting-started.html

debug=<bool> | append=<bool> | metadata=<bool> | auth_token=<socrata_auth_token> | limit=<int> | offset=<int> | select=<SoQL_select_clause> | where=<SoQL_where_clause> | order=<SoQL_order_by_clause> | group=<SoQL_group_by_clause> | q=<SoQL_clause> | query=<SoQL_clause>

Examples

... | socrata auth_token=XXXXXXXXXXXXXXXXXXXX https://opendata.socrata.com/resource/cf4r-dfwe.json
  • Will pull down "Food Inspections in Chicago" dataset, filtering to failed restaurant inspections with violations containing keyword "rodent" with row limit of 50,000
... | socrata limit=50000 https://data.cityofchicago.org/resource/4ijn-s7e5.json where="results='Fail' AND facility_type='Restaurant' AND contains(violations, 'rodent')"
  • Using metadata option will return dataset metadata information as described on https://data.cityofchicago.org for list of locations in NE Illinois, NW Indiana, and SE Wisconsin where alternative vehicle fuels are available.
... | socrata metadata=true https://data.cityofchicago.org/resource/f7f2-ggz5.json
  • Using the debug option will enable additional logging on the command to help troubleshoot data set pulls. See troubleshooting section.
... | socrata debug=1 https://data.cityofchicago.org/resource/f7f2-ggz5.json
  • It is also possible to pass in Splunk variables from previously executed commands. This example will store WHERE filter clause in a variable to be passed to the socrata command, then write output to CSV file.
| localop | stats count | eval my_where="material_family = 'Hazardous Material'" | socrata limit=20000 https://data.ny.gov/resource/dzn2-x287.json where=my_where | outputlookup inputlookup_ny_state_spills.csv
  • Graph food inspection failures in Chicago area restaurants involving rodents by risk (screenshot example)
| socrata https://data.cityofchicago.org/resource/4ijn-s7e5.json where="results='Fail' AND facility_type='Restaurant' AND contains(violations, 'rodent')"
| makemv delim="|" violations | eval entity=dba_name." (". license_ . ")"
| geostats latfield=location_latitude longfield=location_longitude dc(entity) by risk

Troubleshooting

This command writes log data to $SPLUNK_HOME/var/log/splunk/socrata.log, meaning that data is also ingested into Splunk. Magic, I know. Try searching:

index=_internal sourcetype=socrata

When debug level logging is required, pass in debug=true or debug=1 argument to the command. This will display enhanced logging in Splunk UI and the log file.

... | socrata debug=1 e2xy-undq

Legal

  • socrata is a registered trademark of socrata.com.
  • Splunk is a registered trademark of Splunk, Inc.

About

allows import of datasets found on https://opendata.socrata.com and http://www.opendatanetwork.com directly into Splunk

Resources

Stars

Watchers

Forks

Packages

No packages published

Languages

  • Python 100.0%