Skip to content

Setting up Whois Data

Kurt Grutzmacher edited this page Jul 17, 2014 · 2 revisions

Whois Data

The Cisco OpenSOC development team obtains its Whois data from a private third-party in CSV format. Your source data may be different however our bolt code uses specific field names:

Key name Use
domainName The search key
fieldName Added to source string during enrichment
fieldName Added to source string during enrichment
fieldName Added to source string during enrichment
fieldName Added to source string during enrichment

This activity can be modified in the Bolt source code.

JSON Format

OpenSOC comes with an conversion utility that will take a source CSV file format with header fields and convert each line to a JSON string. It expects CSV files to be found within a TLD and will combine the TLD and filename to consolidate output into one directory for HBASE consumption.

As an example the directory /whois/csv has the following structure:

-- whois/
  -- csv/
    -- com/1.csv
    -- us/1.csv

These files will be processed and saved as whois/json/com_1.json and whois/json/us_1.json.

$ OpenSOC-PlatformScripts/WhoisEnrichment/Whois_CSV_to_JSON.py -s /whois/csv -o /whois/json 
INFO:root:Processing Whois files from /whois/csv
INFO:root:Starting 8 pool workers
INFO:root:Starting activities on 2 CSV files
DEBUG:root:PoolWorker-1: Converting /whois/csv/com/1.csv to /whois/json/com_1.json
DEBUG:root:PoolWorker-2: Converting /whois/csv/us/1.csv to /whois/json/us_1.json
INFO:root:Completed

An example line of the JSON output:

{"standardRegCreatedDate": "2012-02-20 14:09:17 UTC", "technicalContact_telephoneExt": "", 
"expiresDate": "Tue Feb 19 23:59:59 GMT 2013", "technicalContact_city": "San Nicola La Strada",
"billingContact_fax": "", "whoisServer": "", "administrativeContact_faxExt": "", "registrant_fax": "",
"registrant_postalCode": "00132", "registrant_city": "Roma", "billingContact_postalCode": "81020",
"billingContact_city": "San Nicola La Strada", "technicalContact_street4": "", "technicalContact_street1": "Via Caserta, 5",
"technicalContact_street3": "", "technicalContact_state": "CE", "technicalContact_email": "domainmanager@interferenza.com",
"technicalContact_street2": "", "technicalContact_name": "Giancarlo Russo", "zoneContact_street4": "",
"zoneContact_street3": "", "zoneContact_street2": "", "zoneContact_street1": "",
"administrativeContact_street1": "Via Caserta, 5", "administrativeContact_street3": "",
"administrativeContact_street2": "", "administrativeContact_street4": "", "zoneContact_postalCode": "", 
"administrativeContact_postalCode": "81020", "technicalContact_organization": "Interferenza s.r.l.",
"zoneContact_city": "", "registrant_name": "Roberto Delle Fratte", "standardRegExpiresDate": "2013-02-19 23:59:59 UTC",
"billingContact_email": "domainmanager@interferenza.com", "registrant_email": "domainmanager@interferenza.com",
"billingContact_name": "Giancarlo Russo", "billingContact_organization": "Interferenza s.r.l.",
"administrativeContact_organization": "Interferenza s.r.l.", "administrativeContact_telephone": "39390823454016",
"technicalContact_fax": "", "zoneContact_telephoneExt": "", "updatedDate": "Sat Feb 23 03:42:03 GMT 2013",
"standardRegUpdatedDate": "2013-02-23 03:42:03 UTC", "zoneContact_name": "", "administrativeContact_telephoneExt": "",
"technicalContact_postalCode": "81020", "billingContact_street3": "", "billingContact_street2": "",
"billingContact_street1": "Via Caserta, 5", "registrant_street4": "", "registrant_street3": "",
"registrant_street2": "", "registrant_street1": "Via Fermignano, 90", "billingContact_street4": "",
"zoneContact_email": "", "zoneContact_telephone": "", "registrant_organization": "Delle Fratte Roberto",
"zoneContact_organization": "", "registrant_telephoneExt": "", "administrativeContact_fax": "",
"billingContact_telephoneExt": "", "createdDate": "Mon Feb 20 14:09:17 GMT 2012", "zoneContact_fax": "",
"administrativeContact_city": "San Nicola La Strada", "administrativeContact_state": "CE",
"zoneContact_country": "", "technicalContact_telephone": "39390823454016", "contactEmail": "domainmanager@interferenza.com",
"registrant_state": "RM", "billingContact_state": "CE", "technicalContact_country": "ITALY",
"technicalContact_faxExt": "", "registrarName": "DOMAIN.COM, LLC|DOTSTER",
"administrativeContact_country": "ITALY", "status": "ok", "registrant_telephone": "39393338043201",
"nameServers": "EIG1.RENEWYOURNAME.NET|EIG2.RENEWYOURNAME.NET|", "billingContact_telephone": "39390823454016",
"billingContact_country": "ITALY", "zoneContact_state": "", "registrant_country": "ITALY",
"administrativeContact_email": "domainmanager@interferenza.com", "administrativeContact_name": "Giancarlo Russo",
"registrant_faxExt": "", "billingContact_faxExt": "", "domainName": "antinfortunistica.us", "zoneContact_faxExt": ""}

Creating the HBASE Whois table

JSON files can then be uploaded to an HDFS directory and loaded using the HBASE ImportTsv command:

$ ./bin/hbase create 'whois', {NAME => 'data', COMPRESSION => 'LZO', VERSIONS=>'2'}
$ ./bin/hbase org.apache.hadoop.hbase.mapreduce.ImportTsv -Dimporttsv.columns=HBASE_ROW_KEY,data:json whois hdfs://whois/load/