Skip to content

ffcg/AnalyticsBattle

Folders and files

NameName
Last commit message
Last commit date

Latest commit

 

History

20 Commits
 
 
 
 
 
 

Repository files navigation

Analytics PUB den 13 april -16


Varmt Välkomna till Forefronts nästa kompetensevent!

Datum: Onsdag den 13 april

Tid: Klockan 17:30

Plats: Forefront Consulting Group. Holländargatan 13, våning 5

Nu startar vi även upp våra populära kompetenspubar inom området Analytics. Ämnet för kvällen är ”Möjligheter med textanalys”. Vi kommer att belysa vilken nytta man kan uppnå med textanalys samt presentera resultatet av en jämförelse som vi gjort mellan Gavagais, MS och SAS verktyg. Förutom intressanta presentationer bjuder vi på trevligt mingel samt mat och dryck.

Slutligen blir det ”Battle” mellan dessa leverantörer. De kommer att visa hur textanalys genomförs med hjälp av olika verktyg och vilket resultat man kan få fram. Givetvis finns det utrymme för frågor och vidare diskussioner.

Detta får ni inte missa!

Vänligen OSA senast torsdag den 7 april.


Användning av detta repot

Originaldata

Se den här repot för diverse scripts som använts på email-datan från grunden.

Strukturen

+-- data
¦   +-- original
¦   ¦   +-- Aliases.csv
¦   ¦   +-- EmailReceivers.csv
¦   ¦   +-- Emails.csv
¦   ¦   +-- Persons.csv
¦   ¦   +-- databases.sqlite
¦   ¦   +-- hashes.txt
¦   +-- generated
¦       +-- Candidates.json
¦       +-- US_states.json
+-- scripts
	+-- databases.sqlite.sql

Datan

Från Kaggle.com/hillary-clinton-emails kan man få originaldata kring Hillary Clintons emails. Dessa är också upplagda här under /data.

Aliases.csv

 - **Id** - unique identifier for internal reference
 - **Alias** - text in the From/To email fields that refers to the person
 - **PersonId** - person that the alias refers to

EmailReceivers.csv

 - **Id** - unique identifier for internal reference
 - **EmailId** - Id of the email
 - **PersonId** - Id of the person that received the email

Emails.csv

 - **Id** - unique identifier for internal reference
 - **DocNumber** - FOIA document number
 - **MetadataSubject** - Email SUBJECT field (from the FOIA metadata)
 - **MetadataTo** - Email TO field (from the FOIA metadata)
 - **MetadataFrom** - Email FROM field (from the FOIA metadata)
 - **SenderPersonId** - PersonId of the email sender (linking to Persons table)
 - **MetadataDateSent** - Date the email was sent (from the FOIA metadata)
 - **MetadataDateReleased** - Date the email was released (from the FOIA metadata)
 - **MetadataPdfLink** - Link to the original PDF document (from the FOIA metadata)
 - **MetadataCaseNumber** - Case number (from the FOIA metadata)
 - **MetadataDocumentClass** - Document class (from the FOIA metadata)
 - **ExtractedSubject** - Email SUBJECT field (extracted from the PDF)
 - **ExtractedTo** - Email TO field (extracted from the PDF)
 - **ExtractedFrom** - Email FROM field (extracted from the PDF)
 - **ExtractedCc** - Email CC field (extracted from the PDF)
 - **ExtractedDateSent** - Date the email was sent (extracted from the PDF)
 - **ExtractedCaseNumber** - Case number (extracted from the PDF)
 - **ExtractedDocNumber** - Doc number (extracted from the PDF)
 - **ExtractedDateReleased** - Date the email was released (extracted from the PDF)
 - **ExtractedReleaseInPartOrFull** - Whether the email was partially censored (extracted from the PDF)
 - **ExtractedBodyText** - Attempt to only pull out the text in the body that the email sender wrote (extracted from the PDF)
 - **RawText** - Raw email text (extracted from the PDF)

Persons.csv

 - **Id** - unique identifier for internal reference
 - **Name** - person's name

database.sqlite

This SQLite database contains all of the above tables (Emails, Persons, Aliases, and EmailReceivers) with their corresponding fields. You can see the schema and ingest code under scripts/databases.sqlite.sql

Candidates.json

This file is a string literal of both the main presidential candidates of the 2016 election and all of the democrats and republicans that have been potential candidates.

US_states.json

This file is a string literal of all of the American states. Also includes abbreviations in case you want to match with geodata. The written name of the states are also included in Candidates.json as part of each candidate's state belonging.


Leverantörer

Gavagai Microsoft SAS

About

No description, website, or topics provided.

Resources

Stars

Watchers

Forks

Releases

No releases published

Packages

No packages published