GitHub

Introduction

This program allows us to process google search trend data for a particualr search term and analyze it using k-means clustering. Additionally, it provides visualization using Principle component analysis to reduce the dimensionality to 2. This allows us to see the clusters form in 2-space. It includes a method that allows us to make calls to the Google Trends API.

Making Calls to the Google Trend API

cd google_trends_access

Once you are there, run the following command:

python src/gtrends/Dataset_creator.py keyword

Example:

python src/gtrends/Dataset_creator.py hello

Here the keyword is the google search term you want to retrieve data for. The region is the region of interest that you want to select. The format of the region parameter is based on the UNECE Codes for Trade. In order to look up the region code that you require, please take a look at the countries.rda file. Include the parameter 'graph' to generate a plot of the data. For the time frame parameter, there are a few formatting options:

Timeframe

Date to start from
Defaults to last 5yrs, 'today 5-y'.
Everything 'all'
Specific dates, 'YYYY-MM-DD YYYY-MM-DD' example '2016-12-14 2017-01-25'
Specific datetimes, 'YYYY-MM-DDTHH YYYY-MM-DDTHH' example '2017-02-06T10 2017-02-12T07'
- Note Time component is based off UTC
Current Time Minus Time Pattern:
- By Month: 'today #-m' where # is the number of months from that date to pull data for
  - For example: 'today 3-m' would get data from today to 3months ago
  - NOTE Google uses UTC date as 'today'
  - Seems to only work for 1, 2, 3 months only
- Daily: 'now #-d' where # is the number of days from that date to pull data for
  - For example: 'now 7-d' would get data from the last week
  - Seems to only work for 1, 7 days only
- Hourly: 'now #-H' where # is the number of hours from that date to pull data for
  - For example: 'now 1-H' would get data from the last hour
  - Seems to only work for 1, 4 hours only

Input Format

The input to this program is a CSV file formatted as follows. Each entry is made up of the google search trend values for a particualr search term for a given location, over a set time period. In addition, each entry is labelled based on the state it is from (Where numbers 1 to 50 refer to states in alphabetical order). These labels allow us to compare our clustering to the actual location of states.

1	2	3	labels
2	100	44	2
13	100	21	10
5	90	44	4

To run the program, run the following command:

cd kmeans
python3 kmeans.py filename.csv

Name		Name	Last commit message	Last commit date
Latest commit History 41 Commits
google_trends_access		google_trends_access
kmeans		kmeans
.DS_Store		.DS_Store
README.md		README.md
countries.rda		countries.rda
kmeans.csv		kmeans.csv

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Repository files navigation

Introduction

Making Calls to the Google Trend API

Input Format

About

Releases

Packages

Languages

Sidu28/GTrends_API_Kmeans

Folders and files

Latest commit

History

Repository files navigation

Introduction

Making Calls to the Google Trend API

Input Format

About

Resources

Stars

Watchers

Forks

Releases

Packages 0

Languages

Packages