Skip to content

This project is to extract large scales tweets of given twitter_id's atleast 1000 followers and query on them for certain statistics

Notifications You must be signed in to change notification settings

Darshansol9/TwitterProject

Folders and files

NameName
Last commit message
Last commit date

Latest commit

 

History

19 Commits
 
 
 
 
 
 
 
 
 
 

Repository files navigation

TwitterProject

Given a twitter ID, get a minimum of 1000 followers and for each follower gather upto 200 tweets Store the tuple (twitterID,followerID,tweetID,tweet) into a table managed in Azure Sql Service. Create a database server with appropriate tables in that Azure account. Setup the connectivity with all resources in Azure Manage Twitter Api Account by using developer console.

Project Agenda:

-- given twitter ID, gather 1000 follower ids of that twitter ID -- for each of the follower ID gather upto 200 original tweets -- exclude retweets, messages, images posting, likes it received

Store that into the Azure table Write a client to query that Azure table. List all tweets for a given twitter ID List follower ID for a given twitter ID Make the query response time faster by indexing the SQL database. Deploy the Rest API endpoint to use this functionality from the internet that can be applicable to retrieve for any twitter_id.

In this case : Twitter_Id used is of Narendra Modi.

Setup:

Contains 5 custom made python files. connect_database.py FetchTweetIds.py private.py Twitter_Followers.py Twitter_Timeline.py

Files

Contains 2 Azure Functions and One FunctionApp: ProjP1 - FA GetTweets - AF FetchFollowers -AF

Data

Files that were generate with tweets for each individual followers and list of followers twitter id: Followers_List.csv More Than 1000 files were generated and stored in Azure Database.

WorkFlow:

FetchFollowers is an azure function which returns HTTPResponse when called through the browser. http://localhost:7071/api/HttpTrigger?name=18839785

As we get the twitter id, inside init.py file Internal calls to custom made python files are done.

Twitter_Follower.py gets called - which hits the twitter api and fetch the followers list of given Twitter_id.
This list is stored inside Followers folder with name Followers_List.csv.
Twitter_Timeline.py gets called which used the Followers_List.csv file and fetch the user's timeline tweets for atleast 200 tweets.
connect_database.py gets called which connects the database and stores the data generated by twitter_timeline.py into the SQL Database of Azure.

Once this process is done, user get the response of all tweets inserted with the list of twitter_ids for the given twitter_id.

To Fetch the tweets:

GetTweets azure func is used. This connects database, given the twitter_id which user wants to view the tweets, it makes a query on database and retrieve the response on the browser.This tweets are treated as string object which is then send to browser back.

Production Enviroment:

I hosted the app in azure for getting the followers id for given twitter id in production environment: Used: Narendra Modi Twitter Id - to fetch atleast 1000 followers having 200 tweets atleast. So in total 200k tweets were stored and index on SQL Server Database to make query response time faster. https://projp1.azurewebsites.net/api/HttpTrigger?code=ZLc5L0Fp9NyLeGe3walOwrc4ZnwiHYSEubLx2J2av2EfL7ufWPxDqA==&name=18839785 This API will fetch the tweets for given followers twitter_id https://projp1.azurewebsites.net/api/GetTweets?code=9eSeGOlQyRDT29ppSubu7W/zr21cTUjvS9bQCbQbWBurmaBd2VO3ag==&name=52754460

These URL's has been deactivated as this was a part of class project initiative by CS-GY 6516 Big Data Technologies and IBM Power Systems Academic Initiative is acknowledged.

Achievement:

-Incorporates with any kind of Tweets with no dataloss or truncation of it : Language Compliant(All Languages tweets is processed), Emojis almost all are fetched as they were posted originally. -Indexes the SQL Database: Which reduces the query time to 65%. -Parallelism is used to scrap the tweets from each individual's timeline, thus reducing the overall processing time.

About

This project is to extract large scales tweets of given twitter_id's atleast 1000 followers and query on them for certain statistics

Resources

Stars

Watchers

Forks

Releases

No releases published

Packages

No packages published