Given a twitter ID, get a minimum of 1000 followers and for each follower gather upto 200 tweets Store the tuple (twitterID,followerID,tweetID,tweet) into a table managed in Azure Sql Service. Create a database server with appropriate tables in that Azure account. Setup the connectivity with all resources in Azure Manage Twitter Api Account by using developer console.
Project Agenda:
-- given twitter ID, gather 1000 follower ids of that twitter ID -- for each of the follower ID gather upto 200 original tweets -- exclude retweets, messages, images posting, likes it received
Store that into the Azure table Write a client to query that Azure table. List all tweets for a given twitter ID List follower ID for a given twitter ID Make the query response time faster by indexing the SQL database. Deploy the Rest API endpoint to use this functionality from the internet that can be applicable to retrieve for any twitter_id.
In this case : Twitter_Id used is of Narendra Modi.
Contains 5 custom made python files. connect_database.py FetchTweetIds.py private.py Twitter_Followers.py Twitter_Timeline.py
Contains 2 Azure Functions and One FunctionApp: ProjP1 - FA GetTweets - AF FetchFollowers -AF
Files that were generate with tweets for each individual followers and list of followers twitter id: Followers_List.csv More Than 1000 files were generated and stored in Azure Database.
FetchFollowers is an azure function which returns HTTPResponse when called through the browser. http://localhost:7071/api/HttpTrigger?name=18839785
As we get the twitter id, inside init.py file Internal calls to custom made python files are done.
Twitter_Follower.py gets called - which hits the twitter api and fetch the followers list of given Twitter_id.
This list is stored inside Followers folder with name Followers_List.csv.
Twitter_Timeline.py gets called which used the Followers_List.csv file and fetch the user's timeline tweets for atleast 200 tweets.
connect_database.py gets called which connects the database and stores the data generated by twitter_timeline.py into the SQL Database of Azure.
Once this process is done, user get the response of all tweets inserted with the list of twitter_ids for the given twitter_id.
GetTweets azure func is used. This connects database, given the twitter_id which user wants to view the tweets, it makes a query on database and retrieve the response on the browser.This tweets are treated as string object which is then send to browser back.
I hosted the app in azure for getting the followers id for given twitter id in production environment: Used: Narendra Modi Twitter Id - to fetch atleast 1000 followers having 200 tweets atleast. So in total 200k tweets were stored and index on SQL Server Database to make query response time faster. https://projp1.azurewebsites.net/api/HttpTrigger?code=ZLc5L0Fp9NyLeGe3walOwrc4ZnwiHYSEubLx2J2av2EfL7ufWPxDqA==&name=18839785 This API will fetch the tweets for given followers twitter_id https://projp1.azurewebsites.net/api/GetTweets?code=9eSeGOlQyRDT29ppSubu7W/zr21cTUjvS9bQCbQbWBurmaBd2VO3ag==&name=52754460
These URL's has been deactivated as this was a part of class project initiative by CS-GY 6516 Big Data Technologies and IBM Power Systems Academic Initiative is acknowledged.
-Incorporates with any kind of Tweets with no dataloss or truncation of it : Language Compliant(All Languages tweets is processed), Emojis almost all are fetched as they were posted originally. -Indexes the SQL Database: Which reduces the query time to 65%. -Parallelism is used to scrap the tweets from each individual's timeline, thus reducing the overall processing time.