You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
You would start with writing an organized program to just read the 1% twitter stream and work with me on a regular basis
I am interested in a histogram in case the tweets allow this.
Let's assume I store all tweets. Does the 1% stream include dletion events. What is the distribution of the time to live of deleted tweets
e.g. tweet x is posted on t_create and deleted on t_delete. The TTL = t_delete - t_create
we want a histogram of that
Then we put this in a kubernetes cluster. The first task can be done without a cluster. .... e.g. stream data, verify that delete events are there ...
So looks like there are many good examples on the twitter API
Maybe you can do such a simple program and make sure to write a README on how to run your program and set it up. I had lots of students that did such a project before easily in less than 2 hours, so this should not be an issue
I suggest we do
mkdir cloudmesh-twitter
cms sys command generate twitter .
cms twitter register REGISTER
some how register what we get from twitter
cms twitter stream start [--file=FILE]
starts the stream - just prints for now or if a file is presented a file is produced
cms twitter stream start [--file=FILE] [--attributes=ATTRIBUTES]. # filter not important as we want all tweets
only stores selected attributes of the tweet and not the entire tweet
cms twitter stream start [--file=FILE] [--filter=FILTER]. # filter not important as we want all tweets
if you start on things start not with parquet, but with the little prg
I will set this up in a repo in a could of minutes. start with learning how to get twitter API key and how to store it in ~/.cloudmesh
it does not have to be in cloudmesh.yaml
The text was updated successfully, but these errors were encountered:
we need an application that we run on cubernetes (we don't have anyone for that)
I was thinking about
https://www.dataquest.io/blog/streaming-data-python/
e.g. storing twitter data and analysing it in a kubernetes cluster
however we do not want to store it in SQL, but https://parquet.apache.org/
You would start with writing an organized program to just read the 1% twitter stream and work with me on a regular basis
I am interested in a histogram in case the tweets allow this.
Let's assume I store all tweets. Does the 1% stream include dletion events. What is the distribution of the time to live of deleted tweets
e.g. tweet x is posted on t_create and deleted on t_delete. The TTL = t_delete - t_create
we want a histogram of that
Then we put this in a kubernetes cluster. The first task can be done without a cluster. .... e.g. stream data, verify that delete events are there ...
So looks like there are many good examples on the twitter API
https://www.storybench.org/how-to-collect-tweets-from-the-twitter-streaming-api-using-python/
Maybe you can do such a simple program and make sure to write a README on how to run your program and set it up. I had lots of students that did such a project before easily in less than 2 hours, so this should not be an issue
I suggest we do
mkdir cloudmesh-twitter
cms sys command generate twitter .
cms twitter register REGISTER
some how register what we get from twitter
cms twitter stream start [--file=FILE]
starts the stream - just prints for now or if a file is presented a file is produced
cms twitter stream start [--file=FILE] [--attributes=ATTRIBUTES]. # filter not important as we want all tweets
only stores selected attributes of the tweet and not the entire tweet
cms twitter stream start [--file=FILE] [--filter=FILTER]. # filter not important as we want all tweets
if you start on things start not with parquet, but with the little prg
I will set this up in a repo in a could of minutes. start with learning how to get twitter API key and how to store it in ~/.cloudmesh
it does not have to be in cloudmesh.yaml
The text was updated successfully, but these errors were encountered: