Skip to content

"Easy" data dump of your activity on various web services

License

Notifications You must be signed in to change notification settings

austil/datapuller

Repository files navigation

Data Puller

Pull all CLI

This repository is a collection of script I've made to conveniently pull my personnal data from internet services I use the most.
The goal is to get everything about me in one place for futher analysis (data science with R, full text search with Elastic, ...).

Those scripts pull every bit of interesting data about you available from web services APIs into plain JSON files.

Currently supporting :

  • Pocket : unread, archived & favorites
  • Twitter : likes, tweets, retweets
  • Youtube : likes, favorites, history (via manual import & parsing)
  • Reddit : upvoted, saved
  • Github : stars

🏥 Have a look at The Data Detox Kit.

Run

# A specific puller (for setup or debug), e.g. twitter
node src/pullers/twitter_pull.js
# All puller at once
npm run start

# Stats
node src/stats.js
# Specific report
node src/reports/twitter_report.js
node src/reports/pocket_readnext.js

Setup

  • Run npm install
  • Provide your API Credentials via env variables or a ./config.json file (have a look at ./src/config_manager.js)
  • Go through the auth procedure of every configured puller by launching them separatly (with something like node ./src/pullers/pocket_pull.js)

More on this project

Youtube Restrictions

The watch history and the watch later playlist are not accessible through the Youtube API for privacy reasons.
To get arround this you can obtain a watch-history.html file via the Google Takeout page. Then, put this file in the drop_zone folder so it can be parsed by the youtube puller on the next run.
As for the watch later playlist, the Google Takeout export is already a JSON file.

Late 2019 update : the watch history is now available in JSON but still require pulling videos details.

Why this project, Are website data exports not enough ?

Website's export feature have shortcomings (late 2019) :

  • Pocket export is in html and does not differenciate favorite from other items
  • Github export does not include starred repos
  • Youtube export does not include any videos metadata like duration and category

As for Facebook, Reddit and Twitter, they're doing a great job so my scripts may be irrelevant.

About

"Easy" data dump of your activity on various web services

Resources

License

Stars

Watchers

Forks

Releases

No releases published

Packages

No packages published