Some scrapy spiders useful to crawl instagram posts using public APIS (No TOKEN)
- Python
- Scrapy
- hashtag (crawl all the post given a hashtag)
scrapy crawl hashtag
Use the -L INFO to avoid a lot of debug messages
The scraper put its files under the scraped directory
File are put under scraped/hashtag/hashtagname, by date and hour of the day. This is because if you execute the crawler multiple times in the same hour the output is appended. Files contains a Json for each line.
For example :
{"id": "1684344684669792291", "shortcode": "Bdf_pERD3Aj", "caption": "\"Non c'\u00e8 amore pi\u00f9 sincero di quello per il cibo\". Panino caldo e croccante con caciocavallo, zucchine grigliate e pesto di pomodori secchi. \ud83d\ude0b #myferrara #labellaferrara #volgoitalia #volgoemiliaromagna #volgoferrara #igersferrara #volgosapori #italia_in_grande #centrostorico #iconsigliati #visitferrara #cibobuono #cosebuone #qualit\u00e0 #passione #genuinit\u00e0 #freschezza #tagsforlikes", "display_url": "https://instagram.ffco2-1.fna.fbcdn.net/vp/71a6e1bc5183bbd9b1339f064b2bb1b9/5B231526/t51.2885-15/e35/25021917_402106523561566_9076772742774128640_n.jpg", "loc_id": 0, "loc_name": "", "owner_id": "5655088891", "owner_name": "tipicoh_ferrara", "taken_at_timestamp": 1515009554, "comments": 0, "likes": 27, "hashtags": ["#myferrara", "#labellaferrara", "#volgoitalia", "#volgoemiliaromagna", "#volgoferrara", "#igersferrara", "#volgosapori", "#italia_in_grande", "#centrostorico", "#iconsigliati", "#visitferrara", "#cibobuono", "#cosebuone", "#qualità", "#passione", "#genuinità", "#freschezza", "#tagsforlikes"], "mentions": [] }
{"id": "1684875047875260104", "shortcode": "Bdh4O3fhi7I", "caption": "#roma #piazzanavona #pjmasks #sky #ballons #detail #thehub_lazio #lazio_illife #new_photolazio #yallerslazio #arts_illife #vivolazio #volgolazio #lazio_super_pics #visit_lazio #italiaStyle20 #iconsigliati #volgoarte #shotz_of_lazio", "display_url": "https://instagram.ffco2-1.fna.fbcdn.net/vp/20459e8002d4a61284bc7e03f0da5f8a/5B0B896A/t51.2885-15/e35/26065496_174719709946144_8885150857012707328_n.jpg", "loc_id": "336844629", "loc_name": "Piazza Navona", "owner_id": "256276180", "owner_name": "clapanama", "taken_at_timestamp": 1515072779, "comments": 3, "likes": 156, "hashtags": ["#roma", "#piazzanavona", "#pjmasks", "#sky", "#ballons", "#detail", "#thehub_lazio", "#lazio_illife", "#new_photolazio", "#yallerslazio", "#arts_illife", "#vivolazio", "#volgolazio", "#lazio_super_pics", "#visit_lazio", "#italiaStyle20", "#iconsigliati", "#volgoarte", "#shotz_of_lazio"], "mentions": []}
GNU GENERAL PUBLIC LICENSE Version 3