Instagram Scrapy Scraper

Some scrapy spiders useful to crawl instagram posts using public APIS (No TOKEN)

Requirements

Python
Scrapy

Spiders

hashtag (crawl all the post given a hashtag)

Usage

scrapy crawl hashtag

Use the -L INFO to avoid a lot of debug messages

Output

The scraper put its files under the scraped directory

hashtag spider

File are put under scraped/hashtag/hashtagname, by date and hour of the day. This is because if you execute the crawler multiple times in the same hour the output is appended. Files contains a Json for each line.

For example :

{"id": "1684344684669792291", "shortcode": "Bdf_pERD3Aj", "caption": "\"Non c'\u00e8 amore pi\u00f9 sincero di quello per il cibo\". Panino caldo e croccante con caciocavallo, zucchine grigliate e pesto di pomodori secchi. \ud83d\ude0b #myferrara #labellaferrara #volgoitalia #volgoemiliaromagna #volgoferrara #igersferrara #volgosapori #italia_in_grande #centrostorico #iconsigliati #visitferrara #cibobuono #cosebuone #qualit\u00e0 #passione #genuinit\u00e0 #freschezza #tagsforlikes", "display_url": "https://instagram.ffco2-1.fna.fbcdn.net/vp/71a6e1bc5183bbd9b1339f064b2bb1b9/5B231526/t51.2885-15/e35/25021917_402106523561566_9076772742774128640_n.jpg", "loc_id": 0, "loc_name": "", "owner_id": "5655088891", "owner_name": "tipicoh_ferrara", "taken_at_timestamp": 1515009554, "comments": 0, "likes": 27, "hashtags": ["#myferrara", "#labellaferrara", "#volgoitalia", "#volgoemiliaromagna", "#volgoferrara", "#igersferrara", "#volgosapori", "#italia_in_grande", "#centrostorico", "#iconsigliati", "#visitferrara", "#cibobuono", "#cosebuone", "#qualità", "#passione", "#genuinità", "#freschezza", "#tagsforlikes"], "mentions": [] }
{"id": "1684875047875260104", "shortcode": "Bdh4O3fhi7I", "caption": "#roma #piazzanavona #pjmasks #sky #ballons #detail #thehub_lazio #lazio_illife #new_photolazio #yallerslazio  #arts_illife #vivolazio #volgolazio #lazio_super_pics #visit_lazio #italiaStyle20 #iconsigliati  #volgoarte #shotz_of_lazio", "display_url": "https://instagram.ffco2-1.fna.fbcdn.net/vp/20459e8002d4a61284bc7e03f0da5f8a/5B0B896A/t51.2885-15/e35/26065496_174719709946144_8885150857012707328_n.jpg", "loc_id": "336844629", "loc_name": "Piazza Navona", "owner_id": "256276180", "owner_name": "clapanama", "taken_at_timestamp": 1515072779, "comments": 3, "likes": 156, "hashtags": ["#roma", "#piazzanavona", "#pjmasks", "#sky", "#ballons", "#detail", "#thehub_lazio", "#lazio_illife", "#new_photolazio", "#yallerslazio", "#arts_illife", "#vivolazio", "#volgolazio", "#lazio_super_pics", "#visit_lazio", "#italiaStyle20", "#iconsigliati", "#volgoarte", "#shotz_of_lazio"], "mentions": []}

License

GNU GENERAL PUBLIC LICENSE Version 3

Name		Name	Last commit message	Last commit date
Latest commit History 12 Commits
scrapy_instagram		scrapy_instagram
.gitignore		.gitignore
README.md		README.md
scrapy.cfg		scrapy.cfg

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Repository files navigation

Instagram Scrapy Scraper

Requirements

Spiders

Usage

Output

hashtag spider

License

About

Releases

Packages

Languages

afarrapeira/instagram-scraper

Folders and files

Latest commit

History

Repository files navigation

Instagram Scrapy Scraper

Requirements

Spiders

Usage

Output

hashtag spider

License

About

Resources

Stars

Watchers

Forks

Releases

Packages 0

Languages

Packages