S3-Compatible-Instagram-Scraper

Overview

This project is an extension of the Instagram scraper built by rarcega.

It is designed to organize the scraped instagram data neatly in AWS S3, according to this structure:

S3_BUCKET_NAME/
|
|-- instagram/
   |-- TARGET_USER
      |-- full-metadata.json: Contains metadata for entire operation
      |-- [POST_ID_X]
         |-- [POST_ID_X].jpg: Image of the post
         |-- summary.json: Key information associated with post
      |-- [POST_ID_Y]
         |-- [POST_ID_Y].jpg
         |-- summary.json
      | ...

Each post by the target instagram user is stored in its own folder.
Each folder contains the image as well as the post's associated metadata.

Getting Started

Prerequisites

These instructions were designed for Ubuntu 18.04.

You will need to create a config.py file with the following contents:

AWS_ACCESS_KEY_ID = [YOUR AWS_ACCESS_KEY_ID]
AWS_SECRET_ACCESS_KEY = [YOUR AWS_SECRET_ACCESS_KEY]
AWS_REGION_NAME = [YOUR AWS_REGION_NAME]
S3_BUCKET_NAME = [YOUR AWS_S3_BUCKET_NAME]
INSTAGRAM_USER_ID = [YOUR INSTAGRAM_USER_ID]
INSTAGRAM_USER_PASSWORD = [YOUR INSTAGRAM_USER_PASSWORD]
TARGET_INSTAGRAM_USER = [YOUR TARGET_INSTAGRAM_USER TO SCRAPE DATA FROM]

A config_template.py file has been provided for your convenience.

Now, follow these instructions to get the variables above.

Lines 1-3 relating to AWS.
Line 4 relating to AWS S3.
Lines 5-7 are self-explanatory. The TARGET_INSTAGRAM_USER refers to the name of the user you intend to scrape data from.

NOTE: Your userId and password are required to scrape data from private users followed by you.

Installation

Clone this repository.

git clone https://github.com/Jordan396/S3-Compatible-Instagram-Scraper.git
cd S3-Compatible-Instagram-Scraper/

Create a venv and activate it.

python3 -m venv venv
source venv/bin/activate

Install dependencies.
```
pip install -r requirements.txt
```
Add your config.py above to the base directory.
Start scraping!
```
python scrape.py
```
Navigate to your S3 bucket to view the scraped data.

Name		Name	Last commit message	Last commit date
Latest commit History 14 Commits
instagram_scraper		instagram_scraper
metadata		metadata
.gitignore		.gitignore
LICENSE.md		LICENSE.md
README.md		README.md
_config.yml		_config.yml
config_template.py		config_template.py
instagram-scraper.log		instagram-scraper.log
main.py		main.py
requirements.txt		requirements.txt
scrape.py		scrape.py

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Repository files navigation

S3-Compatible-Instagram-Scraper

Overview

Getting Started

Prerequisites

Installation

About

Releases

Packages

Languages

License

Jordan396/S3-Compatible-Instagram-Scraper

Folders and files

Latest commit

History

Repository files navigation

S3-Compatible-Instagram-Scraper

Overview

Getting Started

Prerequisites

Installation

About

Topics

Resources

License

Stars

Watchers

Forks

Releases

Packages 0

Languages

Packages