Image Scrapper API

Image Scrapper scraps a given link and gets images and stores them in the database, later can be queried, not only the image is saved but also meta data [image format, height, width, size etc] are stored

Libraries 🌩️

Django: To create backend along with ORM
Django Rest Framework: To create restful API
Beautifulsoup: For web scrapping and collecting images
requests: To handle external URL handling
Pillow: Image processing
Celery: Background async task handling
drf_yasg: Swagger API Docs
djangorestframework-simplejwt[crypto] For JWT Authentication

How To Get Started 🎇

Get started with the project, install dependencies and run the project. View /api-docs for API Documentation

Clone this repo 👮

git clone

Activate virtualenv 🅰️

virtualenv venv

For windows 🪟

E:/image-scrapper/venv/scripts/activate

For MacOS/Linux 🍏

source venv/bin/activate

Install Requirements 🔨️

pip install -r requirements.txt

Test Application 🧪

python manage.py test

Run server

python manage.py runserver

Docker 🐋

If you have docker and want to use docker

Build Docker Image

docker build --tag scrapper-api .

Run Docker Image

docker publish scrapper-api 8000:8000

Dev Docs 📑

Important!

After running the server, you can visit /api-docs to get a swagger interface, you can interfere with the api from api-docs/, it will automatically provide a beautiful UI and interface to test the API

OR

Use /redoc for Redoc Documentation

API Docs 📑

URL

URL API

⭐ /api/url

Takes URL, and returns List of images scrapped from the URL, Saves those images along with meta data in the database

POST

/api/url/

Payload

{
  "url": "https://example.com"
}

Response Sample

Status Code: 200

[
  {
    "id": 0,
    "image_url": "https://example.com/api/image/0",
    "image_name": "string",
    "parent_url": {
      "id": 0,
      "link": "https://example.com"
    },
    "original_url": "https://example.com",
    "height": 0,
    "width": 0,
    "mode": "string",
    "format": "string",
    "created": "2019-08-24T14:15:22Z",
    "updated": "2019-08-24T14:15:22Z"
  }
]

Image

⭐Image Details API

Returns Metadata and Image Link, if given a valid image id

GET

/api/images/details/{:id}

Payload

{
  "url": "https://example.com"
}

Response Sample

Status Code: 200

{
    "id": 0,
    "image_url": "https://example.com/api/image/0",
    "image_name": "string",
    "parent_url": {
      "id": 0,
      "link": "https://example.com"
    },
    "original_url": "https://example.com",
    "height": 0,
    "width": 0,
    "mode": "string",
    "format": "string",
    "created": "2019-08-24T14:15:22Z",
    "updated": "2019-08-24T14:15:22Z"
  }

⭐Image Details API

Returns Metadata and Image Link, if given a valid image id

DELETE

/api/images/details/{:id}

Payload

{
  "url": "https://example.com"
}

Response Sample

Status Code: 204

⭐Image List API

Returns List of saved Metadata and Image Link, if given a valid Parent URL(The URL that was used to scrape the images)

POST

/api/images/list/

Payload

{
  "url": "https://example.com"
}

Response Sample

Status Code: 200

{
    "id": 0,
    "image_url": "https://example.com/api/image/0",
    "image_name": "string",
    "parent_url": {
      "id": 0,
      "link": "https://example.com"
    },
    "original_url": "https://example.com",
    "height": 0,
    "width": 0,
    "mode": "string",
    "format": "string",
    "created": "2019-08-24T14:15:22Z",
    "updated": "2019-08-24T14:15:22Z"
  }

⭐Image Query API

Returns List of saved Metadata and Image Link, if given a valid Original Image URL

POST

/api/images/query/

Payload

{
  "url": "https://example.com/image/image.jpeg" // Original Image URL
}

Response Sample

Status Code: 200

{
    "id": 0,
    "image_url": "https://example.com/api/image/0",
    "image_name": "string",
    "parent_url": {
      "id": 0,
      "link": "https://example.com"
    },
    "original_url": "https://example.com",
    "height": 0,
    "width": 0,
    "mode": "string",
    "format": "string",
    "created": "2019-08-24T14:15:22Z",
    "updated": "2019-08-24T14:15:22Z"
  }

⭐Image Restore API

Deletes all previous images and re-scrape and restore them

POST

/api/images/query/

Payload

{
  "url": "https://example.com" // Original Image URL
}

Response Sample

Status Code: 200

{
    "id": 0,
    "image_url": "https://example.com/api/image/0",
    "image_name": "string",
    "parent_url": {
      "id": 0,
      "link": "https://example.com"
    },
    "original_url": "https://example.com",
    "height": 0,
    "width": 0,
    "mode": "string",
    "format": "string",
    "created": "2019-08-24T14:15:22Z",
    "updated": "2019-08-24T14:15:22Z"
  }

Routing Docs 🌐

Image View

⭐ Image Preview

Image Preview Route, it will send image content to the client

GET

image/<id:int>

Query Parameters

Parameter	Type	Default	Options
width	integer/string	Image default width	`small`, `medium`, `large`
height	integer	Image default height	`small`, `medium`, `large`
quality	integer	100	Any number between 1 to 100
format	string	Image Default	"gif", "png", "jpeg", "jpg", "bmp", "webp"

Note: If height and width both are given, only width will work to maintain aspect ratio

Example

/image/2?width=small

or

/image/2?width=678

External

Workflow

There are 2 main endpoints, one scraps through the link to get images and store them, the second one sends image queried by id.

How the scraping and saving works?

When the api is called, request library uses the link the get HTTP Response content, then bs4 module parses the HTTP response and gets <img/> tag from it and get the image sources/links. Then these links are passed to Pillow Image module to parse the image and get metadata, then the image is saved in the storage and metadata are stored in the database. The image is given an id. Later the image can be queried through ID

The model schema of Image

id: Integer
image: String | Image location in the file system
parent_url: String | The URL that was scrapped to get the image
original_url: String | Original URL of the image
height: Integer | Image height
width: Integer | Image width
mode: String | Image color mode
format: String | Image format

Name		Name	Last commit message	Last commit date
Latest commit History 132 Commits
.github		.github
scrapper		scrapper
.dockerignore		.dockerignore
.gitattributes		.gitattributes
.gitignore		.gitignore
.pre-commit-config.yaml		.pre-commit-config.yaml
Dockerfile		Dockerfile
Procfile		Procfile
README.md		README.md
docker-compose.yml		docker-compose.yml
manage.py		manage.py
pyproject.toml		pyproject.toml
requirements.txt		requirements.txt
setup.cfg		setup.cfg

khan-asfi-reza/image-scrapper

Folders and files

Latest commit

History

Repository files navigation

Image Scrapper API

Libraries 🌩️

How To Get Started 🎇

Clone this repo 👮

Activate virtualenv 🅰️

Install Requirements 🔨️

Test Application 🧪

Run server

Docker 🐋

Build Docker Image

Run Docker Image

Dev Docs 📑

Important!

API Docs 📑

URL

URL API

⭐ /api/url

/api/url/

Payload

Response Sample

Image

⭐Image Details API

/api/images/details/{:id}

Payload

Response Sample

⭐Image Details API

/api/images/details/{:id}

Payload

Response Sample

⭐Image List API

/api/images/list/

Payload

Response Sample

⭐Image Query API

/api/images/query/

Payload

Response Sample

⭐Image Restore API

/api/images/query/

Payload

Response Sample

Routing Docs 🌐

Image View

⭐ Image Preview

image/<id:int>

Query Parameters

Example

External

About

Resources

Stars

Watchers

Forks

Releases

Packages 0

Languages

Packages