Fashion Recommend Mall Crawler

Project Description

This program crawls fashion items from the four sites.

First, get the items introduction page.

Second, go to each item page and crawl the title, image, reviews, and price using celery

Third, categorize image style using fashion recommend mall deep learning server

After all, save all data in the Mongo DB atlas

키작녀	키작남	소녀나라	고고싱

Final data form

{
  "site" : crawling site,
  "category" : item category(top, pants ...),
  "title" : item title,
  "image_link" : item image link,
  "price" : item price,
  "reviews" : item review kewords(list[str]),
  "style" : item stytle(first, second)
}

System Structure

Folder Structure

How to run

Requirement

Docker : 20.10.16

Project Setting

# First, Set like below in project_setting.py

url_setting = {
    "mongo_db_url" : "mongo db atlas url",
    "selenium_url" : "driver path",
    "deep_learning_server_url" : "deep learning server url"
}

tool_setting = {
    "database_driver" : MongodbContextManager,
    "web_driver" : SeleniumContextManager
}

celery_broker_url = {
    "celery_broker_url" : "celery broker url"
}

Build Setup

// setting celery worker
$ sudo git clone {this repo}
$ sudo docker build -t crawler .
$ sudo docker run -d crawler

// run app.py
python3 app.py

What I learn

Celery

Celery is an asynchronous task queue and multi-task processing method of queuing a series of tasks. Celery is often used when converting and storing files on a synchronously performed web or performing heavy tasks such as file uploads.

Features

Asynchronous task queue, capable of scheduling but focused on real-time processing.
Synchronous/Asynchronous processing is possible.
The unit of work is called Task, and the worker is called Worker.
Use a message broker like RabbitMQ, or Redis.

How to install

pip3 install celery

How to run

celery -A tasks worker -l INFO

First step

"""
This code is default using of celery
"""
import time
import random

import celery

# Set celery like blew
app = celery.Celery{
	'tasks',
	broker='pyamqp://broker-url',
	backend='pyamqp://backend-url'
}

# Add anotation app.task
@app.task
def build_server():
		print('wait 10 sec')
		time.sleep(10)
		server_id = random.randit(1,10)
		return server_id

Group

"""
Group handles celery tasks by grouping
"""
@app.task
def build_servers():
		# call celery group and set tasks parameter
		g = celery.group(
				build_server.s() for _ in range(10))
		return g()

Chord

"""
Chord runs a callback task after all group tasks done
"""
@app.task
def callback(result):
		for server_id in result:
				print(server_id)
		print("done")
		return "done"

@app.task
def build_server_with_callback():
		c = celery.chord(
				# run group task
				build_server.s() for _ in range(10),
				# After group task, run callbak
				callback.s()
		)
		return c()

Chain

"""
Chain connects each task.
"""
@app.task
def setup_dns(server_id):
		print(server_id)
		return "done"

@app.task
def deploy_costomer_server():
		chain = build.server.s() | setup_dns.s()
		return chain()

Chord vs Chain

Chord is connecting Goup and Callback. However Chain is connecting each tasks!

Chord performs a synchronization to connect Group and Callback. this process uses a lot of resource.

So use chains instead of chords as possible

Name		Name	Last commit message	Last commit date
Latest commit History 38 Commits
resource		resource
src		src
.gitignore		.gitignore
Dockerfile		Dockerfile
Makefile		Makefile
README.md		README.md
app.py		app.py
project_setting.py		project_setting.py

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Repository files navigation

Fashion Recommend Mall Crawler

Project Description

Final data form

System Structure

Folder Structure

How to run

Requirement

Project Setting

Build Setup

What I learn

Celery

Features

How to install

How to run

First step

Group

Chord

Chain

Chord vs Chain

About

Releases

Packages

Languages

gimseonjin/fashion-recommend-mall-crawler

Folders and files

Latest commit

History

Repository files navigation

Fashion Recommend Mall Crawler

Project Description

Final data form

System Structure

Folder Structure

How to run

Requirement

Project Setting

Build Setup

What I learn

Celery

Features

How to install

How to run

First step

Group

Chord

Chain

Chord vs Chain

About

Resources

Stars

Watchers

Forks

Releases

Packages 0

Languages

Packages