Skip to content

LeBonScrap is a spider which collect data from Leboncoin.fr, crawl all the pagination links to scrap every ads of the list from one search result of the real-estate category.

License

Notifications You must be signed in to change notification settings

wbwlkr/lebonscrap

Folders and files

NameName
Last commit message
Last commit date

Latest commit

 

History

11 Commits
 
 
 
 
 
 
 
 
 
 

Repository files navigation

LeBonScrap

Description

LeBonScrap is a spider which collect data from Leboncoin.fr, a french portal for selling new and second hand goods throughout the whole country.

The spider will crawl all the pagination links to scrap every ads of the list from one search result of the real-estate category.

To extract the data, LeBonScrap uses the open source and collaborative framework Scrapy.

Installation

To download the script, type the code below in a shell :

git clone git@github.com:wbwlkr/lebonscrap.git

Getting started

Run the lebonscrap.py spider using the runspider command:

scrapy runspider lebonscrap.py -o data.json

For each ads,the data related to the following columns will be written in a json file or csv:

'Url':
'Titre'
'Prix'
'Surface'
'GES'
'Classe énergie'
'Auteur'
'Téléphone'
'Remarques'

Requirements

  • Python3
  • Scrapy==1.4.0

Author

License

This project is licensed under the MIT License - see the LICENSE.md file for details.

About

LeBonScrap is a spider which collect data from Leboncoin.fr, crawl all the pagination links to scrap every ads of the list from one search result of the real-estate category.

Topics

Resources

License

Stars

Watchers

Forks

Releases

No releases published

Packages

No packages published

Languages