Michelin My Maps

Context
Disclaimer
Content
Inspiration
Usage
Development
- Selector
- Testing
- Caching
Contributing

Context

At the beginning of the automobile era, Michelin, a tire company, created a travel guide, including a restaurant guide.

Through the years, Michelin stars have become very prestigious due to their high standards and very strict anonymous testers. Michelin Stars are incredibly coveted. Gaining just one can change a chef's life; losing one, however, can change it as well.

The dataset is curated using Go Colly.

Disclaimer

This software is only used for research purposes, users must abide by the relevant laws and regulations of their location, please do not use it for illegal purposes. The user shall bear all the consequences caused by illegal use.

Content

The dataset contains a list of restaurants along with additional details (e.g. address, price range, cuisine type, longitude, latitude, etc.) curated from the MICHELIN Restaurants guide. The culinary distinctions (i.e. the 'Award' column) of the restaurants included are:

3 Stars
2 Stars
1 Star
Bib Gourmand
Selected Restaurants

Content	Link	Description
CSV	CSV	Good'ol comma-separated values
Kaggle	Kaggle	Data science community

Inspiration

Inspired by this Reddit post, my initial intention of creating this dataset is so that I can map all Michelin Guide Restaurants from all around the world on Google My Maps (see an example).

Usage

NOTE Check out the Makefile or run make help.

To crawl, run:

make crawl # go run cmd/mym/mym.go

Alternatively, you can install this directly via go install:

go install github.com/ngshiheng/michelin-my-maps/v2/cmd/mym
rm michelin.db
mym -log debug

Development

Selector

As websites use JavaScript to dynamically generate content, the content may not be present in the initial HTML response. Disabling JavaScript can help you see the underlying HTML structure of the page and make it easier to identify the elements you want to scrape.

To extract relevant information from the site's HTML, we use XPath as our choice of selector language. You can make use of this XPath cheat sheet.

Testing

To run all tests locally, run:

make test # go test ./... -v -count=1

Caching

Caching is enabled by default to avoid hammering the targeted site with too many unnecessary requests during development. After your first run, a cache/ folder (size of ~6GB) will be created. Your subsequent runs should be cached, they should take less than a minute to finish scraping the entire site.

To clear the cache, simply delete the cache/ folder.

Contributing

Pull requests are welcome. For major changes, please open an issue first to discuss what you would like to change.

Fork this
Create your feature branch (git checkout -b feature/bar)
Commit your changes (git commit -am 'feat: add some bar', make sure that your commits are semantic)
Push to the branch (git push origin feature/bar)
Create a new Pull Request

Name		Name	Last commit message	Last commit date
Latest commit History 362 Commits
.github		.github
cmd/mym		cmd/mym
data		data
docker		docker
pkg		pkg
.gitignore		.gitignore
.gitmessage		.gitmessage
LICENSE		LICENSE
Makefile		Makefile
README.md		README.md
go.mod		go.mod
go.sum		go.sum
renovate.json		renovate.json

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Repository files navigation

Michelin My Maps

Context

Disclaimer

Content

Inspiration

Usage

Development

Selector

Testing

Caching

Contributing

About

Releases 76

Sponsor this project

Contributors 6

Languages

License

ngshiheng/michelin-my-maps

Folders and files

Latest commit

History

Repository files navigation

Michelin My Maps

Context

Disclaimer

Content

Inspiration

Usage

Development

Selector

Testing

Caching

Contributing

About

Topics

Resources

License

Stars

Watchers

Forks

Releases 76

Sponsor this project

Contributors 6

Languages