Crunchy

This is Crunchy my experiment over some fancy technologies for learning purpose. Crunchy is scrapper for crunchbase.com.

Background story

I usually surf a lot on crunchbase for looking into which companies are doing what and follow progress. But it wasn't very convenient experience because of Bloatware ui. Web isn't that good any more.web should be Simple. Just like geo hotz's blog. so I got an idea to create clone frontend for crunchbase. but that comes with chaos. I wrap up my slives. and created a simple scrapper for crunchbase pushed data over kafka and made queue out of rabbitmq. and here it is crunchy a simple scrapper with rabbitmq

Pending Work

tor
proxy servers

I wasn't able to add proxy. since crunchbase uses cloudflare for security. Tried Couple of free proxy list and tor but cloudflare blocks all of these so i am left with isp's dhcp. lol i disconnect and reconnect to get new ip and using multiple isps this way it works and i get lots of new company list on my preference. If someone is interested please give good suggestion to use tor with crunchbase. since tor browser works without a problem with crunchbase. i say it is possible. but needs some digging.

Structure

Component	Tech
Frontend	NextJS
Design System	Tailwind ( All time fav )
Backend	Django
Database	MongoDB
Scrapper	Scrapy with playwright
Data Pipeline	Kafka
Queue	Rabbitmq
Deployment	docker-compose

Configurations

Entire code base is configurable using .env. if you want to port crunchy for other websites. just change spider and it will work out of the box. architecture is super flexible

Caution!

Crunchy doesn't respect robot.txt instead scheduler handles all the blocking mechanism. Please respect website owners !

Name		Name	Last commit message	Last commit date
Latest commit History 113 Commits
Chat		Chat
CrunchyCrawler		CrunchyCrawler
CrunchyRest		CrunchyRest
Jenkins		Jenkins
Ui		Ui
fluentd		fluentd
.dockerignore		.dockerignore
.env.example		.env.example
.eslintrc.cjs		.eslintrc.cjs
.gitignore		.gitignore
.node-version		.node-version
.pre-commit-config.yaml		.pre-commit-config.yaml
.python-version		.python-version
LICENSE		LICENSE
Readme.md		Readme.md
docker-compose.prod.yml		docker-compose.prod.yml
docker-compose.test.yml		docker-compose.test.yml
pull.sh		pull.sh
recreate.sh		recreate.sh
test.py		test.py
up.sh		up.sh

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Repository files navigation

Crunchy

Background story

Pending Work

Structure

Configurations

Caution!

About

Releases

Packages

Languages

License

Robokishan/Crunchy

Folders and files

Latest commit

History

Repository files navigation

Crunchy

Background story

Pending Work

Structure

Configurations

Caution!

About

Topics

Resources

License

Stars

Watchers

Forks

Releases

Packages 0

Languages

Packages