🚩🟨🚩 This repository has been moved to everything-web-scraping 🚩🟨🚩
If you found an outdated link to this that I can update, file an issue :)
If you found an outdated link to this that I can update, file an issue :)
Welcome to the course! Glad you're here :)
Consider checking out the video for this introduction here, this video just provides the slides with commentary, later lessons are more high quality.
None so far
I'm David Teather and I work as a software engineer and my specialty is data extraction.
If you'd like a more visual experience check out the introduction video on YouTube, or pull up the introduction slides
- My research on YikYak (a social media app) that was featured in Vice and The Verge
- Creating various data extraction tools
- My most popular is TikTokApi
- 600K+ Downloads
- 2.3K+ Stars
- My most popular is TikTokApi
- Learners will understand the many different ways websites prevent web scraping
- Learners will be able to reverse engineer a real-world website for data extraction
- Real website examples
- Although these websites might change over time and the lesson becomes broken
- Websites I've created for this course
- Will not change to ensure that these lessons don't break
- Each lesson will have a hands on activity
- In addition most modules will have a
submission.py
file that you can create functions related to the lesson concept and run it against a test suite - These will primarily focused on extracting data from the websites created for this course
- In addition most modules will have a
- Everybody learns different so these are guidelines
- Take notes from the slides presented in the videos
- These will revolve around general concepts
- Will be accompanied by programs to write
- Try the activities before watching the solution in the video
- Treat the website folder as a black box, like you would a real website, you can figure out everything through the website itself
- Forging API requests
- Proxies
- Captchas
- Storing data at scale
- Emulating human behavior
- And more
- Feel free to tweet at me or file an issue with the
lesson-request
label with what you'd like to see
- Feel free to tweet at me or file an issue with the
Learn how to get started learning with this course!
- A basic understanding of programming
- Recommended
- Some python experience
- We probably won't do much complex python
- Some python experience
- Docker
- And docker-compose (should be bundled)
- Python
- I'll be using 3.10
- A web browser
- I'll be using Brave (chromium based)
- Doesn't really matter which as long as you can view network traffic
- And the files in this git repo, so be sure to download it! (and maybe give it a star 😉)
Hope you'll enjoy the content in this course! You can either get started with lesson 1, or check out the course catalogue