Create your very own web scraper and crawler using Go and Colly!
📂 makescraper
├── README.md
└── scrape.go
-
Visit github.com/new and create a new repository named
makescraper
. -
Run each command line-by-line in your terminal to set up the project:
$ git clone git@github.com:Make-School-Labs/makescraper.git $ cd makescraper $ git remote rm origin $ git remote add origin git@github.com:YOUR_GITHUB_USERNAME/makescraper.git $ go mod download
-
Open
README.md
in your editor and replace all instances ofYOUR_GITHUB_USERNAME
with your GitHub username to enable the Go Report Card badge.
Complete each task in the order they appear. Use GitHub Task List syntax to update the task list.
- IMPORTANT: Complete the Web Scraper Workflow worksheet distributed in class.
- Create a
struct
to store your data. - Refactor the
c.OnHTML
callback on line16
to use the selector(s) you tested while completing the worksheet. - Print the data you scraped to
stdout
.
- Add more fields to your
struct
. Extract multiple data points from the website. Print them tostdout
in a readable format.
- Serialize the
struct
you created to JSON. Print the JSON tostdout
to validate it. - Write scraped data to a file named
output.json
. - Add, commit, and push to GitHub.
- BEW 2.5 - Scraping the Web: Concepts and examples covered in class related to web scraping and crawling.
- Colly - Docs: Check out the sidebar for 20+ examples!
- Ali Shalabi - Syntax-Helper: Command line interface to help generate proper code syntax, pulled from the Golang documentation.
- JSON to Struct: Paste any JSON data and convert it into a Go structure that will support storing that data.
- GoByExample - JSON: Covers Go's built-in support for JSON encoding and decoding to and from built-in and custom data types (structs).
- GoByExample - Writing Files: Covers creating new files and writing to them.