Skip to content

Latest commit

 

History

History
14 lines (8 loc) · 401 Bytes

README.md

File metadata and controls

14 lines (8 loc) · 401 Bytes

Web Parser

This script allows you to crawl a webpage, extract specific .json links, and parse the filenames to save them as a list of token IDs in a JSON format. The script uses BeautifulSoup to parse the HTML content of the web page and saves the output files in the data folder.

Requirements

  • Python 3.7 or higher
  • BeautifulSoup4
  • requests

Installation

  1. Clone the repository.