Heuristic based boilerplate removal tool
-
Updated
May 9, 2024 - Python
Heuristic based boilerplate removal tool
Locally saves webpages to your hard disk with images, css, js & links as is.
Dedoc is a library (service) for automate documents parsing and bringing to a uniform format. It automatically extracts content, logical structure, tables, and meta information from textual electronic documents. (Parse document; Document content extraction; Logical structure extraction; PDF parser; Scanned document parser; DOCX parser; HTML parser
Parse SEC EDGAR HTML documents into a tree of elements that correspond to the visual (semantic) structure of the document.
Easy way for HTML parsing and building XPath
Fast Indexed python HTML parser which builds a DOM node tree, providing common getElementsBy* functions for scraping, testing, modification, and formatting. Also XPath.
This project allows you to convert your YouTube watch history HTML file from Google Takeout into a CSV file that can be used by the universalscrobbler.com to Scrobble manually in bulk.
YiraBot: Simplifying Web Scraping for All. A user-friendly tool for developers and enthusiasts, offering command-line ease and Python integration. Ideal for research, SEO, and data collection.
✅ Parse your browser's exported HTML bookmark file to Markdown.
A Python library for loading data from various formats into PostgreSQL databases.
A script to parse the saved Humble Bundle library HTML
Lightweight HTML/XML parser for quick and dirty web scraping.
A Work-In-Progress Discord bot based on the largely popular Touhou series by ZUN.
Сбор данных из реестра российского ПО с сайта https://reestr.minsvyaz.ru
Multipage Streamlit app that brings together several html data extraction tools.
Python Script to extract college names from UGC, India website.
Python webscraping module for NCAA Basketball Stats
this script can analyze number of telegram messages by time
Python scraper for TotalWine.com data 🍷
Add a description, image, and links to the html-parser topic page so that developers can more easily learn about it.
To associate your repository with the html-parser topic, visit your repo's landing page and select "manage topics."