Skip to content

Separate project for HTML cleaning functionalities copied from lxml.html.clean.

License

Notifications You must be signed in to change notification settings

fedora-python/lxml_html_clean

Repository files navigation

lxml_html_clean

Motivation

This project was initially a part of lxml. Because HTML cleaner is designed as blocklist-based, many reports about possible security vulnerabilities were filed for lxml and that make the project problematic for security-sensitive environments. Therefore we decided to extract the problematic part to a separate project.

Installation

You can install this project directly via pip install lxml_html_clean or soon as an extra of lxml via pip install lxml[html_clean]. Both ways installs this project together with lxml itself.

Documentation

https://lxml-html-clean.readthedocs.io/

License

BSD-3-Clause