Skip to content

Algorithm that converts an HTML to a vectorized object suitable for neural networks.

Notifications You must be signed in to change notification settings

MartinCastroAlvarez/html2vec

Folders and files

NameName
Last commit message
Last commit date

Latest commit

 

History

7 Commits
 
 
 
 
 
 
 
 
 
 
 
 

Repository files navigation

html2vec

Algorithm that converts an HTML to a vectorized object suitable for neural networks, including sequential models.

wallpaper architecture

Instructions

Installing dependencies.

virtualenv -p python3 .env
source .env/bin/activate
pip install -r requirements.txt
python -m spacy download en_core_web_md

Vectorizing HTML from the CLI

python3 html2vec.py "https://hippie-inheels.com/3-day-new-orleans-itinerary/"

Vectorizing HTML inside a Python script.

from html2vec import Html2Vec

model: Html2Vec = Html2Vec()
model.relatives = 5
for node in model.fit(html):
    print(node)
    print(node.get_vector())

About

Algorithm that converts an HTML to a vectorized object suitable for neural networks.

Topics

Resources

Stars

Watchers

Forks

Releases

No releases published

Packages

No packages published

Languages