You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
Hello
I'm trying to read data from epubs I downloaded from the web.
I'm just interested in the text, I don't care about images or styles
Would it be possible to add a media_type_filter option and only load the specified types from the manifest?
I imagine something along the lines of, in epub.EpubReader._load_manifest
Just to be transparent: this idea originates from an error I keep getting when reading some epubs
KeyError: "There is no item named 'styles/3.ttf' in the archive"
This error originates from the epub rather than from ebooklib: opening the file with Atom shows that indeed there is no styles/3.ttf (there is a fonts/3.ttf).
I don't want to throw away the whole epub just because it cannot read the styles, so ideally I could just skip reading them
This should also make the process quicker.
But I'm no expert in EPUB, so maybe this is not a good idea 😓
Good point. Everything fails now if EPUB claims to have something which is really missing in the archive. One option would be for the EpubReader. Something like fail silently. The other one would be like you suggested - list of things to ignore/allow.
Hello
I'm trying to read data from
epubs
I downloaded from the web.I'm just interested in the text, I don't care about images or styles
Would it be possible to add a
media_type_filter
option and only load the specified types from the manifest?I imagine something along the lines of, in
epub.EpubReader._load_manifest
And the
media_type_filter
would just be a list I pass in as optionsThe text was updated successfully, but these errors were encountered: