Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

html2text >=2024.2.25 regression breaks rss2email 3.14 tests? #412

Closed
hartwork opened this issue Mar 8, 2024 · 7 comments
Closed

html2text >=2024.2.25 regression breaks rss2email 3.14 tests? #412

hartwork opened this issue Mar 8, 2024 · 7 comments

Comments

@hartwork
Copy link

hartwork commented Mar 8, 2024

  • Version by html2text --version
    2024.2.25 or 2024.2.26
  • Test script
    See details below
  • Python version python --version
    3.10 (but doesn't seem to matter)

Hi!

It has come to my attention that the tests of rss2email 3.14 started failing with html2text >=2024.2.25, and from quick look the breakage seems like a regression in html2text where a single space is now produced as a mistaken(?) double space. Here's the full reproducer:

# cd "$(mktemp -d)"

# git clone https://github.com/rss2email/rss2email/

# cd rss2email/

# git checkout v3.14

# python3 --version
Python 3.10.13

# poetry --version 2>/dev/null
Poetry (version 1.8.2)

# poetry install

# sed 's,python = .*,python = "^3.8",' -i pyproject.toml

# poetry add html2text==2024.2.25

# poetry show 2>/dev/null
feedparser       6.0.5     Universal feed parser, handles RSS 0.9x, RSS 1.0, RSS 2.0, CDF, Atom 0.3, and Atom 1.0 feeds
html2text        2024.2.25 Turn HTML into equivalent Markdown-structured text.
sgmllib3k        1.0.0     Py3k port of sgmllib.
update-copyright 0.6.2     Automatically update copyright blurbs in versioned source.

# ( cd test && poetry run python -m unittest )
EEEEEEE...................................
======================================================================
ERROR: test_email_data/allthingsrss/1.config (test.TestEmails)
----------------------------------------------------------------------
Traceback (most recent call last):
  File "/tmp/tmp.ax92e0Em2M/rss2email/test/test.py", line 62, in fn
    self.run_single_test(config_path=test_path)
  File "/tmp/tmp.ax92e0Em2M/rss2email/test/test.py", line 154, in run_single_test
    raise ValueError(
ValueError: error processing data/allthingsrss/1.config
--- expected
+++ generated
@@ -74,7 +74,7 @@
 
 Through `r2e pause _n_` where _n_ is a feed number, you can temporarily
 suspend checking that feed for new content. To start checking it again, simply
-run `r2e unpause _n_`. When you `r2e list`, an asterisk indicates that the
+run `r2e unpause  _n_`. When you `r2e list`, an asterisk indicates that the
 feed is currently unpaused and active.
 
 [![](http://feedads.g.doubleclick.net/~a/nYgTsIUsS9pmvRZ6092XGGHnNKg/0/di)](http://feedads.g.doubleclick.net/~a/nYgTsIUsS9pmvRZ6092XGGHnNKg/0/da)  

======================================================================
ERROR: test_email_data/allthingsrss/2.config (test.TestEmails)
----------------------------------------------------------------------
Traceback (most recent call last):
  File "/tmp/tmp.ax92e0Em2M/rss2email/test/test.py", line 62, in fn
    self.run_single_test(config_path=test_path)
  File "/tmp/tmp.ax92e0Em2M/rss2email/test/test.py", line 154, in run_single_test
    raise ValueError(
ValueError: error processing data/allthingsrss/2.config
--- expected
+++ generated
@@ -74,7 +74,7 @@
 
 through `r2e pause _n_` where _n_ is a feed number, you can temporarily
 suspend checking that feed for new content. to start checking it again, simply
-run `r2e unpause _n_`. when you `r2e list`, an asterisk indicates that the
+run `r2e unpause  _n_`. when you `r2e list`, an asterisk indicates that the
 feed is currently unpaused and active.
 
 [![](http://feedads.g.doubleclick.net/~a/nygtsiuss9pmvrz6092xgghnnkg/0/di)](http://feedads.g.doubleclick.net/~a/nygtsiuss9pmvrz6092xgghnnkg/0/da)  

======================================================================
ERROR: test_email_data/allthingsrss/3.config (test.TestEmails)
----------------------------------------------------------------------
Traceback (most recent call last):
  File "/tmp/tmp.ax92e0Em2M/rss2email/test/test.py", line 62, in fn
    self.run_single_test(config_path=test_path)
  File "/tmp/tmp.ax92e0Em2M/rss2email/test/test.py", line 154, in run_single_test
    raise ValueError(
ValueError: error processing data/allthingsrss/3.config
--- expected
+++ generated
@@ -76,7 +76,7 @@
 
 Through `r2e pause _n_` where _n_ is a feed number, you can
 temporarily suspend checking that feed for new content. To start
-checking it again, simply run `r2e unpause _n_`. When you `r2e list`,
+checking it again, simply run `r2e unpause  _n_`. When you `r2e list`,
 an asterisk indicates that the feed is currently unpaused and active.
 
 [![](http://feedads.g.doubleclick.net/~a/nYgTsIUsS9pmvRZ6092XGGHnNKg/0/di)](http://feedads.g.doubleclick.net/~a/nYgTsIUsS9pmvRZ6092XGGHnNKg/0/da)  

======================================================================
ERROR: test_email_data/allthingsrss/4.config (test.TestEmails)
----------------------------------------------------------------------
Traceback (most recent call last):
  File "/tmp/tmp.ax92e0Em2M/rss2email/test/test.py", line 62, in fn
    self.run_single_test(config_path=test_path)
  File "/tmp/tmp.ax92e0Em2M/rss2email/test/test.py", line 154, in run_single_test
    raise ValueError(
ValueError: error processing data/allthingsrss/4.config
--- expected
+++ generated
@@ -76,7 +76,7 @@
 
 Through `r2e pause _n_` where _n_ is a feed number, you can
 temporarily suspend checking that feed for new content. To start
-checking it again, simply run `r2e unpause _n_`. When you `r2e list`,
+checking it again, simply run `r2e unpause  _n_`. When you `r2e list`,
 an asterisk indicates that the feed is currently unpaused and active.
 
 [![](http://feedads.g.doubleclick.net/~a/nYgTsIUsS9pmvRZ6092XGGHnNKg/0/di)](http://feedads.g.doubleclick.net/~a/nYgTsIUsS9pmvRZ6092XGGHnNKg/0/da)  

======================================================================
ERROR: test_email_data/allthingsrss/5.config (test.TestEmails)
----------------------------------------------------------------------
Traceback (most recent call last):
  File "/tmp/tmp.ax92e0Em2M/rss2email/test/test.py", line 62, in fn
    self.run_single_test(config_path=test_path)
  File "/tmp/tmp.ax92e0Em2M/rss2email/test/test.py", line 154, in run_single_test
    raise ValueError(
ValueError: error processing data/allthingsrss/5.config
--- expected
+++ generated
@@ -96,7 +96,7 @@
 
 Through `r2e pause _n_` where _n_ is a feed number, you can
 temporarily suspend checking that feed for new content. To start
-checking it again, simply run `r2e unpause _n_`. When you `r2e list`,
+checking it again, simply run `r2e unpause  _n_`. When you `r2e list`,
 an asterisk indicates that the feed is currently unpaused and active.
 
 [![][4]][5]  

======================================================================
ERROR: test_email_data/allthingsrss/6.config (test.TestEmails)
----------------------------------------------------------------------
Traceback (most recent call last):
  File "/tmp/tmp.ax92e0Em2M/rss2email/test/test.py", line 62, in fn
    self.run_single_test(config_path=test_path)
  File "/tmp/tmp.ax92e0Em2M/rss2email/test/test.py", line 154, in run_single_test
    raise ValueError(
ValueError: error processing data/allthingsrss/6.config
--- expected
+++ generated
@@ -86,7 +86,7 @@
 
 Through `r2e pause _n_` where _n_ is a feed number, you can
 temporarily suspend checking that feed for new content. To start
-checking it again, simply run `r2e unpause _n_`. When you `r2e list`,
+checking it again, simply run `r2e unpause  _n_`. When you `r2e list`,
 an asterisk indicates that the feed is currently unpaused and active.
 
 [![][4]][5]  

======================================================================
ERROR: test_email_data/allthingsrss/7.config (test.TestEmails)
----------------------------------------------------------------------
Traceback (most recent call last):
  File "/tmp/tmp.ax92e0Em2M/rss2email/test/test.py", line 62, in fn
    self.run_single_test(config_path=test_path)
  File "/tmp/tmp.ax92e0Em2M/rss2email/test/test.py", line 154, in run_single_test
    raise ValueError(
ValueError: error processing data/allthingsrss/7.config
--- expected
+++ generated
@@ -74,7 +74,7 @@
 
 Through `r2e pause _n_` where _n_ is a feed number, you can temporarily
 suspend checking that feed for new content. To start checking it again, simply
-run `r2e unpause _n_`. When you `r2e list`, an asterisk indicates that the
+run `r2e unpause  _n_`. When you `r2e list`, an asterisk indicates that the
 feed is currently unpaused and active.
 
 [![](http://feedads.g.doubleclick.net/~a/nYgTsIUsS9pmvRZ6092XGGHnNKg/0/di)](http://feedads.g.doubleclick.net/~a/nYgTsIUsS9pmvRZ6092XGGHnNKg/0/da)  

----------------------------------------------------------------------
Ran 42 tests in 24.403s

FAILED (errors=7)

Related:

Best, Sebastian

@hartwork hartwork changed the title html2text >=2024.2.25 breaks rss2email 3.14 tests? html2text >=2024.2.25 regression breaks rss2email 3.14 tests? Mar 8, 2024
@auouymous
Copy link

echo 'a <em>b</em>' |pyhtml2text produces a _b_ (double spaces) with html2text >=2024.2.25.

@hartwork
Copy link
Author

hartwork commented Mar 8, 2024

@auouymous thanks for the minimal reproducer 👍

@hartwork
Copy link
Author

hartwork commented Mar 8, 2024

@auouymous I'm having trouble reproducing it like that in isolation. Any idea what I'm doing wrong below?:

# cd "$(mktemp -d)"
# git clone https://github.com/Alir3z4/html2text/
# cd html2text/
# git checkout 2024.2.25 
# python3.10 -m venv venv
# source venv/bin/activate
# pip install -e .
# venv/bin/html2text <<<'a <em>b</em>'
a _b_
# ^^ without double spaces

@auouymous
Copy link

@hartwork I just emerged 2024.2.25 and 2024.2.26 on gentoo.

@hartwork
Copy link
Author

hartwork commented Mar 8, 2024

@hartwork I just emerged 2024.2.25 and 2024.2.26 on gentoo.

@auouymous that expains the py in pyhtml2text, I see. Unfortunately, I don't get it this way — host Gentoo with no virtualenvs active — either:

# pyhtml2text --version
2024.2.26

# echo 'a <em>b</em>' |pyhtml2text
a _b_
# ^^ without double spaces

Any ideas?

@auouymous
Copy link

I was able to copy a hidden "c2 a0" sequence from test feed to shell, but it didn't copy from shell to browser when posting here. It was a problem in rss2email and this issue can be closed.

@hartwork
Copy link
Author

hartwork commented Mar 8, 2024

@auouymous thanks for the fix and the update! Closing…

@hartwork hartwork closed this as completed Mar 8, 2024
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

2 participants