Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

utf-8 characters in bookmark titles cause fatal error #456

Closed
throwawayaccount0153 opened this issue Jun 29, 2020 · 11 comments
Closed

utf-8 characters in bookmark titles cause fatal error #456

throwawayaccount0153 opened this issue Jun 29, 2020 · 11 comments

Comments

@throwawayaccount0153
Copy link

Buku cannot interpret utf8 characters it seems, and currently system locale must be changed in order for buku to not fail fatally. Out of the box I should not have to change my locale in order for buku to work with utf8 symbols in bookmark titles. Bukubrow has no problem interpreting these characters, but buku errors out on the command line because of this, buku should be able to accomplish the same thing that bukubrow is doing without errors. I can switch my locale manually and this works, but no one should be switching their systems default locale, buku should be internally handling this because switching the default system locale breaks other programs. Because buku cannot handle utf-8 by itself, I cannot import some of my bookmarks and must continue to use the firefox bookmark system for those, which I would ideally completely abandon for full time buku usage.

tldr; Buku should handle utf-8 character interpretation internally, this functionality should not be offloaded to the system.

@rachmadaniHaryono
Copy link
Collaborator

can you share the error example?

@throwawayaccount0153
Copy link
Author

can you share the error example?

Traceback (most recent call last):
File "/home/debian/python/bin/buku", line 11, in
load_entry_point('buku==4.4', 'console_scripts', 'buku')()
File "/home/debian/python/lib/python3.7/site-packages/buku.py", line 5471, in main
bdb.print_rec(0)
File "/home/debian/python/lib/python3.7/site-packages/buku.py", line 1728, in print_rec
print_rec_with_filter(resultset, self.field_filter)
File "/home/debian/python/lib/python3.7/site-packages/buku.py", line 4140, in print_rec_with_filter
print_single_rec(row)
File "/home/debian/python/lib/python3.7/site-packages/buku.py", line 4211, in print_single_rec
print(''.join(str_list))
UnicodeEncodeError: 'latin-1' codec can't encode character '\u2013' in position 60: ordinal not in range(256)

@rachmadaniHaryono
Copy link
Collaborator

when you run this, does the exact error happened

>>> import buku
>>> buku.print_single_rec((1, 'http://example.com', u'\u2013', ',tags1,', 'randomdesc', 0))

@jarun
Copy link
Owner

jarun commented Jun 30, 2020

Please refer to this issue: jarun/googler#131

Confirm that your locale is UTF8 OR you are setting it to UTF8 when you run buku.

@throwawayaccount0153
Copy link
Author

throwawayaccount0153 commented Jun 30, 2020

when you run this, does the exact error happened

>>> import buku
>>> buku.print_single_rec((1, 'http://example.com', u'\u2013', ',tags1,', 'randomdesc', 0))
Python 3.7.3 (default, Dec 20 2019, 18:57:59) 
[GCC 8.3.0] on linux
Type "help", "copyright", "credits" or "license" for more information.
>>> import buku
>>> buku.print_single_rec((1, 'http://example.com', u'\u2013', ',tags1,', 'randomdesc', 0))
Traceback (most recent call last):
  File "<stdin>", line 1, in <module>
  File "/home/debian/python/lib/python3.7/site-packages/buku.py", line 4211, in print_single_rec
    print(''.join(str_list))
UnicodeEncodeError: 'latin-1' codec can't encode character '\u2013' in position 3: ordinal not in range(256)

@throwawayaccount0153
Copy link
Author

Please refer to this issue: jarun/googler#131

Confirm that your locale is UTF8 OR you are setting it to UTF8 when you run buku.

I'm still not clear, why are we talking about changing the locale on my machine? Why can buku not interpret utf8 internally? This should work out of the box.

@jarun
Copy link
Owner

jarun commented Jun 30, 2020

I'm still not clear, why are we talking about changing the locale on my machine?

Did you try:

alias buku='LC_ALL="en_GB.UTF-8" buku'

You can also set PYTHONIOENCODING=utf-8

@zmwangx is there a way to detect and do this automatically? Would it even work?

@throwawayaccount0153
Copy link
Author

I'm still not clear, why are we talking about changing the locale on my machine?

Did you try:

alias buku='LC_ALL="en_GB.UTF-8" buku'

You can also set PYTHONIOENCODING=utf-8

@zmwangx is there a way to detect and do this automatically? Would it even work?

Yes, I already mentioned changing my local locale works, but this is not a real solution. No one that uses buku should be changing their machines locale. even temporarily. Buku should be able to detect this automatically or set it internally when run. This is what my submitted issue is about, it's not about setting my machines locale.

@jarun
Copy link
Owner

jarun commented Jun 30, 2020

Can you confirm if the following patch works?

diff --git a/buku b/buku
index cb31612..17654a7 100755
--- a/buku
+++ b/buku
@@ -4211,6 +4211,8 @@ def print_single_rec(row, idx=0):  # NOQA
 
     try:
         print(''.join(str_list))
+    except UnicodeEncodeError:
+        sys.stdout.buffer.write((''.join(str_list) + '\n').encode('utf-8'))
     except BrokenPipeError:
         sys.stdout = os.fdopen(1)
         sys.exit(1)

@throwawayaccount0153
Copy link
Author

Can you confirm if the following patch works?

diff --git a/buku b/buku
index cb31612..17654a7 100755
--- a/buku
+++ b/buku
@@ -4211,6 +4211,8 @@ def print_single_rec(row, idx=0):  # NOQA
 
     try:
         print(''.join(str_list))
+    except UnicodeEncodeError:
+        sys.stdout.buffer.write((''.join(str_list) + '\n').encode('utf-8'))
     except BrokenPipeError:
         sys.stdout = os.fdopen(1)
         sys.exit(1)

I've testing this and it works fine!

@jarun jarun closed this as completed in 9609bc0 Jun 30, 2020
@github-actions github-actions bot locked and limited conversation to collaborators Jul 31, 2020
@jarun
Copy link
Owner

jarun commented May 14, 2021

@throwawayaccount0153 can you please confirm you don't see this issue again with commit a4ee497 ?

Please respond in the ToDo list as this thread is auto-locked.

Sign up for free to subscribe to this conversation on GitHub. Already have an account? Sign in.
Labels
None yet
Projects
None yet
Development

No branches or pull requests

3 participants