Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

[BUG] Unicode characters render improperly on Windows #52

Closed
aakatz3 opened this issue Jul 1, 2020 · 10 comments
Closed

[BUG] Unicode characters render improperly on Windows #52

aakatz3 opened this issue Jul 1, 2020 · 10 comments
Milestone

Comments

@aakatz3
Copy link
Contributor

aakatz3 commented Jul 1, 2020

Windows renders unicode characters improperly, and tends to render   improperly as well. There may be a proper solution for this, but I don't know the root cause, or how to force unicode encoding.

@kvid
Copy link
Collaborator

kvid commented Jul 1, 2020

This might be a duplicate of #22

@aakatz3
Copy link
Contributor Author

aakatz3 commented Jul 4, 2020

It may have the same or a similar root cause. It is likely in GraphViz or in Python, but it could be due to the filetypes or encoding of the input files being different. It's related, but larger. There is a character in there git didn't print (in my original post), it should also say that   renders wrong as well. This is very likely a unicode to ascii, or vice versa issue. It could be indicative of a problem with character handling in general, possibly in a dependency. I haven't had to deal with this before, other than when migrating databases from Windows 9X to Windows 10, so I don't know the root cause.

@formatc1702 formatc1702 added the help wanted Extra attention is needed label Jul 5, 2020
@formatc1702
Copy link
Collaborator

See #22 for (most likely) a specific instance of this problem. Needs help from someone using Windows.

@aakatz3
Copy link
Contributor Author

aakatz3 commented Jul 5, 2020

You can also see it in some of my commits:
image

I will do a check while working on #17 to see if the dev version of graphviz fixes it, and if not, I may have to break out a debugger to debug the way python accesses the disk, so I can look for the root cause

kvid referenced this issue in aakatz3/WireViz Jul 6, 2020
@argabor
Copy link

argabor commented Jul 15, 2020

On windows I get this error message if i use mm2 instead of AWG gauge:
Traceback (most recent call last): File "c:\users\arvai.gabor\appdata\local\programs\python\python38-32\lib\runpy.py", line 193, in _run_module_as_main return _run_code(code, main_globals, None, File "c:\users\arvai.gabor\appdata\local\programs\python\python38-32\lib\runpy.py", line 86, in _run_code exec(code, run_globals) File "C:\Users\arvai.gabor\AppData\Local\Programs\Python\Python38-32\Scripts\wireviz.exe\__main__.py", line 7, in <module> File "c:\users\arvai.gabor\appdata\local\programs\python\python38-32\lib\site-packages\wireviz\wireviz.py", line 235, in main parse(yaml_input, file_out=file_out, generate_bom=args.generate_bom) File "c:\users\arvai.gabor\appdata\local\programs\python\python38-32\lib\site-packages\wireviz\wireviz.py", line 182, in parse harness.output(filename=file_out, fmt=('png', 'svg'), gen_bom=generate_bom, view=False) File "c:\users\arvai.gabor\appdata\local\programs\python\python38-32\lib\site-packages\wireviz\Harness.py", line 224, in output file.write(tuplelist2tsv(bom_list)) File "c:\users\arvai.gabor\appdata\local\programs\python\python38-32\lib\encodings\cp1250.py", line 19, in encode return codecs.charmap_encode(input,self.errors,encoding_table)[0] UnicodeEncodeError: 'charmap' codec can't encode character '\xb2' in position 124: character maps to <undefined>

Workaround: replace all 'mm\u00B2' to 'mm2' in the code.

@aakatz3
Copy link
Contributor Author

aakatz3 commented Jul 16, 2020

Part of the workaround in this is to force all encodings to UTF-8, and to attempt to detect the encodes.

@kvid
Copy link
Collaborator

kvid commented Jul 16, 2020

I wonder, what is needed to "force all encodings to UTF-8" and why is this needed in Windows?

@aakatz3
Copy link
Contributor Author

aakatz3 commented Jul 16, 2020

@kvid Python on windows writes the files as CP-2512, as that is the default extended encoding. Since the SVG is supposed to be UTF-8 (as that is the encoding graphviz generates), we need to force it to actually write as UTF-8. Sometimes, since UTF-8 and CP-2512 are somewhat compatible, python mis-identifies one as the other. by simply standardizing on UTF-8, since that's what Linux and macOS use, and thusly forcing the encoding to UTF-8 (since all modern versions of windows can easily support it), we should be able to get around the issue. A possible more elegant solution would be to simply always read and write in bytes, or in base64 encoding, but that seems more trouble than it is worth.

Ideally, some smart-encoding stuff should be implemented on the YAML side to try to detect which encoding it is, since sometimes Notepad on windows will default to CP-2512 instead of UTF-8, depending on the build of Windows 10.

Why does all of this happen? because IBM/DOS, and compatibility.

@formatc1702
Copy link
Collaborator

Has this been solved by adding encoding='UTF-8' to the file reading and writing functions?
Please confirm, and close the issue if it is fixed. Thanks!

@aakatz3
Copy link
Contributor Author

aakatz3 commented Jul 20, 2020

This appears to be fixed by forcing the encoding to UTF-8, and including the meta encoding tag inside of the generated HTML output. Closing issue.

@aakatz3 aakatz3 closed this as completed Jul 20, 2020
@formatc1702 formatc1702 removed the help wanted Extra attention is needed label Nov 15, 2020
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

4 participants