[BUG] Unicode characters render improperly on Windows #52

aakatz3 · 2020-07-01T18:09:40Z

Windows renders unicode characters improperly, and tends to render   improperly as well. There may be a proper solution for this, but I don't know the root cause, or how to force unicode encoding.

The text was updated successfully, but these errors were encountered:

kvid · 2020-07-01T18:29:42Z

This might be a duplicate of #22

aakatz3 · 2020-07-04T15:17:25Z

It may have the same or a similar root cause. It is likely in GraphViz or in Python, but it could be due to the filetypes or encoding of the input files being different. It's related, but larger. There is a character in there git didn't print (in my original post), it should also say that   renders wrong as well. This is very likely a unicode to ascii, or vice versa issue. It could be indicative of a problem with character handling in general, possibly in a dependency. I haven't had to deal with this before, other than when migrating databases from Windows 9X to Windows 10, so I don't know the root cause.

formatc1702 · 2020-07-05T15:05:23Z

See #22 for (most likely) a specific instance of this problem. Needs help from someone using Windows.

aakatz3 · 2020-07-05T15:58:52Z

You can also see it in some of my commits:

I will do a check while working on #17 to see if the dev version of graphviz fixes it, and if not, I may have to break out a debugger to debug the way python accesses the disk, so I can look for the root cause

argabor · 2020-07-15T09:55:21Z

On windows I get this error message if i use mm2 instead of AWG gauge:
Traceback (most recent call last): File "c:\users\arvai.gabor\appdata\local\programs\python\python38-32\lib\runpy.py", line 193, in _run_module_as_main return _run_code(code, main_globals, None, File "c:\users\arvai.gabor\appdata\local\programs\python\python38-32\lib\runpy.py", line 86, in _run_code exec(code, run_globals) File "C:\Users\arvai.gabor\AppData\Local\Programs\Python\Python38-32\Scripts\wireviz.exe\__main__.py", line 7, in <module> File "c:\users\arvai.gabor\appdata\local\programs\python\python38-32\lib\site-packages\wireviz\wireviz.py", line 235, in main parse(yaml_input, file_out=file_out, generate_bom=args.generate_bom) File "c:\users\arvai.gabor\appdata\local\programs\python\python38-32\lib\site-packages\wireviz\wireviz.py", line 182, in parse harness.output(filename=file_out, fmt=('png', 'svg'), gen_bom=generate_bom, view=False) File "c:\users\arvai.gabor\appdata\local\programs\python\python38-32\lib\site-packages\wireviz\Harness.py", line 224, in output file.write(tuplelist2tsv(bom_list)) File "c:\users\arvai.gabor\appdata\local\programs\python\python38-32\lib\encodings\cp1250.py", line 19, in encode return codecs.charmap_encode(input,self.errors,encoding_table)[0] UnicodeEncodeError: 'charmap' codec can't encode character '\xb2' in position 124: character maps to <undefined>

Workaround: replace all 'mm\u00B2' to 'mm2' in the code.

aakatz3 · 2020-07-16T12:37:22Z

Part of the workaround in this is to force all encodings to UTF-8, and to attempt to detect the encodes.

kvid · 2020-07-16T16:31:21Z

I wonder, what is needed to "force all encodings to UTF-8" and why is this needed in Windows?

aakatz3 · 2020-07-16T23:54:39Z

@kvid Python on windows writes the files as CP-2512, as that is the default extended encoding. Since the SVG is supposed to be UTF-8 (as that is the encoding graphviz generates), we need to force it to actually write as UTF-8. Sometimes, since UTF-8 and CP-2512 are somewhat compatible, python mis-identifies one as the other. by simply standardizing on UTF-8, since that's what Linux and macOS use, and thusly forcing the encoding to UTF-8 (since all modern versions of windows can easily support it), we should be able to get around the issue. A possible more elegant solution would be to simply always read and write in bytes, or in base64 encoding, but that seems more trouble than it is worth.

Ideally, some smart-encoding stuff should be implemented on the YAML side to try to detect which encoding it is, since sometimes Notepad on windows will default to CP-2512 instead of UTF-8, depending on the build of Windows 10.

Why does all of this happen? because IBM/DOS, and compatibility.

formatc1702 · 2020-07-19T20:56:29Z

Has this been solved by adding encoding='UTF-8' to the file reading and writing functions?
Please confirm, and close the issue if it is fixed. Thanks!

aakatz3 · 2020-07-20T00:04:19Z

This appears to be fixed by forcing the encoding to UTF-8, and including the meta encoding tag inside of the generated HTML output. Closing issue.

formatc1702 mentioned this issue Jul 5, 2020

[bug] ℃ cannot be printed #22

Closed

formatc1702 added the help wanted Extra attention is needed label Jul 5, 2020

kvid referenced this issue in aakatz3/WireViz Jul 6, 2020

Update all examples

61a5545

aakatz3 mentioned this issue Jul 16, 2020

Implement Feature/multicolor wires on refactored code #96

Merged

kvid mentioned this issue Jul 17, 2020

[bug] Improper XML and DOCTYPE Declarations #97

Closed

formatc1702 added this to the v0.2 milestone Jul 19, 2020

aakatz3 closed this as completed Jul 20, 2020

formatc1702 removed the help wanted Extra attention is needed label Nov 15, 2020

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

[BUG] Unicode characters render improperly on Windows #52

[BUG] Unicode characters render improperly on Windows #52

aakatz3 commented Jul 1, 2020 •

edited

Loading

kvid commented Jul 1, 2020

aakatz3 commented Jul 4, 2020

formatc1702 commented Jul 5, 2020

aakatz3 commented Jul 5, 2020

argabor commented Jul 15, 2020

aakatz3 commented Jul 16, 2020

kvid commented Jul 16, 2020 •

edited

Loading

aakatz3 commented Jul 16, 2020

formatc1702 commented Jul 19, 2020

aakatz3 commented Jul 20, 2020

[BUG] Unicode characters render improperly on Windows #52

[BUG] Unicode characters render improperly on Windows #52

Comments

aakatz3 commented Jul 1, 2020 • edited Loading

kvid commented Jul 1, 2020

aakatz3 commented Jul 4, 2020

formatc1702 commented Jul 5, 2020

aakatz3 commented Jul 5, 2020

argabor commented Jul 15, 2020

aakatz3 commented Jul 16, 2020

kvid commented Jul 16, 2020 • edited Loading

aakatz3 commented Jul 16, 2020

formatc1702 commented Jul 19, 2020

aakatz3 commented Jul 20, 2020

aakatz3 commented Jul 1, 2020 •

edited

Loading

kvid commented Jul 16, 2020 •

edited

Loading