Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Specify encoding for files #62

Closed
kouralex opened this issue Sep 13, 2018 · 8 comments · Fixed by #68
Closed

Specify encoding for files #62

kouralex opened this issue Sep 13, 2018 · 8 comments · Fixed by #68
Labels

Comments

@kouralex
Copy link
Contributor

Whilst trying to run Skosify on Windows, I encountered issues similar to this.

The answer there was to simply declare the encoding when you open a file. Supporting this user case requires some small changes in the code as well as a new parameter to the program, I suppose.

@osma
Copy link
Member

osma commented Sep 13, 2018

Can you please show the error and traceback?

@kouralex
Copy link
Contributor Author

kouralex commented Sep 14, 2018

There is no traceback provided by the program, only the message for the raised exception:

CRITICAL: Parsing failed. Exception: 'charmap' codec can't decode byte 0x8f in position 1132: character maps to <undefined>

As expected, I managed to run the program on WSL, though.

@libraryjeans
Copy link

Having the same issue. Thanks for the heads up

@osma osma added the bug label Jun 17, 2019
@osma
Copy link
Member

osma commented Jun 17, 2019

There was a similar issue in Annif and it was fixed by this PR. But in Annif the input files are in text-based formats, not RDF (mostly).

I wonder what file caused this. Normally Skosify only parses the configuration file (optional) and RDF files. Can you try to work out which file caused this and perhaps post a minimal example file that causes the error? The error message seems to come from this line and is shown when the input RDF file cannot be parsed. But the parsing itself is handled by rdflib and in my understanding it should do the right thing w.r.t. character encodings - but apparently this isn't always the case.

My wishlist (for @kouralex or @libraryjeans):

  • an example input file that causes this problem
  • the command line used to run skosify
  • debug output of skosify (use -D option on command line to enable this)

@TommiRTVA
Copy link

Hello, I have the same problem. Python 3.8 installed with Powershell.
Followed these instructions.
https://www.digitalocean.com/community/tutorials/how-to-install-python-3-and-set-up-a-local-programming-environment-on-windows-10
Command:
skosify -i .\yso-paikat-skos.rdf -o test.rdf
CRITICAL: Parsing failed. Exception: 'charmap' codec can't decode byte 0x81 in position 35428: character maps to
Used this file.
https://finto.fi/rest/v1/yso-paikat/data?format=application/rdf%2Bxml
Goal is to create custom SKOS-files based on these Finto-files.

@osma
Copy link
Member

osma commented Nov 18, 2019

Possible fix on this branch: https://github.com/NatLibFi/Skosify/tree/issue62-utf8-encoding
Can you test it @kouralex or @TommiRTVA or @libraryjeans ?

@TommiRTVA
Copy link

Looks like it worked, at least the file has been analyced. Good!

skosify -i .\yso-paikat-skos.rdf -o test.rdf
INFO: Don't know what to do with literal http://purl.org/dc/terms/modified
INFO: Don't know what to do with literal http://purl.org/dc/terms/created
INFO: Don't know what to do with relation http://rdaregistry.info/Elements/u/P60683
INFO: Don't know what to do with relation http://purl.org/iso25964/skos-thes#broaderPartitive
INFO: Don't know what to do with relation http://rdaregistry.info/Elements/u/P60686
INFO: Don't know what to do with relation http://purl.org/iso25964/skos-thes#narrowerPartitive
WARNING: Concept scheme has no label(s). Use --label option to set the concept scheme label.
INFO: Some concepts not reached in initial cycle detection. Re-checking for loose concepts.
WARNING: Redundant hierarchical relationship http://www.yso.fi/onto/yso/p507242 skos:broader http://www.yso.fi/onto/yso/p109172 found, but not eliminated because eliminate_redundancy is not set
WARNING: Redundant hierarchical relationship http://www.yso.fi/onto/yso/p105341 skos:broader http://www.yso.fi/onto/yso/p94399 found, but not eliminated because eliminate_redundancy is not set
WARNING: Redundant hierarchical relationship http://www.yso.fi/onto/yso/p94168 skos:broader http://www.yso.fi/onto/yso/p94347 found, but not eliminated because eliminate_redundancy is not set
WARNING: Redundant hierarchical relationship http://www.yso.fi/onto/yso/p507241 skos:broader http://www.yso.fi/onto/yso/p109172 found, but not eliminated because eliminate_redundancy is not set
WARNING: Redundant hierarchical relationship http://www.yso.fi/onto/yso/p226189 skos:broader http://www.yso.fi/onto/yso/p94402 found, but not eliminated because eliminate_redundancy is not set

@brunoalmeida81
Copy link

brunoalmeida81 commented Sep 25, 2020

Hello, I'm having this problem on Windows 10 Pro with Python 3.8.5. On cmd, when I try to run the script on yso-skos.rdf (obtained from Finto.fi) I get the following error: CRITICAL: Parsing failed. Exception: 'charmap' codec can't decode byte 0x9d in position 14043: character maps to . The script works as intended on a Ubuntu installation through WSL.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
Projects
None yet
Development

Successfully merging a pull request may close this issue.

5 participants