You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
I tend to run my dev code locally on my powerful Windows-based home set-up before pushing to the cluster. In this instance, I am trying to run spring which relies on penman. It makes some calls to penman's load method. I am running their preprocessing to parse AMR data and found calls to penman._load result in encoding issues.
I am aware I can just run on WSL and be done with it, but I'd rather see this useful tool be available cross-platform. Is there any way I can contribute? Which methods are all reliant on encoding? My approach would be to allow for an optional encoding argument, as is common in enc/dec methods, and pass it through the relevant IO functions like open.
For starters I can start with codec if that is okay, e.g., change the _load function to
def_load(source: FileOrFilename,
model: Model=None,
encoding: str=None) ->List[Graph]: # EDITED""" Deserialize a list of PENMAN-encoded graphs from *source*. Args: source: a filename or file-like object to read from model: the model used for interpreting the graph Returns: a list of Graph objects """codec=PENMANCodec(model=model)
ifisinstance(source, (str, Path)):
withopen(source, encoding=encoding) asfh: # EDITEDreturnlist(codec.iterdecode(fh))
else:
asserthasattr(source, 'read')
returnlist(codec.iterdecode(source))
The text was updated successfully, but these errors were encountered:
Thanks for bringing that up. I don't have a Windows machine to test on, but I recall that Windows has strange, sometimes non-standard default encodings, such as UTF-8 with a byte-order-mark. I agree that allowing an encoding option would be helpful. I think you'd only need to target where text is read from the filesystem, as everything should be native Python (unicode) strings internally, so look for calls to open(). I see the following:
Thank you for the quick response! Indeed, unfortunately Windows still relies on cp1252 encoding (not uf-8-bom), which always leads to small issues like this. I'll look for built-in "open" methods and allow for the option of custom encoding there. I'll send in a PR soon.
I tend to run my dev code locally on my powerful Windows-based home set-up before pushing to the cluster. In this instance, I am trying to run spring which relies on
penman
. It makes some calls to penman'sload
method. I am running their preprocessing to parse AMR data and found calls topenman._load
result in encoding issues.I am aware I can just run on WSL and be done with it, but I'd rather see this useful tool be available cross-platform. Is there any way I can contribute? Which methods are all reliant on encoding? My approach would be to allow for an optional
encoding
argument, as is common in enc/dec methods, and pass it through the relevant IO functions likeopen
.For starters I can start with
codec
if that is okay, e.g., change the_load
function toThe text was updated successfully, but these errors were encountered: