Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Parse zero padded integers as strings #25

Open
wants to merge 1 commit into
base: master
Choose a base branch
from

Conversation

gertjanol
Copy link

Thanks for your work in this library! We're considering using it for a project, but one of the things that we run in to is that our users have values in their XML that look (and parse) like integers, but are actually strings. These values look like <code>000123</code>.
The parser incorrectly assumes these are integers, and in the process throws away information (the leading zeroes). The renderer works correctly though: the Python-structure {'code': '000123'} is rendered correctly to <code>000123</code>. So for one, the symmetry between parser and renderer is missing here.

This PR checks if values have a leading zero, and then does no processing at all. In the process I added some more tests to clarify existing behavior.

@gertjanol
Copy link
Author

Checks have failed due to ERROR: InterpreterNotFound: python3.3. I don't think there's anything wrong with my own tests.

@gertjanol
Copy link
Author

ping @jpadilla

@jpadilla
Copy link
Owner

jpadilla commented Dec 4, 2017

Hey @gertjanol thanks for putting this together. I'm kind of torn and not sure what the right solution would be. This as is would be a breaking change right now. Also, it would now parse "0" as a string instead of an int, right?

At this point I'm inclining towards perhaps making a note of the existing behavior in the docs and just leaving it to users to implement their expected type conversions.

@gertjanol
Copy link
Author

Myeah, you're right about this being a breaking change. I'm not sure how big of an issue that is though :).
The issue with '0 being parsed as a string is a valid one indeed. We could fix that by checking if the entire string is made up of 0's, for instance. And in that case, cast it to an int.

How do you envision users implementing their own type conversion? Subclass this Parser? def _type_convert is private/protected, so shouldn't be overriden (by convention).

@jpadilla
Copy link
Owner

jpadilla commented Dec 7, 2017

@gertjanol I'd rather not make this type of assumption going on forward and instead enable users to override the parser however they see fit.

I'm actually inclined on just adding/documenting the necessary public API methods, making zero assumptions the current _type_convert() method. Similar to how I think FormParser must behave, and release new major version.

@gertjanol
Copy link
Author

Hey @jpadilla. Alright, sounds like a plan! So then the forced cast to int would also go, or do you want to keep that in there to retain backwards compatibility?

Another solution I thought of, that might be the most simple, is adding a setting for this specific use case. Something like this

# Should data like <code>000123</code> be treated as strings (thereby keeping the leading zeroes), or be cast to an integer (thereby discarding the zeroes).
XML_KEEP_LEADING_ZEROES = False

Default would be False so current behavior doesn't change.

Drawback is that this might lead to other settings for every quirk that you can think of :).

@jpadilla
Copy link
Owner

jpadilla commented Dec 8, 2017

@gertjanol that's right, by default this new method will just return whatever the original value was. In the release note we would have to document what the original behavior was so people depending on it can just copy/paste a parser with that behavior.

@gertjanol
Copy link
Author

See #26

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

2 participants