-
Notifications
You must be signed in to change notification settings - Fork 482
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Added support for spec compliant URL parsing #3056
Conversation
Thank you @Aenigma! If you have a LBRY address or channel, please email it to me at jeremy@lbry.com. @eukreign this was missed on the stand up today. Should you review this or @lyoshenka? |
@Aenigma thank you for this PR, sorry it took so long to respond. regarding this comment "I didn't think it would have been wise to modify the regex to accept both as that would make : ambiguous in terms of whether or not it was a sequence or a claim.", can you elaborate on what you mean? For example, if someone passes "foo:1" how would you disambiguate if this is Thanks! |
No problem. It is impossible to distinguish between the two semantically. You would effectively need to try one after the other in some way. Making a regex do that would have made the code much harder to read is all I meant. I was (poorly) suggesting that by hinting at the nature of the problem here. That's why in my approach I use two patterns: one for the old regex and one for the new and then try them until I get a hit. |
@Aenigma would you be willing to change it to just use one regex? just |
Yeah, but that change will have certain consequences. For example: We can tell easily that this would have been parsed before as: channel_sequence: 1
stream_claim_id: 2 And we know it's in the legacy format because it has a channel_claim_id: 1
stream_claim_id: 2 The consequence of misinterpreting this is that when it gets converted back to a string via At least if you convert Here's another example: Here, it'll be parsed as: channel_sequence: 1
stream_claim_id: 2 But actually, this is a mishmash of the two formats and should not be allowed in either. If the potential problem of situations like this are acceptable, that's fine -- I don't really grasp the reach of this utility so I genuinely don't know, but I thought that it likely wouldn't be OK. The alternative is to make the regex be aware of the whole URL. This would involve abusing regexes to use lookaheads (which I'm not really comfortable doing) or doing something like |
@Aenigma I think we would want |
Sorry, was a bit low on bandwidth the past week. I made the change and the changeset seems quite minimal now. However, I'm noticing that the CI faiilure seems to be related to my change:
Unfortunately, I can't seem to run the integration tests to run locally so I'm at a loss to figure out what the problem is here. |
Legacy URLs are preserved by attempting to parse the new URL format and, on failing that, it'll attempt the legacy one. Tests had to be updated such that the correct things are asserted against each other.
This removes the code for trying multiple patterns and the setup for it Added a few unit tests to check that the parsed URL is as expected
534b3c4
to
657d690
Compare
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
thanks for the PR @Aenigma !
Hello! I was looking at another project while doing hacktoberfest wondering how URL parsing works for lbry and stumbled upon this file. After studying it, I noticed some inconsistencies with the spec so I decided I may have enough context to be able to fix it.
I believe this would close #2832
I saw that in the issue that you guys wanted to preserve support for the legacy URL scheme so I made it so that the new scheme is attempted before attempting the legacy.
I didn't think it would have been wise to modify the regex to accept both as that would make
:
ambiguous in terms of whether or not it was a sequence or a claim.I also updated the tests so that the correct things are asserted against each other but didn't really try to add any more.
Running the whole test suite, I was able to get a consistent number of failures, so I don't think I added a regression. Though, it did seem like
test_database.TestSQLiteRace.test_unhandled_sqlite_misuse
seemed to toggle a fail or pass inconsistently, I didn't think that had anything to do with my changes.Also, I know this repo isn't participating in hacktoberfest, but it'd be super cool if I could get t-shirt points if someone added the
hacktoberfest-accepted
label to this PR...