Use heuristic_parse for untrusted URLs. #976
Merged
Add this suggestion to a batch that can be applied as a single commit.
This suggestion is invalid because no changes were made to the code.
Suggestions cannot be applied while the pull request is closed.
Suggestions cannot be applied while viewing a subset of changes.
Only one suggestion per line can be applied in a batch.
Add this suggestion to a batch that can be applied as a single commit.
Applying suggestions on deleted lines is not supported.
You must change the existing code in this line in order to create a valid suggestion.
Outdated suggestions cannot be applied.
This suggestion has been applied or marked resolved.
Suggestions cannot be applied from pending reviews.
Suggestions cannot be applied on multi-line comments.
Suggestions cannot be applied while the pull request is queued to merge.
Suggestion cannot be applied right now. Please check back later.
Hello,
Thanks for this gem!
Addressable::URI.parse
will throw exceptions for URLs it thinks are invalid. The issue with this is that Addressable and Twitter do not agree on what qualifies as a valid URL. So a tweet can contain URL entities that Addressable believes are invalid.Addressable::URI.heuristic_parse
is Addressable's more lenient parser.This will make it so any tainted URLs are parsed with
heuristic_parse
. This way there is less of a chance of encountering anAddressable::URI::InvalidURIError
exception in the wild.For example the tweet below contains the URL
http://suspicio\\.us/URL"
. Which Twitter recognizes as a URL so it shows up as an entity.This throws an exception when calling
tweet.urls.first.expanded_url
.vs
Incidentally, it looks like
Addressable
was first used to help with this same type of issue: #487.This should also fix #742 and #891.