Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Parse URL properly before cleaning #25

Closed
Cimbali opened this issue Jun 12, 2018 · 2 comments
Closed

Parse URL properly before cleaning #25

Cimbali opened this issue Jun 12, 2018 · 2 comments
Labels
enhancement New feature or request

Comments

@Cimbali
Copy link
Owner

Cimbali commented Jun 12, 2018

Cleaning is really applied to only a part to the link, which is URL.path, URL.search (and maybe URL).hash). Since some rules (e.g. whitelisting) apply per domain, it is useful to start from a properly parsed URL object.

Finding out the link is only (partly) tricky in the injected script, as when checking the header we have the full uri. The injected script is useful for visual feedback of cleaned links, but failing there will be caught later on if request cleaning is enabled.

Once that is done, it will be easy to allow per-domain rules, such as parameters to clean (see discussion #20) or some rules which are currently hardcoded (e.g. on google.com/search, don't clean if we find the URL in the parameter q=)

@Cimbali Cimbali added the enhancement New feature or request label Jun 12, 2018
@Cimbali
Copy link
Owner Author

Cimbali commented Jul 26, 2018

URL objects make URLs canonical, thus comparing them for "equality" sometimes wrongly fails and we add non-cleaned links to the history (e.g. a comma "," becomes %2C). Happened with standalone parameter cleaning in v3.1.2

@Cimbali
Copy link
Owner Author

Cimbali commented Aug 19, 2018

We should also clean paths/parameters in the URL fragment, detected with fragments starting with #!/... or "?key=value, e.g. remove refid in https://m.facebook.com/home.php#!/photo.php?fbid=1234567890&refid=1234567890

Cimbali added a commit that referenced this issue Aug 19, 2018
This will prevent erroneous cleaning of sections of a legitimate URL, as
in the new test case (where aff0b550d3fe338b645a4deebdcb1b got removed).

This is (likely) a temporary fix while waiting for #25, when we'll match
parameters in a more robust way.
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
enhancement New feature or request
Projects
None yet
Development

No branches or pull requests

1 participant