Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Replace normalizeURL with urlx.Parse #1217

Open
wants to merge 1 commit into
base: master
Choose a base branch
from

Conversation

jayme-github
Copy link

normalizeURL used to break URLs with path by rewriting their host part and it would also panic on "IP:Port" strings.

I've refactored that to utilize urlx.Parse and also added a couple of test cases.

normalizeURL used to break URLs with path by rewriting their host part
and it would also panic on "IP:Port" strings.
@thaJeztah
Copy link
Contributor

normalizeURL used to break URLs with path by rewriting their host part

Do you have an example / test-case that would show the URLs that would break? This is adding 8K lines of code (as dependency) to parse an URL, which seems a lot for just this, so wondering if the problem could be solved in another way.

Comment on lines 333 to 336
func normalizeURL(addr string) (*url.URL, error) {
addr = strings.TrimSpace(addr)

u, err := url.Parse(addr)
if err != nil {
return nil, err
}

if u.Opaque != "" {
u.Host = net.JoinHostPort(u.Scheme, u.Opaque)
u.Opaque = ""
} else if u.Path != "" && !strings.Contains(u.Path, ":") {
u.Host = net.JoinHostPort(u.Path, "8888")
u.Path = ""
} else if u.Scheme == "" {
u.Host = u.Path
u.Path = ""
}

if u.Scheme != "https" {
u.Scheme = "http"
}

_, port, err := net.SplitHostPort(u.Host)
if err != nil {
_, port, err = net.SplitHostPort(u.Host + ":8888")
if err != nil {
return nil, err
}
}

if port != "" {
_, err = strconv.Atoi(port)
if err != nil {
return nil, err
}
}
return u, nil
return urlx.Parse(addr)
}
Copy link
Contributor

@thaJeztah thaJeztah Nov 22, 2022

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

At a quick glance, this would make the tests pass (without needing the new dependency);

Suggested change
func normalizeURL(addr string) (*url.URL, error) {
addr = strings.TrimSpace(addr)
u, err := url.Parse(addr)
if err != nil {
return nil, err
}
if u.Opaque != "" {
u.Host = net.JoinHostPort(u.Scheme, u.Opaque)
u.Opaque = ""
} else if u.Path != "" && !strings.Contains(u.Path, ":") {
u.Host = net.JoinHostPort(u.Path, "8888")
u.Path = ""
} else if u.Scheme == "" {
u.Host = u.Path
u.Path = ""
}
if u.Scheme != "https" {
u.Scheme = "http"
}
_, port, err := net.SplitHostPort(u.Host)
if err != nil {
_, port, err = net.SplitHostPort(u.Host + ":8888")
if err != nil {
return nil, err
}
}
if port != "" {
_, err = strconv.Atoi(port)
if err != nil {
return nil, err
}
}
return u, nil
return urlx.Parse(addr)
}
// normalizeURL trims spaces and appends "http" as default protocol if not defined in url
func normalizeURL(addr string) (*url.URL, error) {
addr = strings.TrimSpace(addr)
if strings.Index(addr, "://") == -1 {
// use http:// if no scheme is provided
addr = "http://" + addr
}
u, err := url.Parse(addr)
if err != nil {
return nil, err
}
_, port, err := net.SplitHostPort(u.Host)
if err != nil {
var err2 error
_, port, err2 = net.SplitHostPort(u.Host + ":8888")
if err2 != nil {
return nil, err
}
}
if port != "" {
v, err := strconv.Atoi(port)
if err != nil {
return nil, fmt.Errorf("invalid port: %w", err)
}
if v < 0 || v > 65535 {
return nil, fmt.Errorf("invalid port %q", port)
}
}
return u, nil
}

@jayme-github
Copy link
Author

normalizeURL used to break URLs with path by rewriting their host part

Do you have an example / test-case that would show the URLs that would break? This is adding 8K lines of code (as dependency) to parse an URL, which seems a lot for just this, so wondering if the problem could be solved in another way.

Given this was some time ago I'm no longer sure but I strongly suppose I put them in the tests.

@thaJeztah
Copy link
Contributor

Thanks! I think the reason url.Parse() may fail for some of those is that, if no scheme is provided, some results may be ambiguous.

The code I tried above avoids that situation by checking if a scheme is present and if not, to prepend the default scheme;

if strings.Index(addr, "://") == -1 {
	// use http:// if no scheme is provided
	addr = "http://" + addr
}

Of course it's possible that urlx handles more esoteric cases besides those, but (and I'm not a maintainer, just a passer-by! 😅) I wonder if that's enough to warrant the larger number of lines imported from the additional dependency.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

None yet

2 participants