-
Notifications
You must be signed in to change notification settings - Fork 141
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Consider adding a constructor to make a URL from parts #354
Comments
How exactly would this work? What if |
There could be a TypeError, exactly like when you write |
It's worth considering, but it would be nice if someone wrote a library first and demonstrated it's somewhat popular (or could be adopted in many places). That would also show the desired processing model for the arguments. |
I've been thinking a bit, and I think it would make sense to go from broad (href), to more specific (hostname). That way every time you add a more specific key, it will override what you specified earlier. A naive implementation would be this (this is for node, but very similar for browser): const { URL, URLSearchParams } = require('url');
function createURL(
{
hash,
host,
hostname,
href,
origin,
password,
pathname,
port,
protocol,
search,
searchParams,
username,
} = {}
) {
const defaultHref =
href !== undefined
? href
: protocol !== undefined &&
(host !== undefined || origin !== undefined || hostname !== undefined)
? `${protocol}//${host || origin || hostname}`
: undefined;
const url = new URL(defaultHref);
// everything
if (href !== undefined) {
url.href = href;
}
// hostname : port
if (host !== undefined) {
url.host = host;
}
// protocol // hostname : port
if (origin !== undefined) {
url.origin = origin;
}
if (hostname !== undefined) {
url.hostname = hostname;
}
// only visible in href
if (password !== undefined) {
url.password = password;
}
if (username !== undefined) {
url.username = username;
}
if (port !== undefined) {
url.port = port;
}
if (pathname !== undefined) {
url.pathname = pathname;
}
// includes :
if (protocol !== undefined) {
url.protocol = protocol;
}
// includes #
if (hash !== undefined) {
url.hash = hash;
}
// let's say searchParams overrides search
// includes ?
if (search !== undefined) {
url.search = search;
}
if (searchParams !== undefined) {
// this overrides
url.search = searchParams;
}
return url;
} You can then use it like this:
This would return
It will throw when none of the minimal
This will (now) throw an error with Am I on the right track? |
It doesn't look unreasonable per se, but it does seem fairly complicated and would require a lot of tests. It also does some things that are not really possible (setting |
If we did this we would need to define a lot of edge cases. It might look nice when used properly, but each value can contain any string, most of which would lead to invalid URLs or need encoding of some sort. |
That’s exactly why it should be in the standards and not in user space imo @achristensen07 |
I don't think that's a valid argument since if you use the public API to implement it you'll always be safe. The main rationale for this feature would be ergonomics and the main drawback is that it would be quite complicated to specify (mostly due to refactoring that would be needed) and test (due to all the various scenarios it enables). I suspect implementation would not be as hard, but we need the other two first. And so the main question I have is how often this would be used and I think a library would be a good proxy for that. |
It would also be good to survey usage of Node.js's I should have pointed out these earlier: https://whatwg.org/faq#adding-new-features and https://whatwg.org/working-mode have useful information on how we go about making changes. |
Thanks for that information Anne! Those are valid points. |
I'll note that Node.js itself has such an API, and it can't realistically be properly replaced without this (or a library equivalent). |
Hello! Thank you for your hard work. However, please also consider a classic builder pattern. The use case is pretty straightforward: what if you want to create a URL string from individual parts and you don't want to concatenate strings yourself, because Right now, you can't even create an instance of URL class without specifying at least a minimal correct URL string. So in order to build URL progressively from scratch you have to create an instance using some placeholder URL like I think it would be practical to allow creation of an empty URL instance and then to allow adding individual parts to it. The error (in case of missing important fields, like protocol) could be thrown when URL instance is actually being cast to string. However, such approach would require us to allow URL instance to represent invalid URL at times and I'm not sure if it's viable. |
It's pretty important not to allow URL instances that represent non-URLs (e.g. URLs with no protocol). |
What about relative URLs (href="../images/larry.jpg")? No protocol there. |
Relative URLs might make sense as a separate API, but again, a single API representing 2-3 distinct concepts with distinct invariants would be problematic. Some previous discussion on relative URL APIs in nodejs/node#12682. |
Hello, everyone! I've implemented However, it feels like it would be convenient to have an option of creating a url object not only from string that should represent a url, but also from an object (or any other alternative according to any language) allowing builder pattern.Also it can be much faster than parsing the string. I guess it would be a nice option in the browsers as well since making a url from an object-config is as common as making a url from string. @annevk input validation is an open question, but please note that input is not as complicated as I see in this thread. I would like to get focus on performance leaving most of the validation for users. You should know what you put there (also make writing 3rd party libraries easier). I'd like to do things like: const input = {}
// ...
input.protocol = 'https'
// ...
input.host = 'localhost'
// ...
input.query = querystring.stringify({ a: 1, b: 2 })
// ...
input.path = ['a', 'b']
// ...
input.path.pop()
// ...
const myURL = URL.from(input) If you did mistake you should get punished - sure. But what to do with users that are using typescript and checking things before using them in libraries? Punish them with double checks hurting performance? Anyway, if we can agree on the matter I am fine with strict checks (maybe with an option for I'd love to see it in Node.js. |
@viktor-ku The last comment there doesn't read as a "no" as much as a possible "not yet". There is already things to look at for ideas, like with Node's |
@isiahmeadows well, |
@viktor-ku I know it's deprecated. But I was referring to that as my precedent - it was primarily deprecated in favor of implementing the WHATWG spec IIRC. Of course, you're a little more involved with Node than I am, so you may know more about the background of that than I do. |
I think the URL standard is dearly lacking the inverse operation of the There are many user cases where one wants to build a URL step-by-step, on both client and server-side. Take for example this module. |
The problem is you can't build a URL step-by-step. Each part affects the whole. It's not really a sensible operation in the URL data model. I suggest people just use the following: Object.assign(new URL("https://example.com/"), { ... pieces go here ... }); |
What is meant by this? Lots of urls need to assembled from parts, If something passed to |
@Jamesernator in particular it would be hard to ensure that the individual parts cannot affect the other individual parts, without essentially designing a second kind of URL parser. Otherwise you would essentially be doing string concatenation and there's no need for an API to do that. If someone can demonstrate that this is a common need I'd be happy to reconsider, but at this point this feels like a stretch. |
I am trying to write an immutable wrapper for Instead of overriding a property on a class instance (i.e. mutation), the idea is we can construct a new URL object immutably: const urlClass = new URL('https://foo.bar/a?b');
const fromClass = (klass) => ({
hash: klass.hash,
host: klass.host,
hostname: klass.hostname,
href: klass.href,
password: klass.password,
pathname: klass.pathname,
port: klass.port,
protocol: klass.protocol,
origin: klass.origin,
searchParams: klass.searchParams,
search: klass.search,
username: klass.username,
});
const urlObject = fromClass(urlClass);
// Now we have an object, we can create new objects immutably using spread,
// and then override properties as we wish in order to manipulate the URL
const newUrlObject = {
...urlObject,
pathname: '/c',
searchParams: new URLSearchParams('?d'),
}; Now, I want to convert const result = toClass(newUrlObject).toString(); // "https://foo.bar/c?d" I imagined I could write a function For now, I'm having to use Node's legacy @domenic In response to the following suggestion of yours:
I believe this won't work for some properties which are read only ( Update: actually maybe this could work: const toClass = ({
// Omit from rest
origin,
searchParams,
...restProps
}: URLProps): URL => {
const klass = new URL('https://foo.bar/');
Object.assign(klass, restProps);
// We can't do this (override the property) as it's read only
// klass.searchParams = props.searchParams
searchParams.forEach((value, key) => {
klass.searchParams.set(key, value);
});
return klass;
}; |
I understand the desire to encourage developers to not create invalid URL's. I think that given this functionality was already in Node, and that the WHATWG URL specification supersedes the legacy Node.js API it would be prudent to offer a solution. I think I have a solution, but first I want to demonstrate a use case to provide some context. It would be great to be able to extract all the properties of a URL using object spread, and then pass immutable objects into a standard utility to generate URL's. Enabling patterns like e.g. interacting directly with a database connection string at a higher level and not rely on 3rd party connection string parsers (as its just a URL after all): const database_url = new URL('postgres://user:password@host:1234/database')
const cluster_url = URL.from({ ...database_url, pathname: '' })
const connection_pool = URL.from({ ...database_url, port: 12345 }) Currently that's possible via Object.assign + stringification, but that's a bit clumsy and not unintuitive. const database_url = new URL('postgres://user:password@host:1234/database')
const cluster_url = Object.assign(new URL(database_url+''), { pathname: '' })
const connection_pool = Object.assign(new URL(database_url+''), { port: 12345 }) I think these sorts of situations occur often and having a first party solution is important.
I completely agree with this. And at the same time, I think the existing specific URL object provides the answer. If a particular assignment or invocation is invalid the
A reasonable counter argument (as mentioned by @isiahmeadows) is that given that the My issue with relying on the module ecosystem to prove out the design is that this is exactly the sort of functionality people expect from a standard library, and would probably not install a module for this utility alone. Micro-dependencies are a bit of an anti-pattern both from a security and stability perspective. Installing a new module just for this functionality would be inadvisable. Well formed URL's are extremely important for security, if we can help developers rely on core functionality for anything to do with domains, we should. The position of WHATWG is well stated and so is the developer use case. This brings me to the following conclusion: it is probably safest to implement a standard using composition of an existing standard where all prior predicates and validations apply. Additional tests are reduced as the specification is deferring to the regular URL string parsing and setter implementations and specifications. Below is an example implementation demonstrating a logical implementation, completely ignoring performance of an actual implementation. URL.from = function from(object={}){
// so we're free to mutate
object = { ...object }
searchParams = { ...(object.searchParams || {}) }
// Allow receipt of a URLSearchParams object
if ( searchParams instanceof URLSearchParams ) {
searchParams = Object.fromEntries([ ...searchParams.entries() ])
}
delete object.searchParams
// try to establish a baseline URL object
// you could add to the list of minimal constructions if required
let u, err;
for( let f of [ () => new URL(object.protocol+object.host), () => new URL(object.href) ] ) {
try {
u = f()
break;
} catch (e) { err = e }
}
// if we couldn't establish a baseline URL object, we throw an encountered URL validation error
if (!u) {
throw new TypeError(err)
}
// Copy on other properties as Domenic suggested
// the benefit being, each property will pass through the URL object setter
// which relies on existing specified/documented/tested behaviour.
// Another benefit is, any new changes in functionality to those setters
// will automatically be adopted and consistent.
Object.assign(u, object)
// Do the same for search params
// Note .search has already been assigned which
// will automatically use the setter processing on `url.search`
for( let [k,v] of Object.entries(searchParams) ) {
u.searchParams.set(k,v)
}
// Return a complete url object
return u
}
Any validations built into the existing setters and constructors for |
I’m not really sure why you think it’s not intuitive, but I wanted to mention that you don’t need to make a coercion explicitly, as the - let url2 = Object.assign(new URL(url + ""), {pathname: "/about"})
+ let url2 = Object.assign(new URL(url), {pathname: "/about"}) In addition, URL objects are mutable, and you only need to actually create a new one if you need a copy of the original. - let url2 = Object.assign(new URL(url), {pathname: "/about"})
+ url.pathname = "/about" If you don’t want to mutate the original record, but don’t need to use it anymore, at least in that function (e.g. it’s a parameter that you don’t want to mutate), you can reassign it instead. - let url2 = Object.assign(new URL(url), {pathname: "/about"})
+ url = new URL(url)
+ url.pathname = "/about" Personally, I think the approach of using object spreading is a bit clumsy, as new properties could be added to URL records in the future, maybe just to access and assign existing URL data in a different way (e.g. #491). Also, I think it’s interesting to note that some databases (and/or their drivers) extend the grammar of URLs beyond what this specification recognizes and allows, see e.g. #398. I’m also not sure whether it’s actually feasible to turn URL record properties into own properties anymore, and even if it is, I don’t think it would be a good idea, as I think it would be kinda inconsistent. |
Another use-case would be to convert URLs from older standards. This is something I'm going to have to work out soon, and I don't expect it to be entirely trivial - each component would need to be validated and percent-encoded, the host would need to get parsed, the path simplified, etc. Basically, another URL parser. It would be nice if there was some common algorithm for it - or if the URL parser in the spec was refactored to split the logic which detects component boundaries from the logic which parses and encodes the component's contents. |
I much appreciate such an effort! Thank you. Have you seen my work and research on this? I have done the refactoring that you describe. |
The suggested approach does fail for certain properties like String(Object.assign(new URL("https://example.com/path"), {origin: "https://replace.com"}))
// Uncaught TypeError: Cannot set property origin of [object URL] which has only a getter So back to clumsy string replacement it is, at least for replacing |
Additionally, the spec forbids changing between “special” and “non-special” protocols, which makes “mutating an instance” particularly problematic. |
I'd love to have an api to make an URL from parts of an url, like you can do when calling a node
http.request
. Something like:Would be very helpful to allow people getting for example just the
hash
from there, or making conditional URLs without needing to concatenate strings (and accidentally adding a/
too little or too many).Hope that makes sense!
The text was updated successfully, but these errors were encountered: