Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

URI comparisons and resources.ts #93368

Closed
1 of 4 tasks
jrieken opened this issue Mar 25, 2020 · 35 comments
Closed
1 of 4 tasks

URI comparisons and resources.ts #93368

jrieken opened this issue Mar 25, 2020 · 35 comments
Assignees
Labels
debt Code quality issues uri
Milestone

Comments

@jrieken
Copy link
Member

jrieken commented Mar 25, 2020

This is the continuation of: 7d4cd4a#r37940098.

Background: We can classify uris into two categories

  1. Those that have posix-paths, like file:///foo/bar.bazz or vscode-remote://bing+bong/boo/far/fazz, and
  2. those with paths without special/known semantics, like untitled:unitiled-1 or command:deleteLeft

Uris of both categories exist in our system, generally uris that represent files are posix style, e.g all file-uris and all uris from contributed file systems. However, uris for virtual documents, command uris, and others don't fall into this category.

The URI class knows a little about this and for instance enforces that posix style uri paths start with a slash, e.g URI.parse('file:foo').toString() === 'file:///foo' whereas that "fixing" doesn't happen for arbitrary schemes (only file, http, https and arguably we should add vscode-remote here)

Now, the following problems exist:

  1. We have the resources.ts-utility which offers many convenience functions when working with uris, e.g ways to compare them. But these utilities are all written with the assumption that it works on uris with posix-paths, e.g it states that command:/deleteLeft equals command:deleteLeft (which breaks compatibility with uri.toString-usages). To make the confusion perfect it doesn't check, enforce, nor document that it work only works on a subset of uris, e.g. blind adoptions like these dc0ab50 made a perfect mess
  2. We have alternative, also fishy, ways to compare uris of which I know ResourceMap<T> and the widespread use of map.set(uri.toString(), 42)

I think we need to first decide if

  1. the resources.ts utility should only work on uris with posix-paths. That means we need at least documentation, better enforcement, and we need to review or revert @mjbvz's "adoption".
  2. alternatively we make the resources.ts utility be fit for all uris, esp with respect to uri comparisons. That means we need to know when posix rules apply and when not.

I am strongly suggesting to use variant 2 because anything else will be confusing.

As a next step we need to use the same URI-equivalence checks, e.g it's bad that the renderer uses a different identity for documents then the extension host. The ResourcesMap<T> also uses a different identity than for instance users of getComparisonKey. We should align that and ideally have them all root getComparsionKey and/or isEqual. So, this is what I think needs to be done

  • use getComparisonKey in ResourceMap<T> @bpasero
  • use ResourceMap whenever storing something by URI
  • use the same logic in TernarySearchTree@forPaths which actually is forUris @jrieken
  • define what makes uris equal, e.g. scheme, authority are case insensitive, path is case sensitive or not (depends on the OS, scheme), query is case sensitive, fragment is ignored. Also don't massage paths during compare but during creation of uris. @aeschli
@jrieken jrieken added debt Code quality issues uri labels Mar 25, 2020
@bpasero
Copy link
Member

bpasero commented Mar 25, 2020

I guess to really see the extend of usage we need to hunt down every single call to URI.toString(), possibly even path and fsPath to catch cases where ResourceMap should be used instead.

Changing to use the getComparisonKey seems easy, but we should maybe do this in debt week?

Q: the JSDoc of getComparisonKey says: URI queries are included, fragments are ignored, but it seems to return a URI without query (resource.with({ authority: resource.authority.toLowerCase(), path: path, fragment: null }))?

@jrieken
Copy link
Member Author

jrieken commented Mar 25, 2020

Yes - debt week. And careful changes

No - with keeps everything as-is when omitted, e.g URI.parse('http://google.com/?q=foo').with({ path: '/bar', authority: 'bing'}).toString() === 'http://bing/bar?q=foo'

@bpasero
Copy link
Member

bpasero commented Mar 25, 2020

Oh yeah, got it.

jrieken added a commit that referenced this issue Apr 8, 2020
… tree so that it will be fit for case-in-sensitive compare, #93368
jrieken added a commit that referenced this issue May 5, 2020
jrieken added a commit that referenced this issue May 5, 2020
@jrieken
Copy link
Member Author

jrieken commented May 5, 2020

@aeschli and I were wondering why all those comparisons and on-the-fly normalisations are done at all. My assumption has always been that we normalise when "uris enter the system", e.g when you call open editor for file:///c:/LOWER.txt and there is already file:///c:/lower.txt we return that instead. @bpasero you might know more about this?

@bpasero
Copy link
Member

bpasero commented May 5, 2020

@jrieken @aeschli I remember I suggested that as a possible cheap solution in 2017 to #12448 by simply letting all my file editor world use case-insensitive comparators and got push back from a couple of people in the team so I reverted this again (see #12448 (comment)). See also #12448 (comment) for a more recent reply I did on that matter.

One of the scenarios that break once we go down this path is:

  • you open a file on the "correct" path
  • an extension produces markers for the same file but "incorrect" path
  • you click on a marker from the problems view
  • we show the file as opened on the "correct" path
  • the editor does NOT show any markers because they will only appear if the resource matches (assuming that the editor is comparing URIs by strict equalness)

So, we either consistently compare URIs everywhere and ignore casing on certain OS or we do not. But cannot have either or.

@jrieken
Copy link
Member Author

jrieken commented May 5, 2020

we show the file as opened on the "correct" path

Is that "make correct" logic re-usable? We could just apply it to markers as well

@bpasero
Copy link
Member

bpasero commented May 5, 2020

Well that was basically using ResourceMap or similar helpers that will lowercase the key on Windows and macOS. It boils down to using these helpers in all places for comparing URIs I guess. But we will not fix issues such as links or aliases for paths.

@jrieken
Copy link
Member Author

jrieken commented May 5, 2020

Oh, so the "make correct" logic doesn't exist and today two editors would open, right?

@bpasero
Copy link
Member

bpasero commented May 6, 2020

@jrieken that's right, and one of the reasons for that is this line where we do URI.toString() === URI.toString():

return otherInput instanceof FileEditorInput && otherInput.resource.toString() === this.resource.toString();

On top of that, the place where all text file models are stored is using ResourceMap (here) and would also need to be aware of path casing (as any other place that uses ResourceMap probably).

As I said before, any place where we compare URIs (using toString or fsPath) needs to go through some utility for comparison that would take the case sensitivity into account. Best would be some method in IFileService or a service on top that would know exactly what the file system provider supports. Unfortunately in many cases we may not have access to services in some places where we compare today.

@aeschli
Copy link
Contributor

aeschli commented May 6, 2020

Best would be some method in IFileService or a service on top that would know exactly what the file system provider supports.

I think that's the right direction. IFileSystemProviders should offer resource APIs related to comparison and resolving (resolve, join, normalize) resource URIs. The IFileService would offer the same APIs and forward to the right IFileSystemProvider.
We'll have to migrate resources-API users towards it, which will be a lot of work and cause a lot of ripples, but we should prioritize the places where it matters the most (explorer, models, editors).

@jrieken
Copy link
Member Author

jrieken commented May 6, 2020

or a service on top that would know exactly what the file system provider supports.

I like that a lot. I would opt for a dedicated service, like the IUriTruthService or so. The good news wrt IFileSystemProvider is that we spec'd that they must use posix logic and that they already declare if they are case-sensitive or not. So, we can just use the their information but don't need to delegate to them.

I think an important piece is that such a service is non-destructive. I mean the first occurrence of a path, e.g. file:///c:/LOWER.txt, defines it and subsequent case-variants should use the first occurrence. Sample: the editor is already called LOWER.txt and therefore the markers panel should also use that name.

which will be a lot of work and cause a lot of ripples, but we should prioritize the places where it matters the most (explorer, models, editors).

I am more optimistic and I'd say the opening editors, text models, and markers are 90% here

@aeschli
Copy link
Contributor

aeschli commented May 6, 2020

IMO we should not just tackle the resource equality, but also the operations (join/normalize/..). There we need to know about fs roots, rules for normalizing separators (ok to remove trailing/duplicates, which ones are required), what to do with queries and fragments.
Give that IFileSystemProvider can come from remote, always going over the wire to ask a IFileSystemProvider to compare/manipulate a URI might not be a good idea. So maybe we need to simplify this and come up with some classes of file systems: case insensitive: yes/now, drive letter/UNC support: yes/no, posix path normalization: yes /no, keep queries, keep fragments...

@bpasero
Copy link
Member

bpasero commented May 6, 2020

I think that's the right direction. IFileSystemProviders should offer resource APIs related to comparison and resolving (resolve, join, normalize) resource URIs. The IFileService would offer the same APIs and forward to the right IFileSystemProvider.

I am not sure that is needed on the level of the provider, I think the provider should declare its behaviour in some format (like we declare the label format) and then some service can take that and work with that information. I think I would prefer a IURIService over adding methods to IFileService to keep things separate.

So, we can just use the their information but don't need to delegate to them.

Yes

I think an important piece is that such a service is non-destructive. I mean the first occurrence of a path, e.g. file:///c:/LOWER.txt, defines it and subsequent case-variants should use the first occurrence. Sample: the editor is already called LOWER.txt and therefore the markers panel should also use that name.

Not sure we can get that to work well across processes. We could end up with different forms in different processes unless we had a central authority for URIs, which of course would mean things have to be async.

Give that IFileSystemProvider can come from remote, always going over the wire to ask a IFileSystemProvider to compare/manipulate a URI might not be a good idea. So maybe we need to simplify this and come up with some classes of file systems: case insensitive: yes/now, drive letter/UNC support: yes/no, posix path normalization: yes /no, keep queries, keep fragments...

Yes, definitely needs to be sync. Challenge though is that resolving the remote is async and as such you could not properly construct or compare URIs until the remote is resolved.

Btw we do have a IPathService (formerly IRemotePathService) that is already there to deal with some aspects of remote file systems and environments. We could probably take some inspiration from that service.

@bpasero
Copy link
Member

bpasero commented May 29, 2020

The editor service change is in #98795 but as I said, I would think we need to merge the adoption for text model resolvers as well.

@bpasero
Copy link
Member

bpasero commented May 30, 2020

@jrieken please take a look at #98861 for testing, I tried to be verbose so that people understand the impact. Complexity 4 because of having to write extension code for some of the testing.

@bpasero
Copy link
Member

bpasero commented May 30, 2020

Already found a first somewhat regression: renaming a file to same name but different case no longer reflects in the editor:

recording

Arguably this is the same issue you will get if an extension introduces a "wrong" case canonical URI and you click on the file in the explorer.

@jrieken
Copy link
Member Author

jrieken commented Jun 2, 2020

renaming a file to same name but different case no longer reflects in the editor:

That's a good find. We should maybe not use the canonical uri for display purposes or find a way to re-set a canonical uris, tho that would mean we change the uri of a text model, right?

@bpasero
Copy link
Member

bpasero commented Jun 2, 2020

@jrieken ideally we go with a solution that does not involve touching the model. I actually prefer the way it is now because reusing the same model has a lot of advantages (undo/redo, view state, etc).

I think it would be relatively easy to do this for editors, but we may have many places in our system where we derive a label simply from its URI and nothing else. In those cases we have lost the information about the original URI potentially.

@jrieken jrieken modified the milestones: May 2020, June 2020 Jun 3, 2020
@bpasero
Copy link
Member

bpasero commented Jun 4, 2020

Listing a few things that I think should happen in June that I noticed while adopting the URI identity. I can open separate issues but we can also leave it here.

ITextModelResolverService using canonical resources

I think we just did not do this yet, but should do it early in June. I had to add some code to the editor service that makes it NOT use canonical URIs in case it gets a request to open a URI for which a model exists to ensure you can always open a model in the editor:

let canonicalResource: URI;
if (this.modelService?.getModel(resource)) {
// TODO@Ben remove this check once canonical URIs are adopted in ITextModelResolerService
canonicalResource = resource;
} else {
canonicalResource = this.uriIdentityService.asCanonicalUri(resource);
}

There is just a lot of paths where a URI might end up to create a model, even beyond the openTextDocument API. E.g. if you do "Peek Definition" we open an embedded editor with a URI that I think we get from extensions. If you start to edit that document inside, we would not be able to open it if it is opened with another casing.

Drop our biased checks or replace with IURIIdentityService

I think that our biased checks for isEqual and friends are very dangerous because I think the default we assume is not good for remote scenarios. The vscode-remote scheme will in 99% of the cases be Linux (e.g. WSL), but we ignore casing for any non-file-scheme. At the very minimum we should add a check for vscode-remote and maybe handle that specially. For example, after we resolved the remote agent, we know exactly what target OS we are dealing with and could then do the correct thing.

Nevertheless, adopting IURIIdentityService would solve this better.

ResourceMap aware of file system capabilities

I am having at least one case (write queue in file service) where I would benefit from a ResourceMap that is aware of a filesystems case sensitivity. It would be great to have a utility that would provide this.

@jrieken
Copy link
Member Author

jrieken commented Jun 4, 2020

ITextModelResolverService using canonical resources

Makes sense, but lets do it "careful"

Drop our biased checks or replace with IURIIdentityService

Yeah, my plan is to drop exturiBiasedIgnorePathCase and use extUri for the exports, like isEqual etc. Once we have adopted canonical uris in enough places we should be safe to make this change

ResourceMap aware of file system capabilities

Agreed - we should have an overload constructor that allows to pass in a "get-key" function or a comparator (assuming we trust the SkipList). Like new ResourceMap(uriIdentService.extUri.getComparaitionKey.bind(...))

@bpasero
Copy link
Member

bpasero commented Jun 4, 2020

One thing I should maybe look into is to decouple the URI usage in editors for label purpose and add explicit labels that file editors use.

@bpasero
Copy link
Member

bpasero commented Jun 8, 2020

6a0b359 adds support for passing in a URI as label for file inputs that should be used. This allows to rename files in the explorer and seeing the label get updated even though the underlying resource is still the same.

jrieken added a commit that referenced this issue Jun 9, 2020
This allows to use the ResourceMap with case insensitive path logic etc
@jrieken
Copy link
Member Author

jrieken commented Jun 9, 2020

@bpasero fyi - I have pushed a change that allows to use the ResourceMap with a custom toKey-function. The correct one to use can be retrieved from IUriIdentityService#extUri.

@jrieken
Copy link
Member Author

jrieken commented Jun 11, 2020

Done!

The default uri identity utils are now unbiased and compatible with toString-usage. There is still two biased utils which 1 or 2 users only and (hopefully) very clear comments. Last, there is IUriIdentityService for the "truth" and that's adopted in the critical places. Closing

@jrieken jrieken closed this as completed Jun 11, 2020
sandy081 added a commit that referenced this issue Jun 11, 2020
@sandy081
Copy link
Member

Removed one usage of biased util in fileUserDataProvider with extUri created using the provider's capabilities.

@github-actions github-actions bot locked and limited conversation to collaborators Jul 28, 2020
Sign up for free to subscribe to this conversation on GitHub. Already have an account? Sign in.
Labels
debt Code quality issues uri
Projects
None yet
Development

No branches or pull requests

5 participants