Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Rewrite script/files loading to be be url based #1059

Merged

Conversation

mstoykov
Copy link
Collaborator

@mstoykov mstoykov commented Jun 26, 2019

This includes also change to the archive file structure. Now instead of
separating files by whether they are scripts or no, they are separated
based on whether their URI scheme.

Through the (majority) of k6 now instead of simple strings a url.URL is
used to identify files(scripts or otherwise).

This also means that all imports or open can have schemes.
Previously remote modules were supported specifically without scheme.
With this change we specifically prefer if they are with a scheme. The
old variant is supported but logs a warning.
Additionally if remote module requires a relative/absolute path that doesn't have a
scheme it is relative/absolute given the remote module url.

Because of some of the changes, now caching is done through a afero.Fs
instead of additional map. This also fixes not loаding remotely imported
files from an archive, but instead requesting them again.

fixes #1037, closes #838, fixes #887 and fixes #1051

This includes also change to the archive file structure. Now instead of
separating files by whether they are scripts or no, they are separated
based on whether their URI scheme.

Through the (majority) of k6 now instead of simple strings a url.URL is
used to idenitify files(scripts or otherwise).

This also means that all imports or `open` can have schemes.
Previously remote modules were supported specifically without scheme.
With this change we specifically prefer if they are with a scheme. The
old variant is supported but logs a warning.
Additionally if remote module requires a relative/absolute path that doesn't have a
scheme it is relative/absolute given the remote module url.

Because of some of the changes, now caching is done through a afero.Fs
instead of additional map. This also fixes not laoding remotely imported
files from an archive, but instead requesting them again.

fixes #1037, closes #838, fixes #887 and fixes #1051
@mstoykov mstoykov requested a review from na-- June 26, 2019 10:34
@mstoykov
Copy link
Collaborator Author

mstoykov commented Jun 26, 2019

Things that need discussion/more work, maybe:

  • Test reading old archives ( I have tested this manually at some point ...)
  • Remove _k6=1 get argument when requesitng remote urls ? Use a specific http.Client as well in the loader.fetch. Also just pass the url.URL instead of it's string representation
  • Maybe write an httpsfs that requests files from urls and combine it with UnionFs and MemMapFs ? This will be more magical than https://github.com/loadimpact/k6/blob/aa4b0907c44878f923d3ea77e29f13ffba5730a3/loader/loader.go#L124-L134 so maybe don' t do it ?
  • Whatever @na-- says
  • Possibly some hacks for cdnjs and archives ? I was thinking of recording each originalModuleSpecfier -> url.URL resolve in a map and writing this map to the archive. This way we would know what was resolved to what and in the case of cdnjs we won't make network calls to see what cdnjs.com/libraries/Faker is and get 2 major version upgrade without wanting one. This still means that we will have to do some cdnjs magic for old archives.
  • Old archives and cdnjs/github loading currently won't work but atleast for github we can just throw it through loader.Resolve. This will also work with cdnjs actually 🤔 but we might lie on what url we are loading as now we report the fully resolved url as well. So we will load the archive version but possibly report the latest version in cdnjs ...
  • While writing this last too I released that we might ... wrongly report that we load from urls when we actually load from the archive ... I don't know how this should work and whether it is actually a problem

@codecov
Copy link

codecov bot commented Jun 27, 2019

Codecov Report

Merging #1059 into master will decrease coverage by 0.19%.
The diff coverage is 76.31%.

Impacted file tree graph

@@            Coverage Diff            @@
##           master    #1059     +/-   ##
=========================================
- Coverage   72.78%   72.58%   -0.2%     
=========================================
  Files         133      133             
  Lines        9884     9911     +27     
=========================================
  Hits         7194     7194             
- Misses       2272     2291     +19     
- Partials      418      426      +8
Impacted Files Coverage Δ
lib/models.go 94.52% <ø> (ø) ⬆️
cmd/run.go 9.18% <0%> (-0.04%) ⬇️
stats/cloud/collector.go 70.38% <100%> (ø) ⬆️
lib/archive.go 76.87% <70.51%> (-10.38%) ⬇️
js/initcontext.go 95.55% <80%> (-2.37%) ⬇️
js/bundle.go 81.56% <84%> (-0.53%) ⬇️
loader/loader.go 87.09% <86.66%> (-2.78%) ⬇️

Continue to review full report at Codecov.

Legend - Click here to learn more
Δ = absolute <relative> (impact), ø = not affected, ? = missing data
Powered by Codecov. Last update 2a2fe2c...4b91966. Read the comment docs.

@codecov
Copy link

codecov bot commented Jun 27, 2019

Codecov Report

Merging #1059 into master will increase coverage by 0.18%.
The diff coverage is 79.87%.

Impacted file tree graph

@@            Coverage Diff             @@
##           master    #1059      +/-   ##
==========================================
+ Coverage   72.79%   72.98%   +0.18%     
==========================================
  Files         133      138       +5     
  Lines        9890    10152     +262     
==========================================
+ Hits         7199     7409     +210     
- Misses       2276     2305      +29     
- Partials      415      438      +23
Impacted Files Coverage Δ
lib/models.go 94.52% <ø> (ø) ⬆️
cmd/archive.go 26.19% <0%> (ø) ⬆️
loader/filesystems.go 0% <0%> (ø)
cmd/inspect.go 11.62% <0%> (-1.2%) ⬇️
cmd/collectors.go 0% <0%> (ø) ⬆️
cmd/cloud.go 9.52% <0%> (ø) ⬆️
lib/fsext/cacheonread.go 100% <100%> (ø)
js/bundle.go 82.4% <100%> (+0.17%) ⬆️
stats/cloud/collector.go 70.38% <100%> (ø) ⬆️
js/runner.go 84.73% <100%> (ø) ⬆️
... and 14 more

Continue to review full report at Codecov.

Legend - Click here to learn more
Δ = absolute <relative> (impact), ø = not affected, ? = missing data
Powered by Codecov. Last update c854389...a9498eb. Read the comment docs.

Copy link
Member

@na-- na-- left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I like the approach for using url.URL structs to record the paths, but I'm starting to think that this may be an implementation detail that has been unnecessarily exposed too "high" in the k6 stack in this pull request... 😕

It seems to me that it makes more sense to treat the paths in the user-facing parts of k6 (k6 run <whatever>, import X from "<whatever>", require("whatever"), open("whatever"), etc.) as simple strings - they will just use exactly what the user passed. This would make those parts of k6 dumb to the actual mechanics of what happens under the hood, but will allow us to have sensible error messages to the user, displaying exactly the whatever string path they specified. All the while, the actual file loading code underneath can deal with the complexities in all of the many different execution scenarios.

Basically, if we have a Loader interface (for lack of a better term, since it would mostly replace the current one) with the methods LoadModule(string) (Something?, error) (for import / require()) and LoadFile(string) (io.ReadCloser, error) (or something else, like GetFS() - for open() and any future file APIs), the user-facing k6 parts should be happy. But underneath, two or more implementations of that interface (one for archives and one for the rest?) deal with the complexities of actually resolving and loading whatever was requested:

  • using url.URL everywhere, or only for some things?
  • using different layer cakes of afero.Fs (or whatever) implementations that deal with whatever is needed - caching, slash conversion for windows archives, etc.
  • resolving things properly - either via the afero FSes, or via the github/cdnjs shortcuts, etc.
  • ?? (I'm definitely missing something...)

lib/archive.go Outdated Show resolved Hide resolved
cmd/run.go Outdated Show resolved Hide resolved
js/bundle.go Outdated
// NewBundle creates a new bundle from a source file and a filesystem.
func NewBundle(src *lib.SourceData, fs afero.Fs, rtOpts lib.RuntimeOptions) (*Bundle, error) {
func NewBundle(src *lib.SourceData, fileFS afero.Fs, rtOpts lib.RuntimeOptions) (*Bundle, error) {
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Why this rename? Are we sure that this file system will always be a fileFS, not something else?

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

In my current reworking due to windows ... I now provide the whole map[string]afero.Fs, for reasons that will become apparent :)
But yeah the idea here was that this would be the fileFs that was used to load the original script from ... Obviously this doesn't work very well as

  1. We won't cache that one when we first load it in the cmd/run.go or elsewhere
  2. Won't work if we load (and again not cache it) if it was from a remote script ... aka https Fs :)

@@ -58,11 +62,11 @@ func TestNewBundle(t *testing.T) {
})
t.Run("Invalid", func(t *testing.T) {
_, err := getSimpleBundle("/script.js", "\x00")
assert.Contains(t, err.Error(), "SyntaxError: /script.js: Unexpected character '\x00' (1:0)\n> 1 | \x00\n")
assert.Contains(t, err.Error(), "SyntaxError: file:///script.js: Unexpected character '\x00' (1:0)\n> 1 | \x00\n")
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This seems strange and slightly wrong to me - why should the user see the file schema in the error messages if they didn't explicitly specify the schema themselves?

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Maybe we should print both the original thing that the user wrote and what we resolved it to as in loader.Load
I find the resolved final url to be even more useful as otherwise you don't actually know what we resolved whatever you provided us with

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I find the resolved final url to be even more useful as otherwise you don't actually know what we resolved whatever you provided us with

I don't understand what you mean by this, sorry

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

if you have two scripts which have require("./scripts.js") but are in different directories and our message only tells you ./script.js ... had an error ... you will have trouble to find out which ./script.js it is. This gets even worse if you have a https import which imports ./script.js or the more likely ./utils.js .
But yeah I agree that having the actual name of what the user provided is a good idea.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

if you have two scripts which have require("./scripts.js") but are in different directories and our message only tells you ./script.js ... had an error ... you will have trouble to find out which ./script.js it is.

In my head, the clear way to solve this is that both the originating file and the problem should be part of the log message, something like this:

/dir1/script1:line1234: cannot open "whatever the user wanted to open"
/dir2/script2:line3456: cannot open "whatever the user wanted to open"

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

oops, wrong resolve... in both cases, whatever the user wanted to open should be exactly the string the user specified though

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

so... is this fixable?

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I would consider the new one an improvement on the previous error. And what you propose to be a separate PR.

loader/loader.go Outdated Show resolved Hide resolved
loader/loader.go Outdated Show resolved Hide resolved
Copy link
Member

@na-- na-- left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I like how this improves and fixes a ton of things, but I'm honestly starting to get scared by the scope of it and by merging this before what is supposed to be a bugfix release...

cmd/archive.go Outdated Show resolved Hide resolved
cmd/cloud.go Outdated Show resolved Hide resolved
cmd/run.go Outdated Show resolved Hide resolved
cmd/cloud.go Outdated Show resolved Hide resolved
cmd/archive.go Outdated
fs := afero.NewOsFs()
src, err := readSource(filename, pwd, fs, os.Stdin)
filesystems := createFilesystems()
src, err := readSource(filename, pwd, filesystems, os.Stdin)
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Same as cmd/run.go... I think you can currently re-bundle archives, which is a somewhat nice feature, for example as a way to change their options: k6 archive -u 20 -d 2m oldArchiveWithOtherOptions.tar. So, passing the caching filesystem probably isn't for the best.

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Awesome .. I was intending on having this as feature that upgrades the archive to a new version :). This is not actually a problem because after the archive is read it's filesystems are used not the ones that are provided here. Those are used if the file is not an archive but a script

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Hmm but then it means that we have a potentially multi-megabyte tar archive saved in memory in that cachedFS that won't be garbage-collected. Not a huge issue for k6 archive, unless the original tar is huge, but a minor issue for k6 run and k6 cloud

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Why would it not be garbage-collected ? We don't save any reference to anything of the fs in ReadSource and newRunner completely ignores it if it's an archive. I don't see why it won't be collected

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Because the root object of that memory tree would be filesystems here (and in the other commands), which persists for the whole duration of the command execution. So, the tar archive will continue to be present there in its files, due to the fsext.CacheOnReadFs code, I think

lib/archive.go Show resolved Hide resolved
lib/archive.go Show resolved Hide resolved
lib/archive.go Outdated Show resolved Hide resolved
lib/fsext/unprependfs.go Outdated Show resolved Hide resolved
cmd/run.go Outdated Show resolved Hide resolved
@na-- na-- mentioned this pull request Jul 9, 2019
39 tasks
The error message is not perfect, but this will be a very strange case
either way. Mostly for coverage ;)
cmd/run.go Outdated Show resolved Hide resolved
cmd/run.go Outdated Show resolved Hide resolved
cmd/run.go Outdated Show resolved Hide resolved
loader/loader.go Outdated Show resolved Hide resolved
loader/loader.go Outdated Show resolved Hide resolved
cmd/run.go Outdated
} else {
srcLocalPath = filepath.Join(pwd, src)
}
srcLocalPath = filepath.Clean(afero.FilePathSeparator + srcLocalPath)
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

hmm this seems strange if we're running on Windows - why is it ok to prepend the FilePathSeparator to an absolute path?

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Because we are stripping it in the TrimFilePathSeparatorFs ;)

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Ah, right... This whole thing feels like it's way, way more complicated than it needs to be... Unfortunately, I don't have a very clear idea how to simplify it... 😞

Case in point, I very strongly think that code in cmd should have absolutely no idea that TrimFilePathSeparatorFs exists, much less trying to accommodate it in such a way... 😕 To have such a seemingly low-level implementation detail exposed here is another sign that we need to rethink the abstractions... Can you at least add a comment here, so other people (or us, in the future) aren't confused by this like I was?

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Actually, why is this function and createFilesystems() here, instead of in loader/?

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

No better reason than: nobody moved it and this is the only place where it used ... so ...
IMO we(I) am going to refactor this at some point and possibly have an "object" that is basically the filesystems with methods such as Resolve and Load.
But given that it is only used here I am somewhat against moving it somewhere else just because this package should probably not have it ... at least for now.
I am adding some readSource tests though as I am sick of fixing and breaking things between windows and linux

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

👍 👍 for tests of this code... it would probably be easier to test if it's not in cmd though 😉 But even if you're leaving it here, please at least add a comment that explains this seemingly strange line...

Regarding any future refactoring, I think I sent you a proto-proposal on Slack? Basically, I think we should scrap the loader and just have an interface called Environment or an ExecutionEnvironment or something like that, with methods for import/require() and open(). And have 2-3 different implementations of that interface:

  1. Native / System / OS / whatever - initialized once with the current working directory (so we don't pass it around everywhere, append slashes, etc. strange things we currently do 😑 ), and then used to read (and cache) files from the local file system and actual network.
  2. Archive - initialized from archives, never does any actual network connections or touches the real FS.
  3. Test / Dummy - for simple tests, maybe just a Native environment with the OS FS swapped out with a in-memory one.

The rest of the code, especially in the higher-level parts in cmd/ and js/, should have absolutely no idea about any of the implementation details in these environments - Windows/Linux file slashes, remote HTTPS scripts or imports, etc. But all of that, or something better (since this probably isn't enough...) could wait until a future refactoring of the code...

Copy link
Member

@na-- na-- left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I can't find any other immediate issues with this, besides the minor ones I noted above, but I'm not super happy with it... It fixes a ton of bugs and issues, hopefully without adding others, so we should merge it after a bit more testing, but there definitely seems to be enough work for a second more abstraction-oriented refactoring in the future...

The current architecture definitely feels off, the handling of URLs for sure seems painful in a lot of places, especially when it comes to the use of Opaque, which I didn't even know was a thing before this PR, and I doubt we use for its intended purpose... 😕 It seems like this PR adds a lot of complexity, and I'm hopeful that we could reduce it, or at get a lower ratio between the accidental/essential parts of it with another pass in the future.

lib/archive.go Show resolved Hide resolved
loader/loader.go Outdated Show resolved Hide resolved
@CLAassistant
Copy link

CLAassistant commented Jul 17, 2019

CLA assistant check
All committers have signed the CLA.

@codecov-io
Copy link

codecov-io commented Jul 23, 2019

Codecov Report

❗ No coverage uploaded for pull request base (master@f8efe4f). Click here to learn what that means.
The diff coverage is 79.91%.

Impacted file tree graph

@@            Coverage Diff            @@
##             master    #1059   +/-   ##
=========================================
  Coverage          ?   72.98%           
=========================================
  Files             ?      138           
  Lines             ?    10153           
  Branches          ?        0           
=========================================
  Hits              ?     7410           
  Misses            ?     2305           
  Partials          ?      438
Impacted Files Coverage Δ
lib/models.go 94.52% <ø> (ø)
cmd/archive.go 26.19% <0%> (ø)
loader/filesystems.go 0% <0%> (ø)
cmd/inspect.go 11.62% <0%> (ø)
cmd/collectors.go 0% <0%> (ø)
cmd/cloud.go 9.52% <0%> (ø)
lib/fsext/cacheonread.go 100% <100%> (ø)
js/bundle.go 82.53% <100%> (ø)
stats/cloud/collector.go 70.38% <100%> (ø)
js/runner.go 84.73% <100%> (ø)
... and 7 more

Continue to review full report at Codecov.

Legend - Click here to learn more
Δ = absolute <relative> (impact), ø = not affected, ? = missing data
Powered by Codecov. Last update f8efe4f...20152f1. Read the comment docs.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
5 participants