Rewrite script/files loading to be be url based #1059

mstoykov · 2019-06-26T10:34:39Z

This includes also change to the archive file structure. Now instead of
separating files by whether they are scripts or no, they are separated
based on whether their URI scheme.

Through the (majority) of k6 now instead of simple strings a url.URL is
used to identify files(scripts or otherwise).

This also means that all imports or open can have schemes.
Previously remote modules were supported specifically without scheme.
With this change we specifically prefer if they are with a scheme. The
old variant is supported but logs a warning.
Additionally if remote module requires a relative/absolute path that doesn't have a
scheme it is relative/absolute given the remote module url.

Because of some of the changes, now caching is done through a afero.Fs
instead of additional map. This also fixes not loаding remotely imported
files from an archive, but instead requesting them again.

fixes #1037, closes #838, fixes #887 and fixes #1051

This includes also change to the archive file structure. Now instead of separating files by whether they are scripts or no, they are separated based on whether their URI scheme. Through the (majority) of k6 now instead of simple strings a url.URL is used to idenitify files(scripts or otherwise). This also means that all imports or `open` can have schemes. Previously remote modules were supported specifically without scheme. With this change we specifically prefer if they are with a scheme. The old variant is supported but logs a warning. Additionally if remote module requires a relative/absolute path that doesn't have a scheme it is relative/absolute given the remote module url. Because of some of the changes, now caching is done through a afero.Fs instead of additional map. This also fixes not laoding remotely imported files from an archive, but instead requesting them again. fixes #1037, closes #838, fixes #887 and fixes #1051

mstoykov · 2019-06-26T10:47:10Z

Things that need discussion/more work, maybe:

Test reading old archives ( I have tested this manually at some point ...)
Remove _k6=1 get argument when requesitng remote urls ? Use a specific http.Client as well in the loader.fetch. Also just pass the url.URL instead of it's string representation
Maybe write an httpsfs that requests files from urls and combine it with UnionFs and MemMapFs ? This will be more magical than https://github.com/loadimpact/k6/blob/aa4b0907c44878f923d3ea77e29f13ffba5730a3/loader/loader.go#L124-L134 so maybe don' t do it ?
Whatever @na-- says
Possibly some hacks for cdnjs and archives ? I was thinking of recording each originalModuleSpecfier -> url.URL resolve in a map and writing this map to the archive. This way we would know what was resolved to what and in the case of cdnjs we won't make network calls to see what cdnjs.com/libraries/Faker is and get 2 major version upgrade without wanting one. This still means that we will have to do some cdnjs magic for old archives.
Old archives and cdnjs/github loading currently won't work but atleast for github we can just throw it through loader.Resolve. This will also work with cdnjs actually 🤔 but we might lie on what url we are loading as now we report the fully resolved url as well. So we will load the archive version but possibly report the latest version in cdnjs ...
While writing this last too I released that we might ... wrongly report that we load from urls when we actually load from the archive ... I don't know how this should work and whether it is actually a problem

codecov · 2019-06-27T08:36:46Z

Codecov Report

Merging #1059 into master will decrease coverage by 0.19%.
The diff coverage is 76.31%.

@@            Coverage Diff            @@
##           master    #1059     +/-   ##
=========================================
- Coverage   72.78%   72.58%   -0.2%     
=========================================
  Files         133      133             
  Lines        9884     9911     +27     
=========================================
  Hits         7194     7194             
- Misses       2272     2291     +19     
- Partials      418      426      +8

Impacted Files	Coverage Δ
lib/models.go	`94.52% <ø> (ø)`	⬆️
cmd/run.go	`9.18% <0%> (-0.04%)`	⬇️
stats/cloud/collector.go	`70.38% <100%> (ø)`	⬆️
lib/archive.go	`76.87% <70.51%> (-10.38%)`	⬇️
js/initcontext.go	`95.55% <80%> (-2.37%)`	⬇️
js/bundle.go	`81.56% <84%> (-0.53%)`	⬇️
loader/loader.go	`87.09% <86.66%> (-2.78%)`	⬇️

Continue to review full report at Codecov.

Legend - Click here to learn more
Δ = absolute <relative> (impact), ø = not affected, ? = missing data
Powered by Codecov. Last update 2a2fe2c...4b91966. Read the comment docs.

codecov · 2019-06-27T08:36:46Z

Codecov Report

Merging #1059 into master will increase coverage by 0.18%.
The diff coverage is 79.87%.

@@            Coverage Diff             @@
##           master    #1059      +/-   ##
==========================================
+ Coverage   72.79%   72.98%   +0.18%     
==========================================
  Files         133      138       +5     
  Lines        9890    10152     +262     
==========================================
+ Hits         7199     7409     +210     
- Misses       2276     2305      +29     
- Partials      415      438      +23

Impacted Files	Coverage Δ
lib/models.go	`94.52% <ø> (ø)`	⬆️
cmd/archive.go	`26.19% <0%> (ø)`	⬆️
loader/filesystems.go	`0% <0%> (ø)`
cmd/inspect.go	`11.62% <0%> (-1.2%)`	⬇️
cmd/collectors.go	`0% <0%> (ø)`	⬆️
cmd/cloud.go	`9.52% <0%> (ø)`	⬆️
lib/fsext/cacheonread.go	`100% <100%> (ø)`
js/bundle.go	`82.4% <100%> (+0.17%)`	⬆️
stats/cloud/collector.go	`70.38% <100%> (ø)`	⬆️
js/runner.go	`84.73% <100%> (ø)`	⬆️
... and 14 more

Continue to review full report at Codecov.

Legend - Click here to learn more
Δ = absolute <relative> (impact), ø = not affected, ? = missing data
Powered by Codecov. Last update c854389...a9498eb. Read the comment docs.

na--

I like the approach for using url.URL structs to record the paths, but I'm starting to think that this may be an implementation detail that has been unnecessarily exposed too "high" in the k6 stack in this pull request... 😕

It seems to me that it makes more sense to treat the paths in the user-facing parts of k6 (k6 run <whatever>, import X from "<whatever>", require("whatever"), open("whatever"), etc.) as simple strings - they will just use exactly what the user passed. This would make those parts of k6 dumb to the actual mechanics of what happens under the hood, but will allow us to have sensible error messages to the user, displaying exactly the whatever string path they specified. All the while, the actual file loading code underneath can deal with the complexities in all of the many different execution scenarios.

Basically, if we have a Loader interface (for lack of a better term, since it would mostly replace the current one) with the methods LoadModule(string) (Something?, error) (for import / require()) and LoadFile(string) (io.ReadCloser, error) (or something else, like GetFS() - for open() and any future file APIs), the user-facing k6 parts should be happy. But underneath, two or more implementations of that interface (one for archives and one for the rest?) deal with the complexities of actually resolving and loading whatever was requested:

using url.URL everywhere, or only for some things?
using different layer cakes of afero.Fs (or whatever) implementations that deal with whatever is needed - caching, slash conversion for windows archives, etc.
resolving things properly - either via the afero FSes, or via the github/cdnjs shortcuts, etc.
?? (I'm definitely missing something...)

lib/archive.go

cmd/run.go

na-- · 2019-07-02T07:57:57Z

js/bundle.go

 // NewBundle creates a new bundle from a source file and a filesystem.
-func NewBundle(src *lib.SourceData, fs afero.Fs, rtOpts lib.RuntimeOptions) (*Bundle, error) {
+func NewBundle(src *lib.SourceData, fileFS afero.Fs, rtOpts lib.RuntimeOptions) (*Bundle, error) {


Why this rename? Are we sure that this file system will always be a fileFS, not something else?

In my current reworking due to windows ... I now provide the whole map[string]afero.Fs, for reasons that will become apparent :)
But yeah the idea here was that this would be the fileFs that was used to load the original script from ... Obviously this doesn't work very well as

We won't cache that one when we first load it in the cmd/run.go or elsewhere

Won't work if we load (and again not cache it) if it was from a remote script ... aka https Fs :)

na-- · 2019-07-02T08:10:56Z

js/bundle_test.go

@@ -58,11 +62,11 @@ func TestNewBundle(t *testing.T) {
 	})
 	t.Run("Invalid", func(t *testing.T) {
 		_, err := getSimpleBundle("/script.js", "\x00")
-		assert.Contains(t, err.Error(), "SyntaxError: /script.js: Unexpected character '\x00' (1:0)\n> 1 | \x00\n")
+		assert.Contains(t, err.Error(), "SyntaxError: file:///script.js: Unexpected character '\x00' (1:0)\n> 1 | \x00\n")


This seems strange and slightly wrong to me - why should the user see the file schema in the error messages if they didn't explicitly specify the schema themselves?

Maybe we should print both the original thing that the user wrote and what we resolved it to as in loader.Load
I find the resolved final url to be even more useful as otherwise you don't actually know what we resolved whatever you provided us with

I find the resolved final url to be even more useful as otherwise you don't actually know what we resolved whatever you provided us with

I don't understand what you mean by this, sorry

if you have two scripts which have require("./scripts.js") but are in different directories and our message only tells you ./script.js ... had an error ... you will have trouble to find out which ./script.js it is. This gets even worse if you have a https import which imports ./script.js or the more likely ./utils.js .
But yeah I agree that having the actual name of what the user provided is a good idea.

if you have two scripts which have require("./scripts.js") but are in different directories and our message only tells you ./script.js ... had an error ... you will have trouble to find out which ./script.js it is.

In my head, the clear way to solve this is that both the originating file and the problem should be part of the log message, something like this:

/dir1/script1:line1234: cannot open "whatever the user wanted to open" /dir2/script2:line3456: cannot open "whatever the user wanted to open"

oops, wrong resolve... in both cases, whatever the user wanted to open should be exactly the string the user specified though

so... is this fixable?

I would consider the new one an improvement on the previous error. And what you propose to be a separate PR.

…chiveLoadingAndSupportSchemesInFiles

loader/loader.go

na--

I like how this improves and fixes a ton of things, but I'm honestly starting to get scared by the scope of it and by merging this before what is supposed to be a bugfix release...

cmd/archive.go

cmd/cloud.go

cmd/run.go

cmd/cloud.go

na-- · 2019-07-05T05:17:13Z

cmd/archive.go

-		fs := afero.NewOsFs()
-		src, err := readSource(filename, pwd, fs, os.Stdin)
+		filesystems := createFilesystems()
+		src, err := readSource(filename, pwd, filesystems, os.Stdin)


Same as cmd/run.go... I think you can currently re-bundle archives, which is a somewhat nice feature, for example as a way to change their options: k6 archive -u 20 -d 2m oldArchiveWithOtherOptions.tar. So, passing the caching filesystem probably isn't for the best.

Awesome .. I was intending on having this as feature that upgrades the archive to a new version :). This is not actually a problem because after the archive is read it's filesystems are used not the ones that are provided here. Those are used if the file is not an archive but a script

Hmm but then it means that we have a potentially multi-megabyte tar archive saved in memory in that cachedFS that won't be garbage-collected. Not a huge issue for k6 archive, unless the original tar is huge, but a minor issue for k6 run and k6 cloud

Why would it not be garbage-collected ? We don't save any reference to anything of the fs in ReadSource and newRunner completely ignores it if it's an archive. I don't see why it won't be collected

Because the root object of that memory tree would be filesystems here (and in the other commands), which persists for the whole duration of the command execution. So, the tar archive will continue to be present there in its files, due to the fsext.CacheOnReadFs code, I think

lib/archive.go

lib/fsext/unprependfs.go

cmd/run.go

…n archives

…chiveLoadingAndSupportSchemesInFiles

The error message is not perfect, but this will be a very strange case either way. Mostly for coverage ;)

cmd/run.go

loader/loader.go

na-- · 2019-07-16T09:16:07Z

cmd/run.go

+	} else {
+		srcLocalPath = filepath.Join(pwd, src)
+	}
+	srcLocalPath = filepath.Clean(afero.FilePathSeparator + srcLocalPath)


hmm this seems strange if we're running on Windows - why is it ok to prepend the FilePathSeparator to an absolute path?

Because we are stripping it in the TrimFilePathSeparatorFs ;)

Ah, right... This whole thing feels like it's way, way more complicated than it needs to be... Unfortunately, I don't have a very clear idea how to simplify it... 😞

Case in point, I very strongly think that code in cmd should have absolutely no idea that TrimFilePathSeparatorFs exists, much less trying to accommodate it in such a way... 😕 To have such a seemingly low-level implementation detail exposed here is another sign that we need to rethink the abstractions... Can you at least add a comment here, so other people (or us, in the future) aren't confused by this like I was?

Actually, why is this function and createFilesystems() here, instead of in loader/?

No better reason than: nobody moved it and this is the only place where it used ... so ...
IMO we(I) am going to refactor this at some point and possibly have an "object" that is basically the filesystems with methods such as Resolve and Load.
But given that it is only used here I am somewhat against moving it somewhere else just because this package should probably not have it ... at least for now.
I am adding some readSource tests though as I am sick of fixing and breaking things between windows and linux

👍 👍 for tests of this code... it would probably be easier to test if it's not in cmd though 😉 But even if you're leaving it here, please at least add a comment that explains this seemingly strange line...

Regarding any future refactoring, I think I sent you a proto-proposal on Slack? Basically, I think we should scrap the loader and just have an interface called Environment or an ExecutionEnvironment or something like that, with methods for import/require() and open(). And have 2-3 different implementations of that interface:

Native / System / OS / whatever - initialized once with the current working directory (so we don't pass it around everywhere, append slashes, etc. strange things we currently do 😑 ), and then used to read (and cache) files from the local file system and actual network.

Archive - initialized from archives, never does any actual network connections or touches the real FS.

Test / Dummy - for simple tests, maybe just a Native environment with the OS FS swapped out with a in-memory one.

The rest of the code, especially in the higher-level parts in cmd/ and js/, should have absolutely no idea about any of the implementation details in these environments - Windows/Linux file slashes, remote HTTPS scripts or imports, etc. But all of that, or something better (since this probably isn't enough...) could wait until a future refactoring of the code...

na--

I can't find any other immediate issues with this, besides the minor ones I noted above, but I'm not super happy with it... It fixes a ton of bugs and issues, hopefully without adding others, so we should merge it after a bit more testing, but there definitely seems to be enough work for a second more abstraction-oriented refactoring in the future...

The current architecture definitely feels off, the handling of URLs for sure seems painful in a lot of places, especially when it comes to the use of Opaque, which I didn't even know was a thing before this PR, and I doubt we use for its intended purpose... 😕 It seems like this PR adds a lot of complexity, and I'm hopeful that we could reduce it, or at get a lower ratio between the accidental/essential parts of it with another pass in the future.

lib/archive.go

loader/loader.go

CLAassistant · 2019-07-17T12:59:52Z

All committers have signed the CLA.

codecov-io · 2019-07-23T08:04:03Z

Codecov Report

❗ No coverage uploaded for pull request base (master@f8efe4f). Click here to learn what that means.
The diff coverage is 79.91%.

@@            Coverage Diff            @@
##             master    #1059   +/-   ##
=========================================
  Coverage          ?   72.98%           
=========================================
  Files             ?      138           
  Lines             ?    10153           
  Branches          ?        0           
=========================================
  Hits              ?     7410           
  Misses            ?     2305           
  Partials          ?      438

Impacted Files	Coverage Δ
lib/models.go	`94.52% <ø> (ø)`
cmd/archive.go	`26.19% <0%> (ø)`
loader/filesystems.go	`0% <0%> (ø)`
cmd/inspect.go	`11.62% <0%> (ø)`
cmd/collectors.go	`0% <0%> (ø)`
cmd/cloud.go	`9.52% <0%> (ø)`
lib/fsext/cacheonread.go	`100% <100%> (ø)`
js/bundle.go	`82.53% <100%> (ø)`
stats/cloud/collector.go	`70.38% <100%> (ø)`
js/runner.go	`84.73% <100%> (ø)`
... and 7 more

Continue to review full report at Codecov.

Legend - Click here to learn more
Δ = absolute <relative> (impact), ø = not affected, ? = missing data
Powered by Codecov. Last update f8efe4f...20152f1. Read the comment docs.

…chiveLoadingAndSupportSchemesInFiles

mstoykov requested a review from na-- June 26, 2019 10:34

fixup! Rewrite script/files loading to be be url based

4b91966

mstoykov mentioned this pull request Jun 27, 2019

Refactor loading of modules - support remote modules with scheme #1046

Closed

na-- reviewed Jul 2, 2019

View reviewed changes

mstoykov added 4 commits July 4, 2019 15:03

fixup! Rewrite script/files loading to be be url based

a894e80

fixup! Rewrite script/files loading to be be url based

3843bc9

Merge remote-tracking branch 'origin/master' into refactor/RewriteaAr…

d27a5f0

…chiveLoadingAndSupportSchemesInFiles

Rename FSes to Filesystems

14e49b1

golangcibot reviewed Jul 4, 2019

View reviewed changes

loader/loader.go Outdated Show resolved Hide resolved

loader/loader.go Outdated Show resolved Hide resolved

mstoykov added 3 commits July 4, 2019 15:50

Fix lll error

2222235

Add tests for old archive support

53f10f5

Fix saving and loading archive filename and pwd

ef49dfd

na-- reviewed Jul 5, 2019

View reviewed changes

mstoykov added 10 commits July 5, 2019 10:04

Fix pwd sometimes missing it's slash at the end

3047dec

Fix not recognizing local files which are not starting with . or /

8add590

Fix lifting config files from the filesystems in the memmapfs and tha…

a484a31

…n archives

Add comment about UnprependPathFs

3e65e7e

Better names and abstration for UnprependPathFs

0832273

Refactor normalizedFs to be a ChangePathFs

56a4c37

rename changepathfs file

2fe2d63

Implement caching and reading of cache for github/cdnjs urls

622409c

Refactor the walk function in lib.archive a bit

ac764f3

Fix name of dumpMemMapFsToBuf

2e6bce1

na-- mentioned this pull request Jul 9, 2019

New executors #1007

Merged

39 tasks

mstoykov added 2 commits July 9, 2019 17:52

Add test for ChangePathFs and some fixes

edf5d79

Merge remote-tracking branch 'origin/master' into refactor/RewriteaAr…

c2e4417

…chiveLoadingAndSupportSchemesInFiles

mstoykov added 2 commits July 15, 2019 11:23

Add tests for bad filename/pwd in archives

e6b6ee2

The error message is not perfect, but this will be a very strange case either way. Mostly for coverage ;)

Add test for malformed metadata in archive

1ee2b16

na-- reviewed Jul 15, 2019

View reviewed changes

cmd/run.go Outdated Show resolved Hide resolved

cmd/run.go Outdated Show resolved Hide resolved

cmd/run.go Outdated Show resolved Hide resolved

na-- reviewed Jul 15, 2019

View reviewed changes

loader/loader.go Outdated Show resolved Hide resolved

s/travers/traverse

ab0bc8a

na-- reviewed Jul 15, 2019

View reviewed changes

loader/loader.go Outdated Show resolved Hide resolved

mstoykov added 4 commits July 15, 2019 18:34

Fix relative and absolute paths on windows

55b90d0

loader.Resolve: Don't change pwd when it's missing slash on the end

97a658e

If a command gets an absolute path don't try it as relative

723c49e

Add test with funky paths to the archive

3b3b905

na-- reviewed Jul 16, 2019

View reviewed changes

mstoykov added 4 commits July 16, 2019 16:04

100% test coverage for cmd#readSource

67c6b13

Support running and archiving when giving scripts with stdin

fd5d380

Move cmd#readSource to loader#ReadSource

60f2295

Move cmd#createFilesystems to loader#CreateFilesystems

305bf81

na-- approved these changes Jul 17, 2019

View reviewed changes

lib/archive.go Show resolved Hide resolved

loader/loader.go Outdated Show resolved Hide resolved

typo

f57c992

mstoykov added 2 commits July 18, 2019 15:54

case insensitive anonymizaiton for windows paths

a9498eb

Add the os under which the archive was made in the archive

9249ba4

Merge remote-tracking branch 'origin/master' into refactor/RewriteaAr…

20152f1

…chiveLoadingAndSupportSchemesInFiles

mstoykov mentioned this pull request Jul 24, 2019

Refactor loading and archiving or scripts post #1059 #1089

Closed

na-- mentioned this pull request Jul 31, 2019

Release k6 v0.25.0 #1095

Merged

mstoykov merged commit ea8384f into master Jul 31, 2019

na-- deleted the refactor/RewriteaArchiveLoadingAndSupportSchemesInFiles branch November 12, 2019 11:34

na-- mentioned this pull request Nov 12, 2019

k6 run -o cloud does not work with scripts from shortcut URLs #1236

Closed

mstoykov mentioned this pull request Jan 21, 2021

open() seems to not always return the full data if that wasn't loaded in __VU =0 #1771

Closed

mstoykov mentioned this pull request Feb 17, 2021

Drop support for schemeless urls #1862

Closed

mstoykov mentioned this pull request Jun 4, 2021

Fix panic on missing files in cdnjs links #2047

Merged

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Rewrite script/files loading to be be url based #1059

Rewrite script/files loading to be be url based #1059

mstoykov commented Jun 26, 2019 •

edited by na--

Loading

mstoykov commented Jun 26, 2019 •

edited

Loading

codecov bot commented Jun 27, 2019

codecov bot commented Jun 27, 2019 •

edited

Loading

na-- left a comment

na-- Jul 2, 2019

mstoykov Jul 2, 2019

na-- Jul 2, 2019

mstoykov Jul 2, 2019

na-- Jul 2, 2019

mstoykov Jul 2, 2019

na-- Jul 2, 2019

na-- Jul 2, 2019

na-- Jul 10, 2019

mstoykov Jul 10, 2019

na-- left a comment

na-- Jul 5, 2019

mstoykov Jul 5, 2019

na-- Jul 5, 2019

mstoykov Jul 5, 2019

na-- Jul 5, 2019

na-- Jul 16, 2019

mstoykov Jul 16, 2019

na-- Jul 16, 2019

na-- Jul 16, 2019

mstoykov Jul 16, 2019

na-- Jul 16, 2019

na-- left a comment

CLAassistant commented Jul 17, 2019 •

edited

Loading

codecov-io commented Jul 23, 2019 •

edited

Loading

Rewrite script/files loading to be be url based #1059

Rewrite script/files loading to be be url based #1059

Conversation

mstoykov commented Jun 26, 2019 • edited by na-- Loading

mstoykov commented Jun 26, 2019 • edited Loading

codecov bot commented Jun 27, 2019

Codecov Report

codecov bot commented Jun 27, 2019 • edited Loading

Codecov Report

na-- left a comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

na-- left a comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

na-- left a comment

Choose a reason for hiding this comment

CLAassistant commented Jul 17, 2019 • edited Loading

codecov-io commented Jul 23, 2019 • edited Loading

Codecov Report

mstoykov commented Jun 26, 2019 •

edited by na--

Loading

mstoykov commented Jun 26, 2019 •

edited

Loading

codecov bot commented Jun 27, 2019 •

edited

Loading

CLAassistant commented Jul 17, 2019 •

edited

Loading

codecov-io commented Jul 23, 2019 •

edited

Loading