Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

open() seems to not always return the full data if that wasn't loaded in __VU =0 #1771

Closed
mstoykov opened this issue Dec 15, 2020 · 10 comments · Fixed by #2314
Closed

open() seems to not always return the full data if that wasn't loaded in __VU =0 #1771

mstoykov opened this issue Dec 15, 2020 · 10 comments · Fixed by #2314
Assignees
Labels
bug evaluation needed proposal needs to be validated or tested before fully implementing it in k6

Comments

@mstoykov
Copy link
Contributor

The following code:

if (__VU >0) {
JSON.parse(open("./arr.json"));
}

Where arr.json is sufficiently big JSON seems to fail sometimes because not the whole JSON gets loaded.

This is, technically, a bad script, but maybe it should fail in another way that is more explanatory of what is happening or not at all

@mstoykov mstoykov added bug evaluation needed proposal needs to be validated or tested before fully implementing it in k6 labels Dec 15, 2020
@na--
Copy link
Member

na-- commented Jan 4, 2021

Can you share some more details - open() doesn't throw an exception, but instead truncates the result?

@mstoykov
Copy link
Contributor Author

mstoykov commented Jan 4, 2021

Yes it truncates it which made JSON.parse fail in my particular case which then threw an exception. It seemed like some kind of a race in the aferoFS, but I decided against looking into it too deeply at the time in favor of actually finishing some stuff before the holidays :)

@marnikvde
Copy link

marnikvde commented Jan 20, 2021

I have seen this as well on 0.29.0, running on Ubuntu WSL on Windows 10. I made a wrapper around open() that always prints the length of the string it read from the file, and every now and then, the size is 0. I've been trying to reproduce for hours, but I can't seem to find it.

This is what my wrapper looks like:

function TextFileToString(filename) {
  console.log(`Running TextFileToString() for file ${filename} by using K6's open() method for vu ${__VU}`);
  const fileContents = open(filename);
  const script = __ENV.scriptFilename ? __ENV.scriptFilename : "<unknown>";
  console.log(`script ${script} reads ${fileContents.length} chars from ${filename} for vu ${__VU}`);
  return fileContents;
}

I do a JSON.parse() on the result of that wrapper, and that fails of course on an empty string. This is what I have seen multiple times:

time="2021-01-20T17:15:32+01:00" level=info msg="script foo.js reads 0 chars from foo.json for vu 5" source=console
time="2021-01-20T17:15:32+01:00" level=info msg="script foo.js reads 99092 chars from foo.json for vu 8" source=console
time="2021-01-20T17:15:32+01:00" level=info msg="script foo.js reads 99092 chars from foo.json for vu 1" source=console
time="2021-01-20T17:15:32+01:00" level=info msg="script foo.js reads 99090 chars from foo.json for vu 7" source=console
time="2021-01-20T17:15:32+01:00" level=error msg="Engine error" hint="error while initializing VU #5: 'SyntaxError: EOF at loadClients (file:///home/marnik/bar.js:68:197(181))'"

So VU 5 seems to haven open() returning an empty string, while the other VU's are reading it just fine.

This is my code that I'm using to try and reproduce (so far, no success), which is basicly an example from k6.io with even less complexity:

import http from 'k6/http';
import { sleep, check } from 'k6';
import { Counter } from 'k6/metrics';

const filename = 'foo.json';
const data = open(filename);

if (data.length != 99090) {
  console.error(`VU ${__VU} read a length of ${data.length} for ${filename}`);
  throw new Error("findme");
}

export const requests = new Counter('http_reqs');

export const options = {
  stages: [
    { target: 1, duration: '5s' },
  ],
  thresholds: {
    requests: ['count < 100'],
  },
};

export default function () {
  const res = http.get('http://test.k6.io');

  sleep(1);
  
  const checkRes = check(res, {
    'status is 200': (r) => r.status === 200,
    'response body': (r) => r.body.indexOf('Feel free to browse') !== -1,
  });
}

And I'm trying to reproduce it like this:

while true; do timeout 2s k6 run --paused issue-open-test.js --vus 10 >> test.log 2>&1; done

But so far I have not cought the exception. I'm using --paused because the error happens in the init phase.

Any hints on how to reproduce?

@marnikvde
Copy link

Another interesting find (that is not documented): I implemented a retry for open() with an exponential backoff, so if open() returns a 0-length string, I print a warning and retry a few times untill I reach maxRetries or maxRetryTime. I can't reproduce the 0-length string on a non-empty file, so I created an empty file, started the script, waited for the warning, and within the +-10 retries in 1 minute, filled the file with valid JSON and saved it. My retry implementation never recovered, so I can only conclude that open() doesn't take file system changes into account. Could it be that k6 works on some kind of snapshot of the filesystem, and ignores changes after it has started? Or maybe there is some caching on open(), so that I can never make it recover by retrying?

@marnikvde
Copy link

It looks like my retry is working, so I'm suspecting it's indeed a k6 issue. This is the redacted output of my script:

          /\      |‾‾| /‾‾/   /‾‾/   
     /\  /  \     |  |/  /   /  /    
    /  \/    \    |     (   /   ‾‾\  
   /          \   |  |\  \ |  (‾)  | 
  / __________ \  |__| \__\ \_____/ .io

time="2021-01-20T22:08:22+01:00" level=info msg="script script_example.js reads 95992 chars from file1.json for vu 0" source=console
time="2021-01-20T22:08:22+01:00" level=info msg="script script_example.js reads 83259 chars from file2.json for vu 0" source=console
time="2021-01-20T22:08:22+01:00" level=info msg="script script_example.js reads 50558 chars from file3.json for vu 0" source=console
time="2021-01-20T22:08:22+01:00" level=info msg="script script_example.js reads 37538 chars from file4.json for vu 0" source=console
time="2021-01-20T22:08:22+01:00" level=info msg="script script_example.js reads 16447 chars from file5.json for vu 0" source=console
time="2021-01-20T22:08:22+01:00" level=info msg="script script_example.js reads 2529 chars from file6.json for vu 0" source=console
  execution: local
     script: script_example.js
     output: -

  scenarios: (100.00%) 10 scenarios, 10 max VUs, 5m30s max duration (incl. graceful stop):
           * BT101: 1 iterations shared among 1 VUs (maxDuration: 5m0s, exec: BT101, gracefulStop: 30s)
           * BT102: 1 iterations shared among 1 VUs (maxDuration: 5m0s, exec: BT102, gracefulStop: 30s)
           * BT111: 1 iterations shared among 1 VUs (maxDuration: 5m0s, exec: BT111, gracefulStop: 30s)
           * BT113: 1 iterations shared among 1 VUs (maxDuration: 5m0s, exec: BT113, gracefulStop: 30s)
           * BT115: 1 iterations shared among 1 VUs (maxDuration: 5m0s, exec: BT115, gracefulStop: 30s)
           * BT116: 1 iterations shared among 1 VUs (maxDuration: 5m0s, exec: BT116, gracefulStop: 30s)
           * BT118: 1 iterations shared among 1 VUs (maxDuration: 5m0s, exec: BT118, gracefulStop: 30s)
           * BT119: 1 iterations shared among 1 VUs (maxDuration: 5m0s, exec: BT119, gracefulStop: 30s)
           * BT120: 1 iterations shared among 1 VUs (maxDuration: 5m0s, exec: BT120, gracefulStop: 30s)
           * BT121: 1 iterations shared among 1 VUs (maxDuration: 5m0s, exec: BT121, gracefulStop: 30s)

time="2021-01-20T22:08:22+01:00" level=info msg="script script_example.js reads 95992 chars from file7.json for vu 3" source=console
time="2021-01-20T22:08:22+01:00" level=info msg="script script_example.js reads 95992 chars from file8.json for vu 4" source=console
time="2021-01-20T22:08:22+01:00" level=info msg="script script_example.js reads 95992 chars from file9.json for vu 1" source=console
time="2021-01-20T22:08:22+01:00" level=info msg="script script_example.js reads 95992 chars from file10.json for vu 7" source=console
time="2021-01-20T22:08:22+01:00" level=info msg="script script_example.js reads 95992 chars from file11.json for vu 8" source=console
time="2021-01-20T22:08:22+01:00" level=info msg="script script_example.js reads 95992 chars from file12.json for vu 2" source=console
time="2021-01-20T22:08:22+01:00" level=info msg="script script_example.js reads 83684 chars from file13.json for vu 3" source=console
time="2021-01-20T22:08:22+01:00" level=info msg="script script_example.js reads 84520 chars from file14.json for vu 1" source=console
time="2021-01-20T22:08:23+01:00" level=info msg="script script_example.js reads 83844 chars from file15.json for vu 7" source=console
time="2021-01-20T22:08:23+01:00" level=info msg="script script_example.js reads 83684 chars from file13.json for vu 4" source=console
time="2021-01-20T22:08:23+01:00" level=info msg="script script_example.js reads 84482 chars from file16.json for vu 2" source=console
time="2021-01-20T22:08:23+01:00" level=info msg="script script_example.js reads 95992 chars from file17.json for vu 6" source=console
time="2021-01-20T22:08:23+01:00" level=info msg="script script_example.js reads 95992 chars from file18.json for vu 5" source=console
time="2021-01-20T22:08:23+01:00" level=info msg="script script_example.js reads 82909 chars from file19.json for vu 6" source=console
time="2021-01-20T22:08:23+01:00" level=info msg="script script_example.js reads 83844 chars from file15.json for vu 5" source=console
time="2021-01-20T22:08:23+01:00" level=info msg="script script_example.js reads 50526 chars from file20.json for vu 7" source=console
time="2021-01-20T22:08:23+01:00" level=info msg="script script_example.js reads 50315 chars from file21.json for vu 3" source=console
time="2021-01-20T22:08:23+01:00" level=info msg="script script_example.js reads 50524 chars from file22.json for vu 1" source=console
time="2021-01-20T22:08:23+01:00" level=info msg="script script_example.js reads 50315 chars from file21.json for vu 4" source=console
time="2021-01-20T22:08:23+01:00" level=info msg="script script_example.js reads 50277 chars from file23.json for vu 2" source=console
time="2021-01-20T22:08:23+01:00" level=info msg="script script_example.js reads 85171 chars from file24.json for vu 8" source=console
time="2021-01-20T22:08:23+01:00" level=warning msg="(Re)try 1 read a string length of 0 from file file25.json, we suspect this is a bug in k6, let's try again after sleeping for 0.1s" source=console
time="2021-01-20T22:08:23+01:00" level=info msg="script script_example.js reads 38174 chars from file25.json for vu 7" source=console
time="2021-01-20T22:08:23+01:00" level=info msg="script script_example.js reads 38052 chars from file26.json for vu 1" source=console
time="2021-01-20T22:08:23+01:00" level=info msg="script script_example.js reads 50817 chars from file27.json for vu 6" source=console
time="2021-01-20T22:08:23+01:00" level=info msg="script script_example.js reads 37528 chars from file28.json for vu 4" source=console
time="2021-01-20T22:08:23+01:00" level=info msg="script script_example.js reads 38174 chars from file25.json for vu 2" source=console
time="2021-01-20T22:08:23+01:00" level=info msg="script script_example.js reads 16447 chars from file5.json for vu 7" source=console
time="2021-01-20T22:08:23+01:00" level=info msg="script script_example.js reads 16447 chars from file5.json for vu 1" source=console
time="2021-01-20T22:08:23+01:00" level=info msg="script script_example.js reads 37670 chars from file29.json for vu 6" source=console
time="2021-01-20T22:08:23+01:00" level=info msg="script script_example.js reads 2529 chars from file6.json for vu 7" source=console
time="2021-01-20T22:08:23+01:00" level=info msg="script script_example.js reads 2529 chars from file6.json for vu 1" source=console
time="2021-01-20T22:08:23+01:00" level=info msg="script script_example.js reads 16447 chars from file5.json for vu 4" source=console
time="2021-01-20T22:08:23+01:00" level=info msg="script script_example.js reads 16447 chars from file5.json for vu 2" source=console
time="2021-01-20T22:08:23+01:00" level=info msg="script script_example.js reads 2529 chars from file6.json for vu 4" source=console
time="2021-01-20T22:08:23+01:00" level=info msg="script script_example.js reads 2529 chars from file6.json for vu 2" source=console
time="2021-01-20T22:08:23+01:00" level=info msg="script script_example.js reads 16447 chars from file5.json for vu 6" source=console
time="2021-01-20T22:08:23+01:00" level=info msg="script script_example.js reads 2529 chars from file6.json for vu 6" source=console
time="2021-01-20T22:08:23+01:00" level=info msg="script script_example.js reads 50558 chars from file3.json for vu 8" source=console
time="2021-01-20T22:08:23+01:00" level=info msg="script script_example.js reads 50232 chars from file270.json for vu 5" source=console
time="2021-01-20T22:08:23+01:00" level=info msg="script script_example.js reads 37575 chars from file30.json for vu 8" source=console
time="2021-01-20T22:08:23+01:00" level=info msg="script script_example.js reads 16447 chars from file5.json for vu 8" source=console
time="2021-01-20T22:08:23+01:00" level=info msg="script script_example.js reads 38277 chars from file31.json for vu 5" source=console
time="2021-01-20T22:08:23+01:00" level=info msg="script script_example.js reads 2529 chars from file6.json for vu 8" source=console
time="2021-01-20T22:08:23+01:00" level=info msg="script script_example.js reads 16447 chars from file5.json for vu 5" source=console
time="2021-01-20T22:08:23+01:00" level=info msg="script script_example.js reads 2529 chars from file6.json for vu 5" source=console
time="2021-01-20T22:08:23+01:00" level=info msg="script script_example.js reads 95992 chars from file32.json for vu 10" source=console
time="2021-01-20T22:08:23+01:00" level=info msg="script script_example.js reads 95992 chars from file33.json for vu 9" source=console
time="2021-01-20T22:08:23+01:00" level=info msg="script script_example.js reads 85076 chars from file34.json for vu 9" source=console
time="2021-01-20T22:08:23+01:00" level=info msg="script script_example.js reads 84520 chars from file14.json for vu 10" source=console
time="2021-01-20T22:08:23+01:00" level=info msg="script script_example.js reads 50771 chars from file35.json for vu 9" source=console
time="2021-01-20T22:08:23+01:00" level=info msg="script script_example.js reads 50498 chars from file36.json for vu 10" source=console
time="2021-01-20T22:08:23+01:00" level=info msg="script script_example.js reads 37575 chars from file30.json for vu 9" source=console
time="2021-01-20T22:08:23+01:00" level=info msg="script script_example.js reads 38052 chars from file26.json for vu 10" source=console
time="2021-01-20T22:08:23+01:00" level=info msg="script script_example.js reads 16447 chars from file5.json for vu 9" source=console
time="2021-01-20T22:08:23+01:00" level=info msg="script script_example.js reads 16447 chars from file5.json for vu 10" source=console
time="2021-01-20T22:08:23+01:00" level=info msg="script script_example.js reads 2529 chars from file6.json for vu 9" source=console
time="2021-01-20T22:08:23+01:00" level=info msg="script script_example.js reads 2529 chars from file6.json for vu 10" source=console
time="2021-01-20T22:08:23+01:00" level=info msg="script script_example.js reads 38174 chars from file25.json for vu 3" source=console
time="2021-01-20T22:08:23+01:00" level=info msg="script script_example.js reads 16447 chars from file5.json for vu 3" source=console
time="2021-01-20T22:08:23+01:00" level=info msg="script script_example.js reads 2529 chars from file6.json for vu 3" source=console
time="2021-01-20T22:08:23+01:00" level=info msg="script script_example.js reads 95992 chars from file37.json for vu 0" source=console
time="2021-01-20T22:08:23+01:00" level=info msg="script script_example.js reads 85171 chars from file24.json for vu 0" source=console
time="2021-01-20T22:08:23+01:00" level=info msg="script script_example.js reads 50817 chars from file27.json for vu 0" source=console
time="2021-01-20T22:08:23+01:00" level=info msg="script script_example.js reads 37888 chars from file38.json for vu 0" source=console
time="2021-01-20T22:08:23+01:00" level=info msg="script script_example.js reads 16447 chars from file5.json for vu 0" source=console
time="2021-01-20T22:08:23+01:00" level=info msg="script script_example.js reads 2529 chars from file6.json for vu 0" source=console
time="2021-01-20T22:08:23+01:00" level=info msg="Request:\nGET <redacted>" source=http-debug
time="2021-01-20T22:08:23+01:00" level=info msg="Request:\nGET <redacted>" source=http-debug
time="2021-01-20T22:08:23+01:00" level=info msg="Request:\nGET <redacted>" source=http-debug
time="2021-01-20T22:08:23+01:00" level=info msg="Request:\nGET <redacted>" source=http-debug
time="2021-01-20T22:08:23+01:00" level=info msg="Request:\nGET <redacted>" source=http-debug
time="2021-01-20T22:08:23+01:00" level=info msg="Request:\nGET <redacted>" source=http-debug
time="2021-01-20T22:08:23+01:00" level=info msg="Request:\nGET <redacted>" source=http-debug
time="2021-01-20T22:08:23+01:00" level=info msg="Request:\nGET <redacted>" source=http-debug
time="2021-01-20T22:08:23+01:00" level=info msg="Request:\nGET <redacted>" source=http-debug
time="2021-01-20T22:08:23+01:00" level=info msg="Request:\nGET <redacted>" source=http-debug
time="2021-01-20T22:08:23+01:00" level=info msg="Response:\nHTTP/1.1 200 <redacted>" source=http-debug
time="2021-01-20T22:08:23+01:00" level=info msg="Response:\nHTTP/1.1 200 <redacted>" source=http-debug
time="2021-01-20T22:08:23+01:00" level=info msg="Response:\nHTTP/1.1 200 <redacted>" source=http-debug
time="2021-01-20T22:08:23+01:00" level=info msg="Response:\nHTTP/1.1 200 <redacted>" source=http-debug
time="2021-01-20T22:08:23+01:00" level=info msg="Response:\nHTTP/1.1 200 <redacted>" source=http-debug
time="2021-01-20T22:08:23+01:00" level=info msg="Response:\nHTTP/1.1 200 <redacted>" source=http-debug
time="2021-01-20T22:08:23+01:00" level=info msg="Response:\nHTTP/1.1 200 <redacted>" source=http-debug
time="2021-01-20T22:08:23+01:00" level=info msg="Response:\nHTTP/1.1 200 <redacted>" source=http-debug
time="2021-01-20T22:08:23+01:00" level=info msg="Response:\nHTTP/1.1 200 <redacted>" source=http-debug
time="2021-01-20T22:08:23+01:00" level=info msg="Response:\nHTTP/1.1 200 <redacted>" source=http-debug

running (0m00.8s), 00/10 VUs, 10 complete and 0 interrupted iterations
BT101 ✓ [ 100% ] 1 VUs  0m00.2s/5m0s  1/1 shared iters
BT102 ✓ [ 100% ] 1 VUs  0m00.2s/5m0s  1/1 shared iters
BT111 ✓ [ 100% ] 1 VUs  0m00.2s/5m0s  1/1 shared iters
BT113 ✓ [ 100% ] 1 VUs  0m00.8s/5m0s  1/1 shared iters
BT115 ✓ [ 100% ] 1 VUs  0m00.3s/5m0s  1/1 shared iters
BT116 ✓ [ 100% ] 1 VUs  0m00.4s/5m0s  1/1 shared iters
BT118 ✓ [ 100% ] 1 VUs  0m00.2s/5m0s  1/1 shared iters
BT119 ✓ [ 100% ] 1 VUs  0m00.3s/5m0s  1/1 shared iters
BT120 ✓ [ 100% ] 1 VUs  0m00.2s/5m0s  1/1 shared iters
BT121 ✓ [ 100% ] 1 VUs  0m00.2s/5m0s  1/1 shared iters
time="2021-01-20T22:08:23+01:00" level=info msg="script script_example.js reads 95992 chars from file39.json for vu 0" source=console
time="2021-01-20T22:08:23+01:00" level=info msg="script script_example.js reads 84482 chars from file16.json for vu 0" source=console
time="2021-01-20T22:08:23+01:00" level=info msg="script script_example.js reads 50315 chars from file21.json for vu 0" source=console
time="2021-01-20T22:08:23+01:00" level=info msg="script script_example.js reads 38174 chars from file25.json for vu 0" source=console
time="2021-01-20T22:08:23+01:00" level=info msg="script script_example.js reads 16447 chars from file5.json for vu 0" source=console
time="2021-01-20T22:08:23+01:00" level=info msg="script script_example.js reads 2529 chars from file6.json for vu 0" source=console

running (0m00.9s), 00/10 VUs, 10 complete and 0 interrupted iterations
BT101 ✓ [ 100% ] 1 VUs  0m00.2s/5m0s  1/1 shared iters
BT102 ✓ [ 100% ] 1 VUs  0m00.2s/5m0s  1/1 shared iters
BT111 ✓ [ 100% ] 1 VUs  0m00.2s/5m0s  1/1 shared iters
BT113 ✓ [ 100% ] 1 VUs  0m00.8s/5m0s  1/1 shared iters
BT115 ✓ [ 100% ] 1 VUs  0m00.3s/5m0s  1/1 shared iters
BT116 ✓ [ 100% ] 1 VUs  0m00.4s/5m0s  1/1 shared iters
BT118 ✓ [ 100% ] 1 VUs  0m00.2s/5m0s  1/1 shared iters
BT119 ✓ [ 100% ] 1 VUs  0m00.3s/5m0s  1/1 shared iters
BT120 ✓ [ 100% ] 1 VUs  0m00.2s/5m0s  1/1 shared iters
BT121 ✓ [ 100% ] 1 VUs  0m00.2s/5m0s  1/1 shared iters

    BT101 <redacted>..........: 1     1.136029/s
    BT101 <redacted>..........: avg=211.287871 min=211.287871 med=211.287871 max=211.287871 p(90)=211.287871 p(95)=211.287871
    BT102 <redacted>..........: 1     1.136029/s
    BT102 <redacted>..........: avg=202.931066 min=202.931066 med=202.931066 max=202.931066 p(90)=202.931066 p(95)=202.931066
    BT111 <redacted>..........: 1     1.136029/s
    BT111 <redacted>..........: avg=227.336049 min=227.336049 med=227.336049 max=227.336049 p(90)=227.336049 p(95)=227.336049
    BT113 <redacted>..........: 1     1.136029/s
    BT113 <redacted>..........: avg=781.374261 min=781.374261 med=781.374261 max=781.374261 p(90)=781.374261 p(95)=781.374261
    BT115 <redacted>..........: 1     1.136029/s
    BT115 <redacted>..........: avg=327.251633 min=327.251633 med=327.251633 max=327.251633 p(90)=327.251633 p(95)=327.251633
    BT116 <redacted>..........: 1     1.136029/s
    BT116 <redacted>..........: avg=439.282352 min=439.282352 med=439.282352 max=439.282352 p(90)=439.282352 p(95)=439.282352
    BT118 <redacted>..........: 1     1.136029/s
    BT118 <redacted>..........: avg=234.718007 min=234.718007 med=234.718007 max=234.718007 p(90)=234.718007 p(95)=234.718007
    BT119 <redacted>..........: 1     1.136029/s
    BT119 <redacted>..........: avg=249.78494  min=249.78494  med=249.78494  max=249.78494  p(90)=249.78494  p(95)=249.78494 
    BT120 <redacted>..........: 1     1.136029/s
    BT120 <redacted>..........: avg=183.542755 min=183.542755 med=183.542755 max=183.542755 p(90)=183.542755 p(95)=183.542755
    BT121 <redacted>..........: 1     1.136029/s
    BT121 <redacted>..........: avg=225.698378 min=225.698378 med=225.698378 max=225.698378 p(90)=225.698378 p(95)=225.698378
    data_received.............: 95 kB 108 kB/s
    data_sent.................: 17 kB 20 kB/s
    http_req_blocked..........: avg=7.43ms     min=5.76ms     med=7.92ms     max=8.1ms      p(90)=8.07ms     p(95)=8.09ms    
    http_req_connecting.......: avg=410.45µs   min=229.68µs   med=395.25µs   max=658.79µs   p(90)=533.33µs   p(95)=596.06µs  
    http_req_duration.........: avg=308.32ms   min=183.54ms   med=231.02ms   max=781.37ms   p(90)=473.49ms   p(95)=627.43ms  
    http_req_receiving........: avg=373.86µs   min=176.08µs   med=293.52µs   max=1.14ms     p(90)=506.81µs   p(95)=824.5µs   
    http_req_sending..........: avg=54.1µs     min=24.89µs    med=53.84µs    max=97.05µs    p(90)=74.98µs    p(95)=86.02µs   
    http_req_tls_handshaking..: avg=6.44ms     min=5.33ms     med=6.4ms      max=7.77ms     p(90)=7.21ms     p(95)=7.49ms    
    http_req_waiting..........: avg=307.89ms   min=183.06ms   med=230.75ms   max=780.91ms   p(90)=472.38ms   p(95)=626.64ms  
    http_reqs.................: 10    11.360294/s
    iteration_duration........: avg=317.38ms   min=193.43ms   med=239.33ms   max=790.14ms   p(90)=482.5ms    p(95)=636.32ms  
    iterations................: 10    11.360294/s

The interesting part is what happened when reading file25.json, which is this (in order of log appearance):

time="2021-01-20T22:08:23+01:00" level=warning msg="(Re)try 1 read a string length of 0 from file file25.json, we suspect this is a bug in k6, let's try again after sleeping for 0.1s" source=console
time="2021-01-20T22:08:23+01:00" level=info msg="script script_example.js reads 38174 chars from file25.json for vu 7" source=console
time="2021-01-20T22:08:23+01:00" level=info msg="script script_example.js reads 38174 chars from file25.json for vu 2" source=console
time="2021-01-20T22:08:23+01:00" level=info msg="script script_example.js reads 38174 chars from file25.json for vu 3" source=console
time="2021-01-20T22:08:23+01:00" level=info msg="script script_example.js reads 38174 chars from file25.json for vu 0" source=console

The script chooses random files to read, so not all VU's read the same files. What I see here is that an open() of file25.json returned a 0-length string, but a retry "fixed" it. I'll check if I can add better logging to see in which VU these things are happening.

@marnikvde
Copy link

marnikvde commented Jan 20, 2021

Better logging says this:

time="2021-01-20T23:11:10+01:00" level=info msg="VU 7: Running TextFileToString() for file file1.json by using K6's open() method" source=console
time="2021-01-20T23:11:10+01:00" level=info msg="VU 3: Running TextFileToString() for file file1.json by using K6's open() method" source=console
time="2021-01-20T23:11:10+01:00" level=warning msg="VU 3: (Re)try 1 on file file1.json got a string length of 0, we suspect this is a bug in k6, let's try again after sleeping for 0.1s" source=console
time="2021-01-20T23:11:10+01:00" level=info msg="VU 7: script script1.js reads 50524 chars from file1.json (tries: 1)" source=console
time="2021-01-20T23:11:10+01:00" level=info msg="VU 3: script script1.js reads 50524 chars from file1.json (tries: 2)" source=console
time="2021-01-20T23:11:10+01:00" level=info msg="VU 10: Running TextFileToString() for file file1.json by using K6's open() method" source=console
time="2021-01-20T23:11:10+01:00" level=info msg="VU 10: script script1.js reads 50524 chars from file1.json (tries: 1)" source=console

So there are 3 VU's trying to read the same file:

  • VU 7 and 10 need 1 try to read 50524 characters from the file
  • VU 3 needs 2 tries, the first open() returns a 0-length string, the second try (after a 0.1s sleep()) works just fine

Assuming that the order of the log lines has meaning (?), could it be that VU 7 has not returned from open() yet when VU 3 tries an open() on the same file (that is locked by VU 7)? And that when VU 3 retries, it DOES work because VU 7 has completed the open() and has released the file?

@mstoykov
Copy link
Contributor Author

Hi @marnikvde ,

From my experiments, there is only a problem if I haven't opened the file when the __VU == 0 which is the first run that is done more or less specifically so we can find out what files will be needed. If you can reproduce this without such logic, please post a minimal example, so I can try to reproduce it, as I've understood the code you've posted does not reproduce it?!?

Could it be that k6 works on some kind of snapshot of the filesystem, and ignores changes after it has started? Or maybe there is some caching on open(), so that I can never make it recover by retrying?

Yes, k6 does cache opened/loaded files.

I would argue that consecutive VUs should not be able to open ones that weren't open in the 0-th VU. We have had problems with that for cloud users who have some logic that only loads files in some cases or randomly(as you've tried), in their case as we create the archive(tar) we only include the files loaded by the running as __VU==0 and then that archive is being executed on a separate machine(s). And as the idea of k6 archive (and k6 cloud) are to make it possible for a script to be executed with no(or with a minimal amount of) changes on another machine(s), I would argue that it is best if we just make it so only __VU==0 has access to the fs. This will also be required for the future distributed execution, and will likely be needed for anyone who is doing now using the execution segments.

The issue most definitely is because we open and save(cache) to a shared resource (both a map string to afero.fs and an afero.Fs implementation itself(which has an additional problem and we should probably move to io/fs when it gets stabilized). So there are possible race conditions for both the map and the afero.Fs (we are hitting the second one with this issue). Adding locking will not make the k6 archive/cloud experience closer to k6 run and will also ... add locking to opening and loading files which will likely not be noticeable in most cases(we don't let people open files in the scripts themselves so this is only during init time), but is still probably something we don't want.

For the record this has probably been the case forever, as even before my last refactoring practically everything I said above was the same, I just moved it around. I was under the (obviously) wrong impression that just nobody will even be able to load a file that wasn't loaded in __VU==0, which ... while "true" when running from archive (although it can still try, but will get an error/panic) is not true when just running the script.

@mstoykov
Copy link
Contributor Author

More specifically my proposal to fix this is:

  1. Run __VU==0, the one that runs first(we should have a term for this) as it is run now.
  2. remove the afero.OsFs from https://github.com/loadimpact/k6/blob/master/loader/filesystems.go#L37 for __VU!=0 (and for setup and teardown obviously)
  3. figure out how to return a very nice error that tells users not that the file "couldn't be found" but that it wasn't open()ed in the initialiazing VU + some link to the documentation about what that means and why it is forbidden and how to fix it.
  4. Optional: figure out how to return "couldn't be found" if the initializing VU did try to open it, but it wasn't found, but still handled, as that seems like a possible use case where someone will do something dependant on if a file is present. This might already work, I am not certain how MemMapFs works in this case

@marnikvde
Copy link

Thanks for that extensive comment, that's a lot of (new) information! :)

I can't seem to make a simple example to catch this race condition, I have tried small and big files, lots and few samples, a few VU's and thousands of VU's, to no avail :/

I understand why (by design) you only want files known to VU=0 to be used by other VU's, but the reason we implemented it like this was resource usage. When increasing the amount of VU's, we saw that loading "big" JSON files quickly gave us OOM exceptions. These big files typically contain random data, so our workaround for this is that just before running the tests, we split big.json into 10 chunks/samples like big-sample1.json to big-sample10.json. In the init code, we pick 1 out of the 10 sample files, which drastically decreases the resources we need per VU, and each VU just gets a random subset, which is 10% of big.json. For other files, we have even split them in 100 smaller files.

Our current workaround is to retry a few times (with exponential backoff) if the string length of open() is 0, which seems to work on the first retry after 0.1s, but I understand it goes against some design goals if we want to run this in the cloud.

We have not looked into 0.30.0 yet, but it seems to include the feature to share data between VU's, which might be suited to load big.json, making our sampling-strategy useless. We'll have to play around to see how it affects resource usage (mostly RAM was the issue I think).

I think your proposal makes sense, documenting this looks like the first step. Maybe start by documenting some of this in the open() docs, that loading different files per VU is not supported/recommended.

@mstoykov
Copy link
Contributor Author

Hi @marnikvde,

So you are basically doing the csv(it works with other types) example from https://k6.io/docs/examples/data-parameterization#handling-bigger-data-files ? As shown there you can still load them in __VU==0, although I guess the comment should also add that way it won't break randomly.

Yes the SharedArray (documentation pending ... by me 😭 ) practically removes the need for those hacks so I will recommend using it ;)

I think your proposal makes sense, documenting this looks like the first step. Maybe start by documenting some of this in the open() docs, that loading different files per VU is not supported/recommended.

This depends on the other k6 developers actually agreeing that we will do what I proposed and that this is ... the "correct" behavior :). Obviously, we can just fix it for k6 run locally, but I am against it given the arguments I've listed above.

olegbespalov added a commit that referenced this issue Jan 4, 2022
Implementing a limitation for the open() that limits opening files
by the list of the files that were opened during the initialization
step (__VU == 0).

For example, a code like:

```js
if (__VU >0) {
   JSON.parse(open("./arr.json"));
}
```

Should return an error.

Closes #1771
olegbespalov added a commit that referenced this issue Jan 11, 2022
Implementing a limitation for the open() that limits opening files
by the list of the files that were opened during the initialization
step (__VU == 0).

For example, a code like:

```js
if (__VU >0) {
   JSON.parse(open("./arr.json"));
}
```

Should return an error.

Closes #1771
olegbespalov added a commit that referenced this issue Jan 11, 2022
Implementing a limitation for the open() that limits opening files
by the list of the files that were opened during the initialization
step (__VU == 0).

For example, a code like:

```js
if (__VU >0) {
   JSON.parse(open("./arr.json"));
}
```

Should return an error.

Closes #1771
olegbespalov added a commit that referenced this issue Jan 12, 2022
Implementing a limitation for the open() that limits opening files
by the list of the files that were opened during the initialization
step (__VU == 0).

For example, a code like:

```js
if (__VU >0) {
   JSON.parse(open("./arr.json"));
}
```

Should return an error.

Closes #1771

Co-authored-by: Mihail Stoykov <MStoykov@users.noreply.github.com>
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
bug evaluation needed proposal needs to be validated or tested before fully implementing it in k6
Projects
None yet
Development

Successfully merging a pull request may close this issue.

4 participants