Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

ci: merge staging to master #379

Merged
merged 138 commits into from
Jun 28, 2022
Merged

ci: merge staging to master #379

merged 138 commits into from
Jun 28, 2022

Conversation

MatrixAI-Bot
Copy link
Member

@MatrixAI-Bot MatrixAI-Bot commented Jun 6, 2022

This is an automatic PR generated by the pipeline CI/CD. This will be automatically fast-forward merged if successful.

@tegefaulkes @emmacasolin some todos left here:

  1. switch to staging as default branch - @CMCDragonkai
  2. fix up the build:mac job to evaluate the homebrew shell environment - @tegefaulkes
  3. use the new logger features to help debugging - @emmacasolin
  4. merge the nat testing PR (without testnet.polykey.io testing) - @emmacasolin
  5. [ ] test load balancing and fixing up coverage reports again in the CI/CD - @emmacasolin has already made some progress on this - to be done in Test load balancing - parallelize jest tests TypeScript-Demo-Lib#58 and Test load balancing using jest's --shard option TypeScript-Demo-Lib#65 (new PR to be created by @emmacasolin)
  6. fixing up the integration build job involving pkg.js, use the debug log and find out why pkg is failing to package PK - @tegefaulkes
  7. [ ] add in integration tests for windows/macos/ubuntu - to be done in Integration tests for deploy targets on all deployment platforms (docker, linux/ubuntu, windows, macos) Polykey-CLI#10
  8. benching and metrics (first get TS-Demo-Lib's benching to produce bench metrics that gitlab CI/CD understands and can visualise) - @emmacasolin
  9. ensure that integration:merge job works, so the bot is actually merging this staging into master - @tegefaulkes
    • fix integration:nix job. The shebangs for bin/polykey.js was not getting patched during the nix-build process. The fix was to make bin/polykey.ts and bin/polykey-agent.ts executable.
    • fix integration:docker. This was the same problem as above.
  10. [ ] deployment jobs - AWS - testnet deployment needs me involved - @tegefaulkes - to be done in Testnet Node Deployment (testnet.polykey.io) #194
  11. [ ] release jobs - check prerelease and release jobs are working by pushing up a tag with git push --follow-tags - @tegefaulkes - to be done in Verify that pre-release and release jobs work #390
  12. priority switches to concurrency testing once I finish up DBTransaction for SI - @CMCDragonkai

As long as we can deploy to testnet, then we want to merge this into master branch, so we can start new feature branches from the new features.

Once merged to master branch, it's time to do rename this to Polykey to match #5 (comment). Then to update the CI/CD mirror repository links and integrate codesee.

Some side-issues to be addresses if they help us achieve the primary goals above:

  1. Tracing context to help asynchronous and distributed debugging - Integrate Structured Logging js-logger#3 (we must have an adequate way of visualising the trace though for it to be useful)
  2. Using ts-node with swc to speed up our ts-node executions and remove the typescript node cache
  3. Using esbuild to workaround issues with using pkg as a bundler
  4. Using npm link on our native addons to enable debug builds when debugging
  5. And when will the SI be ready!?!?!?

@MatrixAI-Bot MatrixAI-Bot self-assigned this Jun 6, 2022
@MatrixAI-Bot
Copy link
Member Author

@CMCDragonkai
Copy link
Member

@emmacasolin You will need to merge #374 into this first. Then afterwards rebase your CI/CD changes on top.

There's more changes that depend on #374, so you'll need to sync up the utils.nix and shell.nix as well.

@CMCDragonkai
Copy link
Member

Also I haven't added all the branch and tag protection for this, but we can do this afterwards, this allows you force push to the staging branch while making it work. Afterwards staging will be protected so no force pushing allowed.

Tag releases can be tested later. I'd suggest disabling the auto merge into master job: integration:merge while you're still iterating.

The codesee bot won't work until it's been merged to master branch and I configure codesee service to use this repo.

@CMCDragonkai
Copy link
Member

CMCDragonkai commented Jun 7, 2022

@emmacasolin please change the check:test job to continue using our child pipelines as we had it before. PK has lot of tests, so we still need to do a way of splitting our test runs. Until we have solved:

We will have to continue using child pipelines.

As for macos and windows runners, those ones will not be able to use child pipelines atm, they have to use 58 and 55. Since running the tests may run out of memory or take too long, you should just disable those tests for now as in comment out npm test -- --ci --coverage.

So merge #374 first, then rebase CI/CD integration work as a single commit on top, so that #378 can be merged on top too.

@MatrixAI-Bot
Copy link
Member Author

@MatrixAI-Bot
Copy link
Member Author

@MatrixAI-Bot
Copy link
Member Author

@MatrixAI-Bot
Copy link
Member Author

@MatrixAI-Bot
Copy link
Member Author

@emmacasolin
Copy link
Contributor

Our postbuild script in package.json uses cp, however this doesn't work on windows so it needs to be replaced with something else.

"postbuild": "cp -fR src/proto dist; cp src/notifications/*.json dist/notifications/; cp src/claims/*.json dist/claims/; cp src/status/*.json dist/status/;",

@CMCDragonkai
Copy link
Member

You might need to replace the postbuild script with a script in scripts/postbuild.js. That way inside you can write a nodejs script that does the copy. This is the same reason why I switched to rimraf.

Actually I just found https://github.com/shelljs/shx. Perhaps we can swap over our commands to using shx instead?

@CMCDragonkai
Copy link
Member

It only needs to be a dev dependency, and then you should be able to do shx cp -fR. Note that the options might a bit different, see the shelljs project for more information.

Furthermore we would have to keep using ; as command separators.

The shx can then also potentially replace rimraf too.

@CMCDragonkai
Copy link
Member

Are child pipelines working again? It should be the same as before in master.

@emmacasolin
Copy link
Contributor

Are child pipelines working again? It should be the same as before in master.

Yeah they look like they are. I've taken tests out of all of the build stages for now and made it so that check:test runs on both feature and staging commits.

@CMCDragonkai
Copy link
Member

CMCDragonkai commented Jun 7, 2022

The build:linux should still run the child pipeline again, it overrides check:test.

It's just we want the 2 incremental testing and test load balancing resolved so that we can apply the full tests for all platforms.

@CMCDragonkai
Copy link
Member

The build:linux does testing and building in one job. To avoid having to repeat the testing, this is why it runs and check:test doesn't run while in staging branch.

@emmacasolin
Copy link
Contributor

emmacasolin commented Jun 7, 2022

I know, but I don't think there's a nice way to have build:linux also run a child pipeline without a lot of repetition as both repeated code and repeated builds. Child pipelines also don't produce artifacts I'm pretty sure, so it wouldn't be able to build the dist.

@CMCDragonkai
Copy link
Member

CMCDragonkai commented Jun 7, 2022

Oh yea building a distribution for each child test job would be unnecessary. In that case, once we have load balancing, we'd just have a separate job just to do building vs the load balanced test jobs.

Ok once you're think it can proceed, you should squash, then allow @tegefaulkes to rebase again, and then you can attempt the conditional testing or test load balancing. The issue has more details: MatrixAI/TypeScript-Demo-Lib#58

They both would involve getting into the guts of jest.

@MatrixAI-Bot
Copy link
Member Author

@CMCDragonkai
Copy link
Member

Do note that gitlab has a CI/CD linter. You can always check if your syntax is correct by going to https://gitlab.com/MatrixAI/open-source/js-polykey/-/ci/editor?branch_name=master.

Also instead of letting bad jobs run to completion, just force cancel them on the CI/CD control panel.

@MatrixAI-Bot
Copy link
Member Author

@MatrixAI-Bot
Copy link
Member Author

@emmacasolin
Copy link
Contributor

I wasn't able to replace rimraf in the end due to true not working correctly on windows, however shx works for cp. Everything up until integration:builds is workig now, but integration:builds fails at the very end of nix-build with

> @matrixai/polykey@1.0.0 pkg
> ./scripts/pkg.js --no-dict=leveldown.js "--output=out" "--executable=polykey" "--node-version=16" "--platform=linux" "--arch=x64"
sh: line 1: ./scripts/pkg.js: Permission denied

@CMCDragonkai
Copy link
Member

CMCDragonkai commented Jun 7, 2022

I believe rimraf can just be replaced with shx rm -rf. No need to use || true. The -f means that || true is not needed.

@MatrixAI-Bot
Copy link
Member Author

@MatrixAI-Bot
Copy link
Member Author

@MatrixAI-Bot
Copy link
Member Author

@@ -50,6 +50,7 @@
"types": "dist/index.d.ts",
"pkg": {
"assets": [
"node_modules/jose/**/*",
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

We don't know why jose is needed. But you won't be able to tell until you actually run the final program. I recommend leaving it here until we verify why jose as an asset is required.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I can't find any native addons or JSON files except for package.json in jose.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This will require a manual check until we have integration testing for the built binaries.

emmacasolin and others added 10 commits June 24, 2022 16:28
Now that nodes add the details of a node that contacts them, we no longer need the `edmSimple` configuration to do this manually.
General linting, using capitals for constants in NAT utils, and lowercase test descriptions
A relay node for a hole punch message was previously not modifying the proxy address in the message (which is the "return address" used to contact the source node). For nodes behind a NAT, who do not know their own public address, they rely on this overwriting so that nodes do not try to contact them on their private, inaccessible address.
This will allow us to disable the queue for testing.
the composed flag was set at the beginning of the compose function causing another function to throw an error due to an undefined property if it was called at the same time.
Tests for NAT-Traversal and Hole-Punching
@MatrixAI-Bot
Copy link
Member Author

@CMCDragonkai
Copy link
Member

CMCDragonkai commented Jun 24, 2022

Now that we have NAT testing merged, we focus on the top priority tasks in the PR description. I'll reference the long term tasks #381 (comment) to figure out after testnet is deployed.

Based on some discussion, there are some architectural issues we need to review from ground up. However in order to merge this there are some quick fixes that we need to apply in this PR.

  1. Private addresses should never be contacted. Agent 2 should not be contacting 10.0.0.2. You need to verify what the message is being sent from A2 to Seed, and then Seed to A1. A1 should never even know what the private IP of 10.0.0.2, and you should verify that the Seed should be rewriting the from address, or completely replace the from address.
  2. Punch handlers should be checking whether a proxy connection already exist. If they exist, then there shouldn't be an attempting to create a new connection.
  3. Refresh queue should be exhausted quickly in such a small network. Something is causing the refresh queue to loop infinitely. This should be prevented.

Long term issues (non-exhaustive) to be addressed in staging:

  1. Use wait/poll to define the state transition for these tests. We need to wait for the network to settle first before doing the next step. This is just for testing.
  2. Queue & cancellable promises need to be specced out to reduce the complexity of each step of this architecture.
  3. Signalling requests should be "fire and forget". Similar to 204 Accept code in HTTP. We shouldn't be waiting for the response when signalling.
  4. Promise.all and Promise.any Asynchronous Promise Cancellation with Cancellable Promises, AbortController and Generic Timer #297.

Either way we need to merge this to staging, and then work on the staging directly and divide and conquer each of these issues.

@tegefaulkes
Copy link
Contributor

Ok, so the problem with the application build is that the shebang for the polykey.js script is not getting patched. According to this blog post https://discourse.nixos.org/t/what-is-the-patchshebangs-command-in-nix-build-expressions/12656 This should happen automatically during the fixupPhase towards the end of the build. Notably this should only happen if the scripts are executable.

So the question is, why is this not getting patched? I've seen that the scripts in the scripts directory are getting patched. but this is happening before the main script within default.nix. In fact, I've confirmed that the final polykey.js script is being made executable that that seems to be happening at the final stage. no further output is being done after that step.

The application build is slightly different from the other builds. Where the other builds are using stdenv.mkDerivation to create the derevation. The application build is using runCommandNoCC. I don't know too much about the nix-build process But my running theory is that runCommandNoCC doesn't run patchShebangs during the fixupPhase.

I'm digging into it further.

@CMCDragonkai
Copy link
Member

CMCDragonkai commented Jun 28, 2022

The solution is to make src/bin/polykey.ts and src/bin/polykey-agent.ts executable:

chmod u+x,g+x ./src/bin/polykey.ts
chmod u+x,g+x ./src/bin/polykey-agent.ts

Any files that have the shebang, would need to be made executable at the source-level in order to ensure that they get rewritten.

It's a bit roundabout, since one would expect the dist/bin files themselves to be rewritten, but it appears that doesn't happen anymore.

This is because due to nodev16 and the new npmv8, the node2nix doesn't understand how to generate the executables anymore.

So I worked around it, and generated the bin symlinks myself in the default.nix. It was a workaround talked about here: svanderburg/node2nix#293.

And I must have missed this, because TS demo lib already had their executable files made executable at source-level.

Remember tsc doesn't preserve executable permissions. So the reason why this is complicated is because there are several stages in which this rewrite could occur:

  1. At the .ts source level if the .ts file already has executable permissions - this is our solution right now
  2. If node2nix were working with npmv8, then it should generate executables properly and not require our hack in default.nix that I added in to deal with Node2nix 1.11.0 no longer builds the result/bin symlink, nor result/lib/node_modules/.bin symlinks svanderburg/node2nix#293.
  3. If npm when referencing the bin were to generate executable symlinks or something and nix could understand it, then it would work too. But this is not the case.

There's a related issue which is: microsoft/TypeScript#26060, and that's why we cannot just directly execute the dist/bin but have to do node dist/bin... etc.

@tegefaulkes
Copy link
Contributor

That fixed it. And I was thinking it was a problem with the nix build files... More fuel for disliking side effects I guess. I'll push up the fix in a sec.

@CMCDragonkai
Copy link
Member

Can you also expand the todo list above with these fixes.

Problem was to do with the `nix-build` process not patching the shebangs of the `polykey.js` script. this resulted in it failing to run due to not finding node in it's environment. The fix was to make `src/bin/polykey.js` and  `src/bin/polykey-agent.ts` executable.
@MatrixAI-Bot
Copy link
Member Author

@MatrixAI-Bot
Copy link
Member Author

Pipeline Succeeded on 574480988 for 0bc3af5

https://gitlab.com/MatrixAI/open-source/js-polykey/-/pipelines/574480988

@MatrixAI-Bot MatrixAI-Bot merged commit 0bc3af5 into master Jun 28, 2022
@tegefaulkes
Copy link
Contributor

So the integration:merge job works. Looks like all CI/CD jobs are passing now.

@tegefaulkes
Copy link
Contributor

I feel that fixing up the integration build job involving pkg.js is done but since it doesn't properly run under linux I'm not sure we can tick it off. If we require all packaged executables to properly run then it's blocked by the DB changes and the pkg integration tests. Otherwise we can just tick it off.

@tegefaulkes
Copy link
Contributor

deployment jobs - AWS - testnet deployment needs me involved would this be a part of #194 or just related? I suppose since this is a CI/CD job it would be a separate but related issue.

@CMCDragonkai
Copy link
Member

Yea that would be part of #194.

As for fixing up integration build job involving pkg.js, let's bundle that up with your integration testing work MatrixAI/Polykey-CLI#10. Ultimately it's part of a single pipeline of work. A separate part of it would be in MatrixAI/Polykey-CLI#23. So I'll tick that off.

benching and metrics (first get TS-Demo-Lib's benching to produce bench metrics that gitlab CI/CD understands and can visualise) - @emmacasolin

This does require a separate issue.

@CMCDragonkai
Copy link
Member

CMCDragonkai commented Jun 28, 2022

Now that this is merged, I'm going to do #5 (comment) tonight.

@emmacasolin @tegefaulkes beware of repo URL and repo name change.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Development

Successfully merging this pull request may close these issues.

5 participants