Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Reduce test run time portion of build process (8:31 reduced to 3:51) #4708

Merged
merged 3 commits into from
Apr 21, 2024

Conversation

josh-works
Copy link
Contributor

@josh-works josh-works commented Apr 16, 2024

update: This is now ready to merge!

Doing a quick audit of the test harness, to see where time is being spent, and to try side-stepping some of it.

I'm doing lots of this work locally, but defer to using Github Actions timing data as the gold standard for if I'm successful or not.

I have some live, detailed, time-stamped stream-of-consciousness notes here, might be helpful to future me or others.

I am primarily investigating 'just' the tests, rather than tweaking the build pipeline at all. Faster/easier tests are so low-risk, high-reward.

Here's the test suite on main, right now, as I open this PR:

2024-04-16 at 10 24 AM

here's the detailed results on tests as being run on another PR: https://github.com/openstreetmap/openstreetmap-website/actions/runs/8708315212/job/23885377955?pr=4706

You can drill into the details, and see the tests take times like:

  • 7m 0s
  • 8m 0s
  • 7m 26s
  • 8m 13s
  • 8m 7s
  • 8m 31s

the finished results

the current PR, with logging turned off, stubbing out the password hash process in the user factory returns, in the same order, much improved results:

  • 3:50 (improvement of 4:10, 58% faster)
  • 3:42 (improvement of 4:20 , 73% faster)
  • 3:49 (improvement of 3:20 64% faster)
  • 4:12 (improvement of 4m, 52% faster)
  • 3:58 (improvement of 4:10, 56% faster)
  • 3:55 (improvement of 4:30, 62% faster)

change 1, turn off logging, minor

I expect 'turning off logging' to lead to a 6% decrease in test time. That would be each of these figures going down by about 30 seconds. (this is something Vladimir Dementyev said on a codewithjason podcast episode

change 2, eliminate 4000 calls to PasswordHash.create, meaningful

the next iteration of the PR, in addition to turning off logging, eliminates the PasswordHash.create call, on each the ~1000 users set up across each thread.

There's 4 threads, usually, 1000 users each, or 4000 total users. Doesn't seem like calling a single method 4000 times would be super expensive, but it seems to account for three or four total minutes of the 8 minute test run time. (Makes sense, I suppose, as that function is intended to be arbitrarily expensive). I just hard-coded the value to a valid string.

I ended up with this fix after noticing a LOT of time being spent in the user factory. detailed notes here

Current test run time is: 4:52 total (sometimes lower), down from 9:30, on my machine.

Questions? Any reasons to not merge this?

I cannot see the exact credit 'cost' of each build, I'd love to see that figure, if it's available, to also know how this change lowers the 'credits per build'.

@josh-works josh-works changed the title Improving test harness run time Reduce test harness run time Apr 16, 2024
@josh-works josh-works changed the title Reduce test harness run time Reduce test run time Apr 16, 2024
@josh-works
Copy link
Contributor Author

Sorta stream-of-consiousness style, I made another change, after following some idea-threads for a few minutes, possibly got a good win, the tests now look down by three minutes, from something like 7:50 to 4:50.

I'm interested in seeing it in the CI pipeline, of course. 👀

Speeds up tests rather nicely, derived via trial and error
reasonably confident that this is nice. overheard it mentioned by Vladimir
Dementyev in a Codewithjason podcast, so when my intuition already led me
to the user factory, I figured 'on principle/correctness, I should at least
look at what happens if we sidestep the native hashing library, because it's
something i was already curious about.
@josh-works josh-works changed the title Reduce test run time Reduce test run time, currently improvement: 8m20s to 5m0s Apr 17, 2024
@josh-works josh-works changed the title Reduce test run time, currently improvement: 8m20s to 5m0s Reduce test run time, current improvement: 8m20s to 5m0s Apr 17, 2024
@josh-works josh-works changed the title Reduce test run time, current improvement: 8m20s to 5m0s Reduce test run time, current improvement: 9.5m to 4.9m Apr 17, 2024
@gravitystorm
Copy link
Collaborator

am eager for another admin approval

Great work. Just as a heads-up, I don't get notifications when you edit comments, so I missed that you had pushed new changes and wanted CI to be re-run. Simply add a new comment (as you later did) and everything will be notified properly.

@tomhughes
Copy link
Member

Password hash creation is deliberately slow, to make brute forcing hard - it's a design goal of any modern password hashing algorithm.

@josh-works
Copy link
Contributor Author

Password hash creation is deliberately slow, to make brute forcing hard

absolutely. In a production environment, we'd want to do the slow and laborious hash creation, but for testing, we don't.

So, I hard-coded a generated hash (for the tests/factory data), so when setting up 4000 test users, the deliberately slow function is never called, and it saved a ton of time:

https://github.com/openstreetmap/openstreetmap-website/pull/4708/files#diff-3cb894babbd387f02bde8eb4daf7ddb6f0a5207d0562af589dc59a5653092c04R5

I'm still optimistic about a 90% improvement in test run time - I'm aiming for ~1.5 minutes, total. I don't know if I'll get there, but we're trying! :)

@josh-works
Copy link
Contributor Author

josh-works commented Apr 18, 2024

Okay this looks good to me. I'm not quite done with the profiling, but this change is good enough to roll out, it's small, and a delivers immediate improvements to everybody. Compare the before and after, added as a table here:

ubuntu/ruby before after improvement/time savings
20.0/3.0 9m38s 5m19s ~4m20s
20.0/3.1 9m:10s 5m3s 4m15s
20.0/3.2 9m19s 5m29s 4m40s
22.04/3.0 10m2s 6m2s 4m30s
22.04/3.1 9m 35s 5m38s 3m57s
22.04/3.2 9m39s 5m36s 4m3s

before:

after:

@josh-works
Copy link
Contributor Author

josh-works commented Apr 18, 2024

The PR is ready to go!

I've got some other ideas that I'm exploring, but these two changes seem appropriately low-risk and high-reward.

The test improvement speed is good enough. The total CI build time has now changed, too, but I'm not quite sure in exactly what ways.

For what its worth, I wonder if at some times of day the tests run faster than at other times of day. I found one PR where the tests 'ran fast' on first blush, with a successful in 5 m message visible - it's this one 8m docker/docker, 5m tests! 4533 assertions/second! That was way back on Feb 26.

Generally, the docker/docker (pull request) task has been posting as 'successful' in 12-14 minutes, with the attached tests taking ~9 minutes.

When my tests are taking 5 minutes, that same task shows completing in 10 minutes, so this might represent a total CI run time savings of 20-40% (12->10 or 14-> 10)

For each of the six server/ruby version combos, here's the direct speed improvements:

before after total improvement percent improvement
7m 0s 3:50 4:10 58% faster
8m 0s 3:42 4:18 73% faster
7m 26s 3:49 3:20 64% faster
8m 13s 4:12 4:01 52% faster
8m 7s 3:58 4:09 56% faster
8m 31s 3:55 4:36 62% faster

I'm not looking as directly at the github actions/build time changes, but it's looking like this might be a total time improvement of between 20 and 40% faster.

@josh-works josh-works changed the title Reduce test run time, current improvement: 9.5m to 4.9m Reduce test run time portion of build process (8:31 reduced to 3:51) Apr 18, 2024
@@ -2,7 +2,7 @@
factory :user do
sequence(:email) { |n| "user#{n}@example.com" }
sequence(:display_name) { |n| "User #{n}" }
pass_crypt { PasswordHash.create("test").first }
pass_crypt { "$argon2id$v=19$m=65536,t=3,p=4$M+oP6KQOdFLDeiYa/gbzAg$pULNwXgKt2sEmTxMXxt298skEqc8MjnPwGsk075jLrk" }
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Is there some way we could run PasswordHash.create once and then reuse the result, rather than actually hardcoding a single value?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@tomhughes perhaps we were already on the exact same page. The hardcoded value I got was the result of running PasswordHash.create - I printed all the used values across 4000 users to the terminal, was seeing things like:

"$argon2id$v=19$m=65536,t=3,p=4$M+oP6KQOdFLDeiYa/gbzAg$pULNwXgKt2sEmTxMXxt298skEqc8MjnPwGsk075jLrk"
"$argon2id$v=19$m=65536,t=3,p=4$MWk3F1jq8OGs8m1KADSKzQ$97iChQ0pI7ZhdKWvK2zuALdl6vr1cko9YneceBZ0SRY"
"$argon2id$v=19$m=65536,t=3,p=4$I6joRKGjBBDFUkDVTGl3Hw$Mm7HBWpQiA8zw0y9DlGScJsAZWLEpEYP/vwyYOCQgXo"
"$argon2id$v=19$m=65536,t=3,p=4$3W1VtIzJn00BkJjIbi7L9Q$jSrBc7co66BsfTLINrDjPMwHfmdH2WtXRYvC8qw8b5A"
"$argon2id$v=19$m=65536,t=3,p=4$1QCD2+t4UuM1awxraePirQ$iQp7Rf1NskAbIAWzTv5NBdtiSR+JzuGwk2sJUQWbLc0"
"$argon2id$v=19$m=65536,t=3,p=4$Sm2M0daauln/R67Dc2NNCQ$zo5gMtLEGSUZy1SPsGc2rnlPsUPtW11w4SZHNuKloH4"
"$argon2id$v=19$m=65536,t=3,p=4$gHbAIYV/n1rLO+ICu61fAQ$n++ouqaI35kxerp6woJMjb3xJS+7ALUa6h7zM5eNNeM"

so, the given hardcoded value is the result of this function call, I'm sort of still inclined to leave it as is.

It feels a smidge safer/safe enough to put in the User factory a magic string, rather than hiding that there is the same password in use across all of the temporary users.

Perhaps there's a threat vector I'm not perceiving, happy to hear more, if something still feels off.

Copy link
Contributor Author

@josh-works josh-works Apr 19, 2024

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

yoooo I just found a clean way of doing exactly what you suggested:

https://github.com/josh-works/mastodon/blob/babbde5d6f6d3b3724ae71550ae7814d47f16ed0/spec/fabricators/account_fabricator.rb#L3-L11

I'll implement this here. It looks better than what I wrote, and better than what I was imagining based on what you said.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

OK, pushed the refactor. Hilariously I'd not seen this pattern in factories before, so it sort of slipped my mind as an option available to me. Thanks for the nudge.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks. It wasn't really a question of threat vectors or security, just of trying to ensure test users had passwords that matched whatever the current hashing scheme was.

@josh-works
Copy link
Contributor Author

This is ready for re-review/workflows to run, I perceive. Thanks @tomhughes.

@tomhughes
Copy link
Member

Times on my dev box, before:

Finished in 96.928949s, 18.0441 runs/s, 10375.3730 assertions/s.

and after:

Finished in 41.567882s, 42.0758 runs/s, 24193.5346 assertions/s.

@tomhughes
Copy link
Member

Looks great to me, thanks a lot!

@tomhughes tomhughes merged commit 43db18a into openstreetmap:master Apr 21, 2024
11 of 12 checks passed
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

None yet

3 participants