Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Decrease installation size #2455

Closed
1 of 2 tasks
adamchainz opened this issue Oct 28, 2022 · 11 comments
Closed
1 of 2 tasks

Decrease installation size #2455

adamchainz opened this issue Oct 28, 2022 · 11 comments

Comments

@adamchainz
Copy link
Contributor

Is this feature request related to a new rule or cfn-lint capabilities?

No response

Describe the feature you'd like to request

I tried running cfn-lint with pre-commit on its CI service, pre-commit.ci. I got this error:

clone of https://github.com/aws-cloudformation/cfn-lint@v0.66.1 exceeds tier max size 100MiB: 142.1MiB

Oh no.

Describe the solution you'd like

I’d like the repo to be smaller, similar to the motivations in #2368.

The src/cfnlint/data/CloudSpecs directory is huge, as I think you're aware. Gzipping it takes it from 141MB to 8.5MB. The decompress time on read is probably worth it. Also most users will only need the files from a couple of regions.

This will also save developer machines' disk space, where there may be several cfn-lint versions installed, between several projects.

Additional context

I suggested similar to botocore: boto/botocore#2365

Is this something that you'd be interested in working on?

  • 👋 I may be able to implement this feature request

Would this feature include a breaking change?

  • ⚠️ This feature might incur a breaking change
@kddejong
Copy link
Contributor

kddejong commented Oct 28, 2022

I agree completely on the decreasing the size have a little concern with my troubleshooting from making this change but the advantage to the masses outweighs my concerns here.

This change is related to the provider schemas. Since each resource is a file we only store unique files. The specs aren't loaded into that PR because it would make too big of change but the result is dramatic.
#2325

I should add there is so much more to change that this PR is WIP and probably will be for a little while longer. If there is any good wins we can have from this proposal we should explore them.

adamchainz added a commit to adamchainz/cfn-lint that referenced this issue Oct 28, 2022
adamchainz added a commit to ev-energy/cfn-lint that referenced this issue Oct 28, 2022
@adamchainz
Copy link
Contributor Author

Okay, thanks for the update. To workaround the issue, I made a fork with many of the specs that my project doesn't need removed: ev-energy@aa45e3a

@PatMyron
Copy link
Contributor

PatMyron commented Oct 28, 2022

https://www.debugbear.com/json-size-analyzer

us-east-1 (largest):
3.6MB PropertyTypes 63.38%
1.7MB ResourceTypes 30.00%
381KB ValueTypes 6.57%

me-central-1 (smallest):
799KB PropertyTypes 61.13%
438KB ResourceTypes 33.52%
67KB ValueTypes 5.10%


Was thinking the extra sections cfn-lint added like ValueTypes/IntrinsicTypes/ParameterTypes aren't usually regional and don't all need to be duplicated (instance types are currently the only difference), but also looks like they're only ~6% of size, so probably not worth prioritizing over just trying compression

@kddejong
Copy link
Contributor

The one concern I ran into is the availability of zlib which is used for doing compression otherwise the zip file will just be ~144MBs. It looks like depending on how you install Python this package may or may not be available. I'm not sure how widespread this would be but wanted to call it out.

@kddejong
Copy link
Contributor

Tried an approach where we store us-east-1 and only patches for the other regions. The slowdown was too high... which is what I was expecting but wanted to validate. This slowdown could be mitigated by a more intelligent process to load specs.

@adamchainz
Copy link
Contributor Author

It looks like depending on how you install Python this package may or may not be available.

I don't think that's a concern. Pip unconditionally imports zlib in several places: https://github.com/pypa/pip/search?q=zlib .

I also found this bug where it's discussed whether to try not using zlib in 'ensurepip', the module that installs pip, and the possibility was ignored there too.

@kddejong
Copy link
Contributor

By breaking a part of the spec into individual files and storing only differences from us-east-1 I've been able to reduce the size on disk ~27MBs. The part I need to investigate more is that the I've also been able to reduce the time for integration tests by ~20s (currently on my mac they take about ~40s). Almost sounds good to be true.

@kddejong
Copy link
Contributor

Good find @adamchainz. I'm between the zlib approach and the other approach I just mentioned or maybe a hybrid of two. In comparison my zipfile/zlib tests from above increased the time only marginally +2s over all the integration tests.

The other thing that we would benefit from is a smarter loading of resource specs as we load them all even though they aren't all used.

@kddejong
Copy link
Contributor

I think right now I like the change in #2457. 18MBs on disk for CloudSpecs and reduced my pylint total runs from ~76s to ~36s. To compare integration test finished in ~15s

@kddejong
Copy link
Contributor

kddejong commented Nov 7, 2022

I'm going to mark this one as complete for now. There are probably more efficiencies to be had but v0.71.0 reduced install package size from 144MBs to about 25MBs

@kddejong kddejong closed this as completed Nov 7, 2022
@adamchainz
Copy link
Contributor Author

Nice work, thank you!

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

3 participants