Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Support SDK modularization on a service basis #1543

Open
ciarancourtney opened this issue Aug 31, 2018 · 13 comments
Open

Support SDK modularization on a service basis #1543

ciarancourtney opened this issue Aug 31, 2018 · 13 comments
Labels
feature-request This issue requests a feature. needs-major-version Can only be considered for the next major release p3 This is a minor priority issue

Comments

@ciarancourtney
Copy link

A common use case may be to use some boto functions in a client app frozen with pyinstaller, in this case botocore will add 30MB to the install size.

Is there any plan to document how to trim the install size? For example:

  • Specify which services you are going to be using with setuptools install_extras
  • Archive old service definitions outside pypi package
@joguSD
Copy link
Contributor

joguSD commented Aug 31, 2018

As of right now we don't have any official guidance on slimming down the package size to remove unused services.

We wouldn't be able to do the second one as it's possible to specify that you want to use older API versions.

This is effectively a feature request to be able to modularize the the Python SDK in a similar fashion as some of the other SDKs (Java, ruby, etc). Marking as a feature request.

@joguSD joguSD added the feature-request This issue requests a feature. label Aug 31, 2018
@joguSD joguSD changed the title Install size is quite large (~32MB) Support SDK modularization on a service basis Aug 31, 2018
@AntonOellerer
Copy link

Any updates?
With aws lambda unpacked limit being 250 mb, the 42mb of botocore are really a lot

@michaelbrewer
Copy link

AWS JavaScript SDK V3 had pulled this off. It would be great to have the same for Boto3 which soon will be unusable for AWS Lambdas.

@heitorlessa
Copy link

For those looking for a short-term solution.

A customer wrote an article demonstrating how to selectively discard services you are sure you don't use

https://blog.cubieserver.de/2020/building-a-minimal-boto3-lambda-layer/

@michaelbrewer
Copy link

michaelbrewer commented Apr 21, 2021

Any kind of hint or indication that when this would be worked on?

1.3M	boto3
 55M	botocore
696K	dateutil
148K	jmespath
552K	s3transfer
 36K	six.py
824K	urllib3

@heitorlessa
Copy link

heitorlessa commented Jun 18, 2021

Following up as we're now at 64M - I understand this would be a huge undertaking considering how these are created today, so I'm primarily interested in hearing whether the team is considering a modularization in the future.

@mccauleyp
Copy link

+1

@thejcannon
Copy link

Copying my comment from #2842 (comment)

On further inspection it looks like about 70MB out of that 72.5MB is just the data/ directory. 🤯

I'm sure there's options here. One could be to split each service into an individual package (e.g. botocore-a-la-carte-s3, etc...). > botocore-a-la-carte would contain core code with an extra per service (e.g. botocore-a-la-carte[s3, cloudfront]) and to > maintain backwards-compatibility botocore would simply be botocore-a-la-carte[all].

@thejcannon
Copy link

thejcannon commented Jan 4, 2023

From https://github.com/thejcannon/botocore-a-la-carte I've started publishing botocore-a-la-carte with an additional package per service provided as an extra on the main package.

E.g. botocore-a-la-carte just has the Python code and core resources. botocore-a-la-carte[s3] also install the S3 data, etc...

@takeda
Copy link

takeda commented Aug 23, 2023

What about different versions of API, are the old versions are still needed?

@rafsaf
Copy link

rafsaf commented Sep 11, 2023

$ du -hs *

1,2M    boto3
48K     boto3-1.28.44.dist-info
83M     botocore
220K    botocore-1.31.44.dist-info

83m! where in (not so unusual) venv all ~30 libs takes 133M (so 63% is botocore) with only cryptography 41.0.3 even close to botocore at 14M.

This is so so so much, alpine linux is whole system with required disk size of 130M.

And looking how the size is evolving we will see 100M in 2024 probably 😞 😢

@takeda
Copy link

takeda commented Sep 11, 2023

I used this myself:

          function join_elements {
            local prefix="-path */" separator=" -o "
            local prefixed=("${@/#/$prefix}")
            local rest=("${prefixed[@]:1}")
            local separated=$(printf "%s" "${prefixed[0]}${rest[@]/#/$separator}")
            echo "${separated[@]}"
          }
          function remove_but_latest {
            local latest=$(ls -1 "botocore/data/$1" | sort -r | head -1)
            find "botocore/data/$1" -mindepth 1 -maxdepth 1 -type d -not -path "*/${latest}" -exec rm -vr '{}' +
          }
          function keep_components {
            local component

            find_params=$(join_elements "$@")
            set -o noglob
            find botocore/data -mindepth 1 -maxdepth 1 -type d -not \( $find_params \) -prune -exec rm -vr '{}' +
            set +o noglob
            for component in "$@"; do remove_but_latest "$component"; done
          }
          keep_components cloudformation dynamodb ec2 elbv2 ssm sso sts

It seems to work, and helped me reduce docker image of my application to 83MB, not sure if it can cause issues.

@rafsaf
Copy link

rafsaf commented Sep 11, 2023

Thanks! that's useful overview.

I ended up with dead simple cp + rm only solution. I'm only using s3 and it seems that everything (or most of things) which ends with *json is also used by the common code.

RUN mkdir /tmp/data \
    && cp -r /usr/local/lib/python3.11/site-packages/botocore/data/s3 /tmp/data/ \ 
    && cp -f /usr/local/lib/python3.11/site-packages/botocore/data/*.json /tmp/data \
    && rm -rf /usr/local/lib/python3.11/site-packages/botocore/data \
    && cp -r /tmp/data /usr/local/lib/python3.11/site-packages/botocore/ \
    && rm -rf /tmp/data

Seems to work just fine, probably until it doesn't.
So having official support would be a nice thing.

PS. above code 83M -> 5M drop.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
feature-request This issue requests a feature. needs-major-version Can only be considered for the next major release p3 This is a minor priority issue
Projects
None yet
Development

No branches or pull requests

10 participants