-
Notifications
You must be signed in to change notification settings - Fork 170
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Add an example using Mountpoint with PyTorch #440
Conversation
I'd like to start collecting a few examples of how to use Mountpoint for stuff. This is the first one: using Mountpoint as a PyTorch data loader. The goal is really just to show how to do it, and maybe say a little about how well it works. For now, this doesn't run in CI (need a GPU instance), will work on that later. Signed-off-by: James Bornholt <bornholt@amazon.com>
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Thanks for adding this. It's really nice to see how we can use Mountpoint in real world applications.
Signed-off-by: James Bornholt <bornholt@amazon.com>
s3 = boto3.client("s3") | ||
ds = torchvision.datasets.FakeData(size=num_images, image_size=(3, 224, 224), num_classes=100) | ||
|
||
with tempfile.TemporaryDirectory() as tempdir: |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Would be nice to have the with / without sharded benchmark. This requires code changes and this isn't always an assumption to be made about user dataset.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Agreed! This is just a first example, but we're looking at non-sharded versions too.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Thanks James, super nice to have this example accessible with zero ML experience required.
Submodule mountpoint-s3-crt-sys/crt/aws-c-cal 96c47e3..11fc684: > Make AES GCM more consistent cross platform (awslabs#189) > Pin AWS-LC until it's fixed for manylinux1 (awslabs#188) > Implement runtime check on libcrypto linkage (awslabs#186) > clang-format 18 (awslabs#187) Submodule mountpoint-s3-crt-sys/crt/aws-c-common 06cf4d8..6d974f9: > cbor support (awslabs#1131) > Fix default thread options for windows to not pin to any cpu_id (awslabs#1126) > Use CBMC 6.0.0 (awslabs#1128) > latest_submodules.py uses AWS-LC-FIPS releases in aws-crt-java (awslabs#1125) > Use CBMC version 5.95.1 (awslabs#1124) > clang-format 18 (awslabs#1113) > disable optimization was not working (awslabs#1123) > Fix memtracer bad assumptions on the size of stack trace (awslabs#1122) Submodule mountpoint-s3-crt-sys/crt/aws-c-s3 6588f9a..cb431ba: > test_helper.py improvements (awslabs#442) > Fix shutdown_callback or returning NULL contract for meta_request (awslabs#440) > BREAKING CHANGE: operation_name must be set for DEFAULT meta-requests (awslabs#439) > clang-format 18 (awslabs#438) > Auto - Update S3 Ruleset & Partition (awslabs#436) Submodule mountpoint-s3-crt-sys/crt/aws-lc 92bf532..4368aaa: > Fix for loading JCA stripped private keys (#1658) > Prepare for release v1.30.1 (#1657) > Revert `_CET_ENDBR` (#1656) > Close FD in Snapsafe test function (#1649) > Prepare for release v1.30.0 (#1646) > Snapsafe-type uniqueness breaking event detection (#1640) > Add EVP_md_null and SSL_set_ciphersuites (#1637) > Add de-randomized ML-KEM modes to experimental EVP API (#1578) > Patch for OpenVPN certificate setting behavioral difference (#1643) > Require newer assembler for _CET_ENDBR (#1641) > OpenVPN error codes, SSL_get_peer_signature_* funcs, and first patch file (#1584) > NIST.SP.800-56Cr2 One-Step Key Derivation (#1607) > Upstream merge 2024-06-13 (#1636) > More minor symbols for Ruby support (#1581) > Add support for NETSCAPE_SPKI_print (#1624) > align gcc version with curl's CI (#1633) > Fix spelling nits > Generated ASM files > Add Intel Indirect Branch Tracking support. > [EC] Unify point addition for P-256/384/521 (#1602) > Upstream merge 2024 06 03 (#1621) > Fix AES key size for AES256 in ABI test (#1629) > Move SSL_CIPHER_get_version test to SSLVersionTest.Version (#1631) > Use 'nasm' not 'yasm' (#1630) > Prepare for release 1.29.0 (#1626) > Implement SSL_CIPHER_get_version for recent TLS versions (#1627) > Add integration tests for OpenSSL-linking 3p modules (#1587) > Prevent non-constant-time code in Kyber-R3 and ML-KEM implementation (#1619) > Update ec2-test-framework to use gv2 (#1623) > Script for creating compilation database (#1617) > Fixes for building with `-pedantic` (#1608) > Fix SSL_BUILD_CHAIN_FLAG_IGNORE_ERROR behavior (#1620) > Update for FIPS documentation (#1610) > Disable CI for gcc-14/FIPS until relocation issue is resolved (#1622) > Add support for ocsp get id (#1609) > Add libevent to GitHub integration CI (#1615) > Upstream merge 2024 05 17 (#1600) > add back ASN1_dup with tests (#1591) > Remove special aarch64 valgrind logic (#1618) > Fix NTP integ test (#1616) > Pin aws-lc-rs integ to nightly-2024-05-22 (#1612) > Cleanse the right amount of bytes in HMAC. (#1613) > add support for X509_CRL_http_nbio (#1596) > Add `all_fuzz_tests` build target (#1605) > Fix mariadb ssl_crl patch (#1606) Submodule mountpoint-s3-crt-sys/crt/s2n-tls 6d92b46..073c7b4: > bug: Fixing bash error (#4624) > chore: make cbmc proof build more strict by adding -Werror flag (#4606) > Perform 2-RTT Handshake to upgrade to PQ when possible (#4526) > test(bindings/s2n-tls): refactor testing::s2n-tls tests (#4613) > docs: add timeout note to blinding delay docs (#4621) > docs: Add back suggested FIPS + TLS1.3 policy (#4605) > ci: shallow clone musl repo (#4611) > example(bindings): add async ConfigResolver (#4477) > chore: use CBMC version 5.95.1 (#4586) > s2n-tls rust binding: expose selected application protocol (#4599) > test: add pcap testing crate (#4604) > testing(bindings): add new test helper (#4596) > chore(bindings): fix shebang in generate.sh (#4603) > fix(s2n_session_ticket_test): correct clock mocking (#4602) > Fix: update default cert chain for unit tests (#4582) > refactor(binding): more accurate naming for const str helper (#4601) > fix: error rather than empty cipher suites (#4597) > chore: update s2n_stuffer_printf CBMC harness (#4531) > ci(nix): Fix integ pq test in a devShell (#4576) > feature: new compatibility-focused security policy preferring ECDSA (#4579) > compliance: update generate_report.sh to point to compliance directory (#4588) > ci: fix cppcheck errors (#4589) > chore: cleanup duplicate duvet citations (#4587) > Merge pull request from GHSA-52xf-5p2m-9wrv > chore(bindings): release 0.2.7 (#4580) > fix: Validate received signature algorithm in EVP verify (#4574) > refactor: add try_compile feature probe for RSA-PSS signing (#4569) > feat: Configurable blinding (#4562) > docs: document s2n_cert_auth_type behavior (#4454) > fix: init implicit iv for serialization feature (#4572) > [Nix] adjust pytest retrys (#4558) > fix: cert verify test fix (#4545) > fix: update default security policies (#4523) > feat(bindings): Associate an application context with a Connection (#4563) > chore(bindings): version bump (#4566) > Additional test cases for s2n_constant_time_equals() (#4559) > test: backwards compatibility test for the serialization feature (#4548) > chore(bench): upgrade rustls (#4554) Signed-off-by: Alessandro Passaro <alexpax@amazon.com>
…#935) * Update CRT submodules to latest releases Submodule mountpoint-s3-crt-sys/crt/aws-c-cal 96c47e3..11fc684: > Make AES GCM more consistent cross platform (#189) > Pin AWS-LC until it's fixed for manylinux1 (#188) > Implement runtime check on libcrypto linkage (#186) > clang-format 18 (#187) Submodule mountpoint-s3-crt-sys/crt/aws-c-common 06cf4d8..6d974f9: > cbor support (#1131) > Fix default thread options for windows to not pin to any cpu_id (#1126) > Use CBMC 6.0.0 (#1128) > latest_submodules.py uses AWS-LC-FIPS releases in aws-crt-java (#1125) > Use CBMC version 5.95.1 (#1124) > clang-format 18 (#1113) > disable optimization was not working (#1123) > Fix memtracer bad assumptions on the size of stack trace (#1122) Submodule mountpoint-s3-crt-sys/crt/aws-c-s3 6588f9a..cb431ba: > test_helper.py improvements (#442) > Fix shutdown_callback or returning NULL contract for meta_request (#440) > BREAKING CHANGE: operation_name must be set for DEFAULT meta-requests (#439) > clang-format 18 (#438) > Auto - Update S3 Ruleset & Partition (#436) Submodule mountpoint-s3-crt-sys/crt/aws-lc 92bf532..4368aaa: > Fix for loading JCA stripped private keys (#1658) > Prepare for release v1.30.1 (#1657) > Revert `_CET_ENDBR` (#1656) > Close FD in Snapsafe test function (#1649) > Prepare for release v1.30.0 (#1646) > Snapsafe-type uniqueness breaking event detection (#1640) > Add EVP_md_null and SSL_set_ciphersuites (#1637) > Add de-randomized ML-KEM modes to experimental EVP API (#1578) > Patch for OpenVPN certificate setting behavioral difference (#1643) > Require newer assembler for _CET_ENDBR (#1641) > OpenVPN error codes, SSL_get_peer_signature_* funcs, and first patch file (#1584) > NIST.SP.800-56Cr2 One-Step Key Derivation (#1607) > Upstream merge 2024-06-13 (#1636) > More minor symbols for Ruby support (#1581) > Add support for NETSCAPE_SPKI_print (#1624) > align gcc version with curl's CI (#1633) > Fix spelling nits > Generated ASM files > Add Intel Indirect Branch Tracking support. > [EC] Unify point addition for P-256/384/521 (#1602) > Upstream merge 2024 06 03 (#1621) > Fix AES key size for AES256 in ABI test (#1629) > Move SSL_CIPHER_get_version test to SSLVersionTest.Version (#1631) > Use 'nasm' not 'yasm' (#1630) > Prepare for release 1.29.0 (#1626) > Implement SSL_CIPHER_get_version for recent TLS versions (#1627) > Add integration tests for OpenSSL-linking 3p modules (#1587) > Prevent non-constant-time code in Kyber-R3 and ML-KEM implementation (#1619) > Update ec2-test-framework to use gv2 (#1623) > Script for creating compilation database (#1617) > Fixes for building with `-pedantic` (#1608) > Fix SSL_BUILD_CHAIN_FLAG_IGNORE_ERROR behavior (#1620) > Update for FIPS documentation (#1610) > Disable CI for gcc-14/FIPS until relocation issue is resolved (#1622) > Add support for ocsp get id (#1609) > Add libevent to GitHub integration CI (#1615) > Upstream merge 2024 05 17 (#1600) > add back ASN1_dup with tests (#1591) > Remove special aarch64 valgrind logic (#1618) > Fix NTP integ test (#1616) > Pin aws-lc-rs integ to nightly-2024-05-22 (#1612) > Cleanse the right amount of bytes in HMAC. (#1613) > add support for X509_CRL_http_nbio (#1596) > Add `all_fuzz_tests` build target (#1605) > Fix mariadb ssl_crl patch (#1606) Submodule mountpoint-s3-crt-sys/crt/s2n-tls 6d92b46..073c7b4: > bug: Fixing bash error (#4624) > chore: make cbmc proof build more strict by adding -Werror flag (#4606) > Perform 2-RTT Handshake to upgrade to PQ when possible (#4526) > test(bindings/s2n-tls): refactor testing::s2n-tls tests (#4613) > docs: add timeout note to blinding delay docs (#4621) > docs: Add back suggested FIPS + TLS1.3 policy (#4605) > ci: shallow clone musl repo (#4611) > example(bindings): add async ConfigResolver (#4477) > chore: use CBMC version 5.95.1 (#4586) > s2n-tls rust binding: expose selected application protocol (#4599) > test: add pcap testing crate (#4604) > testing(bindings): add new test helper (#4596) > chore(bindings): fix shebang in generate.sh (#4603) > fix(s2n_session_ticket_test): correct clock mocking (#4602) > Fix: update default cert chain for unit tests (#4582) > refactor(binding): more accurate naming for const str helper (#4601) > fix: error rather than empty cipher suites (#4597) > chore: update s2n_stuffer_printf CBMC harness (#4531) > ci(nix): Fix integ pq test in a devShell (#4576) > feature: new compatibility-focused security policy preferring ECDSA (#4579) > compliance: update generate_report.sh to point to compliance directory (#4588) > ci: fix cppcheck errors (#4589) > chore: cleanup duplicate duvet citations (#4587) > Merge pull request from GHSA-52xf-5p2m-9wrv > chore(bindings): release 0.2.7 (#4580) > fix: Validate received signature algorithm in EVP verify (#4574) > refactor: add try_compile feature probe for RSA-PSS signing (#4569) > feat: Configurable blinding (#4562) > docs: document s2n_cert_auth_type behavior (#4454) > fix: init implicit iv for serialization feature (#4572) > [Nix] adjust pytest retrys (#4558) > fix: cert verify test fix (#4545) > fix: update default security policies (#4523) > feat(bindings): Associate an application context with a Connection (#4563) > chore(bindings): version bump (#4566) > Additional test cases for s2n_constant_time_equals() (#4559) > test: backwards compatibility test for the serialization feature (#4548) > chore(bench): upgrade rustls (#4554) Signed-off-by: Alessandro Passaro <alexpax@amazon.com> * Try to reduce package size Signed-off-by: Alessandro Passaro <alexpax@amazon.com> * Set operation_name when using MetaRequestType::Default Signed-off-by: Alessandro Passaro <alexpax@amazon.com> * Introduce S3Operation type Signed-off-by: Alessandro Passaro <alexpax@amazon.com> --------- Signed-off-by: Alessandro Passaro <alexpax@amazon.com> Co-authored-by: Alessandro Passaro <alexpax@amazon.com>
…awslabs#935) * Update CRT submodules to latest releases Submodule mountpoint-s3-crt-sys/crt/aws-c-cal 96c47e3..11fc684: > Make AES GCM more consistent cross platform (awslabs#189) > Pin AWS-LC until it's fixed for manylinux1 (awslabs#188) > Implement runtime check on libcrypto linkage (awslabs#186) > clang-format 18 (awslabs#187) Submodule mountpoint-s3-crt-sys/crt/aws-c-common 06cf4d8..6d974f9: > cbor support (awslabs#1131) > Fix default thread options for windows to not pin to any cpu_id (awslabs#1126) > Use CBMC 6.0.0 (awslabs#1128) > latest_submodules.py uses AWS-LC-FIPS releases in aws-crt-java (awslabs#1125) > Use CBMC version 5.95.1 (awslabs#1124) > clang-format 18 (awslabs#1113) > disable optimization was not working (awslabs#1123) > Fix memtracer bad assumptions on the size of stack trace (awslabs#1122) Submodule mountpoint-s3-crt-sys/crt/aws-c-s3 6588f9a..cb431ba: > test_helper.py improvements (awslabs#442) > Fix shutdown_callback or returning NULL contract for meta_request (awslabs#440) > BREAKING CHANGE: operation_name must be set for DEFAULT meta-requests (awslabs#439) > clang-format 18 (awslabs#438) > Auto - Update S3 Ruleset & Partition (awslabs#436) Submodule mountpoint-s3-crt-sys/crt/aws-lc 92bf532..4368aaa: > Fix for loading JCA stripped private keys (#1658) > Prepare for release v1.30.1 (#1657) > Revert `_CET_ENDBR` (#1656) > Close FD in Snapsafe test function (#1649) > Prepare for release v1.30.0 (#1646) > Snapsafe-type uniqueness breaking event detection (#1640) > Add EVP_md_null and SSL_set_ciphersuites (#1637) > Add de-randomized ML-KEM modes to experimental EVP API (#1578) > Patch for OpenVPN certificate setting behavioral difference (#1643) > Require newer assembler for _CET_ENDBR (#1641) > OpenVPN error codes, SSL_get_peer_signature_* funcs, and first patch file (#1584) > NIST.SP.800-56Cr2 One-Step Key Derivation (#1607) > Upstream merge 2024-06-13 (#1636) > More minor symbols for Ruby support (#1581) > Add support for NETSCAPE_SPKI_print (#1624) > align gcc version with curl's CI (#1633) > Fix spelling nits > Generated ASM files > Add Intel Indirect Branch Tracking support. > [EC] Unify point addition for P-256/384/521 (#1602) > Upstream merge 2024 06 03 (#1621) > Fix AES key size for AES256 in ABI test (#1629) > Move SSL_CIPHER_get_version test to SSLVersionTest.Version (#1631) > Use 'nasm' not 'yasm' (#1630) > Prepare for release 1.29.0 (#1626) > Implement SSL_CIPHER_get_version for recent TLS versions (#1627) > Add integration tests for OpenSSL-linking 3p modules (#1587) > Prevent non-constant-time code in Kyber-R3 and ML-KEM implementation (#1619) > Update ec2-test-framework to use gv2 (#1623) > Script for creating compilation database (#1617) > Fixes for building with `-pedantic` (#1608) > Fix SSL_BUILD_CHAIN_FLAG_IGNORE_ERROR behavior (#1620) > Update for FIPS documentation (#1610) > Disable CI for gcc-14/FIPS until relocation issue is resolved (#1622) > Add support for ocsp get id (#1609) > Add libevent to GitHub integration CI (#1615) > Upstream merge 2024 05 17 (#1600) > add back ASN1_dup with tests (#1591) > Remove special aarch64 valgrind logic (#1618) > Fix NTP integ test (#1616) > Pin aws-lc-rs integ to nightly-2024-05-22 (#1612) > Cleanse the right amount of bytes in HMAC. (#1613) > add support for X509_CRL_http_nbio (#1596) > Add `all_fuzz_tests` build target (#1605) > Fix mariadb ssl_crl patch (#1606) Submodule mountpoint-s3-crt-sys/crt/s2n-tls 6d92b46..073c7b4: > bug: Fixing bash error (#4624) > chore: make cbmc proof build more strict by adding -Werror flag (#4606) > Perform 2-RTT Handshake to upgrade to PQ when possible (#4526) > test(bindings/s2n-tls): refactor testing::s2n-tls tests (#4613) > docs: add timeout note to blinding delay docs (#4621) > docs: Add back suggested FIPS + TLS1.3 policy (#4605) > ci: shallow clone musl repo (#4611) > example(bindings): add async ConfigResolver (#4477) > chore: use CBMC version 5.95.1 (#4586) > s2n-tls rust binding: expose selected application protocol (#4599) > test: add pcap testing crate (#4604) > testing(bindings): add new test helper (#4596) > chore(bindings): fix shebang in generate.sh (#4603) > fix(s2n_session_ticket_test): correct clock mocking (#4602) > Fix: update default cert chain for unit tests (#4582) > refactor(binding): more accurate naming for const str helper (#4601) > fix: error rather than empty cipher suites (#4597) > chore: update s2n_stuffer_printf CBMC harness (#4531) > ci(nix): Fix integ pq test in a devShell (#4576) > feature: new compatibility-focused security policy preferring ECDSA (#4579) > compliance: update generate_report.sh to point to compliance directory (#4588) > ci: fix cppcheck errors (#4589) > chore: cleanup duplicate duvet citations (#4587) > Merge pull request from GHSA-52xf-5p2m-9wrv > chore(bindings): release 0.2.7 (#4580) > fix: Validate received signature algorithm in EVP verify (#4574) > refactor: add try_compile feature probe for RSA-PSS signing (#4569) > feat: Configurable blinding (#4562) > docs: document s2n_cert_auth_type behavior (#4454) > fix: init implicit iv for serialization feature (#4572) > [Nix] adjust pytest retrys (#4558) > fix: cert verify test fix (#4545) > fix: update default security policies (#4523) > feat(bindings): Associate an application context with a Connection (#4563) > chore(bindings): version bump (#4566) > Additional test cases for s2n_constant_time_equals() (#4559) > test: backwards compatibility test for the serialization feature (#4548) > chore(bench): upgrade rustls (#4554) Signed-off-by: Alessandro Passaro <alexpax@amazon.com> * Try to reduce package size Signed-off-by: Alessandro Passaro <alexpax@amazon.com> * Set operation_name when using MetaRequestType::Default Signed-off-by: Alessandro Passaro <alexpax@amazon.com> * Introduce S3Operation type Signed-off-by: Alessandro Passaro <alexpax@amazon.com> --------- Signed-off-by: Alessandro Passaro <alexpax@amazon.com> Co-authored-by: Alessandro Passaro <alexpax@amazon.com>
I'd like to start collecting a few examples of how to use Mountpoint for
stuff. This is the first one: using Mountpoint as a PyTorch data loader.
The goal is really just to show how to do it, and maybe say a little
about how well it works.
For now, this doesn't run in CI (need a GPU instance), will work on that
later.
I can imagine other examples we'd like to build and stick in this new
examples
directory:By submitting this pull request, I confirm that my contribution is made under the terms of the Apache 2.0 license and I agree to the terms of the Developer Certificate of Origin (DCO).