Skip to content

Releases: NVIDIA/aistore

3.4

28 Mar 22:34
Compare
Choose a tag to compare
3.4

Highlights

  • Kubernetes Operator: a separate repository;
  • Cluster life-cycle management: maintenance (node), decommission and shutdown (node and cluster);
  • CLI: implement (category, verb, subject) auto-completions; numerous improvements;
  • Global Cluster Configuration - versioned, replicated, protected; per-node capability to override global defaults;
  • System Metadata (5 types): unified CCS formatting, instance and meta-versioning, backward compatibility;
  • Authentication and ACLs: add users and roles, docs, tests, bucket and cluster management, CLI;
  • ETL: AIS transport streams, GCP and minikube, CI and Jenkins, stress tests (*);
  • PDU-based intra-cluster transport, streaming objects of unknown size;
  • Performance: memory pooling, in-memory metadata;
  • HDFS on the back – in total, supporting 6 different backends;
  • DNS names - supporting hostnames and/or IPv4 in public and intra-cluster networks;
  • Resilver: resume upon reboot, run on a selected node;
  • Cloud: improve AWS versioning, improve error handling;
  • Erasure Coding: improved stability for Cloud buckets; handle low memory and OOM;
  • Distributed shuffle (dSort): improve stability, optimize memory and CPU usage;
  • All subsystems and core: productization, stabilization, bug fixing, refactoring across board

Core

  • unify bucket(name, provider, namespace) transfer - part one - !3340
  • general: refactor and simplify cold Get - !3345
  • keepalive: do not decrease timeout on connection error - !3444
  • (new) metadata write policy - !3513, !3596
  • LOM minification - !3521
  • fs/cluster: generalize fs.GetContentFQN with interface - !3548
  • cluster: remove some redundancy in CT and LOM structures - !3549
  • cmn: return ReadOpenCloser on ReadOpenCloser.Open - !3562
  • memsys: remove unused SliceReader - !3563
  • LOM: remove one field - !3570
  • LOM: enforce rlock - !3615
  • support (DNS) hostnames for the public and intra-cluster networks - !3616
  • fix passing origURLBck in bucket init - !3720
  • LOM In Flight (LIF) - !3765
  • mem-pool LOM - !3722
  • mem-pool GET, PUT, and COPY runtime - !3825
  • mem-pool LOM (part two) - !3839
  • mem-pool LOM (part three - rebalance) - !3878
  • copy & transform flows: more mem-pooling and refactoring - !3879
  • extend core structs to support possible within-meta-version extensions - !4239
  • on-disk meta-versioning and backward compat (part one) - !4023
  • meta-versioning and backward compatibility (part two) - !4037
  • meta-versioning and backward compatibility (part three) - !4061
  • meta-versioning and backward compatibility (part four) - !4075
  • refactor http-common - !3639, !3641, !3649, !3656, !3660, !3788, !3793, !3816
  • transport: remove roff - !3661
  • transport: pool obj-reader struct - !3662
  • transport: always execute send-completion logic - !3936
  • transport: use interface{} instead of unsafe.Pointer as callback argument - !3953
  • minor refactor of proxy election method - !3582
  • ignore TLS handshake errors - !3586
  • refactor bucket initialization and permission check - !3590
  • add missing unlock in GET object - !3681
  • fix rename to accept bucket with backend buckets - !3684
  • send props on create bucket in message instead of query - !3692
  • pre-decide which type of lock should be taken on CopyObject - !3738
  • ensure that remote object was correctly fetched after UpgradeLock - !3761
  • generalize creating default bucket props - !3775
  • fail early when trying to explicitly create a cloud or HTTP bucket - !3778
  • correctly handle creation of a bucket with backend bucket - !3782
  • fix sending error when part of the object was already sent - !3799
  • determine internally if we should skip validate in defaultBckProps - !3807
  • correctly skip bmd modify when terminate flag is set - !3889
  • revise/unify error handling - !3929
  • fix unlock panic when copying an object - !3935
  • unmarshal HTTPError on call - !3608
  • do not append redundant call frame to error - !3821
  • correctly send error on bucket summary - !4100
  • add new function for extracting QueryBcks from request - !4105
  • unify listing and getting summary - !4106
  • remove err from bckInitArgs struct and rename queryBck - !4123
  • add stronger validation for bucket in bckInitArgs - !4124
  • return error on missing required query parameter - !4126
  • ensure latest bmd when doing list and summary - !4135
  • fix returning error when begin phase in bucket creation fails - !4138
  • correctly handle init remote ais cluster bucket - !4139
  • remove vmd creation on user register - !4152
  • pass error when querying daemon info - !4154
  • initialize bucket on bucket summary - !4164
  • start resilver if the new mountpath was added - !4180
  • revise startup sequence - !4191
  • earlystart: fix possible disconnect between rmd and smap - !3984
  • earlystart: resume global rebalance (part two) - !4000
  • synchronize remote AIS attachments when target joins (part two) - !4202
  • general: fix head object on remote AIS bucket - !4204
  • revise metasync receive - !4206
  • Sync RMD when a node joins - !4193
  • wait for metasync when decommissioning a target - !4087

CLI

  • revamp 'show rebalance' - !3551
  • reorganize proxy and targets templates - !3552
  • do not show nodes status when all online - !3554
  • fix put progress bar report for large uploads - !3556
  • show deployment type of nodes in ais show cluster - !3557
  • ais rename issues - !3560
  • introduce templates table - !3566
  • use colors in messages - !3578
  • add wait flag for copy and etl bucket - !3583
  • correctly handle errors from HTTP server - !3584
  • fix error messages - !3658
  • change the way ais displays daemon configs - !3666
  • fix AuthN errors and a little refactoring - !3677
  • Change the rename command to mv (part two) - !3675
  • panic when setting an invalid bucket property - !3695
  • Improve 'ais show config nonexistent' error - !3711
  • fix matching fail checks in tests - !3715
  • completion for bucket permissions - !3733
  • Update 'show object' to display properties vertically - !3743
  • fix makePairs to accept values with = characters - !3747
  • Remove cp objects from help message - !3764
  • Rebalances cannot be complete if none exist - !3766
  • Add a wait option for mv bucket - !3767
  • fix printing download ID on the start - !3599
  • small ETL improvements - !3603
  • Imply obj name as out_file in get command - !3640
  • change the rename command to mv - !3668
  • highlight bucket properties that differ from default ones - !3686
  • complete option values - !3699
  • Add support for graceful shutdown - !3805
  • improve AuthN UX - !3842
  • revoke token command - !3880
  • AuthN user permissions - !3887
  • TAB-TAB completion and parsing for user-friendly permission when adding role - !3888
  • Introduce object and bucket top level commands - !4029
  • Move rm download/dsort inside job - !4043
  • improve gen-shards input and allow specifing provider - !4048
  • Add ais show auth and ais show bucket - !4062
  • use standard name for no-color flag - !4068
  • Various grammar/wording changes - !3998
  • Introduce cluster and disk top level command - !4006
  • force flag for node maintenance - !4007
  • Introduce job top level command - !4017
  • Introduce advanced top level command - !4018
  • fix setting cluster/daemon config - !4076
  • Fix panic when aliases have subcommands - !4077
  • fix duplicated ais auth show command - !4082
  • refactor parsing bucket or bucket + objName URIs - !4085
  • Get daemon ID from daemon itself instead of API call - !4091
  • add parse functions tests - !4095
  • require bucket only for bucket xactions - !4104
  • show targets in maintenance - !4133
  • allow viewing cluster config - !4147
  • scope based props validation in configure command - !4148
  • Make verbose more consistent for show cluster - !4156
  • Fix ordering in show cluster - !4182
  • add config option to set default provider - !4189
  • Update version number - !4192
  • make object name optional for evict command - !4203
  • local and cluster config displaying - !4230
  • docs improvements and more polish - !4240
  • add 'ais cluster show bmd' command - !4241

Distributed Shuffle (dSort)

  • general: use new optimized msgp method - !3553
  • Use job instead of manager in dSort errors - !3667
  • access lom size only when put has been successful - !3843
  • re-load lom under lock when sending to another target - !3844
  • use cmn.Bck instead of bare name and provider - !4066
  • improve and add more logging - !4093
  • correctly cleanup resources when ignoring duplicated records - !4109
  • ignore targets in maintenance - !4166

Authentication & User management

  • Allow changing bucket permissions for read-only buckets - !3561
  • fix panic when checking restricted user's permissions - !3841
  • correct checking user's role permissions - !3857
  • do not cache generated tokens - !3858
  • AuthN docs - !3881
  • Authn docs and code review - !3892
  • API to change configuration on the fly (1 of 2) - !3947
  • fix returning info for a single cluster, update tests - !3987
  • AuthN refactoring: move to its own package - !4171
  • revise access permissions (ACL) - !3744
  • Improve bucket ACL check - !3757

ETL

  • use user defined ID as name - !3571
  • fix minikube playground etl + cloud providers - !3565
  • minor improvements and refactoring - !3671
  • add tests for target down - !3674
  • use a separate GCP bucket for cloud tests - !3701
  • add better cloud warm/cold ETL object tests - !3703
  • add large bucket test - !3679
  • health observability (part one) - !3683
  • run health tests - !3763
  • wait longer for pods metrics - !3770
  • add tests for occasionally failing transformation - !3785
  • broadcast list request to all targets - !3787
  • make failing ETL test less flac...
Read more

3.3

10 Dec 04:03
Compare
Choose a tag to compare
3.3

Highlights

  • ETL - inline and offline dataset transformations, custom user-defined transformations via both user-provided containers and Python scripts, simplified ETL initialization, ETL directly to and from Cloud buckets;

  • Multi-Cloud capability supporting co-existence and management of datasets originating from (or hosted by) different Cloud storages - !2736, !2737, !2748, !2792, !2793;
  • Maintenance and decommission - the capability to put a clustered node in maintenance mode and/or safely and permanently remove it from the cluster - #947, !2935, !2957, !2983, !2990, !3094;
  • Volume metadata (VMD) - persistent information that describes each clustered node's storage configuration (including data drives, local filesystems, mountpaths) further used to reinforce data integrity and protection - #939, #941, !3118, !3198;
  • New protocol prefixht:// - uniform access to "vanilla" HTTP(S) based datasets - #882, #889;
  • Terraform integration - easy and automated deployment via Terraform - there's a separate repository (of scripts, charts, and documentation) that we use for production deployments;
  • Intra-cluster communications - the transport we use to rebalance user data, transfer erasure-coded slices, copy and transform datasets - a major upgrade !2860, !2895, !2984, !3053, !3055, !3066, !3084, !3085, !3097, !3112, !3181, !3183, !3184, !3187, !3189, !3201, !3265, !3268, !3274, !3286, !3303, !3356, !3357, !3396, !3403, !3409, !3415, !3417.

And also:

  • performance optimizations, CLI usability improvements, refactoring, cleanup, and stability fixes across the board.

Multi-Cloud

A new protocol prefix ht:// (in addition to s3://, gs://, and azure://) for seamless integration and uniform access to "vanilla" HTTP(S) based datasets.

Multi-Cloud via a single deployed runtime. Improved access to public Cloud buckets (from different Cloud providers). Bucket copying and transformations (see ETL below) extended to supports Cloud buckets.

  • New HTTP provider (ht://) - #882, #889
  • Multi-Cloud - added runtime support for bucket management of multiple Cloud providers - !2736, !2737
  • Support multiple regions for AWS buckets - #778, !2804
  • Improve Google provider error handling - !2792, !2793
  • Public GCP buckets can be use without setting PROJECT_ID - !2723
  • Remove default Cloud provider option (provider no must be set explicitly) - 2748
  • Support Cloud-based source/destination in a bucket copy operation - !2975
  • Prefetch performance improvement: keep cached object properties longer - #969

Core

Improve cluster stability in the presence of exceptional events, optimize cluster operation under heavy workloads, introduce maintenance mode, support permanent decommissioning of nodes from the cluster, improve the reliability of bucket destroy operation, optimize and further stabilize cluster rebalancing logic.

  • Node maintenance feature - #947, !2935, !2990, !3094
  • Improved out-of-space (out of capacity) handling - #822
  • Backend buckets vs bucket initialization - !2841
  • Improve cluster stability while it is in transition (when the primary changes) - #945, #968, #960
  • If cluster restarts during rebalancing we will now resume the rebalance - #913
  • Optimize copy-bucket and other bucket-traversing workloads - #917
  • Make promote consistent with other object operations - !2763, !2765
  • Add transfer statistics for resilvering - !2926
  • Configuration option Rebalance. Enabled now; affects only automatic rebalance (manual one can always be started - !2915
  • Reduce resource usage by StatsD (Grafana, Graphite) client - !3240
  • New CLI option --daemon-id to join a node with user-predefined ID - !3255
  • Fix object rename operation to work across different mountpaths - !3329
  • Make destroy bucket operation transactional - !3315
  • Volume meta data (VMD) - persistent information about a node and its storage configuration, used on startup when running node integrity checks - #939, #941, !3118, !3198
  • No metasync when shutting down - !2844
  • Not ignoring errors when listing multiple Cloud providers - !2845
  • Refactor reb (rebalance) package - !2857
  • Refactor target handlers and fix transactions' housekeeping logic - !2869
  • Refactor copy-object interface - !2879
  • Revise and refactor PROMOTE (command and API) - !2880
  • Refactor target copy-object and put-remote interfaces - !2881
  • Use data mover to copy buckets - !2893
  • LOM: fix CopyObject - !2908
  • cmn.JoinWords and friends - !2913
  • Always allow manual rebalance (even if automatic one is disabled) - !2915
  • Mountpath resilvering now counts moved objects and their total size - !2926
  • Copy buckets to return correct total size of copied content - !2919
  • Revise and optimize intra-cluster broadcasting - !2943
  • Improve HrwTargetList performance - !2945
  • Fix zero-size objects scenario - !3531

ETL

Multiple improvements and enhancements to the capability (introduced first with v3.2) to easily run user-defined custom dataset transformations - and scale the performance linearly with each added storage server. This release adds offline (dataset-to-dataset) transformation.

For ETL documentation (that now also includes animated presentations), please refer to docs/etl.md and etl/README.md

  • Add offline, local and cloud, bucket transformation - !2827, !2854, !2898, !3445
  • ETL for objects in the Cloud - !3399
  • ETL build operation - easy initialization based on the function definition - !2873, !2884, !2918, !3369
  • Remove kubectl (shell) calls, use K8s client-go instead - !2896, !2907
  • Support retrieving ETL logs - !2947
  • Stability and performance improvements, bug fixes - !2955, !2977, !3330, !3369, !3374, !3411
  • Add and improve labels in Pods and Services - !3445
  • Improve waiting for the Pod/Service to be ready - !3332, !3397
  • Add extension, prefix, and suffix flags for offline ETL - !2846
  • Support aborting offline ETL - !2850
  • Add dry run option for offline ETL - !2854
  • Simplify flow to initialize ETL - !2853
  • Consistent naming of API constants - !2861
  • ETL build: remove unnecessary annotations - !2871
  • Update skeleton docker images used to run custom Python-based transforms - !2870
  • Install dependencies in initContainer - !2873
  • POD spec: add volume mount - !2883
  • Unify offline ETL with copy-bucket - !2898, !2933
  • Improve waiting for POD-ready - !2912
  • Adddry-run capability - !2939
  • K8s client: pod namespace & refactoring - !2948
  • The capability to throttle ETL (transforms) depending on disk utilizations - !2998

Terraform integration

Dramatically simplified deployment of AIStore cluster on the Cloud via Terraform. This release delivers GKE but can be easily extended to support any Cloud that provides Kubernetes (service). It is now possible to start a fully functional AIStore cluster with a single command - for details, please refer to AIStore Kubernetes repository.

  • Add scripts for easy deployment and shutdown of the AIStore cluster on the cloud - !16, !56-!68, #14, #17
  • Add admin container image - !3079, !3195, !3359
  • Remove requirement for K8S_HOST_NAME environment variable - !3451

Information Center (IC)

More reliable extended action (xaction) status management and reporting, automatic cluster-wide xaction abort, xaction progress notifications (new). In AIS, xaction is a long-lived asynchronous operation, a job.

  • Notify all participating nodes when any one of them aborts xaction - !2928
  • Improve IC status reporting by polling xaction status from targets that have not reported xaction status yet - !2953
  • Fix xaction registration for newly added targets - !2924
  • Support both transactional and non-transactional xactions - !2734
  • Replace target polling with notifications when waiting for xaction to complete - !2868
  • xactions to return user-friendly status - !2865

Downloader

Integration with IC, more robust downloader job handling.

  • Downloader naming; fix mountpath register/unregister - !2842
  • Better job aborting; improved completion mechanisms - #902, !2960
  • Progress Bar: report periodic status and stats to IC (see above) - !2911

Distributed Shuffle (dSort)

Performance improvements, resource usage optimizations.

  • Performance: decrease resource usage - #938
  • Better data transport streams handling - #936, !3307

Erasure Coding (EC)

Resource usage optimizations, better slice checksum handling.

  • Fix checksum when sending constructed slices to other targets - !3073, !3132
  • Improve operation over data transport streams - #916, !3311
  • Fix receiving object slices when the bucket is being destroyed - #887
  • Add support for nodes in maintenance mode - !3404

Intra-cluster communications

The transport that we use to rebalance user data (e. g., when adding/removing nodes), transfer erasure-coded slices, copy and transform datasets has undergone a major upgrade:

  • Add data mover layer - !2860, !2895, !2899
  • Support for short messages and message streams - !2984, !3055, !3066, !3084, !3085, !3097, !3112, !3181, !3183, !3184, !3187, !3189, !3201, !3265, !3268, !3274, !3303
  • Revise and optimize transport stream multiplexing - !3141
  • When done transmitting, wait for data mover quiescence - !2903
  • Support streaming unsized objects - objects of unknown size - the functionality in particular useful when ETL-transforming objects on the fly (that is, inline) - !3356, !3357, !3396, !3403, !3409, !3415, !3417
    *...
Read more

3.2.1

08 Sep 19:02
Compare
Choose a tag to compare
  • bug fixes and improvements:
    • multi-cloud
    • Cloud backend
    • HTTP(s) datasets
    • CLI usability
    • AWS regions
    • docs
    • development playground (docker and Kubernetes)
  • add offline ETL (in-dataset => out-dataset, experimental)

3.2

15 Aug 20:43
Compare
Choose a tag to compare
3.2

Highlights

  • (new) ETL offload: support for running custom extract-transform-load workloads on (and by) storage cluster;
  • (new) TensorFlow integration to support existing training clients that use S3 API - done via tar2tf ETL offload that handles on-the-fly TFRecord/tf.Example conversion;

ETL 1.0

  • List objects v2: optimized list-objects to greatly reduce response times;
  • (new) Query objects: extends list-objects with advanced filtering capabilities;
  • (new) Downloader: an option to keep AIS bucket in-sync with a (downloaded) destination;
  • (new) Information Center (IC), to improve visibility and manageability of the asynchronous batch operations (such as global rebalance, n-way mirroring, erasure coding, ETL, and more);
  • (new) role-based authentication;
  • Distributed Shuffle (dSort) - performance improvements;
  • multi-checksumming, with per-dataset configurable checksum and (new) support for cryptographic checksums.

And also:

  • performance optimizations, CLI usability improvements, erasure coding optimizations, automated no-downtime rebalancing for erasure-coded buckets, refactoring, cleanup, and stability fixes across the board.

Downloader

Skip already downloaded/existing objects, limit download speed, support Azure Cloud, option to synchronize Cloud into AIS bucket, numerous CLI improvements.

  • New API (and CLI) option to keep Cloud bucket and AIS bucket in-sync - #760, !2322
  • Throttle download - #726
  • Download an entire bucket (an option that specifies a range or list of objects to download can now be omitted) - #759
  • Store 3rd party Cloud metadata (version, md5) as part of the AIS object's own metadata; use Cloud metadata for multi-versioning (latest version) and data protection - #701
  • Progress Bar when downloading from Cloud - #773
  • Downloader to support Azure Cloud - #763
  • CLI: download prefix-ed objects - !2204
  • Fix re-downloading a cloud bucket (skip downloading when have identical local replica) - !2221, !2236
  • Downloading a Cloud bucket can be now done only to an AIS bucket that has an associated cloud backend - !2241

Distributed Shuffle (dSort)

Reduce/optimize CPU and memory usage. Refactor and stabilize.

  • CLI usability and improvements - #768
  • Reduce memory usage - !2197
  • Number of workers per mountpath to optimize disk utilization - !2263
  • CLI: Add support for alternative output shard name formats - !2205
  • Use MessagePack instead of JSON for intra-cluster communications - !2262

Authentication server (AuthN)

Replace old basic authentication with a role-based one. Allow a single AuthN server to manage any number of AIS clusters. Add support for both HTTP and HTTPS AIS clusters. More API endpoints require a token issued by AuthN when AuthN is enabled (before this all GET requests worked without any authentication)

  • Use BuntDB to persist all authentication data (instead of previously used separate JSON files) - !2146, !2178
  • Remove (obsolete) user Cloud credentials management - !2146
  • Support multiple AIS clusters with automatic HTTP/HTTPS selection - !2153
  • CLI: new AuthN management commands: add/remove/show user/show cluster - !2153
  • Introduce user roles (admin/cluster owner/bucket owner/read-only) - !2213
  • When AuthN is deployed majority of requests to AIS cluster require to carry valid AuthN token (previously only PUT operations) - !2284

List and Query objects

Revised and fast list-objects. Reduce memory usage. Use MessagePack. Employ bigger pages to speed up listing operations.

Experimental support for the caching - list-objects result can now be used across multiple users/requests.

  • Massive speed-up via streamable listing - #850, #856, #862, #851, !2494
  • list-objects API is now always paged; remove -fast option as obsolete - !2539
  • Use MessagePack for intra-cluster communications; optionally, employ MessagePack for client <=> cluster requests as well - !2568
  • Additional options to control list-objects content: only-cached, include-misplaced - !2613
  • Rename page marker as continuation token and fix paging the semantics accordingly - !2592
  • Use bigger pages (10,000 by default) for AIS buckets; use 10K-size pages for Cloud buckets for only-cached option - !2645

Query objects

New API that extends list-objects with added support for filtering and selection (a so-called inner and outer* SELECT).

  • Add init and next API - #754, !2399
  • Use MessagePack instead of JSON (client side) - !2672
  • Add support for querying Cloud buckets - !2521

Data protection

No more hardcoded xxhash as AIS checksum for objects: any checksum can be selected from a list that currently also includes MD5, SHA, CRC, and can be easily extended.

  • Multiple per-bucket configurable checksums - #722, !2154, !2187
  • SHA-256 and SHA-512 - !2190
  • Self-healing: automatic restore of a corrupted object from EC slices and/or mirrored replicas - !2196

CLI

Numerous improvements and bug fixes. In particular, new command-line options, shorter commands, better readable output, improved TAB-TAB support.

  • Show target uptime in show cluster - #744
  • PUT object from stdin ais put object bck/obj - - #748
  • s3:// and gs:// are aliases for aws:// and gcp:// - !1789
  • Rename register as join (as in: join new cluster node) - !1988
  • TAB-TAB and output improvements - #649, #772, !1888, !1857
  • User-provided checksum and end-to-end data protection - #779
  • Improve show cluster to display a single JSON output - #810
  • Add --chunk-size option for PUT object - !2164
  • Improve show object command - !2185
  • Add search command - !2400
  • All ais start xaction <name> are now ais start <name> - !2448
  • Run LRU on a list of specified buckets - allow user to temporarily override bucket's own LRU configuration - !2493
  • Improve set props command to show what's actually changed - !2479

Erasure Coding (EC)

  • Fix sending calculated slices on PUT objects - !2419
  • CLI: improve EC stats output - #823
  • Improve user experience on PUT - !2366
  • CLI: added options --parity-slices and --data-slices for ais ec-encode command` - !2387
  • Automatically enable EC when user starts erasure-coding of a given bucket (via start xaction or set props CLI, for instance) - !2377

Information Center (IC)

To efficiently and optimally monitor asynchronous operations (jobs), AIStore employs what we call Information Center (IC) - a group of gateways that “own” all the currently running (as well as already finished) jobs in the cluster. Those jobs, codenamed eXtended actions, or xactions, include global rebalance, n-way mirroring, erasure coding, ETL-type distributed workload, and more. IC continuously monitors all async by coordinating with other clustered nodes.

  • Cluster-wide ID for cluster-wide xactions - !2294, !2551
  • Intra-cluster notifications for xactions - !2304, !2326, !2321, !2334, !2378, !2355, !2346
  • 3 (three) IC members by default - !2561
  • Support list- and query-objects caching - !2570
  • Always keep IC members in-sync as far as currently-running and finished async ops - !2639, !2648

Extract-Transform-Load (ETL) locally

  • In-cluster ETL v1.0 - #842, !2659, !2660, !2651
  • Target and ETL affinity - !2451
  • CLI: add support for ETL - !2453
  • List all transformations - !2498
  • aisloader: add support for ETL (for benchmarking) - !2573

AIS loader (aisloader)

Support TAR generating and reading. Support ETL benchmarking via included echo (at https://hub.docker.com/repository/docker/aistore/transformer_echo), md5, and tar2tf ETL containers.

  • Add TAR reader - !2585
  • Add support for standard AIS_ENDPOINT environment variable (options--port and --ip are still supported) - !2642

Local Playground + Kubernetes (for developers)

  • Add minikube based Kubernetes development environment - !2456, !2558, !2508
  • Enable Kubernetes-based testing on GitLab CI - !2510, !2562
  • Enable Kubernetes based tests on Jenkins - !2609, !2685

Build & Release

  • Scripts for automating release management; in particular, scripts to upload released AIS binaries - !2597
  • An option to build aisnode (AIS target and AIS proxy) Alpine Linux-based minimal-footprint docker image - !2709

Miscellaneous

Make names of used environment variables consistent. Introduce $trash directory to keep deleted buckets for a while. Safer and better node startup: assorted APIs are now accessible only after the node is up and running.

Extend Local Playground for developers: add K8s minikube .

  • Rename a bunch of environment variables used by ais/aisloader/cli for consistency - !2133
  • Extend create bucket API (allow setting props) - #782, !2266
  • Added special $trash directory to put deleted buckets to it - !2351
  • Add minukube dev deployment - !2456
  • Node startup vs availability of assorted APIs - !2601, !2624

3.1.1

01 Jun 21:01
Compare
Choose a tag to compare
update docs for CLI about AuthN commands

3.1

13 May 01:21
Compare
Choose a tag to compare
3.1

Highlights

AIStore v3.1 is a significant upgrade with new capabilities that include:

  • remote AIS clustering and unified global namespace
  • Azure Cloud as the 3rd supported Cloud provider (in addition to S3 and Google)
  • Amazon S3 API

New in 3.1

And also:

  • TensorFlow integration (to transparently handle TFRecord and tf.Example formats)
  • performance optimizations
  • CLI usability improvements
  • erasure coding optimizations
  • automated no-downtime rebalancing for erasure-coded buckets
  • refactoring, cleanup, and stability fixes across the board

Core

  • remote AIS clustering, unified global namespace: #602, #667, !1937, !1954, !1958, !1959, !1963, !1964, !1965, !1966
  • Azure Cloud: !1856
  • Amazon S3 API: #690, #691
  • TensorFlow integration: #642, !2099
  • evict range, delete range, and prefetch range operations are now asynchronous: #641, !1778, !1785
  • cluster startup stability fixes and improvements: #707, !2084, !2047
  • new environment variable AIS_PRIMARY_ID: #706, !2033
  • EC rebalance speedup and improvements: #558, #670, !1765
  • return 503 (Service Unavailable) when a node is starting up but not ready yet: !2020
  • return 403 (Forbidden) when operation on object, bucket, or cluster is not permitted: !2121
  • new bucket property creation_date: !2010
  • new bucket property backend_bck for AIS bucket connected to a Cloud one - it contains a name of a parent cloud bucket: !2096
  • control-plane cluster-wide 2PC transactions to create, rename, destroy buckets, change bucket properties, etc.: !1852, !1862, !1876, !1844, !1825
  • new config option to avoid starting global rebalance at cluster startup (rebalance.dont_run_time): !2048
  • improved HTTPS support by all AIS built-in clients and components: !2106
  • new and extended bucket access permissions: !2121

Config

  • new EC rebalance tunable batch_size: !1922
  • move client-related timeouts to a separate config section (client): !1901

Downloader

  • improve object downloading (retrying and checking for existence): !2024, !2026
  • fix downloading timeout issue for big objects: !2057
  • improve/extend CLI job info (error list, ETA, progress): #725, !2069, !2061, !2062
  • new CLI option to limit concurrency while downloading: !2088
  • download list of objects from GCP: !2114
  • support HTTPS links on the clients' side: !2119

CLI

  • remote AIS cluster support: #683
  • remove --provider flag in favor of provider://bucket_name syntax: !1763
  • simplify ls command by moving subcommands to show command: !1786
  • new command wait to wait for xaction/dSort job/download job finishes: #645
  • new command cat to show object's content: #646
  • new commands attach remote and detach remote (cluster): !1968
  • new commands attach mountpath and detach mountpath: !1986
  • new command set primary: !2053
  • rename compose command as concat: !1745
  • add --dry-run flag for put, evict, delete, and prefetch commands: #636, !1828
  • make ais put more intuitive when generating object names from file paths: #640
  • ranged prefetch/evict/delete operation uses the same pattern rules as dSort and downloader: !1793
  • add bucket namespaces: #602, !1943
  • command and flags renamings and regrouping, TAB-TAB completion improvements: #649, !1745, !1786, !1763, !1818, !1988, !2006
  • fix various panics when processing TAB completions: !1923

AIS FS

  • fix object listing (ls) for large buckets: #644

Rebalance

  • multiple fixes, improvements

Documentation

  • revise/extend AIStore Authentication Server (AuthN)
  • add numerous CLI usage examples
  • extend and revise Downloader sections
  • document CLI to attach, detach and show remote clusters
  • revise sections describing cloud providers; add Azure
  • rewrite AIStore overview
  • cluster rebalance: update docs and CLI

AuthN

  • to support Kubernetes secrets, read security settings from an environment variable: !2130

3.0

25 Feb 06:15
Compare
Choose a tag to compare
3.0

Highlights

  • new on-disk layout optimized for per-bucket management policies, namespace partitioning, and cloud provider isolation
    • in addition to checksum, all metadata is now versioned to support backward compatibility when (and if) there are any future changes
    • global (cluster-wide) control structures - cluster map and bucket metadata - are now uniformly GUID-protected and compressed
    • bucket metadata, in particular, exists in multiple protected copies on data drives of all storage targets
  • added AIS as the 3rd fully supported Cloud Provider (in addition to Amazon S3 and Google Cloud)
  • global (cluster-wide) rebalancing:
    • improved, optimized, and enhanced rebalancing logic
    • revised to run stage by (enumerated) stage whereby the stages get synchronized across all targets
    • added support for erasure-coded buckets
    • stabilized long-running operation in the presence of network failures, drive faults, cluster partitioning, administrative restarts
    • will retransmit any migrating object (or EC slice of an object) that didn't get acknowledged
  • resilvering: support erasure-coded buckets
  • CLI: usability improvements, APPEND, dSort configuration
  • AIS FS: namespace caching, config reload/refresh at runtime

Core

  • new on-disk layout (#580, #578, #594)
  • LOM on-disk (#604)
  • bucket groups and namespaces (!1616, !1608, !1607, !1598, !1597, !1593)
  • AIS cluster to cluster connectivity, AIS as a new Cloud Provider (#584)
  • Smap and BMD cluster-wide consistency (#542, !1159, !1154, !1549)
  • rebalance erasure coded buckets (#577, !1651)
  • erasure coding: improve and optimize on-disk metadata representation (!1468)
  • configuration changes: versioning (!1461), Cloud Provider (!1572, !1594)
  • reliable register (join)/unregister node (!1648)
  • improve AWS versioning support (!1471)
  • better and more reliable out-of-space handling (!1696)
  • memory management and Slab allocation; small-size allocator and its usage for LOM (!1685)
  • rebalance/intra-cluster transport: optimize-out heap allocations (!1650)

CLI

  • usability improvements (!1163)
  • new bucket summary (!1505)
  • APPEND API (#612, !1701)
  • allow to override dSort configuration (!1692)
  • revise show xaction (!1704)

AIS FS: FUSE-based mountable filesystem to access objects as files

  • directory caching to optimize POSIX lookups (#563, #566, !1469)
  • config reload/refresh without unmounting (#568)

Rebalance

  • ACK and retransmit (#583)
  • support containerized deployments (#570)
  • recommence interrupted rebalance upon startup (!1661)

API/SDK

  • HEAD object request will now return erasure coding info as well (!1550)
  • fast bucket list (ls) now supports paging (!1475)
  • (bucket, provider, namespace) triplet structure used across numerous API calls (!1598, !1608, !1616)

Development

  • make and build: enhancements and improvements to consolidate most of (and most often used) build, run, and test operations (#564, !1466, !1483, !1498, !1512)
  • add support for Darwin (OSX/Mac) (!1526)

Kubernetes; containerized deployments

  • revise node labeling; fix aisnode container start script (!1725)
  • demo infrastructure for GTC; assorted fixes (!1699)
  • single-node-aistore: docker image for easy and fast turn-key single-host deployments

Documentation

  • on-disk layout
  • multiple corrections and additions