Releases: NVIDIA/aistore
3.4
Highlights
- Kubernetes Operator: a separate repository;
- Cluster life-cycle management: maintenance (node), decommission and shutdown (node and cluster);
- CLI: implement (category, verb, subject) auto-completions; numerous improvements;
- Global Cluster Configuration - versioned, replicated, protected; per-node capability to override global defaults;
- System Metadata (5 types): unified CCS formatting, instance and meta-versioning, backward compatibility;
- Authentication and ACLs: add users and roles, docs, tests, bucket and cluster management, CLI;
- ETL: AIS transport streams, GCP and minikube, CI and Jenkins, stress tests (*);
- PDU-based intra-cluster transport, streaming objects of unknown size;
- Performance: memory pooling, in-memory metadata;
- HDFS on the back – in total, supporting 6 different backends;
- DNS names - supporting hostnames and/or IPv4 in public and intra-cluster networks;
- Resilver: resume upon reboot, run on a selected node;
- Cloud: improve AWS versioning, improve error handling;
- Erasure Coding: improved stability for Cloud buckets; handle low memory and OOM;
- Distributed shuffle (dSort): improve stability, optimize memory and CPU usage;
- All subsystems and core: productization, stabilization, bug fixing, refactoring across board
Core
- unify bucket(name, provider, namespace) transfer - part one - !3340
- general: refactor and simplify cold Get - !3345
- keepalive: do not decrease timeout on connection error - !3444
- (new) metadata write policy - !3513, !3596
- LOM minification - !3521
- fs/cluster: generalize
fs.GetContentFQN
with interface - !3548 - cluster: remove some redundancy in
CT
andLOM
structures - !3549 - cmn: return
ReadOpenCloser
onReadOpenCloser.Open
- !3562 - memsys: remove unused
SliceReader
- !3563 - LOM: remove one field - !3570
- LOM: enforce
rlock
- !3615 - support (DNS) hostnames for the public and intra-cluster networks - !3616
- fix passing
origURLBck
in bucket init - !3720 - LOM In Flight (LIF) - !3765
- mem-pool LOM - !3722
- mem-pool GET, PUT, and COPY runtime - !3825
- mem-pool
LOM
(part two) - !3839 - mem-pool LOM (part three - rebalance) - !3878
- copy & transform flows: more mem-pooling and refactoring - !3879
- extend core structs to support possible within-meta-version extensions - !4239
- on-disk meta-versioning and backward compat (part one) - !4023
- meta-versioning and backward compatibility (part two) - !4037
- meta-versioning and backward compatibility (part three) - !4061
- meta-versioning and backward compatibility (part four) - !4075
- refactor http-common - !3639, !3641, !3649, !3656, !3660, !3788, !3793, !3816
- transport: remove
roff
- !3661 - transport: pool obj-reader struct - !3662
- transport: always execute send-completion logic - !3936
- transport: use
interface{}
instead ofunsafe.Pointer
as callback argument - !3953 - minor refactor of proxy election method - !3582
- ignore TLS handshake errors - !3586
- refactor bucket initialization and permission check - !3590
- add missing unlock in GET object - !3681
- fix rename to accept bucket with backend buckets - !3684
- send props on create bucket in message instead of query - !3692
- pre-decide which type of lock should be taken on
CopyObject
- !3738 - ensure that remote object was correctly fetched after
UpgradeLock
- !3761 - generalize creating default bucket props - !3775
- fail early when trying to explicitly create a cloud or HTTP bucket - !3778
- correctly handle creation of a bucket with backend bucket - !3782
- fix sending error when part of the object was already sent - !3799
- determine internally if we should skip validate in
defaultBckProps
- !3807 - correctly skip bmd modify when
terminate
flag is set - !3889 - revise/unify error handling - !3929
- fix unlock panic when copying an object - !3935
- unmarshal
HTTPError
oncall
- !3608 - do not append redundant call frame to error - !3821
- correctly send error on bucket summary - !4100
- add new function for extracting
QueryBcks
from request - !4105 - unify listing and getting summary - !4106
- remove
err
frombckInitArgs
struct and renamequeryBck
- !4123 - add stronger validation for bucket in
bckInitArgs
- !4124 - return error on missing required query parameter - !4126
- ensure latest bmd when doing list and summary - !4135
- fix returning error when begin phase in bucket creation fails - !4138
- correctly handle init remote ais cluster bucket - !4139
- remove vmd creation on user register - !4152
- pass error when querying daemon info - !4154
- initialize bucket on bucket summary - !4164
- start resilver if the new mountpath was added - !4180
- revise startup sequence - !4191
- earlystart: fix possible disconnect between rmd and smap - !3984
- earlystart: resume global rebalance (part two) - !4000
- synchronize remote AIS attachments when target joins (part two) - !4202
- general: fix head object on remote AIS bucket - !4204
- revise metasync receive - !4206
- Sync RMD when a node joins - !4193
- wait for
metasync
when decommissioning a target - !4087
CLI
- revamp 'show rebalance' - !3551
- reorganize proxy and targets templates - !3552
- do not show nodes status when all online - !3554
- fix
put
progress bar report for large uploads - !3556 - show deployment type of nodes in ais show cluster - !3557
ais rename
issues - !3560- introduce templates table - !3566
- use colors in messages - !3578
- add wait flag for copy and etl bucket - !3583
- correctly handle errors from HTTP server - !3584
- fix error messages - !3658
- change the way ais displays daemon configs - !3666
- fix AuthN errors and a little refactoring - !3677
- Change the rename command to mv (part two) - !3675
- panic when setting an invalid bucket property - !3695
- Improve 'ais show config nonexistent' error - !3711
- fix matching fail checks in tests - !3715
- completion for bucket permissions - !3733
- Update 'show object' to display properties vertically - !3743
- fix
makePairs
to accept values with=
characters - !3747 - Remove
cp objects
from help message - !3764 - Rebalances cannot be complete if none exist - !3766
- Add a wait option for mv bucket - !3767
- fix printing download ID on the start - !3599
- small ETL improvements - !3603
- Imply obj name as out_file in get command - !3640
- change the
rename
command tomv
- !3668 - highlight bucket properties that differ from default ones - !3686
- complete option values - !3699
- Add support for graceful shutdown - !3805
- improve AuthN UX - !3842
- revoke token command - !3880
- AuthN user permissions - !3887
- TAB-TAB completion and parsing for user-friendly permission when adding role - !3888
- Introduce
object
andbucket
top level commands - !4029 - Move
rm download/dsort
insidejob
- !4043 - improve
gen-shards
input and allow specifing provider - !4048 - Add
ais show auth
andais show bucket
- !4062 - use standard name for no-color flag - !4068
- Various grammar/wording changes - !3998
- Introduce
cluster
anddisk
top level command - !4006 - force flag for node maintenance - !4007
- Introduce
job
top level command - !4017 - Introduce
advanced
top level command - !4018 - fix setting cluster/daemon config - !4076
- Fix panic when aliases have subcommands - !4077
- fix duplicated ais auth show command - !4082
- refactor parsing bucket or bucket + objName URIs - !4085
- Get daemon ID from daemon itself instead of API call - !4091
- add parse functions tests - !4095
- require bucket only for bucket xactions - !4104
- show targets in maintenance - !4133
- allow viewing cluster config - !4147
- scope based props validation in configure command - !4148
- Make verbose more consistent for show cluster - !4156
- Fix ordering in
show cluster
- !4182 - add config option to set default provider - !4189
- Update version number - !4192
- make object name optional for evict command - !4203
- local and cluster config displaying - !4230
- docs improvements and more polish - !4240
- add 'ais cluster show bmd' command - !4241
Distributed Shuffle (dSort)
- general: use new optimized
msgp
method - !3553 - Use job instead of manager in dSort errors - !3667
- access lom size only when put has been successful - !3843
- re-load lom under lock when sending to another target - !3844
- use
cmn.Bck
instead of bare name and provider - !4066 - improve and add more logging - !4093
- correctly cleanup resources when ignoring duplicated records - !4109
- ignore targets in maintenance - !4166
Authentication & User management
- Allow changing bucket permissions for read-only buckets - !3561
- fix panic when checking restricted user's permissions - !3841
- correct checking user's role permissions - !3857
- do not cache generated tokens - !3858
- AuthN docs - !3881
- Authn docs and code review - !3892
- API to change configuration on the fly (1 of 2) - !3947
- fix returning info for a single cluster, update tests - !3987
- AuthN refactoring: move to its own package - !4171
- revise access permissions (ACL) - !3744
- Improve bucket ACL check - !3757
ETL
- use user defined ID as name - !3571
- fix minikube playground etl + cloud providers - !3565
- minor improvements and refactoring - !3671
- add tests for target down - !3674
- use a separate GCP bucket for cloud tests - !3701
- add better cloud warm/cold ETL object tests - !3703
- add large bucket test - !3679
- health observability (part one) - !3683
- run health tests - !3763
- wait longer for pods metrics - !3770
- add tests for occasionally failing transformation - !3785
- broadcast list request to all targets - !3787
- make failing ETL test less flac...
3.3
Highlights
- ETL - inline and offline dataset transformations, custom user-defined transformations via both user-provided containers and Python scripts, simplified ETL initialization, ETL directly to and from Cloud buckets;
- Multi-Cloud capability supporting co-existence and management of datasets originating from (or hosted by) different Cloud storages - !2736, !2737, !2748, !2792, !2793;
- Maintenance and decommission - the capability to put a clustered node in maintenance mode and/or safely and permanently remove it from the cluster - #947, !2935, !2957, !2983, !2990, !3094;
- Volume metadata (
VMD
) - persistent information that describes each clustered node's storage configuration (including data drives, local filesystems,mountpaths
) further used to reinforce data integrity and protection - #939, #941, !3118, !3198; - New protocol prefix
ht://
- uniform access to "vanilla" HTTP(S) based datasets - #882, #889; - Terraform integration - easy and automated deployment via Terraform - there's a separate repository (of scripts, charts, and documentation) that we use for production deployments;
- Intra-cluster communications - the transport we use to rebalance user data, transfer erasure-coded slices, copy and transform datasets - a major upgrade !2860, !2895, !2984, !3053, !3055, !3066, !3084, !3085, !3097, !3112, !3181, !3183, !3184, !3187, !3189, !3201, !3265, !3268, !3274, !3286, !3303, !3356, !3357, !3396, !3403, !3409, !3415, !3417.
And also:
- performance optimizations, CLI usability improvements, refactoring, cleanup, and stability fixes across the board.
Multi-Cloud
A new protocol prefix ht://
(in addition to s3://
, gs://
, and azure://
) for seamless integration and uniform access to "vanilla" HTTP(S) based datasets.
Multi-Cloud via a single deployed runtime. Improved access to public Cloud buckets (from different Cloud providers). Bucket copying and transformations (see ETL below) extended to supports Cloud buckets.
- New HTTP provider (
ht://
) - #882, #889 - Multi-Cloud - added runtime support for bucket management of multiple Cloud providers - !2736, !2737
- Support multiple regions for AWS buckets - #778, !2804
- Improve Google provider error handling - !2792, !2793
- Public GCP buckets can be use without setting
PROJECT_ID
- !2723 - Remove default Cloud provider option (provider no must be set explicitly) - 2748
- Support Cloud-based source/destination in a bucket copy operation - !2975
- Prefetch performance improvement: keep cached object properties longer - #969
Core
Improve cluster stability in the presence of exceptional events, optimize cluster operation under heavy workloads, introduce maintenance mode
, support permanent decommissioning
of nodes from the cluster, improve the reliability of bucket destroy
operation, optimize and further stabilize cluster rebalancing logic.
- Node
maintenance
feature - #947, !2935, !2990, !3094 - Improved out-of-space (out of capacity) handling - #822
Backend
buckets vs bucket initialization - !2841- Improve cluster stability while it is in transition (when the primary changes) - #945, #968, #960
- If cluster restarts during rebalancing we will now resume the rebalance - #913
- Optimize
copy-bucket
and other bucket-traversing workloads - #917 - Make promote consistent with other object operations - !2763, !2765
- Add transfer statistics for
resilvering
- !2926 - Configuration option
Rebalance
. Enabled now; affects only automatic rebalance (manual one can always be started - !2915 - Reduce resource usage by
StatsD
(Grafana, Graphite) client - !3240 - New CLI option
--daemon-id
to join a node with user-predefined ID - !3255 - Fix
object rename
operation to work across differentmountpaths
- !3329 - Make
destroy bucket
operation transactional - !3315 - Volume meta data (
VMD
) - persistent information about a node and its storage configuration, used on startup when running node integrity checks - #939, #941, !3118, !3198 - No
metasync
when shutting down - !2844 - Not ignoring errors when listing multiple Cloud providers - !2845
- Refactor
reb
(rebalance) package - !2857 - Refactor target handlers and fix transactions' housekeeping logic - !2869
- Refactor
copy-object
interface - !2879 - Revise and refactor
PROMOTE
(command and API) - !2880 - Refactor target
copy-object
andput-remote
interfaces - !2881 - Use data mover to copy buckets - !2893
LOM
: fixCopyObject
- !2908cmn.JoinWords
and friends - !2913- Always allow manual rebalance (even if automatic one is disabled) - !2915
Mountpath resilvering
now counts moved objects and their total size - !2926- Copy buckets to return correct total size of copied content - !2919
- Revise and optimize intra-cluster broadcasting - !2943
- Improve
HrwTargetList
performance - !2945 - Fix zero-size objects scenario - !3531
ETL
Multiple improvements and enhancements to the capability (introduced first with v3.2) to easily run user-defined custom dataset transformations - and scale the performance linearly with each added storage server. This release adds offline (dataset-to-dataset) transformation.
For ETL documentation (that now also includes animated presentations), please refer to docs/etl.md and etl/README.md
- Add offline, local and cloud, bucket transformation - !2827, !2854, !2898, !3445
- ETL for objects in the Cloud - !3399
- ETL
build
operation - easy initialization based on the function definition - !2873, !2884, !2918, !3369 - Remove
kubectl
(shell) calls, use K8sclient-go
instead - !2896, !2907 - Support retrieving ETL logs - !2947
- Stability and performance improvements, bug fixes - !2955, !2977, !3330, !3369, !3374, !3411
- Add and improve labels in Pods and Services - !3445
- Improve waiting for the Pod/Service to be ready - !3332, !3397
- Add extension, prefix, and suffix flags for offline ETL - !2846
- Support aborting offline ETL - !2850
- Add dry run option for offline ETL - !2854
- Simplify flow to initialize ETL - !2853
- Consistent naming of API constants - !2861
- ETL build: remove unnecessary annotations - !2871
- Update skeleton docker images used to run custom Python-based transforms - !2870
- Install dependencies in
initContainer
- !2873 - POD spec: add volume mount - !2883
- Unify offline ETL with
copy-bucket
- !2898, !2933 - Improve waiting for POD-ready - !2912
- Add
dry-run
capability - !2939 - K8s client: pod namespace & refactoring - !2948
- The capability to throttle ETL (transforms) depending on disk utilizations - !2998
Terraform integration
Dramatically simplified deployment of AIStore cluster on the Cloud via Terraform. This release delivers GKE but can be easily extended to support any Cloud that provides Kubernetes (service). It is now possible to start a fully functional AIStore cluster with a single command - for details, please refer to AIStore Kubernetes repository.
- Add scripts for easy deployment and shutdown of the AIStore cluster on the cloud - !16, !56-!68, #14, #17
- Add
admin
container image - !3079, !3195, !3359 - Remove requirement for
K8S_HOST_NAME
environment variable - !3451
Information Center (IC)
More reliable extended action (xaction
) status management and reporting, automatic cluster-wide xaction
abort, xaction
progress notifications (new). In AIS, xaction
is a long-lived asynchronous operation, a job.
- Notify all participating nodes when any one of them aborts
xaction
- !2928 - Improve
IC
status reporting by pollingxaction
status from targets that have not reportedxaction
status yet - !2953 - Fix
xaction
registration for newly added targets - !2924 - Support both transactional and non-transactional
xactions
- !2734 - Replace target polling with notifications when waiting for
xaction
to complete - !2868 xactions
to return user-friendly status - !2865
Downloader
Integration with IC
, more robust downloader job handling.
- Downloader naming; fix
mountpath
register/unregister - !2842 - Better job aborting; improved completion mechanisms - #902, !2960
- Progress Bar: report periodic status and stats to
IC
(see above) - !2911
Distributed Shuffle (dSort
)
Performance improvements, resource usage optimizations.
- Performance: decrease resource usage - #938
- Better data transport streams handling - #936, !3307
Erasure Coding (EC)
Resource usage optimizations, better slice checksum handling.
- Fix checksum when sending constructed slices to other targets - !3073, !3132
- Improve operation over data transport streams - #916, !3311
- Fix receiving object slices when the bucket is being destroyed - #887
- Add support for nodes in maintenance mode - !3404
Intra-cluster communications
The transport that we use to rebalance user data (e. g., when adding/removing nodes), transfer erasure-coded slices, copy and transform datasets has undergone a major upgrade:
- Add data mover layer - !2860, !2895, !2899
- Support for short messages and message streams - !2984, !3055, !3066, !3084, !3085, !3097, !3112, !3181, !3183, !3184, !3187, !3189, !3201, !3265, !3268, !3274, !3303
- Revise and optimize transport stream multiplexing - !3141
- When done transmitting, wait for data mover quiescence - !2903
- Support streaming unsized objects - objects of unknown size - the functionality in particular useful when ETL-transforming objects on the fly (that is, inline) - !3356, !3357, !3396, !3403, !3409, !3415, !3417
*...
3.2.1
- bug fixes and improvements:
- multi-cloud
- Cloud backend
- HTTP(s) datasets
- CLI usability
- AWS regions
- docs
- development playground (docker and Kubernetes)
- add offline ETL (in-dataset => out-dataset, experimental)
3.2
Highlights
- (new) ETL offload: support for running custom extract-transform-load workloads on (and by) storage cluster;
- (new) TensorFlow integration to support existing training clients that use S3 API - done via
tar2tf
ETL offload that handles on-the-fly TFRecord/tf.Example conversion;
- List objects v2: optimized
list-objects
to greatly reduce response times; - (new) Query objects: extends
list-objects
with advanced filtering capabilities; - (new) Downloader: an option to keep AIS bucket in-sync with a (downloaded) destination;
- (new) Information Center (IC), to improve visibility and manageability of the asynchronous batch operations (such as global rebalance, n-way mirroring, erasure coding, ETL, and more);
- (new) role-based authentication;
- Distributed Shuffle (dSort) - performance improvements;
- multi-checksumming, with per-dataset configurable checksum and (new) support for cryptographic checksums.
And also:
- performance optimizations, CLI usability improvements, erasure coding optimizations, automated no-downtime rebalancing for erasure-coded buckets, refactoring, cleanup, and stability fixes across the board.
Downloader
Skip already downloaded/existing objects, limit download speed, support Azure Cloud, option to synchronize Cloud into AIS bucket, numerous CLI improvements.
- New API (and CLI) option to keep Cloud bucket and AIS bucket in-sync - #760, !2322
- Throttle download - #726
- Download an entire bucket (an option that specifies a range or list of objects to download can now be omitted) - #759
- Store 3rd party Cloud metadata (version, md5) as part of the AIS object's own metadata; use Cloud metadata for multi-versioning (latest version) and data protection - #701
- Progress Bar when downloading from Cloud - #773
- Downloader to support Azure Cloud - #763
- CLI: download
prefix
-ed objects - !2204 - Fix re-downloading a cloud bucket (skip downloading when have identical local replica) - !2221, !2236
- Downloading a Cloud bucket can be now done only to an AIS bucket that has an associated cloud backend - !2241
Distributed Shuffle (dSort)
Reduce/optimize CPU and memory usage. Refactor and stabilize.
- CLI usability and improvements - #768
- Reduce memory usage - !2197
- Number of workers per mountpath to optimize disk utilization - !2263
- CLI: Add support for alternative output shard name formats - !2205
- Use MessagePack instead of JSON for intra-cluster communications - !2262
Authentication server (AuthN)
Replace old basic authentication with a role-based one. Allow a single AuthN server to manage any number of AIS clusters. Add support for both HTTP and HTTPS AIS clusters. More API endpoints require a token issued by AuthN when AuthN is enabled (before this all GET requests worked without any authentication)
- Use BuntDB to persist all authentication data (instead of previously used separate JSON files) - !2146, !2178
- Remove (obsolete) user Cloud credentials management - !2146
- Support multiple AIS clusters with automatic HTTP/HTTPS selection - !2153
- CLI: new AuthN management commands:
add
/remove
/show user
/show cluster
- !2153 - Introduce user roles (admin/cluster owner/bucket owner/read-only) - !2213
- When AuthN is deployed majority of requests to AIS cluster require to carry valid AuthN token (previously only PUT operations) - !2284
List and Query objects
Revised and fast list-objects
. Reduce memory usage. Use MessagePack. Employ bigger pages to speed up listing operations.
Experimental support for the caching - list-objects
result can now be used across multiple users/requests.
- Massive speed-up via streamable listing - #850, #856, #862, #851, !2494
list-objects
API is now always paged; remove-fast
option as obsolete - !2539- Use MessagePack for intra-cluster communications; optionally, employ MessagePack for client <=> cluster requests as well - !2568
- Additional options to control
list-objects
content:only-cached
,include-misplaced
- !2613 - Rename page marker as continuation token and fix paging the semantics accordingly - !2592
- Use bigger pages (10,000 by default) for AIS buckets; use 10K-size pages for Cloud buckets for
only-cached
option - !2645
Query objects
New API that extends list-objects
with added support for filtering and selection (a so-called inner and outer* SELECT).
- Add
init
andnext
API - #754, !2399 - Use MessagePack instead of JSON (client side) - !2672
- Add support for querying Cloud buckets - !2521
Data protection
No more hardcoded xxhash
as AIS checksum for objects: any checksum can be selected from a list that currently also includes MD5, SHA, CRC, and can be easily extended.
- Multiple per-bucket configurable checksums - #722, !2154, !2187
- SHA-256 and SHA-512 - !2190
- Self-healing: automatic restore of a corrupted object from EC slices and/or mirrored replicas - !2196
CLI
Numerous improvements and bug fixes. In particular, new command-line options, shorter commands, better readable output, improved TAB-TAB
support.
- Show target uptime in
show cluster
- #744 - PUT object from stdin
ais put object bck/obj -
- #748 s3://
andgs://
are aliases foraws://
andgcp://
- !1789- Rename
register
asjoin
(as in: join new cluster node) - !1988 TAB-TAB
and output improvements - #649, #772, !1888, !1857- User-provided checksum and end-to-end data protection - #779
- Improve
show cluster
to display a single JSON output - #810 - Add
--chunk-size
option for PUT object - !2164 - Improve
show object command
- !2185 - Add
search
command - !2400 - All
ais start xaction <name>
are nowais start <name>
- !2448 - Run LRU on a list of specified buckets - allow user to temporarily override bucket's own LRU configuration - !2493
- Improve
set props
command to show what's actually changed - !2479
Erasure Coding (EC)
- Fix sending calculated slices on PUT objects - !2419
- CLI: improve EC stats output - #823
- Improve user experience on PUT - !2366
- CLI: added options
--parity-slices
and--data-slices
forais ec-encode
command` - !2387 - Automatically enable EC when user starts erasure-coding of a given bucket (via
start xaction
orset props
CLI, for instance) - !2377
Information Center (IC)
To efficiently and optimally monitor asynchronous operations (jobs), AIStore employs what we call Information Center (IC) - a group of gateways that “own” all the currently running (as well as already finished) jobs in the cluster. Those jobs, codenamed eXtended actions, or xactions, include global rebalance, n-way mirroring, erasure coding, ETL-type distributed workload, and more. IC continuously monitors all async by coordinating with other clustered nodes.
- Cluster-wide ID for cluster-wide xactions - !2294, !2551
- Intra-cluster notifications for xactions - !2304, !2326, !2321, !2334, !2378, !2355, !2346
- 3 (three) IC members by default - !2561
- Support
list-
andquery-objects
caching - !2570 - Always keep IC members in-sync as far as currently-running and finished async ops - !2639, !2648
Extract-Transform-Load (ETL) locally
- In-cluster ETL v1.0 - #842, !2659, !2660, !2651
- Target and ETL affinity - !2451
- CLI: add support for ETL - !2453
- List all transformations - !2498
aisloader
: add support for ETL (for benchmarking) - !2573
AIS loader (aisloader
)
Support TAR generating and reading. Support ETL benchmarking via included echo
(at https://hub.docker.com/repository/docker/aistore/transformer_echo), md5
, and tar2tf
ETL containers.
- Add TAR reader - !2585
- Add support for standard
AIS_ENDPOINT
environment variable (options--port
and--ip
are still supported) - !2642
Local Playground + Kubernetes (for developers)
- Add
minikube
based Kubernetes development environment - !2456, !2558, !2508 - Enable Kubernetes-based testing on GitLab CI - !2510, !2562
- Enable Kubernetes based tests on Jenkins - !2609, !2685
Build & Release
- Scripts for automating release management; in particular, scripts to upload released AIS binaries - !2597
- An option to build
aisnode
(AIS target and AIS proxy) Alpine Linux-based minimal-footprint docker image - !2709
Miscellaneous
Make names of used environment variables consistent. Introduce $trash
directory to keep deleted buckets for a while. Safer and better node startup: assorted APIs are now accessible only after the node is up and running.
Extend Local Playground for developers: add K8s minikube .
- Rename a bunch of environment variables used by
ais
/aisloader
/cli
for consistency - !2133 - Extend create bucket API (allow setting props) - #782, !2266
- Added special
$trash
directory to put deleted buckets to it - !2351 - Add
minukube
dev deployment - !2456 - Node startup vs availability of assorted APIs - !2601, !2624
3.1.1
update docs for CLI about AuthN commands
3.1
Highlights
AIStore v3.1 is a significant upgrade with new capabilities that include:
- remote AIS clustering and unified global namespace
- Azure Cloud as the 3rd supported Cloud provider (in addition to S3 and Google)
- Amazon S3 API
And also:
- TensorFlow integration (to transparently handle TFRecord and tf.Example formats)
- performance optimizations
- CLI usability improvements
- erasure coding optimizations
- automated no-downtime rebalancing for erasure-coded buckets
- refactoring, cleanup, and stability fixes across the board
Core
- remote AIS clustering, unified global namespace: #602, #667, !1937, !1954, !1958, !1959, !1963, !1964, !1965, !1966
- Azure Cloud: !1856
- Amazon S3 API: #690, #691
- TensorFlow integration: #642, !2099
- evict range, delete range, and prefetch range operations are now asynchronous: #641, !1778, !1785
- cluster startup stability fixes and improvements: #707, !2084, !2047
- new environment variable
AIS_PRIMARY_ID
: #706, !2033 - EC rebalance speedup and improvements: #558, #670, !1765
- return 503 (Service Unavailable) when a node is starting up but not ready yet: !2020
- return 403 (Forbidden) when operation on object, bucket, or cluster is not permitted: !2121
- new bucket property
creation_date
: !2010 - new bucket property
backend_bck
for AIS bucket connected to a Cloud one - it contains a name of a parent cloud bucket: !2096 - control-plane cluster-wide 2PC transactions to create, rename, destroy buckets, change bucket properties, etc.: !1852, !1862, !1876, !1844, !1825
- new config option to avoid starting global rebalance at cluster startup (
rebalance.dont_run_time
): !2048 - improved HTTPS support by all AIS built-in clients and components: !2106
- new and extended bucket access permissions: !2121
Config
- new EC rebalance tunable
batch_size
: !1922 - move client-related timeouts to a separate config section (
client
): !1901
Downloader
- improve object downloading (retrying and checking for existence): !2024, !2026
- fix downloading timeout issue for big objects: !2057
- improve/extend CLI job info (error list, ETA, progress): #725, !2069, !2061, !2062
- new CLI option to limit concurrency while downloading: !2088
- download list of objects from GCP: !2114
- support HTTPS links on the clients' side: !2119
CLI
- remote AIS cluster support: #683
- remove
--provider
flag in favor ofprovider://bucket_name
syntax: !1763 - simplify
ls
command by moving subcommands toshow
command: !1786 - new command
wait
to wait for xaction/dSort job/download job finishes: #645 - new command
cat
to show object's content: #646 - new commands
attach remote
anddetach remote
(cluster): !1968 - new commands
attach mountpath
anddetach mountpath
: !1986 - new command
set primary
: !2053 - rename
compose
command asconcat
: !1745 - add
--dry-run
flag forput
,evict
,delete
, andprefetch
commands: #636, !1828 - make
ais put
more intuitive when generating object names from file paths: #640 - ranged prefetch/evict/delete operation uses the same pattern rules as dSort and downloader: !1793
- add bucket namespaces: #602, !1943
- command and flags renamings and regrouping,
TAB-TAB
completion improvements: #649, !1745, !1786, !1763, !1818, !1988, !2006 - fix various panics when processing TAB completions: !1923
AIS FS
- fix object listing (
ls
) for large buckets: #644
Rebalance
- multiple fixes, improvements
Documentation
- revise/extend AIStore Authentication Server (AuthN)
- add numerous CLI usage examples
- extend and revise Downloader sections
- document CLI to attach, detach and show remote clusters
- revise sections describing cloud providers; add Azure
- rewrite AIStore overview
- cluster rebalance: update docs and CLI
AuthN
- to support Kubernetes secrets, read security settings from an environment variable: !2130
3.0
Highlights
- new on-disk layout optimized for per-bucket management policies, namespace partitioning, and cloud provider isolation
- in addition to checksum, all metadata is now versioned to support backward compatibility when (and if) there are any future changes
- global (cluster-wide) control structures - cluster map and bucket metadata - are now uniformly GUID-protected and compressed
- bucket metadata, in particular, exists in multiple protected copies on data drives of all storage targets
- added AIS as the 3rd fully supported Cloud Provider (in addition to Amazon S3 and Google Cloud)
- global (cluster-wide) rebalancing:
- improved, optimized, and enhanced rebalancing logic
- revised to run stage by (enumerated) stage whereby the stages get synchronized across all targets
- added support for erasure-coded buckets
- stabilized long-running operation in the presence of network failures, drive faults, cluster partitioning, administrative restarts
- will retransmit any migrating object (or EC slice of an object) that didn't get acknowledged
- resilvering: support erasure-coded buckets
- CLI: usability improvements, APPEND, dSort configuration
- AIS FS: namespace caching, config reload/refresh at runtime
Core
- new on-disk layout (#580, #578, #594)
- LOM on-disk (#604)
- bucket groups and namespaces (!1616, !1608, !1607, !1598, !1597, !1593)
- AIS cluster to cluster connectivity, AIS as a new Cloud Provider (#584)
- Smap and BMD cluster-wide consistency (#542, !1159, !1154, !1549)
- rebalance erasure coded buckets (#577, !1651)
- erasure coding: improve and optimize on-disk metadata representation (!1468)
- configuration changes: versioning (!1461), Cloud Provider (!1572, !1594)
- reliable register (join)/unregister node (!1648)
- improve AWS versioning support (!1471)
- better and more reliable out-of-space handling (!1696)
- memory management and Slab allocation; small-size allocator and its usage for LOM (!1685)
- rebalance/intra-cluster transport: optimize-out heap allocations (!1650)
CLI
- usability improvements (!1163)
- new bucket summary (!1505)
- APPEND API (#612, !1701)
- allow to override dSort configuration (!1692)
- revise
show xaction
(!1704)
AIS FS: FUSE-based mountable filesystem to access objects as files
- directory caching to optimize POSIX lookups (#563, #566, !1469)
- config reload/refresh without unmounting (#568)
Rebalance
- ACK and retransmit (#583)
- support containerized deployments (#570)
- recommence interrupted rebalance upon startup (!1661)
API/SDK
- HEAD object request will now return erasure coding info as well (!1550)
- fast bucket list (ls) now supports paging (!1475)
- (bucket, provider, namespace) triplet structure used across numerous API calls (!1598, !1608, !1616)
Development
make
and build: enhancements and improvements to consolidate most of (and most often used) build, run, and test operations (#564, !1466, !1483, !1498, !1512)- add support for Darwin (OSX/Mac) (!1526)
Kubernetes; containerized deployments
- revise node labeling; fix
aisnode
container start script (!1725) - demo infrastructure for GTC; assorted fixes (!1699)
single-node-aistore
: docker image for easy and fast turn-key single-host deployments
Documentation
- on-disk layout
- multiple corrections and additions