All notable changes to this project will be documented in this file.
The format is based on Keep a Changelog, and this project adheres to Semantic Versioning.
1.0.0 - 2024-10-14
No significant changes.
0.43.0 - 2024-09-12
- A new service called
ProfilingService
in charge of propagating profiling- For now, only register and propagates profiling steps
- Introduce a new asset type,
ASSET_PROFILING_STEP
(#407)
DisableOutput
does not panic if there is not output assets associated with an output (#438)
0.42.0 - 2024-06-13
- Rename
ComputePlanDBAL().IsPlanRunning(key)
toComputePlanDBAL().ArePlanTaksRunning(key)
(#432)
- Compute plans are identified as not running if a task failed or is canceled, even if some tasks are still in doing. (#427)
IsPlanRunning
checks if a cancel of failure date is set before checking the tasks status. (#432)
0.41.0 - 2024-06-03
No significant changes.
0.40.0 - 2024-03-27
-
- BREAKING: remove
type
fromdatamanager
table (#394)
- BREAKING: remove
-
- [chore]
towncrier
is now used for changelog management (#395)
- [chore]
0.39.0 - 2024-03-07
- Compute task status
DOING
is renamedEXECUTING
(#371)
0.38.0 - 2024-02-26
- BREAKING: remove all code related to the
distributed
mode, and mentions in schemas and documentation (#341) - BREAKING:
distributed
Skaffold profile and mentions in doc (#319) - BREAKING:
chaincode-init
andchaincode
Dockerfiles (#319) - Flag & environment variables to choose between
standalone
anddistributed
mode (#347)
- Add
Image
addressable toFunction
object. By default,Image
is set to an empty string adressable. This addressable is updated when thechecksum
andstorageAddress
are available (#288) - Enum
FailedAssetKind
(#277) - BREAKING: Field
asset_type
of typeFailedAssetKind
inFailureReport
(#277) - BREAKING: Add
FunctionStatus
(#263) - Add Function status event machine (#263)
- BREAKING: Add statuses
WAITING_FOR_BUILDER_SLOT
andBUILDING
on tasks to reflect associated function status (#366) - Add task actions
BUILD_STARTED
andBUILD_FINISHED
to propagate status change from function to compute task (#366)
- Rename
Function
addressable toArchive
(#288) - Renamed
compute_task_key
byasset_key
inFailureReport
(#277) FailureReport
now can be reference aComputeTask
or aFunction
throughasset_key
+asset_type
(#277)- Logic to determine new compute task status takes in account the status of the function. A new task can now be created with the status
FAILED
orCANCELLED
(if the function reached the corresponding status) (#365) - BREAKING: Transition to status
TODO
for a given compute task is done after the function is built(#365) - BREAKING: Rename
TODO
toWAITING_FOR_EXECUTOR_SLOT
andWAITING
toWAITING_FOR_PARENT_TASKS
(#366)
- incorrect link in documentation (#307)
- Skip compute ask permissions checks when the action is propagated from the function status change (#367)
0.37.0 - 2023-10-18
- Source code formatting (#313)
- Bump go version to 1.21 (#316)
0.36.1 - 2023-10-06
three_orgs
Skaffold profile for standalone orchestrator (#280)
0.36.0 - 2023-09-07
- Replace deprecated ioutil package with os (#269)
0.35.2 - 2023-07-25
- Minor dependency updates. See commit history for more details.
0.35.1 - 2023-06-27
- Minor dependency updates. See commit history for more details.
0.35.0 - 2023-06-12
- No changes on the app
0.34.0 - 2023-05-11
- A Performance in now unique regarding a compute task key, a metric key and a compute task output identifier (#197)
- Metric from Performance (#213)
0.33.0 - 2023-03-31
- BREAKING: rename Algo to Function (#139)
0.32.0 - 2023-01-31
- Context in fsm calls (#127)
- Contributing, contributors & code of conduct files (#123)
- Test Only field for data samples (#116)
0.31.1 - 2023-01-09
- bump app version to
0.31.0
- end-2-end postgres dump upload (server version)
- (BREAKING) task category
- (BREAKING) task specific data
- TASK_UNKNOWN is a valid category
- Allow registration of performances on any task category
- Update the TLS certificates (#91)
- allow setting gRPC keepalive enforcement policy
- (BREAKING)
delete_intermediary_models
property inComputePlan
andNewComputePlan
- (BREAKING) ModelCategory and associated Model & NewModel fields
- (BREAKING) AlgoCategory and associated Algo and NewAlgo fields
- failure reports: Build errors now have a logs address
- (BREAKING): Replaced
algo
byalgo_key
in ComputeTask
- 000042_reference_task_outputs use
compute_task_key
instead ofasset_key
forassetKey
WARNING: Some migrations in this version are destructive once applied you will not be able to restore algo categories.
- (BREAKING) Algo.category: do not rely on categories anymore, all algo categories will be returned as UNKNOWN
- NewAlgo.category: No category is expected
- (BREAKING): Replaced
algo
byalgo_key
in ComputeTask
- Task outputs events come after their referenced asset creation event
- Worker field on NewAggregateTrainTaskData, use NewComputeTask.Worker field instead
- DisableModel rpc in Model service, use DisableOutput in ComputeTask service instead
- CanDisableModel rpc in Model service.
failure_date
field toComputePlan
protocol buffer schemaIsComputePlanRunning
gRPC method inComputePlanService
- (BREAKING) compute plan status
- Worker field on NewComputeTask mandatory for tasks without input data
- NewAggregateTrainTaskData.worker: use NewComputeTask.Worker field instead
NewComputeTask.parent_task_keys
which was deprecated since 0.26.0- Restriction on algo-task category matching
- Restriction on parent task category
- Images are built using
protoc-gen-go
v1.18.1,grpc_health_probe
v0.4.12 andmigrate
v1.28.1
- Disable output RPC on distributed mode
- (BREAKING) ModelService.GetComputeTaskInputModels, use ComputeTaskAPI.GetInputAssets instead
- Test task rank special case: rank is not inherited from parent task anymore
- (BREAKING) QueryModels from model service: it was unused and model category will soon be deprecated
NewComputeTask.parent_task_keys
is deprecated since parent tasks are determined from task inputs
- New RPC to disable task outputs
- ModelService.GetComputeTaskInputModels, use ComputeTaskAPI.GetInputAssets instead
- Properly register compute task outputs in distributed mode
- (BREAKING) Remove RabbitMQ
- New service methods to update algo, compute_plan and data manager name
- gRPC method to get task input assets
- Prevent duplicate model registration based on task output definition
- Switched to zerolog logging library
- Build images with go 1.19
- Add a
Transient
field to the task inputs - Return an error in distributed mode if a stored event has invalid event or asset kind
- Associate asset with task output on registration
- Task counts by status from ComputePlan responses
- Introduce gRPC SubscribeToEvents method in distributed mode
- Validate task inputs
- In standalone mode, lock the
events
table when inserting events to prevent missing events inSubscribeToEvents
gRPC stream
- Category filter from QueryAlgos rpc
- Legacy compute task permission fields
- Introduce gRPC SubscribeToEvents method in standalone mode
- Dispatch updated asset event on ComputePlan cancellation
- Automatic transition to DONE when registering models or performances.
- updated grpc healthprobe to 0.4.11 in server image
- updated rabbitmq/amqp091-go lib to 1.4.0
- properly ignore mocks when building image locally
- SQL query for organization with null address
- Organization hostname in the organization object
- CancelationDate in the compute plan object
- SQL logging was enabled when
METRICS_ENABLED
flag was passed instead of documentedLOG_SQL_VERBOSE
- Prevent disabling model if task has only predict or test children
- Don't timeout when canceling a compute plan
- Metadata set in events
- (BREAKING) Removed the
MetricKeys
property of test tasks in favor of the genericAlgo
field
- Enable transition to DONE through ApplyTaskAction
- (BREAKING) rename node to organization
- allow a worker to cancel a task it does not own
- Introduce Predict task type
- Introduce compute task outputs
- use go test to run e2e tests
- Introduce empty compute plan status
- base docker image from alpine 3.15 to alpine 3.16
- event asset migration
- only update status on task update.
conn busy
error when querying Tasks
conn busy
error when querying Algos
- In standalone mode, truncate TimeService time to microsecond resolution to match PostgreSQL timestamp resolution.
- Disable CGO.
- More validation of Algo inputs (data managers / data samples)
- Introduce compute task inputs existing tasks won't have any inputs
- Embed historical assets in the event messages.
- New mandatory name field to compute plan
- Remove event column
- Add a new
ALGO_PREDICT
algo category
- Validate algo inputs and outputs
- Remove model asset column
- Remove performance asset column
- Remove datamanager asset column
- Remove datasample asset column
- Algos now have Inputs and Outputs
- The orchestrator-server doesn't run DB migrations on startup anymore
ASSET_METRIC
kind
- Build with go 1.18
- Update failure report asset column migration to prevent null value error when migrating a populated database
- Parent tasks keys format validation
- Order parent tasks keys by task position
- Added ALGO_METRICS Algo category
- Remove compute task asset column
- QueryAlgos filter "Category" is now "Categories"
- Remove failure report asset column
- Metrics gRPC routes. Use Algo gRPC routes and ALGO_METRICS category instead.
- Allow querying datasamples by keys
- Remove node asset column
- Remove algo asset column
- Remove compute plan asset column
- Do not panic on nil filter
- Expose gRPC metrics
- Expose database transaction and events metrics
- Expose task metrics
- Log SQL errors regardless of log level
- Remove
asset
column ofnodes
table
- Publish events sequentially, preserving the order
- add support for graceful shutdown on
SIGTERM
signal
- removed codegen layer and implicit protojson serialization
- Cancel all tasks when cancelling a compute plan
- Check for compute plan existence on task registration
- Disallow registration of tasks on a compute plan you don't own
- add
Start
andEnd
timestamp filters forEventQueryFilter
- support composite tasks with two composite parents
- Add migration logs
- add owner field to failure report asset
- Add a new endpoint to register multiple models at the same time
- return
datasamples
list inRegisterDataSamplesResponse
- return
tasks
list inRegisterTasksResponse
- store the error type of a failed compute task in a failure report instead of an event
- improve performance of
compute_tasks
SQL indexes by using dedicated columns instead of JSONB - improve performance of compute plan queries by leveraging a specific index for status count
- isolation level of read-only queries in standalone mode is now READ COMMITTED
- improve performance of model SQL indexes by using dedicated columns instead of JSONB
- set the correct name of the
RegisterFailureReport
service method used in distributed mode - Return the correct models in
GetComputeTaskInputModels
for composite tasks - timestamp comparison when performing event sorting and filtering in PostgreSQL
- ComputePlan query now uses correct SQL indexes
- Incorrect sort order when checking parent task compatibility
RegisterModel
gRPC method
- add a
logs_permission
field to the Dataset asset - add a
GetDataSample
method to the DataSample service
- add filter for compute plan query
- chaincode now properly propagate request ID in every logs
- log events as JSON
- add FailureReport asset to store compute task failure information
- sort queried events
- expose basic metrics from server, chaincode and forwarder behind
METRICS_ENABLED
feature flag - filter queried events on metadata
- (BREAKING) Replace objective by metric
- (BREAKING) Multiple metrics and performances per test task
- fail gRPC healthcheck and stop serving on message broker disconnection
- Get task counts grouped by status when querying compute plans
- Events queried from the gRPC API now have their channel properly set
- Leverage asset_key index when querying events
- Stable sorting of tasks
- Expose the orchestrator version and chaincode version
- Expose worker in task event metadata
- Assets expose a creation date
- Query algo by compute plan
- Handle event backlog
- Retry on fabric timeout
- Add request ID to log context
- Do not retry on assets out of sync
- Do not compute plan status on model deletion
- Reuse gateway connection in distributed mode
- Replace readinessProbe by startupProbe
- Do not cascade canceled status
- Properly retry on postgres' serialization error
- Filtering events by asset in distributed mode
- Input models for composite child of aggregate
- Automatic generation of graphviz documentation from *.proto file definition
- asset management
- asset event dispatch
- standalone database (postgresql) support
- distributed ledger (hyperledger-fabric) support