forked from apache/mesos
-
Notifications
You must be signed in to change notification settings - Fork 0
/
CHANGELOG
7817 lines (7104 loc) · 487 KB
/
CHANGELOG
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
48
49
50
51
52
53
54
55
56
57
58
59
60
61
62
63
64
65
66
67
68
69
70
71
72
73
74
75
76
77
78
79
80
81
82
83
84
85
86
87
88
89
90
91
92
93
94
95
96
97
98
99
100
101
102
103
104
105
106
107
108
109
110
111
112
113
114
115
116
117
118
119
120
121
122
123
124
125
126
127
128
129
130
131
132
133
134
135
136
137
138
139
140
141
142
143
144
145
146
147
148
149
150
151
152
153
154
155
156
157
158
159
160
161
162
163
164
165
166
167
168
169
170
171
172
173
174
175
176
177
178
179
180
181
182
183
184
185
186
187
188
189
190
191
192
193
194
195
196
197
198
199
200
201
202
203
204
205
206
207
208
209
210
211
212
213
214
215
216
217
218
219
220
221
222
223
224
225
226
227
228
229
230
231
232
233
234
235
236
237
238
239
240
241
242
243
244
245
246
247
248
249
250
251
252
253
254
255
256
257
258
259
260
261
262
263
264
265
266
267
268
269
270
271
272
273
274
275
276
277
278
279
280
281
282
283
284
285
286
287
288
289
290
291
292
293
294
295
296
297
298
299
300
301
302
303
304
305
306
307
308
309
310
311
312
313
314
315
316
317
318
319
320
321
322
323
324
325
326
327
328
329
330
331
332
333
334
335
336
337
338
339
340
341
342
343
344
345
346
347
348
349
350
351
352
353
354
355
356
357
358
359
360
361
362
363
364
365
366
367
368
369
370
371
372
373
374
375
376
377
378
379
380
381
382
383
384
385
386
387
388
389
390
391
392
393
394
395
396
397
398
399
400
401
402
403
404
405
406
407
408
409
410
411
412
413
414
415
416
417
418
419
420
421
422
423
424
425
426
427
428
429
430
431
432
433
434
435
436
437
438
439
440
441
442
443
444
445
446
447
448
449
450
451
452
453
454
455
456
457
458
459
460
461
462
463
464
465
466
467
468
469
470
471
472
473
474
475
476
477
478
479
480
481
482
483
484
485
486
487
488
489
490
491
492
493
494
495
496
497
498
499
500
501
502
503
504
505
506
507
508
509
510
511
512
513
514
515
516
517
518
519
520
521
522
523
524
525
526
527
528
529
530
531
532
533
534
535
536
537
538
539
540
541
542
543
544
545
546
547
548
549
550
551
552
553
554
555
556
557
558
559
560
561
562
563
564
565
566
567
568
569
570
571
572
573
574
575
576
577
578
579
580
581
582
583
584
585
586
587
588
589
590
591
592
593
594
595
596
597
598
599
600
601
602
603
604
605
606
607
608
609
610
611
612
613
614
615
616
617
618
619
620
621
622
623
624
625
626
627
628
629
630
631
632
633
634
635
636
637
638
639
640
641
642
643
644
645
646
647
648
649
650
651
652
653
654
655
656
657
658
659
660
661
662
663
664
665
666
667
668
669
670
671
672
673
674
675
676
677
678
679
680
681
682
683
684
685
686
687
688
689
690
691
692
693
694
695
696
697
698
699
700
701
702
703
704
705
706
707
708
709
710
711
712
713
714
715
716
717
718
719
720
721
722
723
724
725
726
727
728
729
730
731
732
733
734
735
736
737
738
739
740
741
742
743
744
745
746
747
748
749
750
751
752
753
754
755
756
757
758
759
760
761
762
763
764
765
766
767
768
769
770
771
772
773
774
775
776
777
778
779
780
781
782
783
784
785
786
787
788
789
790
791
792
793
794
795
796
797
798
799
800
801
802
803
804
805
806
807
808
809
810
811
812
813
814
815
816
817
818
819
820
821
822
823
824
825
826
827
828
829
830
831
832
833
834
835
836
837
838
839
840
841
842
843
844
845
846
847
848
849
850
851
852
853
854
855
856
857
858
859
860
861
862
863
864
865
866
867
868
869
870
871
872
873
874
875
876
877
878
879
880
881
882
883
884
885
886
887
888
889
890
891
892
893
894
895
896
897
898
899
900
901
902
903
904
905
906
907
908
909
910
911
912
913
914
915
916
917
918
919
920
921
922
923
924
925
926
927
928
929
930
931
932
933
934
935
936
937
938
939
940
941
942
943
944
945
946
947
948
949
950
951
952
953
954
955
956
957
958
959
960
961
962
963
964
965
966
967
968
969
970
971
972
973
974
975
976
977
978
979
980
981
982
983
984
985
986
987
988
989
990
991
992
993
994
995
996
997
998
999
1000
Release Notes - Mesos - Version 1.9.0 (WIP)
-------------------------------------------
This release contains the following highlights:
* Security
* A new libprocess flag `--hostname_validation_scheme` has been added.
This allows users to enable a new RFC 6125-compliant hostname verification
scheme based on primitives provided by OpenSSL. This will also improve
performance by getting rid of all reverse DNS lookups. (MESOS-9784)
* The use of anonymous cipher suites is now disallowed when TLS certificate
verification is enabled. (MESOS-9810)
* Containerization:
* [MESOS-9760] - A new `--docker_ignore_runtime` flag has been
added. This causes the agent to ignore any runtime configuration
present in Docker images.
* [MESOS-9770] - Add no-new-privileges isolator. An additional
Linux isolator has been added to support enabling the no_new_privs
process control flag.
* [MESOS-9771] - The Mesos containerizer now masks sensitive paths
in `/proc` for containers that do not share the host's PID namespace.
Additional API Changes:
* Mesos components will now forego TLS certificate validation for incoming
connections, unless `LIBPROCESS_SSL_REQUIRE_CERT` is set to true.
* The `Socket::connect(const Address&)` member function will now abort the
program when called on a `LibeventSSLSocket`. Instead, the new overload
`Socket::connect(const Address&, const TLSClientConfig&)` must be used.
NOTE: This new overload is only available when libprocess is compiled
with `--enable-ssl`.
Release Notes - Mesos - Version 1.8.2 (WIP)
-------------------------------------------
* This is a bug fix release.
** Bug
* [MESOS-9785] - Frameworks recovered from reregistered agents are not reported to master `/api/v1` subscribers.
Release Notes - Mesos - Version 1.8.1
-------------------------------------
* This is a bug fix release.
** Bug
* [MESOS-9395] - Check failure on `StorageLocalResourceProviderProcess::applyCreateDisk`.
* [MESOS-9616] - `Filters.refuse_seconds` declines resources not in offers.
* [MESOS-9730] - Executors cannot reconnect with agents using TLS1.3
* [MESOS-9750] - Agent V1 GET_STATE response may report a complete executor's tasks as non-terminal after a graceful agent shutdown.
* [MESOS-9766] - /__processes__ endpoint can hang.
* [MESOS-9779] - `UPDATE_RESOURCE_PROVIDER_CONFIG` agent call returns 404 ambiguously.
* [MESOS-9782] - Random sorter fails to clear removed clients.
* [MESOS-9786] - Race between two REMOVE_QUOTA calls crashes the master.
* [MESOS-9803] - Memory leak caused by an infinite chain of futures in `UriDiskProfileAdaptor`.
* [MESOS-9831] - Master should not report disconnected resource providers.
* [MESOS-9852] - Slow memory growth in master due to deferred deletion of offer filters and timers.
* [MESOS-9856] - REVIVE call with specified role(s) clears filters for all roles of a framework.
* [MESOS-9870] - Simultaneous adding/removal of a role from framework's roles and its suppressed roles crashes the master.
** Improvement
* [MESOS-9695] - Remove the duplicate pid check in Docker containerizer
* [MESOS-9759] - Log required quota headroom and available quota headroom in the allocator.
* [MESOS-9787] - Log slow SSL (TLS) peer reverse DNS lookup.
Release Notes - Mesos - Version 1.8.0
-------------------------------------
This release contains the following highlights:
* Performance Improvements:
* Frameworks can now specify the minimum resource quantities needed
in an offer, which acts as an override of the global
`--min_allocatable_resources` master flag. Updating schedulers to
specify this field improves multi-scheduler scalability as it
reduces the amount of offers declined from having insufficient
resource quantities. Note that this feature currently requires that
the scheduler re-subscribes each time it wants to mutate the
minimum resource quantity offer filter information, see MESOS-7258.
* The batching mechanism used for requests to the master's `/state`
endpoint was extending to other read-only master endpoints like
`/state-summary`, `/frameworks`, `/roles`, etc. (see MESOS-9158)
In addition, responses for multiple concurrent requests to read-only master
endpoints are now only computed once in cases where it can be guaranteed
that all responses would be equal. (see MESOS-9224)
This should significantly increase master responsiveness under
heavy load.
* Allocator cycle time is significantly decreased (around 40% for a
small size cluster and up to 70% for larger clusters) when quota is
used. This greatly narrows the allocator performance gap between
quota and non-quota usage scenarios.
* CLI
* The new Mesos CLI now offers the task subcommand. The first
command, attach, allows you to attach your terminal to a running
task launched with a tty. The second command, exec, launches a
new nested container inside a running task. To build the CLI,
use the flag `--enable-new-cli` with Autotools and
`-DENABLE_NEW_CLI=1` with CMake on MacOS or Linux.
* Operation Feedback:
* V1 schedulers can now receive operation feedback for operations on agent
default resources, i.e. normal cpu, memory, and disk. This means that the
v1 scheduler API's operation feedback feature can now be used for all
non-task-launch operations (any offer operations except for LAUNCH and
LAUNCH_GROUP) on any type of resources.
* The experimental operation feedback API for v1 schedulers made a breaking
change: the RECONCILE_OPERATIONS call no longer returns a 200 OK response
with a body containing the full reconciliation results. Instead, a
successful request now returns 202 Accepted, and a series of operation
status updates are sent on the scheduler's event stream to satisfy the
reconciliation request. This is similar to the way in which the master
replies to requests for task status reconciliation.
* Containerization:
* [MESOS-9029] - New `linux/seccomp` isolator: Containers launched
by Mesos containerizer can be sandboxed by enabling filtering of
system calls using a configurable policy.
* [MESOS-9675] - Support pulling docker images with docker manifest
V2 Schema2 on Mesos Containerizer.
* [MESOS-9133] - Support custom port range option to the `network/ports`
isolator. Added the `--container_ports_isolated_range` flag to the
`network/ports` isolator. This allows the operator to specify a custom
port range to be protected by the isolator.
* [MESOS-5158] - Support XFS quota for persistent volumes. Added
persistent volume support to the `disk/xfs` isolator.
* [MESOS-9009] - Support an option to create non-existing host
paths for host path volume in Mesos Containerizer. Added a new
agent flag `--host_path_volume_force_creation` for the
`volume/host_path` isolator.
* Container Storage Interface (CSI):
* **Experimental** Supported the new CSI v1 API. Operators can deploy
plugins that are compatible to either CSI v0 or v1 to create persistent
volumes through storage local resource providers, and Mesos will
automatically detect which CSI versions are supported by the plugins.
Additional API Changes:
* [MESOS-9540] - Improved the experimental `DESTROY_DISK` operations so
frameworks can now deprovision any unwanted pre-provisioned CSI volume
directly, if they are authorized to perform `DESTROY_RAW_DISK` actions.
Unresolved Critical Issues:
* [MESOS-9697] - Release RPMs are not uploaded to bintray
* [MESOS-9672] - Docker containerizer should ignore pids of executors that do not pass the connection check.
* [MESOS-9654] - `PUBLISH_RESOURCES` should fail if the resource version changes.
* [MESOS-9616] - `Filters.refuse_seconds` declines resources not in offers.
* [MESOS-9609] - Master check failure when marking agent unreachable
* [MESOS-9579] - ExecutorHttpApiTest.HeartbeatCalls is flaky.
* [MESOS-9560] - ContentType/AgentAPITest.MarkResourceProviderGone/1 is flaky
* [MESOS-9536] - Nested container launched with non-root user may not be able to write to its sandbox via the environment variable
* [MESOS-9520] - IOTest.Read hangs on Windows
* [MESOS-9500] - spark submit with docker image on mesos cluster fails.
* [MESOS-9426] - ZK master detection can become forever pending.
* [MESOS-9393] - Fetcher crashes extracting archives with non-ASCII filenames.
* [MESOS-9365] - Windows - GET_CONTAINERS API call causes the Mesos agent to fail
* [MESOS-9355] - Persistence volume does not unmount correctly with wrong artifact URI
* [MESOS-9352] - Data in persistent volume deleted accidentally when using Docker container and Persistent volume
* [MESOS-9306] - Mesos containerizer can get stuck during cgroup cleanup
* [MESOS-9180] - tasks get stuck in TASK_KILLING on the default executor
* [MESOS-9053] - Network ports isolator can falsely trigger while destroying containers.
* [MESOS-9006] - The agent's GET_AGENT leaks resource information when using authorization
* [MESOS-8946] - CURL 7.58 causes Mesos to fail decoding raw responses.
* [MESOS-8840] - `cpu.cfs_quota_us` may be accidentally set for command task using docker during agent recovery.
* [MESOS-8803] - Libprocess deadlocks in a test.
* [MESOS-8769] - Agent crashes when CNI config not defined
* [MESOS-8679] - If the first KILL stuck in the default executor, all other KILLs will be ignored.
* [MESOS-8608] - RmdirContinueOnErrorTest.RemoveWithContinueOnError fails.
* [MESOS-8257] - Unified Containerizer "leaks" a target container mount path to the host FS when the target resolves to an absolute path
* [MESOS-8256] - Libprocess can silently deadlock due to worker thread exhaustion.
* [MESOS-8096] - Enqueueing events in MockHTTPScheduler can lead to segfaults.
* [MESOS-8038] - Launching GPU task sporadically fails.
* [MESOS-7971] - PersistentVolumeEndpointsTest.EndpointCreateThenOfferRemove test is flaky
* [MESOS-7911] - Non-checkpointing framework's tasks should not be marked LOST when agent disconnects.
* [MESOS-7748] - Slow subscribers of streaming APIs can lead to Mesos OOMing.
* [MESOS-7721] - Master's agent removal rate limit also applies to agent unreachability.
* [MESOS-7566] - Master crash due to failed check in DRFSorter::remove
* [MESOS-7386] - Executor not cleaning up existing running docker containers if external logrotate/logger processes die/killed
* [MESOS-5989] - Libevent SSL Socket downgrade code accesses uninitialized memory / assumes single peek is sufficient.
* [MESOS-5754] - CommandInfo.user not honored in docker containerizer
* [MESOS-2842] - Master crashes when framework changes principal on re-registration
All Resolved Issues:
** Bug
* [MESOS-5048] - MesosContainerizerSlaveRecoveryTest.ResourceStatistics is flaky
* [MESOS-5189] - SSLTest.ProtocolMismatch is slow
* [MESOS-6874] - Agent silently ignores FS isolation when protobuf is malformed
* [MESOS-6949] - SchedulerTest.MasterFailover is flaky
* [MESOS-6990] - PartitionTest.TaskCompletedOnPartitionedAgent is flaky.
* [MESOS-7042] - Send SIGKILL after SIGTERM to IOSwitchboard after container termination.
* [MESOS-7076] - libprocess tests fail when using libevent 2.1.8
* [MESOS-7474] - Mesos fetcher cache doesn't retry when missed.
* [MESOS-7564] - Introduce a heartbeat mechanism for v1 HTTP executor <-> agent communication.
* [MESOS-7883] - Quota heuristic check not accounting for mount volumes
* [MESOS-8156] - Add a socketpair helper to the stout net API
* [MESOS-8343] - SchedulerHttpApiTest.UpdatePidToHttpScheduler is flaky.
* [MESOS-8467] - Destroyed executors might be used after `Slave::publishResource()`.
* [MESOS-8470] - CHECK failure in DRFSorter due to invalid framework id.
* [MESOS-8545] - AgentAPIStreamingTest.AttachInputToNestedContainerSession is flaky.
* [MESOS-8547] - Mount devpts with compatible defaults.
* [MESOS-8568] - Command checks should always call `WAIT_NESTED_CONTAINER` before `REMOVE_NESTED_CONTAINER`
* [MESOS-8782] - Transition operations to OPERATION_GONE_BY_OPERATOR when marking an agent gone.
* [MESOS-8783] - Transition pending operations to OPERATION_UNREACHABLE when an agent is removed.
* [MESOS-8797] - Check failed in the default executor while running `MesosContainerizer/DefaultExecutorTest.TaskUsesExecutor/0` test.
* [MESOS-8835] - mesos-tests takes a long time to execute no tests
* [MESOS-8872] - OperationReconciliationTest.AgentPendingOperationAfterMasterFailover is flaky.
* [MESOS-8887] - Unreachable tasks are not GC'ed when unreachable agent is GC'ed.
* [MESOS-8907] - Docker image fetcher fails with HTTP/2.
* [MESOS-8978] - Command executor calling setsid breaks the tty support.
* [MESOS-9056] - mesos-style.py messaging is poor
* [MESOS-9074] - Pylint is too noisy when using mesos-style.py
* [MESOS-9079] - Test MasterTestPrePostReservationRefinement.LaunchGroup is flaky.
* [MESOS-9089] - Test `PartitionTest.PartitionAwareTaskCompletedOnPartitionedAgent` is flaky.
* [MESOS-9112] - mesos-style reports violations on a clean checkout
* [MESOS-9124] - Agent reconfiguration can cause master to REVIVE on scheduler's behalf
* [MESOS-9130] - Test `StorageLocalResourceProviderTest.ROOT_ContainerTerminationMetric` is flaky.
* [MESOS-9131] - Health checks launching nested containers while a container is being destroyed lead to unkillable tasks.
* [MESOS-9143] - MasterQuotaTest.RemoveSingleQuota is flaky.
* [MESOS-9168] - Libprocess' http client does not encode the outgoing query.
* [MESOS-9172] - Fetcher deadlock with duplicated URIs.
* [MESOS-9179] - ./support/python3/mesos-gtest-runner.py --help crashes
* [MESOS-9186] - Failed to build Mesos with Python 3.7 and new CLI enabled
* [MESOS-9187] - Add allocator benchmark to allow multiple framework/agent profiles.
* [MESOS-9190] - Test `StorageLocalResourceProviderTest.ROOT_CreateDestroyDiskRecovery` is flaky.
* [MESOS-9193] - Mesos build fail with Clang 3.5.
* [MESOS-9210] - Mesos v1 scheduler library does not properly handle SUBSCRIBE retries
* [MESOS-9212] - Disable SIGCHLD handling in libev.
* [MESOS-9214] - Stout.FsTest.Used fails on macOS
* [MESOS-9217] - LongLivedDefaultExecutorRestart is flaky.
* [MESOS-9222] - Linking libevent should be avoided.
* [MESOS-9225] - Github's mesos/modules does not build.
* [MESOS-9228] - SLRP does not clean up plugin containers after it is removed.
* [MESOS-9231] - `docker inspect` may return an unexpected result to Docker executor due to a race condition.
* [MESOS-9232] - verify-reviews.py broken after enabling python3 support scripts
* [MESOS-9240] - CSI protobuf build fails when dependency tracking is disabled.
* [MESOS-9253] - Reviewbot is failing when posting a review
* [MESOS-9266] - Whenever our packaging tasks trigger errors we run into permission problems.
* [MESOS-9274] - v1 JAVA scheduler library can drop TEARDOWN upon destruction.
* [MESOS-9279] - Docker Containerizer 'usage' call might be expensive if mount table is big.
* [MESOS-9281] - SLRP gets a stale checkpoint after system crash.
* [MESOS-9283] - Docker containerizer actor can get backlogged with large number of containers.
* [MESOS-9293] - If a framework looses operation information it cannot reconcile to acknowledge updates.
* [MESOS-9295] - Nested container launch could fail if the agent upgrade with new cgroup subsystems.
* [MESOS-9300] - XFS isolator can mislabel project IDs on persistence volumes.
* [MESOS-9302] - Mesos fails to build on Fedora 28
* [MESOS-9308] - URI disk profile adaptor could deadlock.
* [MESOS-9316] - FsTest.Used is flaky
* [MESOS-9317] - Some master endpoints do not handle failed authorization properly.
* [MESOS-9319] - Move root filesystem creation to the `filesystem/linux` isolator.
* [MESOS-9324] - Resource fragmentation: frameworks may be starved of port resources in the presence of large number frameworks with quota.
* [MESOS-9331] - Some library functions ignore failures from ::close which should probably be handled.
* [MESOS-9334] - Container stuck at ISOLATING state due to libevent poll never returns.
* [MESOS-9350] - CLI build step is broken with CMake due to missing file.
* [MESOS-9354] - Automatically remount read-only bind mounts.
* [MESOS-9357] - FetcherTest.DuplicateFileURI fails on macos
* [MESOS-9358] - Test `SlaveRecoveryTest.AgentReconfigurationWithRunningTask` is flaky.
* [MESOS-9362] - Test `CgroupsIsolatorTest.ROOT_CGROUPS_CreateRecursively` is flaky.
* [MESOS-9366] - Test `HealthCheckTest.HealthyTaskNonShell` can hang.
* [MESOS-9367] - GetContainers call crashes when using XFS disk isolation.
* [MESOS-9370] - Unable to build new Mesos CLI with PyInstaller and Python 3.7.
* [MESOS-9382] - mesos-gtest-runner doesn't work on systems without ulimit binary
* [MESOS-9390] - Warnings in AdaptedOperation prevent clang build
* [MESOS-9397] - PosixRLimitsIsolatorTest.UnsetLimits is broken on macOS 10.14.2 beta3.
* [MESOS-9398] - post-reviews.py fails to update an existing chain.
* [MESOS-9411] - Validation of JWT tokens using HS256 hashing algorithm is not thread safe.
* [MESOS-9417] - User mesosphere made lots of incorrect ticket updates
* [MESOS-9418] - Add support for the `Discard` blkio operation type.
* [MESOS-9419] - Executor to framework message crashes master if framework has not re-registered.
* [MESOS-9434] - Completed framework update streams may retry forever
* [MESOS-9459] - Reviewbot is not verifying reviews that need verification
* [MESOS-9462] - Devices in a container are inaccessible due to `nodev` on `/var/run`.
* [MESOS-9469] - Mesos does not validate framework-supplied FrameworkIDs
* [MESOS-9474] - Master does not respect authorization result for `CREATE_DISK` and `DESTROY_DISK`.
* [MESOS-9479] - SLRP does not set RP ID in produced OperationStatus.
* [MESOS-9480] - Master may skip processing authorization results for `LAUNCH_GROUP`.
* [MESOS-9492] - Persist CNI working directory across reboot.
* [MESOS-9495] - Test `MasterTest.CreateVolumesV1AuthorizationFailure` is flaky.
* [MESOS-9501] - Mesos executor fails to terminate and gets stuck after agent host reboot.
* [MESOS-9502] - IOswitchboard cleanup could get stuck due to FD leak from a race.
* [MESOS-9505] - `make check` failed with linking errors when c-ares is installed.
* [MESOS-9507] - Agent could not recover due to empty docker volume checkpointed files.
* [MESOS-9508] - Official 1.7.0 tarball can't be built on Ubuntu 16.04 LTS.
* [MESOS-9514] - Reviewboard bot fails on verify-reviews.py.
* [MESOS-9517] - SLRP should treat gRPC timeouts as non-terminal errors, instead of reporting OPERATION_FAILED.
* [MESOS-9518] - CNI_NETNS should not be set for orphan containers that do not have network namespace.
* [MESOS-9519] - Unable to build Mesos with CMake on Ubuntu 14.04.
* [MESOS-9521] - MasterAPITest.OperationUpdatesUponAgentGone is flaky
* [MESOS-9529] - `/proc` should be remounted even if a nested container set `share_pid_namespace` to true
* [MESOS-9531] - chown error handling is incorrect in createSandboxDirectory.
* [MESOS-9532] - ResourceOffersTest.ResourceOfferWithMultipleSlaves is flaky.
* [MESOS-9533] - CniIsolatorTest.ROOT_CleanupAfterReboot is flaky.
* [MESOS-9537] - SLRP sends inconsistent status updates for dropped operations.
* [MESOS-9542] - Hierarchical allocator check failure when an operation on a shutdown framework finishes
* [MESOS-9544] - SLRP does not clean up destroyed persistent volumes.
* [MESOS-9549] - nvidia/cuda 10 does not work on GPU isolator.
* [MESOS-9554] - Allocator might skip allocations because a single framework is incapable of receiving certain resources.
* [MESOS-9555] - Allocator CHECK failure: reservationScalarQuantities.contains(role).
* [MESOS-9557] - Operations are leaked in Framework struct when agents are removed
* [MESOS-9559] - OPERATION_UNREACHABLE and OPERATION_GONE_BY_OPERATOR updates don't include the agent/RP IDs
* [MESOS-9564] - Logrotate container logger lets tasks execute arbitrary commands in the Mesos agent's namespace
* [MESOS-9568] - SLRP does not clean up mount directories for destroyed MOUNT disks.
* [MESOS-9573] - Agent should not try to recover operation status update streams that haven't been created yet.
* [MESOS-9574] - Operation status update streams are not properly garbage collected.
* [MESOS-9582] - Reviewbot jenkins jobs stops validating any reviews as soon as it sees a patch which does not apply
* [MESOS-9590] - Mesos CI sometimes, incorrectly, overwrites already-pushed mesos master nightly images with new images built from non-master branches.
* [MESOS-9592] - Mesos Websitebot is flaky
* [MESOS-9597] - Status update streams for operations affecting agent default resources should be stored under "meta/slaves/<slave_id>/operations/"
* [MESOS-9605] - mesos/mesos-centos nightly docker image has to include the SHA of the build.
* [MESOS-9607] - Removing a resource provider with consumers breaks resource publishing.
* [MESOS-9610] - Fetcher vulnerability - escaping from sandbox
* [MESOS-9612] - Resource provider manager assumes all operations are triggered by frameworks
* [MESOS-9619] - Mesos Master Crashes with Launch Group when using Port Resources
* [MESOS-9621] - Mesos failed to build due to error LNK2019 on Windows using MSVC.
* [MESOS-9629] - Pylint reports cyclic dependencies in cli_new
* [MESOS-9635] - OperationReconciliationTest.AgentPendingOperationAfterMasterFailover is flaky again (3x) due to orphan operations
* [MESOS-9637] - Impossible to CREATE a volume on resource provider resources over the operator API
* [MESOS-9661] - Agent crashes when SLRP recovers dropped operations.
* [MESOS-9667] - Check failure when executor for task using resource provider resources subscribes before agent is registered.
* [MESOS-9688] - Quota is not enforced properly when subroles have reservations.
* [MESOS-9691] - Quota headroom calculation is off when subroles are involved.
* [MESOS-9692] - Quota may be under allocated for disk resources.
* [MESOS-9696] - Test MasterQuotaTest.AvailableResourcesSingleDisconnectedAgent is flaky
* [MESOS-9707] - Calling link::lo() may cause runtime error
* [MESOS-9667] - Check failure when executor for task using resource provider resources subscribes before agent is registered.
* [MESOS-9711] - Avoid shutting down executors registering before a required resource provider.
* [MESOS-9712] - StorageLocalResourceProviderTest.CsiPluginRpcMetrics is flaky.
* [MESOS-9727] - Heartbeat calls from executor to agent are reported as errors.
* [MESOS-9729] - Unpublishing a volume that is failed to publish crashes the agent with CSI v1.
* [MESOS-9733] - Random sorter generates non-uniform result for hierarchical roles.
* [MESOS-9740] - Invalid protobuf unions in ExecutorInfo::ContainerInfo will prevent agents from reregistering with 1.8+ masters
** Epic
* [MESOS-8054] - Feedback for operations
* [MESOS-8345] - Improve master responsiveness while serving state information.
* [MESOS-9029] - Seccomp syscall filtering in Mesos containerizer
* [MESOS-9211] - Make the new Mesos CLI production ready
* [MESOS-9675] - Docker Manifest V2 Schema2 Support.
** Story
* [MESOS-907] - Add Kerberos Authentication support
** Improvement
* [MESOS-4036] - Install instructions for CentOS 6.6 lead to errors running `perf`.
* [MESOS-4599] - ReviewBot should re-verify a review chain if any of the reviews is updated
* [MESOS-5158] - Provide XFS quota support for persistent volumes.
* [MESOS-6765] - Make the Resources wrapper "copy-on-write" to improve performance.
* [MESOS-6934] - Support pulling Docker images with V2 Schema 2 image manifest
* [MESOS-7124] - Replace monadic type get() functions with operator*
* [MESOS-7947] - Add GC capability to nested containers
* [MESOS-8025] - Update the master field in the new CLI config to accept a URL instead of an <ip:port>
* [MESOS-8206] - Add the pip-requirements from other modules to the pylint virtual environment
* [MESOS-8380] - Update WebUI to show local resource providers.
* [MESOS-8403] - Add agent HTTP API operator call to mark local resource providers as gone
* [MESOS-8880] - Add minimum capabilities in the master.
* [MESOS-8999] - Add default bodies for libprocess HTTP error responses.
* [MESOS-9133] - Make the range of ports protected by the network/ports isolator configurable.
* [MESOS-9158] - Parallel serving of state-related read-only requests in the Master.
* [MESOS-9194] - Extend request batching to '/roles' endpoint
* [MESOS-9223] - Storage local provider does not sufficiently handle container launch failures or errors
* [MESOS-9224] - De-duplicate read-only requests to master based on principal.
* [MESOS-9239] - Improve sorting performance in the DRF sorter.
* [MESOS-9249] - Avoid dirtying the DRF sorter when allocating resources.
* [MESOS-9255] - Use consistent "totals" across role / framework DRF.
* [MESOS-9258] - Prevent subscribers to the master's event stream from leaking connections
* [MESOS-9275] - Allow optional `profile` to be specified in `CREATE_DISK` offer operation.
* [MESOS-9292] - Rejected quotas request error messages should specify which resources were overcommitted.
* [MESOS-9301] - Add flag to disable per-framework metrics.
* [MESOS-9305] - Create cgoup recursively to workaround systemd deleting cgroups_root.
* [MESOS-9315] - Adding support for implicit allocation of mandatory custom resources in Mesos
* [MESOS-9321] - Add an optional `vendor` field in `Resource.DiskInfo.Source`.
* [MESOS-9340] - Log all socket errors in libprocess.
* [MESOS-9384] - Resource providers reported by master should reflect connected resource providers
* [MESOS-9406] - Allow for optionally unbundled leveldb from CMake builds.
* [MESOS-9486] - Set up `object.value` for `CREATE_DISK` and `DESTROY_DISK` authorizations.
* [MESOS-9504] - Use ResourceQuantities in the allocator and sorter to improve performance.
* [MESOS-9510] - Disallowed nan, inf and so on in `Value::Scalar`.
* [MESOS-9516] - Extend `min_allocatable_resources` flag to cover non-scalar resources.
* [MESOS-9523] - Add per-framework allocatable resources matcher/filter.
* [MESOS-9540] - Support `DESTROY_DISK` on preprovisioned CSI volumes.
* [MESOS-9608] - Refactor and Improve `class ResourceQuantity`.
* [MESOS-9613] - Support seccomp `unconfined` option for whitelisting.
* [MESOS-9628] - Consider running tox as part of test suite, not as part of style checking
* [MESOS-9642] - Avoid reading host mount table when allocating a gid in GIDManager.
* [MESOS-9643] - Make setting volume ownership asynchronous in volume gid manager
* [MESOS-9655] - Improving SLRP tests for preprovisioned volumes.
* [MESOS-9704] - Support docker manifest v2s2 config GC.
** Task
* [MESOS-4509] - Remove deprecated .json endpoints.
* [MESOS-5827] - Add example framework for using inverse offers
* [MESOS-6551] - Add attach/exec commands to the Mesos CLI
* [MESOS-6630] - Add some benchmark test for quota allocation
* [MESOS-6840] - Tests for quota capacity heuristic.
* [MESOS-8241] - Add metrics for offer operation feedback
* [MESOS-8528] - Design Doc for Storage External Resource Provider (SERP) support.
* [MESOS-8770] - Use Python3 for Mesos support scripts
* [MESOS-8810] - Grant non-root task user the permissions to access the SANDBOX_PATH volume of PARENT type
* [MESOS-8813] - Support multiple tasks with different users can access a persistent volume.
* [MESOS-8957] - Install Python 3 on Mesos CI instances
* [MESOS-8975] - Problem and solution overview for the slow API issue.
* [MESOS-9009] - Support for creation non-existing host paths in a whitelist as source paths
* [MESOS-9032] - Update build scripts to support `seccomp-isolator` flag and `libseccomp` library
* [MESOS-9033] - Add Seccomp-related protobufs
* [MESOS-9034] - Implement a wrapper class for `libseccomp` API
* [MESOS-9035] - Implement `linux/seccomp` isolator
* [MESOS-9099] - Add allocator quota tests regarding reserve/unreserve already allocated resources.
* [MESOS-9105] - Implement Docker Seccomp profile parser.
* [MESOS-9106] - Add seccomp filter into containerizer launcher.
* [MESOS-9229] - Install Python3 on ubuntu-16.04-arm docker image
* [MESOS-9265] - Analyse and pinpoint libprocess SSL failures when using libevent 2.1.8.
* [MESOS-9270] - Get rid of dependency on `net-tools` in network/cni isolator.
* [MESOS-9278] - Add an operation status update manager to the agent
* [MESOS-9318] - Consider providing better operation status updates while an RP is recovering
* [MESOS-9333] - Document usage and build of new Mesos CLI
* [MESOS-9356] - Make agent atomically checkpoint operations and resources
* [MESOS-9392] - Implement tests for Seccomp parser
* [MESOS-9396] - Use the built CLI binary when running new CLI integration tests in CI
* [MESOS-9399] - Update 'mesos task list' to only list running tasks
* [MESOS-9409] - Implement Seccomp isolator tests
* [MESOS-9471] - Master should track operations on agent default resources.
* [MESOS-9472] - Unblock operation feedback on agent default resources.
* [MESOS-9473] - Add end to end tests for operations on agent default resources.
* [MESOS-9477] - Documentation for operation feedback
* [MESOS-9525] - Agent capability for operation feedback on default resources
* [MESOS-9535] - Master should clean up operations from downgraded agents
* [MESOS-9538] - Agent `ReconcileOperations` handler should handle operation affecting default resources
* [MESOS-9578] - Document per framework minimal allocatable resources in framework development guides
* [MESOS-9596] - Add a new `UPDATE_QUOTA` operator call.
* [MESOS-9604] - Clean up `QuotaRequest` and `QuotaInfo`.
* [MESOS-9615] - Example framework for feedback on agent default resources
* [MESOS-9620] - Add metrics for volume gid manager
* [MESOS-9622] - Refactor SLRP with a CSI volume manager.
* [MESOS-9623] - Implement CSI volume manager with CSI v1.
* [MESOS-9624] - Bundle CSI spec v1.0 in Mesos.
* [MESOS-9625] - Make `DiskProfileAdaptor` agnostic to CSI spec version.
* [MESOS-9626] - Make SLRP pick the appropriate CSI versions for plugins.
* [MESOS-9632] - Refactor SLRP with a CSI service manager.
* [MESOS-9639] - Make CSI plugin RPC metrics agnostic to CSI versions.
* [MESOS-9648] - Make operation reconciliation send asynchronous updates
* [MESOS-9651] - Design for docker registry v2 schema2 basic support.
* [MESOS-9676] - Add prettyjws support for docker v2 s1 manifest.
* [MESOS-9694] - Refactor UCR docker store to construct 'Image' protobuf at Puller.
** Documentation
* [MESOS-9036] - Document `linux/seccomp` isolator
Release Notes - Mesos - Version 1.7.3 (WIP)
-------------------------------------------
* This is a bug fix release.
** Bug
* [MESOS-8467] - Destroyed executors might be used after `Slave::publishResource()`.
* [MESOS-9124] - Agent reconfiguration can cause master to unsuppress on scheduler's behalf.
* [MESOS-9507] - Agent could not recover due to empty docker volume checkpointed files.
* [MESOS-9529] - `/proc` should be remounted even if a nested container set `share_pid_namespace` to true.
* [MESOS-9549] - nvidia/cuda 10 does not work on GPU isolator.
* [MESOS-9564] - Logrotate container logger lets tasks execute arbitrary commands in the Mesos agent's namespace.
* [MESOS-9568] - SLRP does not clean up mount directories for destroyed MOUNT disks.
* [MESOS-9607] - Removing a resource provider with consumers breaks resource publishing.
* [MESOS-9610] - Fetcher vulnerability - escaping from sandbox.
* [MESOS-9616] - `Filters.refuse_seconds` declines resources not in offers.
* [MESOS-9619] - Mesos Master Crashes with Launch Group when using Port Resources
* [MESOS-9661] - Agent crashes when SLRP recovers dropped operations.
* [MESOS-9692] - Quota may be under allocated for disk resources.
* [MESOS-9695] - Remove the duplicate pid check in Docker containerizer
* [MESOS-9707] - Calling link::lo() may cause runtime error
* [MESOS-9750] - Agent V1 GET_STATE response may report a complete executor's tasks as non-terminal after a graceful agent shutdown.
* [MESOS-9766] - /__processes__ endpoint can hang.
* [MESOS-9785] - Frameworks recovered from reregistered agents are not reported to master `/api/v1` subscribers.
* [MESOS-9786] - Race between two REMOVE_QUOTA calls crashes the master.
* [MESOS-9787] - Log slow SSL (TLS) peer reverse DNS lookup.
* [MESOS-9803] - Memory leak caused by an infinite chain of futures in `UriDiskProfileAdaptor`.
* [MESOS-9852] - Slow memory growth in master due to deferred deletion of offer filters and timers.
* [MESOS-9856] - REVIVE call with specified role(s) clears filters for all roles of a framework.
* [MESOS-9870] - Simultaneous adding/removal of a role from framework's roles and its suppressed roles crashes the master.
** Improvements
* [MESOS-8880] - Add minimum capabilities in the master.
* [MESOS-9159] - Support Foreign URLs in docker registry puller.
* [MESOS-9540] - Support `DESTROY_DISK` on preprovisioned CSI volumes.
* [MESOS-9675] - Docker Manifest V2 Schema2 Support.
* [MESOS-9704] - Support docker manifest v2s2 config GC.
* [MESOS-9759] - Log required quota headroom and available quota headroom in the allocator.
Release Notes - Mesos - Version 1.7.2
-------------------------------------
* This is a bug fix release.
** Bug
* [MESOS-8887] - Unreachable tasks are not GC'ed when unreachable agent is GC'ed.
* [MESOS-9210] - Mesos v1 scheduler library does not properly handle SUBSCRIBE retries.
* [MESOS-9517] - SLRP should treat gRPC timeouts as non-terminal errors, instead of reporting OPERATION_FAILED.
* [MESOS-9531] - chown error handling is incorrect in createSandboxDirectory.
* [MESOS-9532] - ResourceOffersTest.ResourceOfferWithMultipleSlaves is flaky.
* [MESOS-9533] - CniIsolatorTest.ROOT_CleanupAfterReboot is flaky.
* [MESOS-9537] - SLRP sends inconsistent status updates for dropped operations.
* [MESOS-9544] - SLRP does not clean up destroyed persistent volumes.
* [MESOS-9554] - Allocator might skip allocations because a single framework is incapable of receiving certain resources.
* [MESOS-9555] - Allocator CHECK failure: reservationScalarQuantities.contains(role).
** Improvement
* [MESOS-9340] - Log all socket errors in libprocess.
Release Notes - Mesos - Version 1.7.1
-------------------------------------
* This is a bug fix release. Also includes performance and API
improvements:
* **Allocator**: Improved allocation cycle time substantially
(see MESOS-9239 and MESOS-9249). These reduce the allocation
cycle time in some benchmarks by 80%.
* **Scheduler API**: Improved the experimental `CREATE_DISK` and
`DESTROY_DISK` operations for CSI volume recovery (see MESOS-9275
and MESOS-9321). Storage local resource providers now return disk
resources with the `source.vendor` field set, so frameworks needs to
upgrade the `Resource` protobuf definitions.
* **Scheduler API**: Offer operation feedbacks now present their agent
IDs and resource provider IDs (see MESOS-9293).
** Bug
* [MESOS-7042] - Send SIGKILL after SIGTERM to IOSwitchboard after container termination.
* [MESOS-7474] - Mesos fetcher cache doesn't retry when missed.
* [MESOS-8545] - AgentAPIStreamingTest.AttachInputToNestedContainerSession is flaky.
* [MESOS-8907] - Docker image fetcher fails with HTTP/2.
* [MESOS-8978] - Command executor calling setsid breaks the tty support.
* [MESOS-9131] - Health checks launching nested containers while a container is being destroyed lead to unkillable tasks.
* [MESOS-9152] - Close all file descriptors except whitelist_fds in posix/subprocess.
* [MESOS-9154] - MasterTest.TaskStateMetrics is flaky
* [MESOS-9164] - Subprocess should unset CLOEXEC on whitelisted file descriptors.
* [MESOS-9228] - SLRP does not clean up plugin containers after it is removed.
* [MESOS-9231] - `docker inspect` may return an unexpected result to Docker executor due to a race condition.
* [MESOS-9266] - Whenever our packaging tasks trigger errors we run into permission problems.
* [MESOS-9267] - Mesos agent crashes when CNI network is not configured but used.
* [MESOS-9274] - v1 JAVA scheduler library can drop TEARDOWN upon destruction.
* [MESOS-9279] - Docker Containerizer 'usage' call might be expensive if mount table is big.
* [MESOS-9281] - SLRP gets a stale checkpoint after system crash.
* [MESOS-9283] - Docker containerizer actor can get backlogged with large number of containers.
* [MESOS-9293] - If a framework looses operation information it cannot reconcile to acknowledge updates.
* [MESOS-9295] - Nested container launch could fail if the agent upgrade with new cgroup subsystems.
* [MESOS-9308] - URI disk profile adaptor could deadlock.
* [MESOS-9317] - Some master endpoints do not handle failed authorization properly.
* [MESOS-9324] - Resource fragmentation: frameworks may be starved of port resources in the presence of large number frameworks with quota.
* [MESOS-9332] - Nested container should run as the same user of its parent container by default.
* [MESOS-9334] - Container stuck at ISOLATING state due to libevent poll never returns.
* [MESOS-9362] - Test `CgroupsIsolatorTest.ROOT_CGROUPS_CreateRecursively` is flaky.
* [MESOS-9411] - Validation of JWT tokens using HS256 hashing algorithm is not thread safe.
* [MESOS-9418] - Add support for the `Discard` blkio operation type.
* [MESOS-9419] - Executor to framework message crashes master if framework has not re-registered.
* [MESOS-9474] - Master does not respect authorization result for `CREATE_DISK` and `DESTROY_DISK`.
* [MESOS-9479] - SLRP does not set RP ID in produced OperationStatus.
* [MESOS-9480] - Master may skip processing authorization results for `LAUNCH_GROUP`.
* [MESOS-9492] - Persist CNI working directory across reboot.
* [MESOS-9501] - Mesos executor fails to terminate and gets stuck after agent host reboot.
* [MESOS-9502] - IOswitchboard cleanup could get stuck due to FD leak from a race.
* [MESOS-9505] - `make check` failed with linking errors when c-ares is installed.
* [MESOS-9508] - Official 1.7.0 tarball can't be built on Ubuntu 16.04 LTS.
* [MESOS-9518] - CNI_NETNS should not be set for orphan containers that do not have network namespace.
* [MESOS-9519] - Unable to build Mesos with CMake on Ubuntu 14.04.
** Improvement
* [MESOS-6765] - Make the Resources wrapper "copy-on-write" to improve performance.
* [MESOS-9239] - Improve sorting performance in the DRF sorter.
* [MESOS-9249] - Avoid dirtying the DRF sorter when allocating resources.
* [MESOS-9255] - Use consistent "totals" across role / framework DRF.
* [MESOS-9275] - Allow optional `profile` to be specified in `CREATE_DISK` offer operation.
* [MESOS-9305] - Create cgoup recursively to workaround systemd deleting cgroups_root.
* [MESOS-9321] - Add an optional `vendor` field in `Resource.DiskInfo.Source`.
* [MESOS-9325] - Optimize `Resources::filter` operation.
* [MESOS-9486] - Set up `object.value` for `CREATE_DISK` and `DESTROY_DISK` authorizations.
* [MESOS-9510] - Disallowed nan, inf and so on in `Value::Scalar`.
* [MESOS-9516] - Extend `min_allocatable_resources` flag to cover non-scalar resources.
Release Notes - Mesos - Version 1.7.0
-------------------------------------
This release contains the following highlights:
* Performance Improvements:
* **Master `/state` endpoint:** Adopted RapidJSON and reduced
copying for a ~130% throughput improvement due to a ~55%
decrease in latency (MESOS-9092). Also, added parallel
processing of `/state` requests to reduce master backlogging
/ interference under high request load (MESOS-9122).
* **Allocator:** Improved allocator cycle time significantly
(MESOS-9087). This, together with the reduced master
backlogging from `/state` improvements, reduces the
end-to-end offer cycling time between Mesos and schedulers.
* **Agent `/containers` endpoint:** Fixed a performance issue
that caused high latency / cpu consumption when there are
many containers on the agent (MESOS-8418).
* **Agent container launching performance improvements**:
The expensive `cgroups::verify()` calls were removed which
provides a significant improvement to container launch /
destroy throughput (MESOS-9081).
* Containerization:
* [MESOS-8794] - **Experimental** Supported docker image tarball
fetching from HDFS through the `--docker_registry` agent flag.
* [MESOS-7691] - Added a new option `cgroups/all` to the agent
flag `--isolation`. This allows cgroups isolator to
automatically load all the local enabled cgroups subsystems.
If this option is specified in the agent flag `--isolation`
along with other cgroups related options
(e.g., `cgroups/cpu`), those options will be just ignored.
* [MESOS-7947] - Added a new `--gc_non_executor_container_sandboxes`
option which tells the agent to garbage collect sandboxes created
via the LAUNCH_NESTED_CONTAINER API. The same flag will apply to
standalone container sandboxes in future.
* [MESOS-8327] - Added container-specific cgroups mounts under
`/sys/fs/cgroup` to containers with image launched by Mesos
containerizer.
* [MESOS-5647] - Expose network statistics for containers on
CNI network in the `network/cni` isolator.
* [MESOS-8792] - Added a new `linux/devices` isolator that
automatically populates containers with devices that have
been whitelisted with the `--allowed_devices` agent flag.
* [MESOS-8340] Added a new `--enforce_container_ports`
option to toggle ports resource enforcement by the
`network/ports` isolator.
* [MESOS-6451] - Add timer and percentile metrics for docker
pull latency distribution.
* Windows:
* [MESOS-8668] - Added support to libprocess for the Windows
Thread Pool API, replacing libevent with the native Windows
event and thread pool library. This can be enabled with
`-DENABLE_LIBWINIO=ON` during CMake configuration. By
utilizing I/O Completion Ports, this enables non-blocking
asynchronous I/O on Windows for sockets, pipes, and files.
* Multi-Framework Workloads:
* [MESOS-8842] - **Experimental** Added per-framework metrics
to the master. These new metrics provide detailed information
about the behavior of each framework and can help with
scalability testing, debugging, and fine grained monitoring.
Please refer to docs/monitoring.md for more details.
* [MESOS-8238] Documentation was added in the framework
development guide to provide recommendations on how schedulers
can behave co-operatively in a multi-framework setting, as
well as how to operationally configure Mesos in such a setting.
* [MESOS-8936] A new weighted random sorter was added as an
alternative to the existing DRF sorter, this allows users
that don't need DRF behavior to opt-out.
Additional API Changes:
* [MESOS-9066] - Introduced `CREATE_DISK` and `DESTROY_DISK` offer
operations to replace `CREATE_VOLUME`, `CREATE_BLOCK`,
`DESTROY_VOLUME` and `DESTROY_BLOCK`.
* Container logger module interface has been changed. The `prepare()` method
now takes `ContainerID` and `ContainerConfig` instead.
* `Isolator::recover` interface has been changed to take an `std::vector`
instead of `std::list`.
* JSON endpoints now use rapidjson to provide a performance improvement,
this means that if a client has a JSON de-serializer that does not
conform to the ECMA-404 spec for JSON, they may break. As an example,
Mesos would previously serialize '/' as '\/', but the spec does not
require the escaping and rapidjson does not escape '/'.
Changes to Dependencies:
* [MESOS-8395] - Made gRPC a requirement for Mesos builds. The `--enable-grpc`
Autotools option and the `-DENABLE_GRPC=ON` CMake option is now removed.
* [MESOS-8064] - Mesos now requires libarchive to programmatically decode
.zip, .tar, .gzip, and other common file compression schemes. Version 3.3.2
is bundled in Mesos.
* [MESOS-9092] - Adopt rapidjson for improved json serialization performance.
Version 1.1.0 is bundled in Mesos.
Unresolved Critical Issues:
* [MESOS-1718] - Command executor can overcommit the agent.
* [MESOS-2554] - Slave flaps when using --slave_subsystems that are not used for isolation.
* [MESOS-2774] - SIGSEGV received during process::MessageEncoder::encode()
* [MESOS-2842] - Update FrameworkInfo.principal on framework re-registration
* [MESOS-3747] - HTTP Scheduler API no longer allows FrameworkInfo.user to be empty string
* [MESOS-5396] - After failover, master does not remove agents with same UPID.
* [MESOS-5989] - Libevent SSL Socket downgrade code accesses uninitialized memory / assumes single peek is sufficient.
* [MESOS-5995] - Protobuf JSON deserialisation does not accept numbers formated as strings
* [MESOS-6632] - ContainerLogger might leak FD if container launch fails.
* [MESOS-7076] - libprocess tests fail when using libevent 2.1.8
* [MESOS-7386] - Executor not cleaning up existing running docker containers if external logrotate/logger processes die/killed
* [MESOS-7566] - Master crash due to failed check in DRFSorter::remove
* [MESOS-7622] - Agent can crash if a HTTP executor tries to retry subscription in running state.
* [MESOS-7721] - Master's agent removal rate limit also applies to agent unreachability.
* [MESOS-7748] - Slow subscribers of streaming APIs can lead to Mesos OOMing.
* [MESOS-7911] - Non-checkpointing framework's tasks should not be marked LOST when agent disconnects.
* [MESOS-7991] - fatal, check failed !framework->recovered()
* [MESOS-8038] - Launching GPU task sporadically fails.
* [MESOS-8137] - Mesos agent can hang during startup.
* [MESOS-8256] - Libprocess can silently deadlock due to worker thread exhaustion.
* [MESOS-8257] - Unified Containerizer "leaks" a target container mount path to the host FS when the target resolves to an absolute path
* [MESOS-8522] - `prepareMounts` in Mesos containerizer is flaky.
* [MESOS-8623] - Crashed framework brings down the whole Mesos cluster
* [MESOS-8679] - If the first KILL stuck in the default executor, all other KILLs will be ignored.
* [MESOS-8703] - Mesos master can`t reconnect to zookeeper
* [MESOS-8731] - mesos master APIs become latent
* [MESOS-8769] - Agent crashes when CNI config not defined
* [MESOS-8803] - Libprocess deadlocks in a test.
* [MESOS-8840] - `cpu.cfs_quota_us` may be accidentally set for command task using docker during agent recovery.
* [MESOS-8927] - Default executor cannot kill tasks if `LAUNCH_NESTED_CONTAINER` is stuck.
* [MESOS-9006] - The agent's GET_AGENT leaks resource information when using authorization
* [MESOS-9022] - Race condition in task updates could cause missing event in streaming
* [MESOS-9049] - Agent GC could unmount a dangling persistent volume multiple times.
* [MESOS-9053] - Network ports isolator can falsely trigger while destroying containers.
* [MESOS-9109] - Windows agent uses reserved character :(colon) for file name and crashes when attempting to remove link
* [MESOS-9131] - Health checks launching nested containers while a container is being destroyed lead to unkillable tasks
* [MESOS-9157] - cannot pull docker image from dockerhub
* [MESOS-9169] - docker image fetching fails
All Resolved Issues:
** Bug
* [MESOS-2199] - Failing test: SlaveTest.ROOT_RunTaskWithCommandInfoWithUser
* [MESOS-3202] - Avoid role/framework offer starvation in DRF allocator.
* [MESOS-3475] - TestContainerizer should not modify global environment variables.
* [MESOS-3790] - ZooKeeper connection should retry on EAI_NONAME
* [MESOS-5371] - Implement `fcntl.hpp`
* [MESOS-5904] - Process routes implementation seems to drop routes on Windows.
* [MESOS-6092] - Docker containerizer launch command may access a "Container" struct after it has been destroyed
* [MESOS-6622] - NvidiaGpuTest.ROOT_INTERNET_CURL_CGROUPS_NVIDIA_GPU_NvidiaDockerImage is flaky
* [MESOS-6823] - bool/UserContainerLoggerTest.ROOT_LOGROTATE_RotateWithSwitchUserTrueOrFalse/0 is flaky
* [MESOS-6985] - os::getenv() can segfault
* [MESOS-7032] - Mesos fail NvidiaGpuTest.ROOT_INTERNET_CURL_CGROUPS_NVIDIA_GPU_NvidiaDockerImage
* [MESOS-7168] - Agent should validate that the nested container ID does not exceed certain length.
* [MESOS-7220] - 'EXPECT_SOME' and other asserts don't work with 'Try's that have a custom error state.
* [MESOS-7342] - Port Docker tests
* [MESOS-7397] - apply-reviews.py silently fails when using chain mode.
* [MESOS-7658] - apply-reviews.py fails with Unicode characters
* [MESOS-7966] - check for maintenance on agent causes fatal error
* [MESOS-8128] - Make os::pipe file descriptors O_CLOEXEC.
* [MESOS-8134] - SlaveTest.ContainersEndpoint is flaky due to getenv crash.
* [MESOS-8429] - Clean up endpoint socket if the container daemon is destroyed while waiting.
* [MESOS-8499] - Change docker health check image to the new nanoserver one
* [MESOS-8567] - Test UriDiskProfileTest.FetchFromHTTP is flaky.
* [MESOS-8568] - Command checks should always call `WAIT_NESTED_CONTAINER` before `REMOVE_NESTED_CONTAINER`
* [MESOS-8613] - Test `MasterAllocatorTest/*.TaskFinished` is flaky.
* [MESOS-8626] - The 'allocatable' check in the allocator is problematic with multi-role frameworks
* [MESOS-8686] - Mesos build failed with /permissive- + MSVC on windows
* [MESOS-8687] - Check failure in `ProcessBase::_consume()`.
* [MESOS-8786] - CgroupIsolatorProcess accesses subsystem processes directly.
* [MESOS-8830] - Agent gc on old slave sandboxes could empty persistent volume data
* [MESOS-8838] - Consider validating that resubscribing resource providers do not change their name or type
* [MESOS-8857] - Fix subprocess(flags) logic on Windows to handle arguments with quotes
* [MESOS-8871] - Agent may fail to recover if the agent dies before image store cache checkpointed.
* [MESOS-8873] - StorageLocalResourceProviderTest.ROOT_ZeroSizedDisk is flaky.
* [MESOS-8875] - `leveldb::PosixEnv::DeleteFile()` can segfault.
* [MESOS-8884] - Flaky `DockerContainerizerTest.ROOT_DOCKER_MaxCompletionTime`.
* [MESOS-8892] - MasterSlaveReconciliationTest.ReconcileDroppedOperation is flaky
* [MESOS-8897] - ROOT_XFS_QuotaTest.DiskUsageExceedsQuotaWithKill is flaky
* [MESOS-8906] - `UriDiskProfileAdaptor` fails to update profile selectors.
* [MESOS-8913] - Resource provider manager registry leaks file descriptors into executors.
* [MESOS-8917] - Agent leaking file descriptors into forked processes
* [MESOS-8921] - Autotools don't work with newer OpenJDK versions
* [MESOS-8932] - Quota guarantee metric does not handle removal correctly.
* [MESOS-8935] - Quota limit "chopping" can lead to cpu-only and memory-only offers.
* [MESOS-8945] - Master check failure due to CHECK_SOME(providerId).
* [MESOS-8952] - process::await/collect n^2 performance issue
* [MESOS-8954] - python3/post-reviews.py errors due to TypeError.
* [MESOS-8958] - LinuxDevicesIsolatorTest.ROOT_PopulateWhitelistedDevices fails on some boxes.
* [MESOS-8963] - Executor crash trying to print container ID.
* [MESOS-8970] - Tests relying on metrics segfault on some Linux distros.
* [MESOS-8977] - BuildBot uses Docker with AUFS that has a max file length limit of 242 characters
* [MESOS-8979] - python3/push-commits.py fails due to TypeError
* [MESOS-8980] - mesos-slave can deadlock with docker pull
* [MESOS-8985] - Posting to the operator api with 'accept recordio' header can crash the agent
* [MESOS-8987] - Master asks agent to shutdown upon auth errors.
* [MESOS-9000] - Operator API event stream can miss task status updates.
* [MESOS-9007] - XFS disk isolator doesn't clean up project ID from symlinks
* [MESOS-9008] - Fetcher fails to extract some archives containing hardlinks
* [MESOS-9010] - `UPDATE_STATE` can race with `UPDATE_OPERATION_STATUS` for a resource provider.
* [MESOS-9014] - MasterAPITest.SubscribersReceiveHealthUpdates is flaky
* [MESOS-9025] - The container which joins CNI network and has checkpoint enabled will be mistakenly destroyed by agent
* [MESOS-9027] - GPU Isolator still depends on cgroups/devices agent flag given cgrous/all is supported.
* [MESOS-9037] - DefaultExecutorTest.SigkillExecutor is flaky
* [MESOS-9038] - Archiver utility extracts links within subdirectories incorrectly
* [MESOS-9039] - CNI isolator recovery should wait until unknown orphan cleanup is done
* [MESOS-9051] - Move agent call validation into common validation library.
* [MESOS-9065] - Apply the `override` keyword globally.
* [MESOS-9073] - Tox doesn't run in the support virtualenv when using Python 3 mesos-style.py
* [MESOS-9075] - Virtualenv management in support directory is buggy.
* [MESOS-9094] - On macOS libprocess_tests fail to link when compiling with gRPC
* [MESOS-9114] - cmake build is broken on macos
* [MESOS-9115] - Stout depends on missing rapidjson headers.
* [MESOS-9116] - Launch nested container session fails due to incorrect detection of `mnt` namespace of command executor's task.
* [MESOS-9125] - Port mapper CNI plugin might fail with "Resource temporarily unavailable"
* [MESOS-9127] - Port mapper CNI plugin might deadlock iptables on the agent.
* [MESOS-9137] - GRPC build fails to pass compiler flags
* [MESOS-9142] - CNI detach might fail due to missing network config file.
* [MESOS-9144] - Master authentication handling leads to request amplification.
* [MESOS-9145] - Master has a fragile burned-in 5s authentication timeout.
* [MESOS-9146] - Agent has a fragile burn-in 5s authentication timeout.
* [MESOS-9147] - Agent and scheduler driver authentication retry backoff time could overflow.
* [MESOS-9149] - Failed to build gRPC on Linux without OpenSSL.
* [MESOS-9151] - Container stuck at ISOLATING due to FD leak
* [MESOS-9156] - StorageLocalResourceProviderProcess can deadlock
* [MESOS-9160] - Failed to compile gRPC when the build path contains symlinks.
* [MESOS-9163] - `UriDiskProfileAdaptor` should not update profiles when a poll returns a non-OK HTTP status.
* [MESOS-9170] - Zookeeper doesn't compile with newer gcc due to format error
* [MESOS-9171] - Mesos agent crashes in CNI isolator when usage is queried
* [MESOS-9177] - Mesos master segfaults when responding to /state requests.
* [MESOS-9185] - An attempt to remove or destroy container in composing containerizer leads to segfault.
* [MESOS-9193] - Mesos build fail with Clang 3.5.
* [MESOS-9196] - Removing rootfs mounts may fail with EBUSY.
** Epic
* [MESOS-8564] - Port libprocess-tests suites to Windows
* [MESOS-8668] - Transition libprocess on Windows to use the Thread Pool API
* [MESOS-8705] - Composing containerizer improvements
* [MESOS-8842] - Per Framework Metrics on Master
* [MESOS-8916] - Allocation logic cleanup.
* [MESOS-9013] - Support container Cgroup FS mount.
** Improvement
* [MESOS-6451] - Add timer and percentile for docker pull latency distribution.
* [MESOS-7691] - Support local enabled cgroups subsystems automatically.
* [MESOS-7947] - Add GC capability to nested containers
* [MESOS-8064] - Add capability so mesos can programmatically decode .zip, .tar, .gzip, and other common file compression schemes
* [MESOS-8106] - Docker fetcher plugin unsupported scheme failure message is not accurate.
* [MESOS-8340] - Add a no-enforce option to the `network/ports` isolator.
* [MESOS-8418] - mesos-agent high cpu usage because of numerous /proc/mounts reads
* [MESOS-8680] - Rename variable names in slave.hpp to be more explicit.
* [MESOS-8788] - Add alg RS256 support for JWT generator and validator in libprocess
* [MESOS-8792] - Automatically create whitelisted devices.
* [MESOS-8798] - Build the "unsecure" gRPC libraries to remove SSL dependency.
* [MESOS-8829] - Get rid of extra `containerizer->wait()` calls in tests.
* [MESOS-8908] - Add -fno-omit-frame-pointer to improve debugging and profiling.
* [MESOS-8911] - Add framework metrics benchmark test.
* [MESOS-8919] - Per Framework SUBSCRIBE metrics.
* [MESOS-8920] - Support per-container container logger configuration.
* [MESOS-8924] - Refactor the libprocess gRPC warpper.
* [MESOS-8955] - Manage Python2 and 3 in build steps
* [MESOS-8986] - `slave.available()` in the allocator is expensive and drags down allocation performance.
* [MESOS-8989] - Add a better benchmark for range type resources.
* [MESOS-8998] - Allow for unbundled libevent in CMake builds to work around 2.1.x SSL issues.
* [MESOS-9015] - Allow resources to be removed when updating the sorter.
* [MESOS-9055] - Make gRPC call deadline configurable.
* [MESOS-9067] - Improve performance of json parsing by avoiding conversion cost.
* [MESOS-9081] - cgroups::verify is expensive and is done implicitly during cgroups operations.
* [MESOS-9086] - Optimize range subtraction operation.
* [MESOS-9092] - Adopt rapidjson for improved json serialization performance.
* [MESOS-9104] - Refactor capability related logic in the allocator.
* [MESOS-9110] - Add move support to the Resources / Resource_ wrappers.
* [MESOS-9122] - Batch '/state' requests in the Master actor.
* [MESOS-9129] - Port mapper CNI plugin should use '-n' option with 'iptables --list'
* [MESOS-9213] - Avoid double copying of master->framework messages when incrementing metrics.
** Task
* [MESOS-2633] - Move implementations of Framework struct functions out of master.hpp.
* [MESOS-3442] - Port path_tests to Windows
* [MESOS-3444] - Port sendfile_tests
* [MESOS-5647] - Expose network statistics for containers on CNI network in the `network/cni` isolator.
* [MESOS-5814] - Port libprocess http_tests.cpp
* [MESOS-5817] - Port libprocess process_tests.cpp
* [MESOS-5941] - RemoteLink tests fail on Windows
* [MESOS-7329] - Authorize offer operations for converting disk resources
* [MESOS-7527] - Enable ProcessTest.THREADSAFE_Http2 on Windows.
* [MESOS-8314] - Add authorization to display of resource provider information in API calls and endpoints
* [MESOS-8327] - Add container-specific CGroup FS mounts under /sys/fs/cgroup/* to Mesos containers
* [MESOS-8383] - Add metrics for operations in Storage Local Resource Provider (SLRP).
* [MESOS-8395] - Made gRPC a requirement for Mesos builds.
* [MESOS-8473] - Authorize `GET_OPERATIONS` calls.
* [MESOS-8670] - Implement `process::io::read/write` using Thread Pool API
* [MESOS-8671] - Add EventLoop implementation using Thread Pool API
* [MESOS-8672] - Replace libprocess `PollSocketImpl` with IOCP and Thread Pool API
* [MESOS-8674] - Fix os::pipe to work in overlapped mode
* [MESOS-8681] - Clean up os::sendfile on Windows
* [MESOS-8712] - Remove `destroyed` promise from `Container` struct
* [MESOS-8713] - Synchronize result of `wait` and `destroy` composing c'zer methods
* [MESOS-8714] - Cleanup `containers_` hashmap once container exits
* [MESOS-8732] - Use composing containerizer in some agent tests.
* [MESOS-8734] - Restore `WaitAfterDestroy` test to check termination status of a terminated nested container.
* [MESOS-8736] - Implement a test which ensures that `wait` and `destroy` return the same result for a terminated nested container.
* [MESOS-8737] - Update composing containerizer tests.
* [MESOS-8774] - Authenticate and authorize calls to the resource provider manager's API
* [MESOS-8794] - Support docker image tarball hdfs based fetching.
* [MESOS-8814] - Mount the volume based on `Volume.mode`.
* [MESOS-8825] - Remove storage pools associated with missing profiles.
* [MESOS-8837] - Add test of resource provider manager recovery
* [MESOS-8843] - Per Framework CALL metrics
* [MESOS-8844] - Per Framework EVENT metrics
* [MESOS-8845] - Per Framework Operation metrics
* [MESOS-8846] - Per Framework state metrics
* [MESOS-8847] - Per Framework task state metrics
* [MESOS-8848] - Per Framework Offer metrics
* [MESOS-8849] - Per Framework resource allocation metrics
* [MESOS-8903] - Update the Python CLI to use Python 3
* [MESOS-8912] - Per Framework terminal task state metrics
* [MESOS-8931] - Add os::shell back to Windows
* [MESOS-8934] - Update python.m4 to support Python 3
* [MESOS-8936] - Implement a Random Sorter for offer allocations.
* [MESOS-8940] - Per Framework Offer metrics with a specific resource type
* [MESOS-8942] - Master streaming API does not send (health) check updates for tasks.
* [MESOS-8943] - Add metrics about CSI calls.
* [MESOS-8961] - Output of tasks gets corrupted if task defines the same environment variables as the executor container
* [MESOS-8990] - Build failure of the google-test dependency on Windows using MSVC.
* [MESOS-8995] - Add SLRP unit tests for missing profiles.
* [MESOS-8997] - Consider dropping PATH disk support for CSI volumes.
* [MESOS-9002] - GCC 8.1 build failure in os::Fork::Tree.
* [MESOS-9043] - Move check validators to the common validation library.
* [MESOS-9066] - Changing `CREATE_VOLUME` and `CREATE_BLOCK` to `CREATE_DISK`.
* [MESOS-9068] - Add a metrics benchmark in libprocess.
* [MESOS-9070] - Support systemd and freezer cgroup subsystems bind mount for container with rootfs.
* [MESOS-9148] - Make cgroups destroy timeout configurable for Mesos containerizer
** Documentation
* [MESOS-8740] - Update description of a Containerizer interface.
* [MESOS-9020] - Seccomp design doc
Release Notes - Mesos - Version 1.6.3 (WIP)
-------------------------------------------
* This is a bug fix release.
** Bug
* [MESOS-9124] - Agent reconfiguration can cause master to unsuppress on scheduler's behalf.
* [MESOS-9507] - Agent could not recover due to empty docker volume checkpointed files.
* [MESOS-9529] - `/proc` should be remounted even if a nested container set `share_pid_namespace` to true.
* [MESOS-9564] - Logrotate container logger lets tasks execute arbitrary commands in the Mesos agent's namespace.
* [MESOS-9616] - `Filters.refuse_seconds` declines resources not in offers.
* [MESOS-9619] - Mesos Master Crashes with Launch Group when using Port Resources
* [MESOS-9692] - Quota may be under allocated for disk resources.
* [MESOS-9695] - Remove the duplicate pid check in Docker containerizer
* [MESOS-9707] - Calling link::lo() may cause runtime error
* [MESOS-9766] - /__processes__ endpoint can hang.
* [MESOS-9786] - Race between two REMOVE_QUOTA calls crashes the master.
* [MESOS-9787] - Log slow SSL (TLS) peer reverse DNS lookup.
* [MESOS-9852] - Slow memory growth in master due to deferred deletion of offer filters and timers.
* [MESOS-9856] - REVIVE call with specified role(s) clears filters for all roles of a framework.
* [MESOS-9870] - Simultaneous adding/removal of a role from framework's roles and its suppressed roles crashes the master.
** Improvement
* [MESOS-8880] - Add minimum capabilities in the master.
* [MESOS-9159] - Support Foreign URLs in docker registry puller.
* [MESOS-9675] - Docker Manifest V2 Schema2 Support.
* [MESOS-9704] - Support docker manifest v2s2 config GC.
* [MESOS-9759] - Log required quota headroom and available quota headroom in the allocator.
Release Notes - Mesos - Version 1.6.2
-------------------------------------------
* This is a bug fix release.
** Bug
* [MESOS-7042] - Send SIGKILL after SIGTERM to IOSwitchboard after container termination.
* [MESOS-7474] - Mesos fetcher cache doesn't retry when missed.
* [MESOS-8128] - Make os::pipe file descriptors O_CLOEXEC.
* [MESOS-8418] - mesos-agent high cpu usage because of numerous /proc/mounts reads.
* [MESOS-8545] - AgentAPIStreamingTest.AttachInputToNestedContainerSession is flaky.
* [MESOS-8568] - Command checks should always call `WAIT_NESTED_CONTAINER` before `REMOVE_NESTED_CONTAINER`
* [MESOS-8887] - Unreachable tasks are not GC'ed when unreachable agent is GC'ed.
* [MESOS-8907] - Docker image fetcher fails with HTTP/2.
* [MESOS-8917] - Agent leaking file descriptors into forked processes
* [MESOS-8921] - Autotools don't work with newer OpenJDK versions
* [MESOS-8978] - Command executor calling setsid breaks the tty support.
* [MESOS-9116] - Launch nested container session fails due to incorrect detection of `mnt` namespace of command executor's task.