-
Notifications
You must be signed in to change notification settings - Fork 1
/
NEWS
3439 lines (2937 loc) · 133 KB
/
NEWS
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
48
49
50
51
52
53
54
55
56
57
58
59
60
61
62
63
64
65
66
67
68
69
70
71
72
73
74
75
76
77
78
79
80
81
82
83
84
85
86
87
88
89
90
91
92
93
94
95
96
97
98
99
100
101
102
103
104
105
106
107
108
109
110
111
112
113
114
115
116
117
118
119
120
121
122
123
124
125
126
127
128
129
130
131
132
133
134
135
136
137
138
139
140
141
142
143
144
145
146
147
148
149
150
151
152
153
154
155
156
157
158
159
160
161
162
163
164
165
166
167
168
169
170
171
172
173
174
175
176
177
178
179
180
181
182
183
184
185
186
187
188
189
190
191
192
193
194
195
196
197
198
199
200
201
202
203
204
205
206
207
208
209
210
211
212
213
214
215
216
217
218
219
220
221
222
223
224
225
226
227
228
229
230
231
232
233
234
235
236
237
238
239
240
241
242
243
244
245
246
247
248
249
250
251
252
253
254
255
256
257
258
259
260
261
262
263
264
265
266
267
268
269
270
271
272
273
274
275
276
277
278
279
280
281
282
283
284
285
286
287
288
289
290
291
292
293
294
295
296
297
298
299
300
301
302
303
304
305
306
307
308
309
310
311
312
313
314
315
316
317
318
319
320
321
322
323
324
325
326
327
328
329
330
331
332
333
334
335
336
337
338
339
340
341
342
343
344
345
346
347
348
349
350
351
352
353
354
355
356
357
358
359
360
361
362
363
364
365
366
367
368
369
370
371
372
373
374
375
376
377
378
379
380
381
382
383
384
385
386
387
388
389
390
391
392
393
394
395
396
397
398
399
400
401
402
403
404
405
406
407
408
409
410
411
412
413
414
415
416
417
418
419
420
421
422
423
424
425
426
427
428
429
430
431
432
433
434
435
436
437
438
439
440
441
442
443
444
445
446
447
448
449
450
451
452
453
454
455
456
457
458
459
460
461
462
463
464
465
466
467
468
469
470
471
472
473
474
475
476
477
478
479
480
481
482
483
484
485
486
487
488
489
490
491
492
493
494
495
496
497
498
499
500
501
502
503
504
505
506
507
508
509
510
511
512
513
514
515
516
517
518
519
520
521
522
523
524
525
526
527
528
529
530
531
532
533
534
535
536
537
538
539
540
541
542
543
544
545
546
547
548
549
550
551
552
553
554
555
556
557
558
559
560
561
562
563
564
565
566
567
568
569
570
571
572
573
574
575
576
577
578
579
580
581
582
583
584
585
586
587
588
589
590
591
592
593
594
595
596
597
598
599
600
601
602
603
604
605
606
607
608
609
610
611
612
613
614
615
616
617
618
619
620
621
622
623
624
625
626
627
628
629
630
631
632
633
634
635
636
637
638
639
640
641
642
643
644
645
646
647
648
649
650
651
652
653
654
655
656
657
658
659
660
661
662
663
664
665
666
667
668
669
670
671
672
673
674
675
676
677
678
679
680
681
682
683
684
685
686
687
688
689
690
691
692
693
694
695
696
697
698
699
700
701
702
703
704
705
706
707
708
709
710
711
712
713
714
715
716
717
718
719
720
721
722
723
724
725
726
727
728
729
730
731
732
733
734
735
736
737
738
739
740
741
742
743
744
745
746
747
748
749
750
751
752
753
754
755
756
757
758
759
760
761
762
763
764
765
766
767
768
769
770
771
772
773
774
775
776
777
778
779
780
781
782
783
784
785
786
787
788
789
790
791
792
793
794
795
796
797
798
799
800
801
802
803
804
805
806
807
808
809
810
811
812
813
814
815
816
817
818
819
820
821
822
823
824
825
826
827
828
829
830
831
832
833
834
835
836
837
838
839
840
841
842
843
844
845
846
847
848
849
850
851
852
853
854
855
856
857
858
859
860
861
862
863
864
865
866
867
868
869
870
871
872
873
874
875
876
877
878
879
880
881
882
883
884
885
886
887
888
889
890
891
892
893
894
895
896
897
898
899
900
901
902
903
904
905
906
907
908
909
910
911
912
913
914
915
916
917
918
919
920
921
922
923
924
925
926
927
928
929
930
931
932
933
934
935
936
937
938
939
940
941
942
943
944
945
946
947
948
949
950
951
952
953
954
955
956
957
958
959
960
961
962
963
964
965
966
967
968
969
970
971
972
973
974
975
976
977
978
979
980
981
982
983
984
985
986
987
988
989
990
991
992
993
994
995
996
997
998
999
1000
psmgmt NEWS -- history of user-visible changes.
Copyright (C) 2010-2021 ParTec Cluster Competence Center GmbH, Munich
Copyright (C) 2021-2024 ParTec AG, Munich
Please send bug reports, questions and suggestions to <support@par-tec.com>
*************************************************************************
*************************************************************************
** **
** This release breaks all backward compatibility to psmgmt-5 due to **
** major changes in the core protocol for the sake of support of **
** PIDs with up to 31 bit (at the time Linux supports 22 bits) **
** **
** Rolling updates will *not* work and **
** might cause major harm to a system! **
** **
*************************************************************************
*************************************************************************
Version 6.0.0:
==============
See especially remarks in 6.0.beta1 for further changes
Bugfixes:
- Rework PSI_initClient() in view of (#269)
- MPI_Comm_spawn() must not be confused by environment
- Ensure PSP_DD_INHERITFAILED is actually sent
Enhancements:
- Ensure psmgmt-rrcomm-devel is self-contained (#268)
- Describe how to create a bugfix release (#273)
- Take the chance to reorganize message types
- Take the chance to reorganize hooks
- Add NVIDIA GH100 [H100 SXM5 94GB] to GPUDevices
- Utilize fragmentation layer for PSP_DD_SLOTSRES
- Always initialize fragmentation in libpsi and cleanup on exit
Additional changes:
- Introduce PSIDHOOK_RECEIVEPART and bump plugin API to 145
- Remove test_pse program and its supporting functions (#271)
- Replace PSP_CD_CREATEPART/NL by fragmented PSP_CD_REQUESTPART
- Replace PSP_DD_GETPART/NL by fragmented PSP_DD_CREATEPART
- Replace PSIDHOOK_CREATEPART/NL by single PSIDHOOK_REQUESTPART
- Replace PSP_DD_PROVIDEPART/SL by fragmented PSP_DD_PROVIDEPART
- Replace PSP_DD_PROVIDETASK/SL by fragmented PSP_DD_PROVIDETASK
- Replace PSP_DD_REGISTERPART/SL by fragmented PSP_DD_REGISTERPART
- Drop obsolete info types PSP_INFO_NROFNODES, PSP_INFO_RANKID, and
PSP_INFO_TASKSIZE
- Evict obsolete PSP_DD_PROVIDETASKSL and PSP_DD_REGISTERPARTSL types
- Evict obsolete PSP_[CD]D_GETNODES / PSP_[CD]D_NODESRES message types
- Add further test programs for (PMIx_)Spawn
- Adapt spawn_rrcomm to new size of PStask_ID_t
- IWYU fixes (unveiled by new version 0.23)
Version 5.1.63-5:
=================
Bugfixes:
- prevent use after free while growing psid's nodelist
Enhancements:
- Ensure freed list head might not get dereferenced
Version 5.1.63-4:
=================
Bugfixes:
- Ensure PMI_Abort() actually terminates job (#272)
Enhancements:
- Add test for PMI_Abort()
Version 6.0.beta1:
==================
Bugfixes:
- More consistent setup/calculation of message length
- Ensure CMD_BROKE_IO_CON is acutally handled
- Ensure sendMsg() includes error message in PSID_sendCounter()
- Ensure stepforwarder's serialization actually sends
- Use correct type for local node ID
Enhancements:
- Make PStask_ID_t 64 bit wide to lift the limit on PIDs to 31 bits
- Bump protocol versions due to incompatible changes
- Handle all psslurm specific PSP messages through serialization layer
- Add NVIDIA GH100 [GH200 120GB / 480GB] to GPUDevices to psconfig defaults
- Evict extra debug used to dig down #159/jwt:#20747
Additional changes:
- Cleanup all kind of backward compatibility, message types, hooks, obsolete
functions, etc.
- Drop various ancient helper programs
- Send core part of tasks as PSDATA_DATA
- Introduce recvFragMsgInfo()
- Drop ability to build psmom
Version 5.1.63-3:
=================
Bugfixes:
- Ensure buffers of according size are used (jwt:#33960)
Version 5.1.63-2:
=================
Bugfixes:
- Let psslurm set SLURM_NTASKS and SLURM_NPROCS (jwt:#33462)
Version 5.1.63-1:
=================
Bugfixes:
- Prevent reuse of destroyed strv_t (jwt:#33382)
- step credentail ID must be calculated from job host-list (jwt:#33461)
- Prevent reuse of destroyed env_t
- Don't skip fast forwarded index while cgroup memory limits
- scan-build fix: fprintf() does not set errno
Enhancements:
- Catch broken PSP_CD_SPAWNREQUEST messages
- Ensure psserial does not leak stale pointers
- scan-build fix: don't leak FILE resources
- Prevent various false positive scan-build complains on
Version 5.1.63:
===============
Bugfixes:
- Actually use psslurm start time to fake node boot time (#260)
- Ensure jobscript stdout cannot be used directly by SPANK hooks
- Prevent leading comma in nodelist string
Enhancements:
- Initialize SPANK options from environment (JMS/S0/#35)
- Log remote address for dropped messages due to munge failures
- Adapt PSIDHOOK_SHUTDOWN according to (#245)
- Make pluginforwarder's sleep after failed fork flexible (#246)
- Let slurm_spank_log() log to child's stderr instead of syslog (#243)
- Let slurm_error() log to users stderr in addition to syslog (#248)
- Allow to change SPANK log level using SlurmdDebug option
- pspmix: Add spawn infos "pspmix.srunconstraint", "pspmix.nodetype"
- pspmix: Add spawn infos "pspmix.mpiexecopts" and "pspmix.srunopts"
- pspmix: Always create server and session tmpdirs
- Use PSCio_setFDCloExec() to close psidforwarder's file descriptors on exec()
- Inform user if pspelogue is missing in prologue
- Fix memory leaks
Additional changes:
- Introduce reOpenSyslog() to plugincommon
- Introduce PSCio_setFDCloExec()
- Utilize get/addTaskId and get/addNodeId where appropriate
- pspmix: Consequently use v4.2 style info lists
- pspmix: Unify output in pspmixserver.c
- Utilize PMIx_Info_string() in pspmix dmodex request callback
- Evict all code parts only for PMIX_VERSION_MAJOR < 4
- Bump plugin API to 144
Version 5.1.62-2:
=================
Bugfixes:
- Ensure USER is always set for jobs and steps (jwt:#31717)
- Reclaim permissions and switch to correct user before task epilogue (jwt:#32154)
- Ensure a collect script pointer is still valid on shutdown (jwt:#31732)
- Adjust psslurmgetbind to weird slurmctld behavior and bump version to 23.11
Enhancements:
- Support external launchers (LXP:#1709)
Additional changes:
- Drop support for Slurm protocols 20.11 and 21.08
- Remove user managed I/O support dropped in Slurm 22.05
Version 5.1.62-1:
=================
Bugfixes:
- Only use GRes step credentials for step GPU pinning
- Emit warning in the correct if-branch
Version 5.1.62:
===============
Bugfixes:
- Prevent psaccount updates after child is gone (#183)
- Let psaccount restart collect scripts if they exit unexpectedly (#186)
- Retry failed fork attempts in pluginforwarder (#186)
- Fix segfault while parsing slurm.conf include statements (#242)
- Enable re-spawned children to determine parent's job-size in PMIx
- Ensure MESSAGE_TASK_EXIT, MESSAGE_NODE_REGISTRATION_STATUS, and
REQUEST_KILL_JOB contain all relevant info (#226)
- Ensure dependency is added for trigger in PSIDplugin
- Get rid of superfluous environment dump (#202)
Enhancements:
- Add support for Slurm 24.05 protocol
- Call SPANK_TASK_POST_FORK SPANK_TASK_EXIT with root privileges (#180)
- Don't kill TG_PLUGINFW tasks unless killAdmTasks is set (#245)
- Create PMIx server's tmpdir to provide a place for dstors
- Introduce pset SPANK plugin
- Support concurrent re-spawns in pspmix (#230)
- Set MPIEXEC_UNIVERSE_SIZE from mpiexec -u (#225)
- Avoid detour via local psid (#234)
Additional changes:
- Drop support for Slurm protocol 20.02
- Use srun's colon syntax to re-spawn multiple apps in PMIx
- Add yet another shutdown phase to unload plugins gracefully
- Add --enable-psmom option to configure (#223)
- Rename execForwarder() -> execPluginForwarder() (#209)
- Check if task has TASKINFO_FORWARDER attached (#209)
- Omit obsolete PMIX_APP_SIZE_* environment (#222)
- Omit PMI remnant from pspmix (#235)
- Introduce PSIDplugin_finalizeAll()
- Replace StrBuffer_t, etc. by corresponding strbuf_t constructs (#218)
Version 5.1.61:
===============
Bugfixes:
- Ensure SLURMD_TRES_BIND and SLURMD_TRES_FREQ get set (#201)
- Do not exit psiadmin if handleRCfile() fails (#215)
- Cleanup pspmix's temporary directories (#143)
- Fix potential memory leak in strvAdd() (#208)
- Fix psslurm's marking thread usage in partition
- Ensure /etc/pam.d/psid is not installed witout PAM
- Ensure cleanup if starting re-spawn's spawner fails (#199)
- Actually shred BCast credential and username
- Ensure psaccount plugin can be safely unloaded during psid's runtime
- Fix -Wcalloc-transposed-args of gcc 14
Enhancements:
- Add support for Slurm 23.11 protocol
- Introduce full support of PMIx_Spawn()
- psslurmgetbind: syntax checking for -B|--extra-node-info and add --version
- Ranks of service tasks fetched from logger where required (#192)
- Rework psslurm's pinning user messages
- Warn if configless is enabled in slurm.conf but not in psslurm
- Ensure hetjob handling does not depend on Slurm's backward compatibility
- Add plugin config list to pelogue plugin, replacing static sized array
- Detect explicit_bzero() in configure and protect shredding from optimizations
- Align log format of hetsteps to vanilla Slurm
- Do not send TERM_CLIENTS to empty recipients list in pspmix
Additional changes:
- Make [PSI_]sendSpawnReq() public and use it in pspmi[x]
- Remove PSI_sendSpawnMsg() and all its helpers (e.g. handling PSP_CD_SPAWNREQ)
- Introduce PSI_finReservation() and PSP_CD_FINRESERVATION
- Introduce PSIDHOOK_JOBCOMPLETE and PSIDHOOK_FILL_RESFINALIZED
=> bump plugin API to 142
- Add PSP_DD_RESFINALIZED and PSP_DD_JOBCOMPLETE
=> push PSDaemonProtocolVersion to 417
- Rename __PSSLURM_SPAWN_RANK environment to __PSSLURM_STEP_RANK
- Utilize env_t in implementation of psienv and introduce mergePSIEnv()
- Rename putPSIEnv() to addPSIEnv() similar to env_t
- Rework envPut() and rename it to envAdd(); add a new envPut() (#189)
- Split envCat() into envMerge() and envAppend()
- Introduce envEvict() to filter certain variables
- Change environment in PStask_t to env_t
- Evict obsolete injectedEnv member from PStask_t
- Update buildsystem to container versions
- Make logger_t opaque
- Load pspmix handles in psslurm and bump plugin version to 118
Version 5.1.60-2:
=================
Bugfixes:
- Ensure task structs are kept until inheritance is done to prevent
zombie entries in psiadmin
- psslurmgetbind: Set all threads in core map (jwt:#27756)
- Lots of additional fixes to the psslurmgetbind tool
Version 5.1.60-1:
=================
Bugfixes:
- psslurm: respect step core map with cpu-bind=sockets (psc:#449)
- psslurmgetbind: Adjust coremap generation (psc:#446)
- psslurmgetbind: Allow option combinations (psc:#451)
Enhancements:
- Print message if core requested to pin does not match step core map
- psslurmgetbind: Print core map if PSSLURM_PRINT_COREMAPS set
Additional changes:
- psslurmgetbind: Change default Slurm version to 23.02
Version 5.1.60:
===============
Bugfixes:
- Don't stop psaccount collect scripts on temporary error (#186)
- Ensure PSP_DD_CHILDRESREL is sent without delay (#177)
- Ensure -x and -H (-N, -f) work together in mpiexec (#160)
- psiadmin has to expect local answer on plugin (#167)
- Rework mpiexec's environment handling (#182)
- Ensure building works even with missing NUMA support (#12)
- Fix potentail memory leak and unexpected NULL attribute unveiled by Clang
- Fix warnings unveiled by recent Clang versions
- Fix psidsession information when spawning w/out Slurm
Enhancements:
- Add psslurm configuration option SSTAT_USERS (psc:#438)
- Add extra variables to client environment (#188)
* PS_SESSION_ID, PS_JOB_ID, PS_RESERVATION_ID, PS_JOB_RANK, PS_JOB_SIZE
- Implement --pset option for mpiexec (#110)
- Add support for OCI containers to psslurm (via Slurm's oci.conf)
- Add PAM session support for user processes in psslurm
- Prepare for PMIx_Spawn() support -- this is not yet complete
- Close psidforwarder's logger connection on dropped FINALIZE message
Additional changes:
- Get rid of the OpenMPI tweeks in mpiexec
* This is obsoleted for a long time by native Slurm support
- Get rid of starters for ancient MPI versions
- PMIx tests: Add spawn and simple abort tests
- Add script to calculate cpumap from /proc/cpuinfo
- Rename __PMI_SPAWN_SERVICE_RANK to __SPAWNER_SERVICE_RANK
- Introduce PSIDHOOK_EXEC_CLIENT_EXEC and bump plugin API to 141
- Push rrcomm's plugin version to 2
- Make PSID_execFunc() usable outside psid itself
- Publish loadPlugin(), findPlugin() as PSIDplugin_load(), PSIDplugin_find()
- Move spawn definitions into common pluginspawn.[ch]
- Rework psidspawn's testExecutable() and changeToWorkDir()
- Introduce mmapFile() in pluginhelper
- Rework env_t to make it opaque, rework filter functionality, etc.
- Allow stealing of the whole array from strv_t
- IWYU fixes (after update to version 0.22)
- Add Intel Xeon MAX topology w/ and w/out flat mode recorded with v2.2
Version 5.1.59-5:
=================
Bugfixes:
- Ensure reading from cached pack info messages start from the start
Version 5.1.59-4:
=================
Bugfixes:
- Adopt to changed return code in libbpf 1.0.0 and above (jwt:#26118)
- Fix handling of srun's --threads-per-core for pinning (psc:#443)
- Rework psslurm's fillHints() avoids random 'invalid hint' (psc:#440)
Enhancements:
- Introduce PSSLURM_VERBOSE_PINNING to print additional information
- Suppress more error messages if quiet flag is given in bpf_loader
Version 5.1.59-3:
=================
Enhancements:
- Update bpf_cgroup_device for compatibility with Rocky 9
Additional changes:
- Rename bpf_cgroup_device to prevent automatic removal (#176)
Version 5.1.59-2:
=================
Bugfixes:
- Evict false positive warnings from syslog (#178)
Version 5.1.59-1:
=================
Bugfixes:
- Stop hardly blocking build of BPF support in RHEL9, Rocky9, etc.
Version 5.1.59:
===============
Bugfixes:
- Adopt to new error behavior of Slurm 23.02 (jwt:#24558)
- Execute SPANK hooks in batchscript context (deep:#3325)
- Prevent possible DOS in psserial (#172)
- Ensure pluginforwarder sends sufficient data
- Prevent segfault when psaccount gets insufficient answer
- Prevent memory leak when pscommon's logger is re-initialized
- Set correct size for allocation of Req_Signal_Tasks_t
- Fix various memory leaks, double free(), etc. found during development
- Various fixes unvealed by scan-build
Enhancements:
- Add a generic energy readout script (psc:#411)
- Optimize PSI_spawnRsrvtn() (#155)
- Ensure slurmutils is installed with matching Slurm
- Set correct job and step memory limits for SPANK
- RRComm v2 supporting messages across jobs (created by MPI_Comm_spawn())
- Allow psmgmt to be built without SPANK support (#171)
Additional changes:
- Remove the cgroup plugin (obsoleted by jail) (#174)
- Rework PStask_t's handling of extra information (#156)
- Drop support for Slurm 19.05 protocol
- Drop support to spawn to PSProtocolVersion < 341 (before 2019)
- Use PS_DataBuffer for unpacking messages using psserial functions
- Enable the compiler to verify types for more psserial functions
- Introduce PSID_dbg() an replace PSID_log() as much as possible
- Introduce psslurmprototypes
- Remove unused PSI_spawnSingle()
- Various IWYU fixes
Version 5.1.58-1:
=================
Bugfixes:
- sss-nss re-uses closed fd (#159/jwt:#20747)
- prevent segfault from termJail() (jwt:#24361)
- ensure nLocalSlots is initialized (#162/jwt:#24377)
- POSIX does not guarantee the getpwuid_r() et al. sets errno
Enhancements:
- librrcomm needs -fPIC to be linked into libpscom
*************************************************************************
*************************************************************************
** **
** Since the pspelogue executable was moved, the slurmctld prologue **
** has to be adapted accordingly. The new location is libexecdir, **
** i.e. /opt/parastation/libexec/psmgmt/pspelogue **
** **
** psmgmt 5.1.58-0 will require psconfig 5.2.10 or later to work **
** **
*************************************************************************
*************************************************************************
Version 5.1.58:
===============
Bugfixes:
- Pin only to cores allowed by the step in psslurm
- Fix support for spawn in PMI
- Ensure step forwarder task owns its strings
Enhancements:
- Add user-facing error messages to psslurm's pinning
- Handle hint compute_bound in psslurm
- Print coremaps to stderr for debugging if PSSLURM_PRINT_COREMAPS is set
- Make energy reporting managable by srun (meluxina #1242)
- Cache user-facing messages until connection to srun appears
- Add support for libbpf versions 1.0.0 and above (cgroup v2 on Rocky 9)
Additional changes:
- Use share/psconfig/dumps for main psconfig dumps (requires psconfig 5.2.10)
- Add flag termAfterFWmsg to job and step to stop immediately after
delivery of queued messages to the user
- Move psid to sbin directory
- Move various helpers (pspelogue,ps_acc,psaccounter,psilogger) to libexecdir
- Major rework psidsession (adressing tasks via session/job/app/rank)
- Utilize psid (instead of psilogger) to check validity of
installation directory
- Introduce PSIDHOOK_LAST_RESRELEASED, PSIDHOOK_LAST_CHILD_GONE and
bump plugin API version to 140
- Introduce the concept of sister partitions and push daemon protocol
to version 416
- Allow PSIDpart_register() to extend a partition
- Introduce PSID_flog() and PSID_fdbg()
- Introduce PSIDfwd_inForwarder()
- Move clrPartQueue() to PSpart_clrQueue() and make it public
- Rework strv_t behavior
Version 5.1.57-3:
=================
Bugfixes:
- Various fixes for cgroup v2 (jwt:#23342)
* Sanitize calculations of jail limits, especially memory
* Prevent race conditions leading to concurrent modifications of cgroup
Version 5.1.57-2:
=================
Bugfixes:
- Delete allocation before sending MESSAGE_EPILOG_COMPLETE (jwt:#23342)
Version 5.1.57-1:
=================
Bugfixes:
- Set correct state of allocation to ensure proper cleanup of processes
- Set the correct sequence when setting cgroup memory limits
- Ensure correct locking in cgroupv1
- Fix race condition and ensure all SSH processes get jailed into cgroups
- Ensure cgroupv2 controllers get enabled correctly
- Fix quoting and other errors unveiled by shellcheck in jail scripts
- Ensure local variable does not influence top level behavior
- Various checks if data was unavailable from message
- Avoid array boundary violation
Enhancements:
- Print messages from PMIx server log interface
- Improve cleanup of leftover cgroupv2 directories
- Add caller name to jail log functions
- Avoid superfluous mdsave call in cgroup jail script
- Avoid temporary buffers of excessive size
Additional changes:
- Various improvements in the build system (e.g. to support Rocky 9)
- Fix gcc-11 warning emitted via rpmbuild on Rocky 9
- Improvements of documentation comment
*************************************************************************
*************************************************************************
** **
** psmgmt 5.1.57-0 introduces support for cgroup v2. Since enhanced **
** cgroup support was moved into the jail-plugin machinery away from **
** the original cgroup-plugin, loading both plugins was useless but **
** not really harmful. This changes with psmgmt 5.1.57-0 when usage **
** of cgroup v2 support is enabled on the system and within Slurm. **
** In this specific case default settings for the cgroup-plugin will **
** clash with the expectations of the jail-plugin's cgroup machinery **
** and the standard setup of cgroups found on the systems **
** **
** Therefore it is highly recommended to: **
** **
** - properly setup the cgroups machinery of the jail-plugin **
** - disable the use of the old cgroup-plugin **
** **
*************************************************************************
*************************************************************************
Version 5.1.57:
===============
Bugfixes:
- Fix jail memory check (#118)
- Deny access to all devices not allocated by the user via cgroups
- Cleanup leftover cgroupv1 psid directories
- Do not let psslurm init fail if optional Spank plugins are broken
- Do not overwrite jail information if multiple GRes have devices
- Fix inet_ntop()'s second argument
Enhancements:
- Introduce support for cgroup v2
- Set default cgroup version to autodetect
- Add psslurm set options SPANK_LOAD, SPANK_UNLOAD and SPANK_FIN (#pct:428)
- Introduce support for port ranges in SlurmctldPort in psslurm
- Send SIGTERM only in the first round of killing cgroup processes
- psslurm takes hwloc info as default if SKIP_CORE_VERIFICATION is given
- Avoid lapsed constant PMIX_ERR_NOT_IMPLEMENTED in PMIx 4
- Mark processes started with PSID_execFunc() as non daemon
Additional changes:
- Add support for changing IP addresses delivered in psmgmt-dynip
- Introduce PSC_traverseHostInfo() to avoid getaddrinfo()'s boylerplate code
- Introduce PSC_isDaemon() and rework PSC_setDaemonFlag()
- Add adjusted psconfig dumps for base and defaults configuration
- Refactor and streamline RDP
Version 5.1.56-2:
=================
Bugfixes:
- Ensure cached PSP_PACK_INFO info is handled (#131)
- Execute SPANK prologue/epilogue hooks without the presence of
a prologue/epilogue script
- Let doGetMsgBuf() (behind PSP_getMsgBuf() et al.) warn even if header.len is 0
Enhancements:
- Ensure cores is not used in gres.conf
- Include UID in PMIx' temp session dir name to help with node sharing
- Add sentence on ParaStation ID -1 to psiadmin's inline help
Version 5.1.56-1:
=================
Bugfixes:
- Set correct localNodeId for jobs on sister nodes (psc:#427)
- Prevent spank_api.so from being dlopen()ed twice
- Fix RPM's postun scriptlet
Enhancements:
- Add debugging output in psslurm's findGresCred()
Version 5.1.56:
===============
Bugfixes:
- Prevent segfault if resInfo is broken (jwt:#21216)
- Don't let exiting shell spoiling output for steps with pty (jwt:#21114)
- Fix quoting prevents device jail script to generate unexpected files (dt:3151)
- Set correct cgroup configuration default if no cgroup.conf is present
- Ensure configuration updates will remove obsolete entries
- psslurm: Move SLURM_TRES_* setting place
- Fix one improbable segfault unveiled by gcc 13.1
Enhancements:
- pspmix: Use PSPMIX_ENV_TMOUT to steer environment timeout
- Add jail configuration option JAIL_INIT_SCRIPT and the corresponding script
- Save main psid PID to /run for jail scripts and pshc
- Let pluginforwarder jail pspmix server processes
- psslurm: Rework GPU pinning
- Avoid silent fails in GPU pinning
- Introduce spank_prepend_task_argv() expected for Slurm 23.11
- Let RPMs depend on PMIx minor but release version
Additional changes:
- pspmix: Print input to server_grp_cb()
Version 5.1.55-2:
=================
Bugfixes:
- Prevent free() on uninitialized pointer (jwt:#20926)
Version 5.1.55-1:
=================
Bugfixes:
- Ensure SPANK calls (esp. TASK_EXIT) are not called for service procs
Enhancements:
- Extra analysis checks for jwt:#20747
Version 5.1.55:
===============
Bugfixes:
- Ensure the correct order of SPANK hook calls
- Add SPANK option handling at all necessary places
- Actual use optional keyword of plugstack.conf
Enhancements:
- Add support for Slurm 23.02 protocol
- Add support for include statement in slurm.conf
- Set all Slurm configuration files to case-insensitive
- Allow SPANK hooks spank_init, spank_init_post_opt and spank_user_init
to modify the environment of corresponding user processes
- Rework and improve jail scripts
- Introduce PSIDnodes_lookupHostname() to internally resolve hostnames;
this utilizes <Psid.NetworkName>.Hostname in psconfig's node objects
- Add support for S_TASK_ARGV (might be in future Slurm vers) to psslurmspank
Additional changes:
- Add psslurm option SKIP_CORE_VERIFICATION as preparation for AWS pClusters
- Add ReFrame testsuite and first PMIx test
- Make pluginconfig's Config_t opaque
Version 5.1.54-3:
=================
Bugfixes:
- Ensure downward messages are not handled as upward (jwt:#19720)
Enhancements:
- Support srun option --threads-per-core in psslurmgetbind
- Reject mututally exclusive srun options in psslurmgetbind
Version 5.1.54-2:
=================
Bugfixes:
- slurm.conf might hold absolute or relative path to spank configuration
Version 5.1.54-1:
=================
Bugfixes:
- Fix possible segfault when parsing GRes usage in psslurm
- Update srun options for user triggered spawn in psslurm
Version 5.1.54:
===============
Bugfixes:
- Fix possible segfault if psslurm verifies an allocation (#96)
- Fix handling of hint nomultithread in pspmix (#104)
- Don't destroy steps in Job_delete()
- Ensure all delayed tasks get removed when allocation is gone
- Prevent premature exit of psilogger on short jobs
- Don't load psslurm if jail function handles are not avaiable
- Ensure all data gets distribute during PMIx fence operation
Enhancements:
- Add cgroup support to psslurm (might limit various system resources)
- Major rework of jail plugin and scripts to allow node sharing
- Introduce automatic PMIx process sets to pspmix when acting as v4 server
- More scalable, tree protocol based PMIx fence operation in pspmix
- Introduce psslurm configuration option SLURM_CONFIG_DIR
- Rename psslurm configuration option SLURM_CONF_DIR to SLURM_CONF_CACHE
- Warn about missing multi gpu per tasks support in psslurm (mlx#842)
- Verify sender, dest and user IDs in pspmix communication
- Add new psslurm configuration option DENIED_USERS
- Speedup completing phase of steps which failed to start
- Introduce __PSI_LOGGER_IO_FILE to detour psidlogger's outputs
Additional changes:
- Show senders TID for failed psaccount update messages
- Give some indication why psidforwarder's connection to logger gets lost
- Move jail scripts to separate folders
- Use pscompress and psexpand in jail scripts
- Switch psid's Timer facility to POSIX per-process timer
- Rename Step_clearByJobid() to Step_destroyByJobid()
- Rename traverseHostList() to traverseCompList()
- Add functionality to find entry in vector
- Print pmix version at the end of configure
- Avoid %exclude statement in spec file for each .la (#100)
Version 5.1.53-1:
=================
Bugfixes:
- Avoid double/wrong free with PMIx_Cpuset_destruct() in pspmix
Version 5.1.53:
===============
Bugfixes:
- Fix filter logic bug unvealed by Clang
- Fix syntax error not complained by gcc-11 or later
- Don't mix loop variables
- Fix memory leak in pspmix
- Use PMIx_Cpuset_destruct() in current version in pspmix
- Add validity checks for array lengths in pspmix messages
- Sanitize file descriptor handling in pspmi
- Call GC for cbInfoPool only when necessary
- Fix print_array_if macro in gdbinit definitions
Enhancements:
- Set PMIX_LOCALITY_STRING using hwloc in pspmix
- Make logger print prefix look nicer and sort compatible
- Introduce rrcomm plugin and user-space library for Rank Router Communication
- Introduce mpiexec's --fullpartition (-P) option
- Use local reservation info for pinning (obsoletes PSP_DD_SPAWNLOC msgs)
- Support more fence info types used by openpmix
- Remove unnecessary initializations and reduce noise during pspmix's operation
- Add minRank/maxRank to PSIDsession's PSresinfo_t to speedup search)
- Add memory cleanup functionality to PSIDpart module
- Introduce delayPSPMsg plugin (delays specific message types for debugging)
Additional changes:
- Introduce PSCio_recvBufB() for blocking receive
- Introduce tryRecvFragMsg() in psserial
- Introduce PSIDHOOK_SPAWN_TASK and bump plugin API version to 137
- Move pspmix server start to new hook HOOK_SPAWN_TASK
- Add hook PSIDHOOK_FRWRD_SETUP and bump plugin API version to 138
- Enhance PSCio_setFDblock() to report old setting on return
- Send local reservation infos to nodes (push PSDaemonProtocolVersion to 415)
- Use common definitions of crucial environment names
- Introduce Timer_restart()
- Introduce PSP_resolveType() and PSDaemonP_resolveType()
- Remove size limitation from fragmented messages
- Do not prevent sending 0-length fragmented messages
- Count number of selectors and make it available via Selector_getNum()
- Let psidforwarder rely on reported number of Selectors to decide on exit
- Store size in PSIDmsgbuf_t explicitly
- Make psserial's fragment types public
- Refactor PSIDsession to make use of PSitems
- Rework selector
- Remove obsolete remnants of psicomm (early sketches of rank-routing idea)
- Improve lots of documentation comments
Version 5.1.52-5:
=================
Bugfixes:
- prevent segfault due to late REQUEST_LAUNCH_TASKS message
Version 5.1.52-4:
=================
Bugfixes:
- Prevent Selector deadlock (don't awaitWrite() on disabled Selector)
Version 5.1.52-3:
=================
Bugfixes:
- Ensure psslurm is fully initialized if config-less mode is combined
with Slurm healthcheck
- Allow to call psslurm's cleanup() even if not fully initialized
Version 5.1.52-2:
=================
Enhancements:
- Allow higher pmix release versions in pspmix RPM requirements
Version 5.1.52-1:
=================
Bugfixes:
- Adjust psslurmgetbind to Slurm's interpretation of -B option (#89)
- Fix hint nomultithread together with 'exact'
Version 5.1.52:
===============
Bugfixes:
- Allow directory as destination for srun's --bcast option (pct:#404)
- Use configured nodename for psmix's namespace procmap (#82)
Enhancements:
- Add support for Slurm 22.05
- Add IPMI support to psslurm's energy monitoring
- psslurmgetbind: Add --exact option (implied by -c for Slurm >= 22.05)
- Try to autodetect Slurm version in psslurmgetbind if possible
- Show Slurm protocol version in config request log message
Additional changes:
- pspmix RPM require pmix version used to build
Version 5.1.51-2:
=================
Bugfixes:
- pspmix: Create namespace with only info arrays (works around OpenPMIx #2791)
- psslurm: Do not filter PMIX_MCA_* variables from user env
- pspmix: Fix memory leak
Version 5.1.51-1:
=================
Bugfixes:
- Fix round counter for pluginforwarder children
* This is used by pelogue to determine if to run as root
*************************************************************************
*************************************************************************
** **
** psmgmt 5.1.51-0 renames psconfig parameters (i.e.: **
** RdpStatusTimeout => StatusTimeout **
** RdpStatusDeadLimit => DeadLimit **
** RdpStatusBroadcasts => StatusBroadcasts **
** This must be reflected in the psconfig database. To adapt the **
** defaults:psid object accordingly, the script **
** update_defaults_5.1.51.sh deployed in **
** /opt/parastation/share/doc/psmgmt/psconfig **
** must be run with corresponding rights on the system hosting the **
** psconfig database. If your configuration contains custom setting **
** of either parameter listed above, those have to be adapted, too. **
** Starting with version 5.1.51 the ParaStation daemon will refuse **
** kick off if one of the now obsolete psconfig parameters is found **
** **
*************************************************************************
*************************************************************************
Version 5.1.51:
===============
Bugfixes:
- Prevent writing behind end of buffer in spawner (#84)
- Reset FPE exceptions on spawn and unload if ENABLE_FPE_EXCEPTION is enabled
- Allow to run PMIx jobs as root
- Prevent possible segfault in psgw at unload when it was uninitialized
- Various potential bugs unveiled by scan-build
- Fix format errors unveiled by cppcheck
- Ensure array size for gethostname() is sufficient
- Ensure debug loops have no bad side effects
Enhancements:
- Let MAP_LDOM pinning use physical domain numbers
- Support topology.conf and set SLURM_TOPOLOGY_* in rank environments
- Extend psslurm's accounting towards energy, interconnect and I/O
* Set their poll intervals from slurm.conf
* Add psaccount configuration option MONITOR_SCRIPT_PATH
* Introduce psaccount debug masks PSACC_LOG_FILESYS and PSACC_LOG_INTERCON
- Unload psslurm if Slurm configuration parsing in configless mode fails
- Enhance PMIx 4 support
- Add PMIx singleton support to pspmix
- Add default session ID explicitly (might be required by openpmix 4.1.2rc1)
Additional changes:
- Rename (misleading) psconfig parameters to match psiadmin
- Make use of PSC_concat() less error-prone (remove need for trailing 0L)
- Make flog() and and fdbg() accessible to other plugins
- Introduce FW_CHILD_INFINITE to restart plugin forwarder's child endlessly
- Also call hookFWInitUser if user is root
- Add const qualifier to traverseHostList()
- Add some test programs for PMI
Version 5.1.50-5:
=================
Bugfixes:
- Do not pass node info array in pspmix with PMIx < 4 (jwt:#16581)
- Fix handling of PSPMIX_CLIENT_INIT/FINALIZE[_RES] message type
Enhancements:
- Integrate pspmix' info arrays output into debug logging
Version 5.1.50-4:
=================
Bugfixes:
- Rework request handling for Slurm messages messed up in 5.1.50-3
Version 5.1.50-3:
=================
Bugfixes:
- Ensure ptid is interpreted correctly if node goes down (jwt:#16170)
- Fix re-sendind requests originally sent via sendSlurmctldReq() (jwt:#16170)
Version 5.1.50-2:
=================
Bugfixes:
- Ensure reqKeys.strings gets initialized (jwt:#16557)
Version 5.1.50-1:
=================
Bugfixes:
- Enforce PSP_DD_DAEMONCONNECT to be the first message delivered via RDP
Enhancements:
- Introduce PSSLURM_FAKE_UPTIME to help supporting Slurms power saving feature
- Add example scripts for Slurm power saving feature
Additional changes:
- Introduce RDP_getState() and RDP_getNumPend()
Version 5.1.50:
===============
Bugfixes:
- Reorganize psslurm's con->info handling (#77)
- Let MASK_LDOM pinning use physical domain numbers (#72)
- Release delayed spawn requests in the step callback (meluxina:#219)
- Ensure an allocation is defined for spawn request (meluxina:#219).
- Do not crash psid when SLURM_OVERCOMMIT is not set
- Ensure psslurm's help will not crash psid if no key is given
- Avoid use-after-free for config.psiddomain
- Avoid memory leak upon connection reset in psslurm
- Fix memory leak in pspmix' fence data handling
- Substantial hardening of pspmix
- Retry sending Slurm message even if nothing is sent yet
- Verify an allocation if no job/step is started
- Ensure openSlurmctldConEx() returns an error
Enhancements:
- Add basic support for PMIx 4.0 and bump pspmix plugin version to 2
- Continue pinning on full nodes when overcommit is set in psslurm
- Use psid domain as cluster ID in pspmix
- Introduce PSC_getwd()
- Introduce addStringArrayToMsg()
Additional changes:
- Adapt strv_t in analogy to env_t and add compatibility to getStringArrayM()
- Replace add/getEnviron() by addStringArrayToMsg() and getStringArrayM()
- Remove obsolete info parameter from sendSlurmMsg()
Version 5.1.49-5:
=================
Bugfixes:
- Fix user ID of response message for a BCast RPC (pct:#405)
Version 5.1.49-4:
=================
Bugfixes:
- Don't delete the allocation before all terminate messages are send
Version 5.1.49-3:
=================
Bugfixes:
- Delete allocation if the slurmctld prologue fails (jwt:#14755)
- Also delete lingering allocation if result cannot be sent to pspelogue
Enhancements:
- Add psslurm kvs set commands DEL_ALLOC, DEL_JOB, DEL_STEP
- Add hook PSIDHOOK_PELOGUE_DROP and bump plugin API version to 136
Version 5.1.49-2:
=================
Bugfixes:
- Fix possible memory leak if slurmctld misses to send a reply to request
- Prevent segfault when cleaning up clients in PMIx server
- Fix memory corruption if number of requested GRes is too large
- Fix memory leaks unveiled by valgrind
- Fix warnings unveiled by cppcheck
Enhancements:
- Add the ability to query psslurm's active Slurm connections via psiadmin
- Allow PMI client to connect multiple times (via TCP)
- Overwrite default PMI connection method by PMI_ENABLE_TCP environment
Additional changes:
- Raise preference of CONNECTED above RELEASED for PMI(x) status
Version 5.1.49-1:
=================
Messed up the tags, thus, need a new version
Version 5.1.49:
===============
Bugfixes:
- Avoid reading beyond end of buffer
- Ensure SIGPIPE is not ignored if started via systemd
- Prevent PSsignal_get() from interupt by RDP timeouts, too
Enhancements:
- Major rework of pspmix to a concept of one server per user
- Add function to deregister client from PMIx server library
- Unload psslurm if Slurm's healthcheck fails
- Introduce PStask_destroy() and rework PSIDtask_clearMem() to use it
- Plugin's help directive now takes an argument
- Rework update caching in providerloop.c
Additional changes:
- Store reservations in jobs as sets grouped by spawner
- Fix description of DEBUG_MASK in pspmix.conf
- Fix various gcc-12 warnings
- Split psidsession.[ch] from psidspawn.[ch]
- Introduce PSC_getVersionStr() and make use of it at various places
- Make PSID_findJobInSession() public
- Rename (original) PSjob_t to PSsession_t and PSresset_t to PSjob_t
- Rename PSI_sendSpawnReq() to sendSpawnReq() and make it private
- Consolidate env.[ch] into psenv.[cv]
- Centralize fixList() into list_fix() in list.h
- Add print_array_if macro to gdbinit definitions
Version 5.1.48-6:
=================
Bugfixes:
- Attach Slurm message hash to munge credential (psc:#402)
Enhancements:
- Allow to encode payload in psMungeEncodeRes()
- Improve debug log (unify resID print, minor fixes)
*************************************************************************
*************************************************************************
** **
** psmgmt 5.1.48-5 adapts psslurm to critical bugfixes in Slurm **
** (CVE-2022-29500, CVE-2022-29501, CVE-2022-29502). Therefore **
** psslurm in this and further versions is only compatible with **
** Slurm versions 20.11.9 and 21.08.8 and beyond **
** **
*************************************************************************
*************************************************************************
Version 5.1.48-5:
=================
Bugfixes:
- Allow only specific users to decode munge messages send by psslurm
- Cleanup possibly sensitive information when spawning step- or job-forwarder
Version 5.1.48-4:
=================
Bugfixes:
- Set correct exit status for steps hitting the walltime limit
- Ensure symbol is found after after second %, too
Enhancements:
- Add `--exact` option to srun when spawning additional processes
Version 5.1.48-3:
=================
Bugfixes: