Releases: ClusterLabs/striker
Version 2.0.0 beta - Release Candidate 14
This release is fairly minor and covers the following;
Bug fixes;
- Added missing package dependencies for EL6.9 ISO builds.
Enhancements;
- CentOS 6.9 ISOs now recognized by
anvil-generate-iso
. - Added a check to
scan-ipmitool
that determines when a sensor vanishes in one scan and returns in the next and suppresses false alarms. - Updated alerts generated by ScanCore to set the 'Reply-To' field with a list of other alert recipients. This was done to simplify replying to an alert to inform other recipients on the nature of the alert.
Version 2.0.0 beta - Release Candidate 13
This is a bug-fix release.
- Fixed a major bug with install manifests where an install would fail after setting passwords.
- Fixed several bugs in scan-storcli and scan-ipmitool that showed up with some configurations.
- Fixed a bug in striker-usb-insert when checks if nfs was running was done incorrectly.
- Several spelling mistakes were fixed and unused strings and templates were removed.
- Several variable logging bugs fixed.
Version 2.0.0 beta - Release Candidate 12
This release is a tail to RC11. It fixes an issue reported by a user where install manifests could not be deleted. It also fixes a bug where CGI-invoked scripts were stuck at log level 3.
Version 2.0.0 beta - Release Candidate 11
This release includes several bug fixes. Upgrading is strongly recommended.
Bug fixes of note;
- Updated striker-update to upgrade Anvil! systems as old as v2.0 TC1. Includes bug fixes. Now updates the ScanCore DB SQL schema and Alteeve EL6 repos.
- Added missing database resync functions to scan-remote-access, scan-ping-target and scan-server-resources.
- Fixed a bug where selecting to optimize for "Windows 2016" servers would fail.
- Re-enabled spice graphics for Win7 and Win2008 servers.
- ScanCore will now manually check if a pending alert's requested title and message keys exist in the language files and, if not, remove it and send a special alert. This prevents a problem where, if a bad key is specified, ScanCore errors out, effectively disabling further alerts.
New features;
- A new scan agent was added for APC/Schneider brand switched PDUs.
Version 2.0.0 beta - Release Candidate 10
This fixes a compile-time issue with 'striker-installer'.
Version 2.0.0 beta - Release Candidate 9
The "Digimer clearly doesn't understand the concept of release candidates" release...
Notable Bug Fixes;
- Fixed a problem where very long node names (with long domain names) would be truncated by clustat causing match problems.
- Fixed a bug where Remote.pm->_create_rsync_wrapper() was logging passwords at log-level 3.
- Fixed a bug in a few scan agents that was preventing their '--help' switch from working.
- Fixed a bug in Install Manifest where, if updating the OS was disable in the manifest's XML, the nodes' striker repos would not have their 'priority=1' constraint removed.
New features:
- Striker dashboards can now send email alerts. See striker.conf for details on how to use this.
- Created the new 'striker-update' tool that allows for OS and Striker/ScanCore updates to active dashboards and Anvil! systems. How far back this will work is to be seen (The ScanCore database isn't updated, so version old enough to have incompatible schemas will break unless the ScanCore database is dropped and recreated). This program will evolve over time as we determine how old of installs people try to upgrade.
- Added three new scan agents, all related to monitoring servers;
** scan-server-resources; This agent can talk to an optional daemon that runs on a target windows (or anything that runs python) server. The daemon reports CPU, RAM, Swap and disk details in response a query from the scan agent and will trigger alerts on low RAM or disk space.
** scan-remote-access; This agent works against any target system that runs an SSH server. It simply attempts to log into the target, echo '1' and exits. If a servers stops responding, it sends an alert. It will send a follow-up alert when the server is again accessible.
** scan-ping-target; This agent pings an address and throws an alert if it stops responding (or comes back). - The striker-installer switch '--no-os-update' was linked to 'no-os-updates' to avoid an accidental OS update on typo.
- The scan-server agent will now detect two bad server states; First, if it is defined and in the Anvil!, it will be undefined. Second, if a server enters a 'paused' state, it will try to resume it. If that fails, it will force-off the server (and the cluster will boot it back up shortly after).
Lets see if this really is the end of new things...
Version 2.0.0 beta - Release Candidate 8
Fixed issues;
- Fixed a problem where, on very slow systems, adding a new server could send the command to enable the new server in rgmanager before it appeared in clustat.
- Fixed an issue with thermal load shedding logic that was preventing load shedding from working at all.
- Fixed an issue with
anvil-kick-apc-ups
where the UPSes wouldn't kick properly and Striker couldn't cancel or alter a shutdown timer.
Version 2.0.0 beta - Release Candidate 7
This release candidate fixes a lot of bugs... At this time, no known bugs remain, but a decent amount of testing is required and bugs are still expected to be found given how much changed leading up to this release.
-=] Highlights;
Features (new management tools that were deemed important enough to add in the RC cycle);
- New password changing tool created,
striker-change-password
, which can change a local Striker dashboard or a target Anvil! system's passwords (a previously arduous task). - New command line monitoring tool,
anvil-cli-state
, which uses data from the newanvil-report-state
tool, makes it a lot easier to do pre-production validation testing. - Updated the Anvil!'s resource manager to use the 'no_kill' option so that a server that is shutting down won't be killed if it takes too long (it will enter a 'failed' state instead).
- Updated scan-clustat to shift which node the fence delay favours so that the node hosting all the running servers get favoured in a comms-split induced fence call.
- Fixed a problem where scan-bond would hit a DB error if an interface was ifdown'ed because the bonding driver would stop reporting the primary slave or primary reselect policy.
- Reduced the default 'scancore::health::migration_delay' to 120 seconds (two minutes).
- Pushed the bond updelay back up to 120 seconds (hit slower switches in the wild, 2 minutes is safer).
- Changed
anvil-run-jobs
to act like a daemon, significantly improving how long it takes to run jobs. - Created a new method for picking which node should be shutdown during emergency load shedding or a cold-stop operation. Overhauled how Cold-Stop works to be more reliable.
- Reworked how archiving tables in ScanCore works. Now there is a generic method for handling this task, and if the number of records to archive is too large, the archive process is broken into two or more smaller chunks. This ensures the process doesn't appear to hang, doesn't use too much RAM and doesn't hit the disk too hard.
- Removed support for automatic power transfer switches (APTSes) from Install Manifest. They're cause more problems than they solve and are no longer recommended or supported.
- Updated ScanCore to check it's own md5sum and exit if the sum of the copy on disk changed (indicating it was updated).
- Removed the requirement for an IFN gateway address in
striker-installer
and Install Manifest to better support air-gapped Anvil! systems.
Bug fixes;
- Fixed a package dependency problem when building RHEL-based Anvil! install media.
- Fixed a bug where the new way of enabling/disabling
anvil-safe-start
wasn't being checked. - Fixed a bug where provisioning a server with less that 1 GiB of disk space failed because the requested size was rounded to '0'. Now specific MiB units are used, as they should have been.
- Fixed a bug where checking access to a node that rebooted could fail if the remote access file handle went stale.
- Updated how Striker dashboards are installed. The ISO used to build it is no longer mounted on subsequent boots. Instead, it's data is used to populate the http:////x86_64/img directory, and then a new repo is generated. This fixes a bug where totally offline installs would fail when they tried to install third-party RPMs.
- Fixed a but where very long host names on nodes would cause clustat to truncate the node names, breaking the linkage between where services were running and those service's details.
- Fixed a bug where servers that were stopped by a user would have that information cleared by scan-server later, causing the servers to boot when
anvil-safe-start
later ran.
Version 2.0.0 beta - Release Candidate 6
There will be another RC after this one.
Fixes and improvements;
- Fixed a problem with using --rhn in striker-installer when the dashboard is already registered.
- Added support for Striker dashboards using NVMe storage devices.
- Updated anvil-safe-start to use crontab's '@reboot' syntax, removing the need for the convoluted rc3.d symlink process. This also improves the performance of anvil-safe-start.
- Fixed a problem where some required RPMs were not being added to the generated Anvil! install ISO.
- Possibly fixed a sporadic bug where 'Cold Stop' would simply exit without an error message mid-run.
- Added support for D-Link branded USB3 to Gbit Ethernet adaptors.
Version 2.0.0 beta - Release Candidate 5
This is primarily a bug-fix release.
- Fixed a bug in deleting Anvil! systems when the nodes have cache data.
- Fixed a bug where deleting an Anvil! when a Striker dashboard is offline causes the Anvil! to be restored when the offline Striker returns.
- Fixed the handling of virtual machine manager so that deleted/changed nodes are deleted when an Anvil! is deleted.
- Fixed a bug where trying to connect to a remote Anvil! behind a single public IP using different ports was failing.
Also changed the generated kickstart files to install most needed packages in stage-1.
There are outstanding bugs, so there will be another release later this week.