pve-ha-manager (5.2.0) trixie; urgency=medium

  * add dynamic load scheduler that uses current resource usage stats (CPU
    load, memory) from pvestatd for scheduling decisions, complementing the
    existing static scheduler that only considers maximum configured capacity.

  * implement automatic rebalancing for HA resources. When enabled, the CRM
    monitors cluster node imbalance and automatically migrates services to
    restore balance. Resources in strict positive affinity rules are treated
    as bundles and moved together. Rebalancing is configurable through the new
    datacenter config options for imbalance threshold, hold duration, and
    improvement margin.

  * fix api: rules: allow updating a rule without setting the resources
    parameter, which previously failed with an assertion error due to
    validating an undefined value.

 -- Proxmox Support Team <support@proxmox.com>  Thu, 02 Apr 2026 17:58:13 +0200

pve-ha-manager (5.1.3) trixie; urgency=medium

  * fix #2751: add disarm-ha and arm-ha commands for safe cluster-wide
    maintenance, allowing the admin to temporarily disable automatic fencing
    and recovery. Two resource modes are available: 'freeze' locks all
    services in place, 'ignore' suspends HA tracking so services can be
    managed manually. All HA service watchdogs are released when fully
    disarmed, the underlying watchdog-mux must still keep the /dev/watchdog
    device open as not all watchdog types support a graceful deactivate.

  * api: status: add fencing entry showing the HA stack's armed state and CRM
    watchdog status.

  * scheduler: fix type confusion regression for container resource stats that
    could cause adding a container with explicit 'cores' to fail.

 -- Proxmox Support Team <support@proxmox.com>  Fri, 27 Mar 2026 00:10:12 +0100

pve-ha-manager (5.1.2) trixie; urgency=medium

  * fix #7399: api: rules: reject empty nodes and resources properties, which
    could result in invalid rule sections being written to the config.

  * crm: become active to clean up stale service status entries that persist
    when services are removed from the HA config while the CRM is stopped,
    which previously showed as stuck in "deleting" state.

  * resource placement scheduler: include running non-HA resources in load
    accounting for more accurate placement decisions.

  * resources: expand 'max_restart' option description to clarify behavior
    relative to 'max_relocate'.

  * watchdog-mux: improve logging by tracking the client PID and including it
    in all client-specific messages, and log on new client connect, graceful
    disconnect, and when all watchdog-mux connections are closed (underlying
    /dev/watchdog still needs to be active).

  * clear lock status cache on lock release to avoid spurious "lost lock"
    error when a lock is released and re-acquired within the same process
    lifetime. This was harmless but might have caused confusion.

  * various fixes for typos and misleading error messages across the tree.

 -- Proxmox Support Team <support@proxmox.com>  Sun, 22 Mar 2026 17:14:34 +0100

pve-ha-manager (5.1.1) trixie; urgency=medium

  * assert ha resource has correct guest type as otherwise a HA resource with
    the wrong guest type can be added to the HA resources configuration, which
    will make its methods (start, shutdown, migrate, ...) fail.

  * fix #7133: manager: skip update on group to affinity rules migration for
    resources that have no group configured anyway.

  * manager: group migration: bulk update changes to resource config to reduce
    the total time required for a migration.

  * ensure node is known to cluster config before committing a new node
    status. While we pruned such unknown entries, that was only done after an
    hour and thus showed any wrongly spelled node until that happened in the
    status.

 -- Proxmox Support Team <support@proxmox.com>  Thu, 19 Feb 2026 18:15:40 +0100

pve-ha-manager (5.1.0) trixie; urgency=medium

  * fix #6801: fix an issue where resources with positive affinity could
    accidentally be migrated back to the source node after a migration. This
    could happen in case one of the resources took more than 10 seconds longer
    to be migrated than the other(s).

 -- Proxmox Support Team <support@proxmox.com>  Fri, 12 Dec 2025 14:05:59 +0100

pve-ha-manager (5.0.8) trixie; urgency=medium

  * api: resources: fix possible warning of uninitialized value when checking
    a resource's state.

  * various improvements for the HA simulator, among other things:
    - remove hidden service default groups.
    - ensure all resources have a valid static resource usage state from the
      start.
    - clean up static service stats on service deletion.
    - avoid overly verbose prints when changing a resource's usage stats.

 -- Proxmox Support Team <support@proxmox.com>  Mon, 17 Nov 2025 22:37:03 +0100

pve-ha-manager (5.0.7) trixie; urgency=medium

  * compile ha rules to a more efficient representation, dramatically reducing
    the time needed to evaluate all rules every round.

  * affinity rules: do not add ignored resources as dependent resources.

  * api: status: sync active service counting with lrm's helper.

  * fix #6613: update rules containing a resource that is currently being
    removed from being managed by the HA stack.

  * api: add purge parameter for HA resource removal.

 -- Proxmox Support Team <support@proxmox.com>  Fri, 14 Nov 2025 20:47:44 +0100

pve-ha-manager (5.0.6) trixie; urgency=medium

  * simulator: integrate basic support for simulating the cluster resource
    scheduler.

  * manager: make online node usage computation granular and do not rebuild it
    from scratch on every round. This greatly improves the performance when
    managing many HA resources, making scheduling up to ten thousand resources
    at the same time feasible.

 -- Proxmox Support Team <support@proxmox.com>  Fri, 14 Nov 2025 13:34:31 +0100

pve-ha-manager (5.0.5) trixie; urgency=medium

  * manager: support older versioning schema for ha group migration.

  * fix #6839: move PVE::Notify usage to "real" Env to avoid error in HA
    simulator on non-PVE systems.

  * fix #6881: api: relocate resource: fix check to avoid Perl warning.

  * manager: fix precedence in mixed resource affinity rules usage that could
    lead to a resource not getting migrated while it could be.

 -- Proxmox Support Team <support@proxmox.com>  Fri, 03 Oct 2025 22:51:47 +0200

pve-ha-manager (5.0.4) trixie; urgency=medium

  * api: rules: cope if disable parameter is explicitly set to a falsy value.

  * rules: make positive affinity resources migrate on single resource fail.

  * rules: allow same resources in node and resource affinity rules but
    restrict inter-plugin resource references to simple cases:
    - the resources of a resource affinity rule are not part of any node
      affinity rule, which has multiple priority groups. This is because of
      the dynamic nature of priority groups.
    - the resources of a positive resource affinity rule are part of at most
      one node affinity rule, but no more. Otherwise, it is not easily
      decidable (yet) what the common node restrictions are.
    - the positive resource affinity rules, which have at least one resource
      which is part of one node affinity rule, make all the resources part of
      the node affinity rule.
    - the resources of a negative resource affinity rule are not restricted
      by their node affinity rules in such a way that these do not have enough
      nodes to be separated on.

 -- Proxmox Support Team <support@proxmox.com>  Fri, 01 Aug 2025 19:34:52 +0200

pve-ha-manager (5.0.3) trixie; urgency=medium

  * rules: add global checks between node and resource affinity rules, being
    currently overly strict in what is allowed and what not, e.g. combinations
    of any resources-to-nodes and resources-to-resources rules are not
    allowed, this safety constraint will be loosened in a future version to
    allow simple cases that are always straight forward to resolve.

  * manager: apply resource affinity rules when selecting service nodes.

  * api: resources: add check for resource affinity in resource migrations.

 -- Proxmox Support Team <support@proxmox.com>  Thu, 31 Jul 2025 11:24:29 +0200

pve-ha-manager (5.0.2) trixie; urgency=medium

  * watchdog-mux: restore if guard for watchdog updates to avoid an recent
    regression causing an extra round of delay when fencing due to client
    timeout.

  * introduce HA affinity rules, which will replace the HA groups and provide
    the groundwork for future service-to-service affinity rules.

  * manager: migrate ha groups to node affinity rules in-memory and write out
    the new config once all nodes have a recent enough pve-manager version
    installed, which depends on the new ha-manager version.

  * api: groups: disallow calls to HA groups endpoints if already fully
    migrated to affinity rules.

 -- Proxmox Support Team <support@proxmox.com>  Thu, 31 Jul 2025 09:01:39 +0200

pve-ha-manager (5.0.1) trixie; urgency=medium

  * LRM: count incoming migrations towards a nodes active resources, ensuring
    an idle LRM will get active on incoming migrations.

  * watchdog-mux: log a warning when the watchdog counter is 10s before
    expiration and sync journal in that case.

  * watchdog-mux: break out of loop when updates are disabled to ensure we hit
    the code path where it will trigger another systemd-journal sync.

 -- Proxmox Support Team <support@proxmox.com>  Thu, 17 Jul 2025 02:22:44 +0200

pve-ha-manager (5.0.0) trixie; urgency=medium

  * re-build for Debian 13 Trixie based Proxmox VE 9 releases.

 -- Proxmox Support Team <support@proxmox.com>  Tue, 17 Jun 2025 15:50:27 +0200

pve-ha-manager (4.0.7) bookworm; urgency=medium

  * notifications: overhaul fence notification templates.

 -- Proxmox Support Team <support@proxmox.com>  Mon, 07 Apr 2025 23:24:19 +0200

pve-ha-manager (4.0.6) bookworm; urgency=medium

  * tools: group verbose description: explicitly state that higher number
    means a higher priority

  * fix #5243: make CRM go idle after ~15 min of no service being configured.
    This is mostly cosmetic as the CRM never needed to trigger self-fencing on
    quorum-loss anyway, as all state under the CRM's control is managed by the
    pmxcfs, which is already protected by quorum and cluster synchronisation.

  * crm: get active if there are pending CRM commands and it seems that no CRM
    is already active. This ensures any CRM command, like disabling the
    node-maintenance mode is properly processed. Currently this is mostly
    cosmetic though, but not unimportant as we show the maintenance state in
    the UI since not that long ago.

  * crm: get active if there are nodes that probably need to leave maintenance
    mode. This is very similar to the point above but is less likely to
    trigger if just CLI/API is used.

 -- Proxmox Support Team <support@proxmox.com>  Sun, 17 Nov 2024 20:36:17 +0100

pve-ha-manager (4.0.5) bookworm; urgency=medium

  * env: notify: use named templates instead of passing template strings

 -- Proxmox Support Team <support@proxmox.com>  Tue, 04 Jun 2024 11:10:05 +0200

pve-ha-manager (4.0.4) bookworm; urgency=medium

  * d/postinst: make deb-systemd-invoke non-fatal

 -- Proxmox Support Team <support@proxmox.com>  Mon, 22 Apr 2024 13:47:18 +0200

pve-ha-manager (4.0.3) bookworm; urgency=medium

  * manager: send notifications via new notification module

  * fix #4984: manager: add service to migration-target usage only if online

  * crs: avoid auto-vivification when adding node to service usage

 -- Proxmox Support Team <support@proxmox.com>  Fri, 17 Nov 2023 14:49:03 +0100

pve-ha-manager (4.0.2) bookworm; urgency=medium

  * cluster resource manager: clear stale maintenance node, which can be
    caused by simultaneous cluster shutdown

 -- Proxmox Support Team <support@proxmox.com>  Tue, 13 Jun 2023 08:35:52 +0200

pve-ha-manager (4.0.1) bookworm; urgency=medium

  * test, simulator: make it possible to add already running service

  * lrm: do not migrate via rebalance-on-start if service already running upon
    rebalance on start

  * api: fix/add return description for status endpoint

  * resources: pve: avoid relying on internal configuration details, use new
    helpers in pve-container and qemu-server

 -- Proxmox Support Team <support@proxmox.com>  Fri, 09 Jun 2023 10:41:06 +0200

pve-ha-manager (4.0.0) bookworm; urgency=medium

  * re-build for Proxmox VE 8 / Debian 12 Bookworm

 -- Proxmox Support Team <support@proxmox.com>  Wed, 24 May 2023 19:26:51 +0200

pve-ha-manager (3.6.1) bullseye; urgency=medium

  * cli: assert that node exist when changing CRS request state to avoid
    creating a phantom node by mistake

  * manager: ensure node-request state gets transferred to new active CRM, so
    that the request for (manual) maintenance mode is upheld, even if the node
    that is in maintenace mode is also the current active CRM and gets
    rebooted.

  * lrm: ignore shutdown policy if (manual) maintenance mode is requested to
    avoid exiting from maintenance mode to early.

 -- Proxmox Support Team <support@proxmox.com>  Thu, 20 Apr 2023 14:16:14 +0200

pve-ha-manager (3.6.0) bullseye; urgency=medium

  * fix #4371: add CRM command to switch an online node manually into
    maintenance (without reboot), moving away all active services, but
    automatically migrate them back once the maintenance mode is disabled
    again.

  * manager: service start: make EWRONG_NODE a non-fatal error, but try to
    find its the actual node the service is residing on

  * manager: add new intermediate 'request_started' state for stop->start
    transitions

  * request start: optionally enable automatic selection of the best rated
    node by the CRS on service start up, bypassing the very high priority of
    the current node on which a service is located.

 -- Proxmox Support Team <support@proxmox.com>  Mon, 20 Mar 2023 13:38:26 +0100

pve-ha-manager (3.5.1) bullseye; urgency=medium

  * manager: update crs scheduling mode once per round to avoid the need for a
    restart of the currently active manager.

  * api: status: add CRS info to manager if not set to default

 -- Proxmox Support Team <support@proxmox.com>  Sat, 19 Nov 2022 15:51:11 +0100

pve-ha-manager (3.5.0) bullseye; urgency=medium

  * env: datacenter config: include crs (cluster-resource-scheduling) setting

  * manager: use static resource scheduler when configured

  * manager: avoid scoring nodes if maintenance fallback node is valid

  * manager: avoid scoring nodes when not trying next and current node is
    valid

  * usage: static: use service count on nodes as a fallback

 -- Proxmox Support Team <support@proxmox.com>  Fri, 18 Nov 2022 15:02:55 +0100

pve-ha-manager (3.4.0) bullseye; urgency=medium

  * switch to native version formatting

  * fix accounting of online services when moving services due to their source
    node going gracefully nonoperational (maintenance mode). This ensures a
    better balance of services on the cluster after such an operation.

 -- Proxmox Support Team <support@proxmox.com>  Fri, 22 Jul 2022 09:21:20 +0200

pve-ha-manager (3.3-4) bullseye; urgency=medium

  * lrm: fix getting stuck on restart due to finished worker state not
    being collected

 -- Proxmox Support Team <support@proxmox.com>  Wed, 27 Apr 2022 14:01:55 +0200

pve-ha-manager (3.3-3) bullseye; urgency=medium

  * lrm: avoid possible job starvation on huge workloads

  * lrm: increase run_worker loop-time for doing actually work to 80%
    duty-cycle

 -- Proxmox Support Team <support@proxmox.com>  Thu, 20 Jan 2022 18:05:33 +0100

pve-ha-manager (3.3-2) bullseye; urgency=medium

  * fix #3826: fix restarting LRM/CRM when triggered by package management
    system due to other updates

  * lrm: also check CRM node-status for determining if there's a fence-request
    and avoid starting up in that case to ensure that the current manager can
    get our lock and do a clean fence -> unknown -> online FSM transition.
    This avoids a problematic edge case where an admin manually removed all
    services of a to-be-fenced node, and re-added them again before the
    manager could actually get that nodes LRM lock.

  * manage: handle edge case where a node gets seemingly stuck in 'fence'
    state if all its services got manually removed by an admin before the
    fence transition could be finished. While the LRM could come up again in
    previous versions (it won't now, see above point) and start/stop services
    got executed, the node was seen as unavailable for all recovery,
    relocation and migrate actions.

 -- Proxmox Support Team <support@proxmox.com>  Wed, 19 Jan 2022 14:30:15 +0100

pve-ha-manager (3.3-1) bullseye; urgency=medium

  * LRM: release lock and close watchdog if no service configured for >10min

  * manager: make recovery actual state in finite state machine, showing a
    clear transition from fence -> reocvery.

  * fix #3415: never switch in error state on recovery, try to find a new node
    harder. This improves using the HA manager for services with local
    resources (e.g., local storage) to ensure it always gets started, which is
    an OK use-case as long as the service is restricted to a group with only
    that node. Previously failure of that node would have a high possibility
    of the service going into the errors state, as no new node can be found.
    Now it will retry finding a new node, and if one of the restricted set,
    e.g., the node it was previous on, comes back up, it will start again
    there.

  * recovery: allow disabling a in-recovery service manually

 -- Proxmox Support Team <support@proxmox.com>  Fri, 02 Jul 2021 20:03:29 +0200

pve-ha-manager (3.2-2) bullseye; urgency=medium

  * fix systemd service restart behavior on package upgrade with Debian
    Bullseye

 -- Proxmox Support Team <support@proxmox.com>  Mon, 24 May 2021 11:38:42 +0200

pve-ha-manager (3.2-1) bullseye; urgency=medium

  * Re-build for Debian Bullseye / PVE 7

 -- Proxmox Support Team <support@proxmox.com>  Wed, 12 May 2021 20:55:53 +0200

pve-ha-manager (3.1-1) pve; urgency=medium

  * allow 'with-local-disks' migration for replicated guests

 -- Proxmox Support Team <support@proxmox.com>  Mon, 31 Aug 2020 10:52:23 +0200

pve-ha-manager (3.0-9) pve; urgency=medium

  * factor out service configured/delete helpers

  * typo and grammar fixes

 -- Proxmox Support Team <support@proxmox.com>  Thu, 12 Mar 2020 13:17:36 +0100

pve-ha-manager (3.0-8) pve; urgency=medium

  * bump LRM stop wait time to an hour

  * do not mark maintenaned nodes as unkown

  * api/status: extra handling of maintenance mode

 -- Proxmox Support Team <support@proxmox.com>  Mon, 02 Dec 2019 10:33:03 +0100

pve-ha-manager (3.0-6) pve; urgency=medium

  * add 'migrate' node shutdown policy

  * do simple fallback if node comes back online from maintenance

  * account service to both, source and target during migration

  * add 'After' ordering for SSH and pveproxy to LRM service, ensuring the node
    stays accessible until HA services got moved or shutdown, depending on
    policy.

 -- Proxmox Support Team <support@proxmox.com>  Tue, 26 Nov 2019 18:03:26 +0100

pve-ha-manager (3.0-5) pve; urgency=medium

  * fix #1339: remove more locks from services IF the node got fenced

  * adapt to qemu-server code refactoring

 -- Proxmox Support Team <support@proxmox.com>  Wed, 20 Nov 2019 20:12:49 +0100

pve-ha-manager (3.0-4) pve; urgency=medium

  * use PVE::DataCenterConfig from new split-out cluster library package

 -- Proxmox Support Team <support@proxmox.com>  Mon, 18 Nov 2019 12:16:29 +0100

pve-ha-manager (3.0-3) pve; urgency=medium

  * fix #1919, #1920: improve handling zombie (without node) services

  * fix # 2241: VM resource: allow migration with local device, when not running

  * HA status: render removal transition of service as 'deleting'

  * fix #1140: add crm command 'stop', which allows to request immediate
    service hard-stops if a timeout of zero (0) is passed

 -- Proxmox Support Team <support@proxmox.com>  Mon, 11 Nov 2019 17:04:35 +0100

pve-ha-manager (3.0-2) pve; urgency=medium

  * services: update PIDFile to point directly to /run

  * fix #2234: fix typo in service description

  * Add missing Dependencies to pve-ha-simulator

 -- Proxmox Support Team <support@proxmox.com>  Thu, 11 Jul 2019 19:26:03 +0200

# Older entries have been removed from this changelog.
# To read the complete changelog use `apt changelog pve-ha-manager`.