Adjusting advanced cluster settings on larger installations

Last modified on 23 Aug, 2022. Revision 7
Adjusting advanced cluster settings on larger installations
Up to date for
cOS Core 13.00.08
Supported since
cOS Core 10.xx.xx
Status OK

Question:

My High Availability cluster is not synchronizing properly, also I have seen incidents where the cluster changes role for no apparent reason.

Answer:

Problems with synchronization and cluster role changes can of course be all kinds of reasons such as hardware problem on sync interface, bad cable, incorrect configuration etc. but if we look at some of the Advanced Settings for High Availability there are some settings here that may need to be adjusted. For most these settings never need to be changed but for larger installations it is recommended to modify them to incorporate large synchronization data and (based on scenario) lessen the chance that the cluster performs a failover due to lack of heartbeats from it’s peer.

The settings that we want to adjust are the following and can be found under System->High Availability->Advanced:

  • Sync Buffer Size , default value 4096
  • Recommended value : 4096

This setting controls how much synchronization data (in KB) can be buffered before waiting for acknowledgement from its cluster peer. Today’s appliance models (E80 and above) have quite a lot of spare memory, so allocating 4 MB instead of one should be no problem, having a little extra buffer for the synchronization will never hurt.

Note: The default value in older versions was 1024. The value will not update on existing configurations automatically. Only new configurations from around 2017 will use the new default value.

  • Sync Packet Max Burst , default value 100
  • Recommended value : 100

This setting controls how many packet the active cluster peer can send in a synchronization state burst to the inactive node. For larger installations (100+ users) it is highly recommended to increase this value, using the default value can cause the active node to be unable to synchronize data fast enough. Meaning the inactive node may not be fully synchronized with the active.

Note: The old default value was in older versions 20. The value will not update on existing configurations automatically. Only new configurations from around 2017 will use the new default value.

  • HA Failover Time , default values 750ms
  • Recommended value : 1500-2500ms

This setting controls how long the inactive node node will “wait” before going active in case it has not received sufficient heartbeats from it’s peer within this time. Simply speaking if the inactive node has not “seen” the active node for 750ms it will go active.

Depending on the scenario/size/network structure, 750ms can be a bit low. In case the system encounters network packet bursts it could result in the inactive declaring the active node as inactive and then go active itself. Then you could enter an active/active state and then the clusters start to negotiate which node that should be the active node. This in turn could cause disruptions in the network.

One way to make the cluster “less” sensitive to minor network “hiccups” would be to increase this value.

Note: The higher the value here the longer it would take for the inactive node to take over in case something happens with the active node. The value configured here will have to be based on what is reasonable acceptable, is 1.5-2.5 seconds of total network outage acceptable in case something happens with the active node? It will be up to the administrator to decide.

Related articles

cOS Core HA clusters in VMware with Promiscuous Mode
4 Apr, 2023 core vmware highavailability ha promiscuous
Device initiated InControl management of NetWall HA clusters with a single public IP
31 Mar, 2022 incontrol core netcon netwall ha cluster coscore
Differences between the NetWall E80A and E80B
31 May, 2021 hardware ha e80a e80b
Avoiding cOS Core HA interruptions during configuration deployment
20 Feb, 2023 ha core idp cli cluster antivirus configuration
Transparent mode & L2TPv3 unavailable in cOS Core HA clusters
17 Feb, 2023 core ha cluster transparentmode l2tpv3
Managing NetWall HA clusters over the Internet using one public IP
21 Jun, 2022 core ha hacluster netwall coscore slb
cOS Core High Availability Cluster troubleshooting
23 Feb, 2023 core troubleshoot cluster ha