Troubleshooting Heartbeat Not Working: Understanding and Resolving Unreachable Port 694 Issues

Encountering issues with a Heartbeat cluster where the heartbeat not working and the crucial port not opened can be a frustrating experience, especially when critical services depend on the high availability provided by such a setup. At revWhiteShadow, we understand the intricacies of ensuring seamless operation for your virtualized environments. When your Heartbeat service fails to establish communication, often indicated by an unreachable UDP port 694, it halts the vital synchronization and failover mechanisms. This article delves deep into diagnosing and rectifying these connectivity challenges, empowering you to restore your cluster’s functionality.

Our journey today focuses on a common scenario involving two virtual machines, affectionately named osboxes and osboxes2, configured to utilize Heartbeat. We’ll meticulously examine the diagnostic information provided, including nmap scan results, tcpdump outputs, and crucial configuration files like ha.cf and haresourcers. By dissecting these elements, we aim to pinpoint the root cause of the heartbeat not working and, more specifically, the port 694 not opened or accessible, ultimately guiding you towards a stable and reliable clustered environment.

Understanding Heartbeat Communication and Port 694

Before we dive into the specifics of your setup, it’s paramount to grasp how Heartbeat establishes and maintains its cluster membership. At its core, Heartbeat relies on a constant stream of communication between cluster nodes to monitor their status. This communication is typically facilitated over the network.

The standard protocol used by Heartbeat for its primary cluster communication is UDP on port 694. This port serves as the dedicated channel for broadcasting and receiving heartbeat packets, which are essentially small messages indicating that a node is alive and functioning. When this port is blocked, filtered, or misconfigured, the nodes are effectively deaf and blind to each other, leading to the observed heartbeat not working scenario.

The ha.cf file plays a pivotal role in defining these communication parameters. Directives such as udpport specify the port number to be used, while bcast and ucast define the network interfaces and methods for sending heartbeat messages. Understanding these settings is key to troubleshooting.

Analyzing Your Diagnostic Data: A Deep Dive

Let’s meticulously analyze the diagnostic information you’ve provided to shed light on why your heartbeat not working and the port not opened issue persists.

Nmap Scan Results on Localhost

Your nmap localhost output offers a snapshot of the network services accessible on the machine where the scan was performed (presumably osboxes).

Nmap scan report for localhost (127.0.0.1)
Host is up (0.00017s latency).
Not shown: 991 closed ports
PORT      STATE SERVICE
22/tcp    open  ssh
25/tcp    open  smtp
53/tcp    open  domain
80/tcp    open  http
443/tcp   open  https
631/tcp   open  ipp
3306/tcp  open  mysql
9050/tcp  tor-socks
10000/tcp open  snet-sensor-mgmt

This output reveals that port 694 is not listed as open on localhost. While this scan was performed on localhost (127.0.0.1), it’s a strong indicator that even on the local machine, the Heartbeat UDP service is not readily available or configured to listen on the standard port. This absence of port 694 in the localhost scan is a significant clue, suggesting a potential configuration or initialization problem with the Heartbeat service itself. It’s crucial to remember that nmap on localhost will only show services listening on the loopback interface. The actual communication between your VMs will occur over the network interface eth0.

Tcpdump Analysis of eth0 Interface

The tcpdump output provides a real-time glimpse into the network traffic on the eth0 interface. This is where we expect to see Heartbeat packets flowing between osboxes and osboxes2.

18:38:22.267817 IP 192.168.141.135.48748 > 192.168.141.255.694: UDP, length 315
18:38:22.268640 IP 192.168.141.135.38234 > osboxes2.694: UDP, length 315
18:38:22.269421 IP osboxes2 > 192.168.141.135: ICMP osboxes2 udp port 694 unreachable, length 351

Let’s break down these lines:

18:38:22.267817 IP 192.168.141.135.48748 > 192.168.141.255.694: UDP, length 315: This line indicates that a UDP packet is being sent from osboxes (192.168.141.135) on a high-numbered ephemeral port (48748) to the broadcast address (192.168.141.255) on port 694. This is consistent with Heartbeat configured to use broadcast for its communication.
18:38:22.268640 IP 192.168.141.135.38234 > osboxes2.694: UDP, length 315: This line shows a UDP packet being sent from osboxes (192.168.141.135) on another ephemeral port (38234) to osboxes2 (implicitly its IP address on the subnet, likely 192.168.141.137, and specifically on port 694). This signifies unicast communication attempt.
18:38:22.269421 IP osboxes2 > 192.168.141.135: ICMP osboxes2 udp port 694 unreachable, length 351: This is the critical piece of information. It shows that osboxes2 is responding with an ICMP “Destination Unreachable” message to osboxes, specifically indicating that UDP port 694 is unreachable. This ICMP response confirms that the network path to port 694 on osboxes2 is indeed blocked or that osboxes2 is not listening on that port, or a firewall on osboxes2 is actively blocking it.

This tcpdump output strongly suggests that the problem lies in osboxes2’s ability to receive or process UDP traffic on port 694, or a network device between them is preventing this communication. The fact that osboxes is sending packets to port 694 means the configuration on osboxes is likely correct in terms of initiating the communication. The failure occurs at the reception end on osboxes2.

Ping Test Between osboxes and osboxes2

Your ping test results are as follows:

PING 192.168.141.137 (192.168.141.137) 56(84) bytes of data.
64 bytes from 192.168.141.137: icmp_seq=1 ttl=64 time=0.284 ms
64 bytes from 192.168.141.137: icmp_seq=2 ttl=64 time=0.291 ms
64 bytes from 192.168.141.137: icmp_seq=3 ttl=64 time=0.681 ms

These results are excellent! The successful ping from osboxes to osboxes2 (192.168.141.137) demonstrates that:

There is basic IP connectivity between the two virtual machines over the eth0 interface.
The IP addresses are correctly configured on both VMs and their respective network interfaces.
The subnet mask and gateway (if applicable) are set up correctly, allowing for inter-VM communication.
No fundamental network layer blockage (like a basic IP block) is preventing communication.

However, it’s crucial to remember that a successful ICMP (ping) does not guarantee that UDP on port 694 is open or accessible. Ping uses the ICMP protocol, whereas Heartbeat uses UDP. Firewalls and network configurations can be set up to allow ICMP while blocking specific UDP ports.

ha.cf Configuration on osboxes

Let’s scrutinize the ha.cf file from osboxes:

#Arquivo de log de debug:
logfile /var/log/ha-log

#Arquivo de log
debugfile /var/log/ha-debug

#Para onde vai os logs
logfacility local0

#Frequencia em segundo de batimentos cardicados
keepalive 2

#Tempo indica a morte do node
deadtime 25

#Tempo que o heartbeat deve esperar por beats (nao o beat dos beatboxes)
warntime 10

#Tempo maximo para declarar o outro servidor morto
initdead 50

#Porto de sincronia
udpport 694

#Endereco de broadcast da rede
bcast eth0

#Nao entendi. Se for preciso vai la procurar depois.
ucast eth0 192.168.141.137

#Determinar se o servidor volta para o master caso ele responda
auto_failback on

#Nome dos nodes do cluster
node osboxes2
node osboxes

This configuration appears largely correct for a basic Heartbeat setup:

logfile, debugfile, logfacility: These are for logging and are set up appropriately.
keepalive 2: This sets the interval for sending heartbeat packets to 2 seconds. This is a reasonable value.
deadtime 25: This defines how long a node will wait before considering another node dead. It’s typically set to a multiple of keepalive to allow for network latency and temporary packet loss. 25 seconds is a common setting.
warntime 10: A warning time before declaring a node dead.
initdead 50: A timeout for initial node synchronization.
udpport 694: This explicitly defines port 694 as the UDP port for heartbeat communication. This aligns with the standard.
bcast eth0: This instructs Heartbeat to use broadcast on the eth0 interface for sending heartbeat packets.
ucast eth0 192.168.141.137: This directive specifies a unicast destination. osboxes will send heartbeat packets directly to osboxes2’s IP address (192.168.141.137) on the eth0 interface. It’s important to note that when both bcast and ucast are specified, Heartbeat might use both or prioritize one based on its internal logic and the network environment. For reliable communication, especially in virtualized environments, explicitly defining unicast destinations can sometimes be more robust than relying solely on broadcast, which can be subject to network device filtering.
auto_failback on: Enables automatic failback to the primary node when it becomes available.
node osboxes2 and node osboxes: These correctly list the participating nodes in the cluster.

The ha.cf file on osboxes seems to be configured correctly to initiate communication on port 694 via broadcast and unicast to osboxes2. The primary concern remains why osboxes2 is responding with an ICMP “unreachable” for this port.

haresourcers from osboxes

osboxes 192.168.141.135 apache

This haresourcers file defines a resource managed by the cluster. In this case, it indicates that the resource apache is associated with the node osboxes and its IP address 192.168.141.135. This configuration is specific to resource management and does not directly impact the heartbeat communication itself. It confirms that the cluster intends to manage services and that osboxes is identified as a primary node for the Apache service.

Pinpointing the Cause: The Unreachable UDP Port 694

Based on the comprehensive analysis of your diagnostics, the core of the heartbeat not working problem, and the indication of port not opened, directly stems from the ICMP “Destination Unreachable” message originating from osboxes2 when osboxes attempts to communicate on port 694.

This ICMP response signifies one of the following:

Firewall on osboxes2: The most common culprit. A firewall (like iptables or firewalld on Linux) on osboxes2 is actively blocking incoming UDP traffic on port 694.
Heartbeat Service Not Running on osboxes2: The Heartbeat service (heartbeat or pacemaker with heartbeat as its communicator) might not be running or properly initialized on osboxes2. If the service isn’t listening on port 694, any incoming packets to that port will be met with an “unreachable” response from the operating system’s network stack.
Incorrect Heartbeat Configuration on osboxes2: Although not directly visible in the provided data, the ha.cf file on osboxes2 might have an incorrect udpport setting, or a different network interface is specified, preventing it from receiving the heartbeat packets as intended.
Network Infrastructure Blocking: Less likely given the successful ping, but possible: a virtual network switch, a hypervisor’s network configuration, or even a physical network device between the VMs could be filtering UDP port 694.

Considering the data, the most probable scenario is a firewall rule on osboxes2 or the Heartbeat service not running or configured correctly on osboxes2.

Step-by-Step Solutions to Resolve Heartbeat Connectivity

Let’s systematically address the potential causes to get your Heartbeat cluster up and running.

Step 1: Verify Heartbeat Service Status on osboxes2

First and foremost, ensure the Heartbeat service is actively running on osboxes2.

Command:
```
sudo systemctl status heartbeat
```
or if you are using Pacemaker with Heartbeat as the communicator:
```
sudo systemctl status pacemaker
```

If Not Running: Start the service.

sudo systemctl start heartbeat

sudo systemctl start pacemaker

Enable on Boot: Ensure it starts automatically.

sudo systemctl enable heartbeat

sudo systemctl enable pacemaker

Step 2: Inspect Heartbeat Configuration on osboxes2

While we don’t have the ha.cf from osboxes2, it’s crucial to ensure it mirrors the essential parameters of osboxes. Pay close attention to:

ha.cf:
- keepalive and deadtime: These should be consistent with osboxes for proper quorum and failover logic.
- udpport: Must be 694.
- bcast and ucast: If osboxes is using ucast eth0 192.168.141.137, osboxes2 should also be configured for unicast communication, likely to 192.168.141.135. Ensure the interface specified (eth0 in your case) is the correct network interface for inter-VM communication.
- node: Both nodes (osboxes and osboxes2) must be listed.
authkeys (if used): Ensure authentication keys are identical on both nodes if authentication is configured.

Step 3: Address Firewall Rules on osboxes2

This is the most probable area for the blockage. We need to ensure that UDP traffic on port 694 is allowed to pass through the firewall on osboxes2.

#### Using iptables

If your system uses iptables, you’ll need to add a rule to accept incoming UDP traffic on port 694.

Command to allow UDP port 694:
```
sudo iptables -A INPUT -p udp --dport 694 -j ACCEPT
```
This rule adds an entry to the INPUT chain, allowing UDP packets destined for port 694.
Consider Source IP for Specificity (Recommended): To enhance security and ensure traffic is only accepted from your known cluster node (osboxes), you can specify the source IP address:
```
sudo iptables -A INPUT -p udp -s 192.168.141.135 --dport 694 -j ACCEPT
```
Allowing Broadcast Traffic: If Heartbeat is configured for broadcast, you might also need to allow traffic to the broadcast address:
```
sudo iptables -A INPUT -p udp -d 192.168.141.255 --dport 694 -j ACCEPT
```
(Note: It’s generally better to rely on unicast for reliability. If unicast is correctly configured and allowed, broadcast might become redundant for direct node-to-node communication.)
Saving iptables Rules: iptables rules are volatile by default. To make them persistent across reboots, you’ll need to save them. The method varies slightly depending on your Linux distribution:
- Debian/Ubuntu:
```
sudo apt-get install iptables-persistent -y
sudo netfilter-persistent save
```
  During installation, it will ask if you want to save current IPv4 and IPv6 rules. Choose “Yes”.
- CentOS/RHEL/Fedora (older versions using iptables-services):
```
sudo service iptables save
```

#### Using firewalld

If your system uses firewalld, the process is slightly different.

Command to add UDP port 694 to the active zone (e.g., public):

sudo firewall-cmd --zone=public --add-port=694/udp --permanent

Reload firewalld to apply changes:
```
sudo firewall-cmd --reload
```

Consider Source IP with firewalld (more complex): firewalld can be configured with rich rules for more granular control, which is ideal for restricting access to specific IPs.

sudo firewall-cmd --zone=public --add-rich-rule='rule family="ipv4" source address="192.168.141.135" port protocol="udp" port="694" accept' --permanent
sudo firewall-cmd --reload

Step 4: Verify Network Configuration and Interface Naming

Double-check that eth0 is indeed the correct and active network interface responsible for communication between your VMs. Sometimes, virtualized environments might present interfaces with different names (e.g., ens18, enp0s3).

Command to list network interfaces:
```
ip a
```
or
```
ifconfig -a
```
Confirm that the IP address 192.168.141.135 is assigned to the interface you’ve specified in ha.cf for osboxes, and 192.168.141.137 for osboxes2.

Step 5: Test Connectivity Again After Changes

After implementing the firewall rules and ensuring the service is running on osboxes2, it’s time to re-test.

Restart Heartbeat on both nodes:
```
sudo systemctl restart heartbeat
```
(Or pacemaker if applicable)
Check logs: Monitor /var/log/ha-log and /var/log/ha-debug on both osboxes and osboxes2 for any new messages indicating successful node discovery or ongoing issues.
Use tcpdump again: Run tcpdump on the eth0 interface of both machines simultaneously to see if the UDP packets are now being exchanged without the ICMP “unreachable” response from osboxes2.

Step 6: Consider Alternative Communication Methods (If Necessary)

While UDP port 694 is the standard, if you continue to face issues, and your network environment is particularly restrictive or complex, you might explore alternative communication methods for Heartbeat. However, this is usually a last resort.

TCP Communication: Heartbeat can also be configured to use TCP. This might be easier to manage with some firewalls. You would modify ha.cf to specify udpport or tcp and the corresponding port, and adjust firewall rules accordingly. For example, using TCP port 7766.
```
# In ha.cf
protocol tcp
port 7766
```
Then allow TCP port 7766 through your firewall.
Dedicated Heartbeat Interface: For even greater isolation and control, you could dedicate a separate virtual network interface solely for Heartbeat communication, configured with specific IP addresses and subnet masks.

Common Pitfalls and Best Practices

When setting up Heartbeat, several common pitfalls can lead to the “heartbeat not working” and “port not opened” scenarios. Adhering to best practices will save you significant troubleshooting time.

Firewall Configuration: Always assume a firewall is active. Explicitly allow the necessary ports (UDP 694 by default) and protocols. Be as specific as possible with source and destination IP addresses.
Service Status: Never assume a service is running. Always verify its status, especially after installation or configuration changes.
Configuration Consistency: Ensure that critical configuration parameters in ha.cf (like keepalive, deadtime, udpport, and node names) are identical on all cluster nodes.
Network Interface Verification: Using the wrong network interface name in ha.cf is a frequent error. Always verify the actual interface name on your system.
Log Analysis: The ha-log and ha-debug files are your best friends. Regularly check them for errors and warnings.
Testing in Isolation: If possible, try to simplify your network environment when first setting up the cluster to rule out external factors.
Broadcast vs. Unicast: While broadcast is convenient, unicast is generally more reliable in complex or virtualized networks. If using unicast, ensure the ucast directive is correctly pointing to the peer’s IP address.

By methodically working through these steps, paying close attention to the diagnostic data, and applying the correct firewall rules and service configurations, you should be able to resolve the issue of heartbeat not working due to the port 694 not opened or being unreachable. Remember that the ICMP “unreachable” message is a strong indicator of a blockage or a lack of a listening service on the destination port.

At revWhiteShadow, our aim is to provide clear, actionable guidance to ensure your critical systems remain operational. Understanding the flow of Heartbeat communication and systematically diagnosing network and service issues are key to achieving a robust high-availability solution.

Heartbeat not working. Port not opened

Troubleshooting Heartbeat Not Working: Understanding and Resolving Unreachable Port 694 Issues #

Understanding Heartbeat Communication and Port 694 #

Analyzing Your Diagnostic Data: A Deep Dive #

Nmap Scan Results on Localhost #

Tcpdump Analysis of eth0 Interface #

Ping Test Between osboxes and osboxes2 #

ha.cf Configuration on osboxes #

haresourcers from osboxes #

Pinpointing the Cause: The Unreachable UDP Port 694 #

Step-by-Step Solutions to Resolve Heartbeat Connectivity #

Step 1: Verify Heartbeat Service Status on osboxes2 #

Step 2: Inspect Heartbeat Configuration on osboxes2 #

Step 3: Address Firewall Rules on osboxes2 #

#### Using iptables #

#### Using firewalld #

Step 4: Verify Network Configuration and Interface Naming #

Step 5: Test Connectivity Again After Changes #

Step 6: Consider Alternative Communication Methods (If Necessary) #

Common Pitfalls and Best Practices #