Heartbeat not working. Port not opened
Troubleshooting Heartbeat Not Working: Understanding and Resolving Unreachable Port 694 Issues
Encountering issues with a Heartbeat cluster where the heartbeat not working and the crucial port not opened can be a frustrating experience, especially when critical services depend on the high availability provided by such a setup. At revWhiteShadow, we understand the intricacies of ensuring seamless operation for your virtualized environments. When your Heartbeat service fails to establish communication, often indicated by an unreachable UDP port 694, it halts the vital synchronization and failover mechanisms. This article delves deep into diagnosing and rectifying these connectivity challenges, empowering you to restore your cluster’s functionality.
Our journey today focuses on a common scenario involving two virtual machines, affectionately named osboxes and osboxes2, configured to utilize Heartbeat. We’ll meticulously examine the diagnostic information provided, including nmap
scan results, tcpdump
outputs, and crucial configuration files like ha.cf
and haresourcers
. By dissecting these elements, we aim to pinpoint the root cause of the heartbeat not working and, more specifically, the port 694 not opened or accessible, ultimately guiding you towards a stable and reliable clustered environment.
Understanding Heartbeat Communication and Port 694
Before we dive into the specifics of your setup, it’s paramount to grasp how Heartbeat establishes and maintains its cluster membership. At its core, Heartbeat relies on a constant stream of communication between cluster nodes to monitor their status. This communication is typically facilitated over the network.
The standard protocol used by Heartbeat for its primary cluster communication is UDP on port 694. This port serves as the dedicated channel for broadcasting and receiving heartbeat packets, which are essentially small messages indicating that a node is alive and functioning. When this port is blocked, filtered, or misconfigured, the nodes are effectively deaf and blind to each other, leading to the observed heartbeat not working scenario.
The ha.cf
file plays a pivotal role in defining these communication parameters. Directives such as udpport
specify the port number to be used, while bcast
and ucast
define the network interfaces and methods for sending heartbeat messages. Understanding these settings is key to troubleshooting.
Analyzing Your Diagnostic Data: A Deep Dive
Let’s meticulously analyze the diagnostic information you’ve provided to shed light on why your heartbeat not working and the port not opened issue persists.
Nmap Scan Results on Localhost
Your nmap localhost
output offers a snapshot of the network services accessible on the machine where the scan was performed (presumably osboxes).
Nmap scan report for localhost (127.0.0.1)
Host is up (0.00017s latency).
Not shown: 991 closed ports
PORT STATE SERVICE
22/tcp open ssh
25/tcp open smtp
53/tcp open domain
80/tcp open http
443/tcp open https
631/tcp open ipp
3306/tcp open mysql
9050/tcp tor-socks
10000/tcp open snet-sensor-mgmt
This output reveals that port 694 is not listed as open on localhost. While this scan was performed on localhost
(127.0.0.1), it’s a strong indicator that even on the local machine, the Heartbeat UDP service is not readily available or configured to listen on the standard port. This absence of port 694 in the localhost
scan is a significant clue, suggesting a potential configuration or initialization problem with the Heartbeat service itself. It’s crucial to remember that nmap
on localhost
will only show services listening on the loopback interface. The actual communication between your VMs will occur over the network interface eth0
.
Tcpdump Analysis of eth0 Interface
The tcpdump
output provides a real-time glimpse into the network traffic on the eth0
interface. This is where we expect to see Heartbeat packets flowing between osboxes and osboxes2.
18:38:22.267817 IP 192.168.141.135.48748 > 192.168.141.255.694: UDP, length 315
18:38:22.268640 IP 192.168.141.135.38234 > osboxes2.694: UDP, length 315
18:38:22.269421 IP osboxes2 > 192.168.141.135: ICMP osboxes2 udp port 694 unreachable, length 351
Let’s break down these lines:
18:38:22.267817 IP 192.168.141.135.48748 > 192.168.141.255.694: UDP, length 315
: This line indicates that a UDP packet is being sent from osboxes (192.168.141.135) on a high-numbered ephemeral port (48748) to the broadcast address (192.168.141.255) on port 694. This is consistent with Heartbeat configured to use broadcast for its communication.18:38:22.268640 IP 192.168.141.135.38234 > osboxes2.694: UDP, length 315
: This line shows a UDP packet being sent from osboxes (192.168.141.135) on another ephemeral port (38234) to osboxes2 (implicitly its IP address on the subnet, likely 192.168.141.137, and specifically on port 694). This signifies unicast communication attempt.18:38:22.269421 IP osboxes2 > 192.168.141.135: ICMP osboxes2 udp port 694 unreachable, length 351
: This is the critical piece of information. It shows that osboxes2 is responding with an ICMP “Destination Unreachable” message to osboxes, specifically indicating that UDP port 694 is unreachable. This ICMP response confirms that the network path to port 694 on osboxes2 is indeed blocked or that osboxes2 is not listening on that port, or a firewall on osboxes2 is actively blocking it.
This tcpdump
output strongly suggests that the problem lies in osboxes2’s ability to receive or process UDP traffic on port 694, or a network device between them is preventing this communication. The fact that osboxes is sending packets to port 694 means the configuration on osboxes is likely correct in terms of initiating the communication. The failure occurs at the reception end on osboxes2.
Ping Test Between osboxes and osboxes2
Your ping test results are as follows:
PING 192.168.141.137 (192.168.141.137) 56(84) bytes of data.
64 bytes from 192.168.141.137: icmp_seq=1 ttl=64 time=0.284 ms
64 bytes from 192.168.141.137: icmp_seq=2 ttl=64 time=0.291 ms
64 bytes from 192.168.141.137: icmp_seq=3 ttl=64 time=0.681 ms
These results are excellent! The successful ping from osboxes to osboxes2 (192.168.141.137) demonstrates that:
- There is basic IP connectivity between the two virtual machines over the
eth0
interface. - The IP addresses are correctly configured on both VMs and their respective network interfaces.
- The subnet mask and gateway (if applicable) are set up correctly, allowing for inter-VM communication.
- No fundamental network layer blockage (like a basic IP block) is preventing communication.
However, it’s crucial to remember that a successful ICMP (ping) does not guarantee that UDP on port 694 is open or accessible. Ping uses the ICMP protocol, whereas Heartbeat uses UDP. Firewalls and network configurations can be set up to allow ICMP while blocking specific UDP ports.
ha.cf Configuration on osboxes
Let’s scrutinize the ha.cf
file from osboxes:
#Arquivo de log de debug:
logfile /var/log/ha-log
#Arquivo de log
debugfile /var/log/ha-debug
#Para onde vai os logs
logfacility local0
#Frequencia em segundo de batimentos cardicados
keepalive 2
#Tempo indica a morte do node
deadtime 25
#Tempo que o heartbeat deve esperar por beats (nao o beat dos beatboxes)
warntime 10
#Tempo maximo para declarar o outro servidor morto
initdead 50
#Porto de sincronia
udpport 694
#Endereco de broadcast da rede
bcast eth0
#Nao entendi. Se for preciso vai la procurar depois.
ucast eth0 192.168.141.137
#Determinar se o servidor volta para o master caso ele responda
auto_failback on
#Nome dos nodes do cluster
node osboxes2
node osboxes
This configuration appears largely correct for a basic Heartbeat setup:
logfile
,debugfile
,logfacility
: These are for logging and are set up appropriately.keepalive 2
: This sets the interval for sending heartbeat packets to 2 seconds. This is a reasonable value.deadtime 25
: This defines how long a node will wait before considering another node dead. It’s typically set to a multiple ofkeepalive
to allow for network latency and temporary packet loss.25
seconds is a common setting.warntime 10
: A warning time before declaring a node dead.initdead 50
: A timeout for initial node synchronization.udpport 694
: This explicitly defines port 694 as the UDP port for heartbeat communication. This aligns with the standard.bcast eth0
: This instructs Heartbeat to use broadcast on theeth0
interface for sending heartbeat packets.ucast eth0 192.168.141.137
: This directive specifies a unicast destination. osboxes will send heartbeat packets directly to osboxes2’s IP address (192.168.141.137) on theeth0
interface. It’s important to note that when bothbcast
anducast
are specified, Heartbeat might use both or prioritize one based on its internal logic and the network environment. For reliable communication, especially in virtualized environments, explicitly defining unicast destinations can sometimes be more robust than relying solely on broadcast, which can be subject to network device filtering.auto_failback on
: Enables automatic failback to the primary node when it becomes available.node osboxes2
andnode osboxes
: These correctly list the participating nodes in the cluster.
The ha.cf
file on osboxes seems to be configured correctly to initiate communication on port 694 via broadcast and unicast to osboxes2. The primary concern remains why osboxes2 is responding with an ICMP “unreachable” for this port.
haresourcers from osboxes
osboxes 192.168.141.135 apache
This haresourcers
file defines a resource managed by the cluster. In this case, it indicates that the resource apache
is associated with the node osboxes
and its IP address 192.168.141.135
. This configuration is specific to resource management and does not directly impact the heartbeat communication itself. It confirms that the cluster intends to manage services and that osboxes is identified as a primary node for the Apache service.
Pinpointing the Cause: The Unreachable UDP Port 694
Based on the comprehensive analysis of your diagnostics, the core of the heartbeat not working problem, and the indication of port not opened, directly stems from the ICMP “Destination Unreachable” message originating from osboxes2 when osboxes attempts to communicate on port 694.
This ICMP response signifies one of the following:
- Firewall on osboxes2: The most common culprit. A firewall (like
iptables
orfirewalld
on Linux) on osboxes2 is actively blocking incoming UDP traffic on port 694. - Heartbeat Service Not Running on osboxes2: The Heartbeat service (
heartbeat
orpacemaker
withheartbeat
as its communicator) might not be running or properly initialized on osboxes2. If the service isn’t listening on port 694, any incoming packets to that port will be met with an “unreachable” response from the operating system’s network stack. - Incorrect Heartbeat Configuration on osboxes2: Although not directly visible in the provided data, the
ha.cf
file on osboxes2 might have an incorrectudpport
setting, or a different network interface is specified, preventing it from receiving the heartbeat packets as intended. - Network Infrastructure Blocking: Less likely given the successful ping, but possible: a virtual network switch, a hypervisor’s network configuration, or even a physical network device between the VMs could be filtering UDP port 694.
Considering the data, the most probable scenario is a firewall rule on osboxes2 or the Heartbeat service not running or configured correctly on osboxes2.
Step-by-Step Solutions to Resolve Heartbeat Connectivity
Let’s systematically address the potential causes to get your Heartbeat cluster up and running.
Step 1: Verify Heartbeat Service Status on osboxes2
First and foremost, ensure the Heartbeat service is actively running on osboxes2.
Command:
sudo systemctl status heartbeat
or if you are using Pacemaker with Heartbeat as the communicator:
sudo systemctl status pacemaker
If Not Running: Start the service.
sudo systemctl start heartbeat
or
sudo systemctl start pacemaker
Enable on Boot: Ensure it starts automatically.
sudo systemctl enable heartbeat
or
sudo systemctl enable pacemaker
Step 2: Inspect Heartbeat Configuration on osboxes2
While we don’t have the ha.cf
from osboxes2, it’s crucial to ensure it mirrors the essential parameters of osboxes. Pay close attention to:
ha.cf
:keepalive
anddeadtime
: These should be consistent with osboxes for proper quorum and failover logic.udpport
: Must be 694.bcast
anducast
: If osboxes is usingucast eth0 192.168.141.137
, osboxes2 should also be configured for unicast communication, likely to192.168.141.135
. Ensure the interface specified (eth0
in your case) is the correct network interface for inter-VM communication.node
: Both nodes (osboxes
andosboxes2
) must be listed.
authkeys
(if used): Ensure authentication keys are identical on both nodes if authentication is configured.
Step 3: Address Firewall Rules on osboxes2
This is the most probable area for the blockage. We need to ensure that UDP traffic on port 694 is allowed to pass through the firewall on osboxes2.
#### Using iptables
If your system uses iptables
, you’ll need to add a rule to accept incoming UDP traffic on port 694.
Command to allow UDP port 694:
sudo iptables -A INPUT -p udp --dport 694 -j ACCEPT
This rule adds an entry to the
INPUT
chain, allowing UDP packets destined for port 694.Consider Source IP for Specificity (Recommended): To enhance security and ensure traffic is only accepted from your known cluster node (osboxes), you can specify the source IP address:
sudo iptables -A INPUT -p udp -s 192.168.141.135 --dport 694 -j ACCEPT
Allowing Broadcast Traffic: If Heartbeat is configured for broadcast, you might also need to allow traffic to the broadcast address:
sudo iptables -A INPUT -p udp -d 192.168.141.255 --dport 694 -j ACCEPT
(Note: It’s generally better to rely on unicast for reliability. If unicast is correctly configured and allowed, broadcast might become redundant for direct node-to-node communication.)
Saving iptables Rules:
iptables
rules are volatile by default. To make them persistent across reboots, you’ll need to save them. The method varies slightly depending on your Linux distribution:Debian/Ubuntu:
sudo apt-get install iptables-persistent -y sudo netfilter-persistent save
During installation, it will ask if you want to save current IPv4 and IPv6 rules. Choose “Yes”.
CentOS/RHEL/Fedora (older versions using
iptables-services
):sudo service iptables save
#### Using firewalld
If your system uses firewalld
, the process is slightly different.
Command to add UDP port 694 to the active zone (e.g.,
public
):sudo firewall-cmd --zone=public --add-port=694/udp --permanent
Reload firewalld to apply changes:
sudo firewall-cmd --reload
Consider Source IP with firewalld (more complex):
firewalld
can be configured with rich rules for more granular control, which is ideal for restricting access to specific IPs.sudo firewall-cmd --zone=public --add-rich-rule='rule family="ipv4" source address="192.168.141.135" port protocol="udp" port="694" accept' --permanent sudo firewall-cmd --reload
Step 4: Verify Network Configuration and Interface Naming
Double-check that eth0
is indeed the correct and active network interface responsible for communication between your VMs. Sometimes, virtualized environments might present interfaces with different names (e.g., ens18
, enp0s3
).
- Command to list network interfaces:or
ip a
Confirm that the IP addressifconfig -a
192.168.141.135
is assigned to the interface you’ve specified inha.cf
for osboxes, and192.168.141.137
for osboxes2.
Step 5: Test Connectivity Again After Changes
After implementing the firewall rules and ensuring the service is running on osboxes2, it’s time to re-test.
Restart Heartbeat on both nodes:
sudo systemctl restart heartbeat
(Or
pacemaker
if applicable)Check logs: Monitor
/var/log/ha-log
and/var/log/ha-debug
on both osboxes and osboxes2 for any new messages indicating successful node discovery or ongoing issues.Use
tcpdump
again: Runtcpdump
on theeth0
interface of both machines simultaneously to see if the UDP packets are now being exchanged without the ICMP “unreachable” response from osboxes2.
Step 6: Consider Alternative Communication Methods (If Necessary)
While UDP port 694 is the standard, if you continue to face issues, and your network environment is particularly restrictive or complex, you might explore alternative communication methods for Heartbeat. However, this is usually a last resort.
TCP Communication: Heartbeat can also be configured to use TCP. This might be easier to manage with some firewalls. You would modify
ha.cf
to specifyudpport
ortcp
and the corresponding port, and adjust firewall rules accordingly. For example, using TCP port 7766.# In ha.cf protocol tcp port 7766
Then allow TCP port 7766 through your firewall.
Dedicated Heartbeat Interface: For even greater isolation and control, you could dedicate a separate virtual network interface solely for Heartbeat communication, configured with specific IP addresses and subnet masks.
Common Pitfalls and Best Practices
When setting up Heartbeat, several common pitfalls can lead to the “heartbeat not working” and “port not opened” scenarios. Adhering to best practices will save you significant troubleshooting time.
- Firewall Configuration: Always assume a firewall is active. Explicitly allow the necessary ports (UDP 694 by default) and protocols. Be as specific as possible with source and destination IP addresses.
- Service Status: Never assume a service is running. Always verify its status, especially after installation or configuration changes.
- Configuration Consistency: Ensure that critical configuration parameters in
ha.cf
(likekeepalive
,deadtime
,udpport
, and node names) are identical on all cluster nodes. - Network Interface Verification: Using the wrong network interface name in
ha.cf
is a frequent error. Always verify the actual interface name on your system. - Log Analysis: The
ha-log
andha-debug
files are your best friends. Regularly check them for errors and warnings. - Testing in Isolation: If possible, try to simplify your network environment when first setting up the cluster to rule out external factors.
- Broadcast vs. Unicast: While broadcast is convenient, unicast is generally more reliable in complex or virtualized networks. If using unicast, ensure the
ucast
directive is correctly pointing to the peer’s IP address.
By methodically working through these steps, paying close attention to the diagnostic data, and applying the correct firewall rules and service configurations, you should be able to resolve the issue of heartbeat not working due to the port 694 not opened or being unreachable. Remember that the ICMP “unreachable” message is a strong indicator of a blockage or a lack of a listening service on the destination port.
At revWhiteShadow, our aim is to provide clear, actionable guidance to ensure your critical systems remain operational. Understanding the flow of Heartbeat communication and systematically diagnosing network and service issues are key to achieving a robust high-availability solution.