Why Your LXC Container Stops When You Close an SSH Session with the Host: Troubleshooting and Solutions

As revWhiteShadow, the author of kts personal blog site revWhiteShadow, we understand the frustration of having your LXC containers unexpectedly stop when you close your SSH session to the host machine. This is a common issue, and several factors can contribute to it. Let’s dive into the possible causes and, more importantly, how to resolve them, ensuring your containers remain running independently of your SSH sessions.

Understanding the Problem: LXC Containers and SSH Sessions

The core issue stems from how processes are managed within a Linux environment. When you initiate an SSH session, a process is created to handle your connection. If the LXC container startup process is inadvertently linked to this SSH session’s lifespan, the container may be terminated when the session ends. This linkage can occur in various ways, often related to how the container is started or the user under which it runs.

Common Causes for LXC Container Shutdowns on SSH Session Closure

Let’s explore the most frequent reasons why your LXC container might be stopping when your SSH session closes:

1. User Session Dependency

The Issue:

If the container startup process is directly tied to your user session, closing the SSH session will signal the system to terminate all processes associated with that session, including the container. This is particularly relevant if you’re starting the container manually within the SSH session without properly detaching it.

The Solution:

The critical step here is to ensure the container is started independently of your user session. This is often achieved through systemd services, as you’ve already begun to implement. However, subtle configuration errors can still lead to dependency issues. We will examine the configurations that cause dependency issues.

2. Systemd Service Configuration Errors

The Issue:

While using systemd is the recommended approach, incorrect settings within your service file (~/.config/systemd/user/container.service in your case) can prevent the container from running reliably in the background. Common pitfalls include missing or incorrect Type, RemainAfterExit, or User directives.

The Solution:

Let’s examine and refine your container.service file:

# ~/.config/systemd/user/container.service
#
[Unit]
Description=lxc-autostart containers
DefaultDependencies=no
Wants=network.target lxc.service
After=network.target lxc.service

[Service]
Type=simple
RemainAfterExit=yes
Delegate=yes
User=%I  # Critical: Run the container as a specific user, not the SSH session user
ExecStart=/usr/bin/lxc-start -n %i # Use %i to get the container name from the service instance
ExecStop=/usr/bin/lxc-stop --kill %i

[Install]
WantedBy=default.target
Key Changes and Explanations:
  • After=network.target lxc.service: This directive is added to explicitly state that the network.target and lxc.service must be active before starting your container service. This ensures that the network is up and LXC is ready to manage containers.
  • User=%I: Replace %I by the user account that will run the container. This line is crucial. Specifying the user ensures that the container runs under that user’s context, independently of your SSH session. Replace %I with the actual username. It should be a non-root user for security best practices.
  • ExecStart=/usr/bin/lxc-start -n %i and ExecStop=/usr/bin/lxc-stop --kill %i: We’ve updated these lines to use %i. This placeholder is automatically replaced by the name of the service instance. To use instance names, you need to start your service using systemctl --user start container@dns1.service
    • This format allows you to manage multiple containers using a single service definition by creating instances.
    • Ensure the full path to lxc-start and lxc-stop is specified. You can find these paths using which lxc-start and which lxc-stop.
Enabling and Starting the Service:
  1. Save the updated container.service file in ~/.config/systemd/user/.

  2. Enable the service:

    systemctl --user enable container@dns1.service
    
  3. Start the service:

    systemctl --user start container@dns1.service
    
  4. Check the service status:

    systemctl --user status container@dns1.service
    

    Ensure the service is active (running) and there are no errors.

3. Network Configuration Issues

The Issue:

Even if the container is running, network connectivity problems can make it appear unresponsive from other machines. This can be caused by incorrect network settings within the container itself or issues with the host’s network configuration, such as firewall rules or routing problems.

The Solution:

  1. Verify Container Network Settings:

    • IP Address: Confirm that the container has a valid IP address within the expected network range (192.168.0.4 in your example). Use the following command inside the container:
      ip addr show
      
    • Gateway: Ensure the container has a correct gateway configured, allowing it to reach the outside network. This is typically the host’s IP address on the relevant network interface. Check the container’s /etc/resolv.conf for DNS settings.
    • DNS: Verify that the container can resolve domain names. Check the /etc/resolv.conf file within the container to ensure it has valid DNS server addresses.
  2. Check Host Network Settings:

    • Firewall: The host’s firewall might be blocking traffic to or from the container. Use iptables -L or ufw status (if using UFW) to check the firewall rules. Ensure that traffic to the container’s IP address (192.168.0.4) on the relevant ports (e.g., SSH port 22) is allowed.
    • Routing: If the container is on a different subnet than the machines trying to access it, you may need to configure routing on the host to forward traffic to the container’s network.
    • IP Forwarding: Ensure IP forwarding is enabled on the host. This allows the host to forward traffic between different network interfaces. You can enable it temporarily with:
      sudo sysctl net.ipv4.ip_forward=1
      
      To make it permanent, edit /etc/sysctl.conf and uncomment or add the line net.ipv4.ip_forward=1. Then, run sudo sysctl -p.

4. Terminal Multiplexers (Screen, tmux) as a Temporary Workaround (But Not a Solution)

The Issue:

While not a direct solution, using terminal multiplexers like screen or tmux can temporarily prevent the container from stopping when you close your SSH session. This is because the container is started within the screen or tmux session, which remains active even after you disconnect.

Why It’s Not a Solution:

This is merely a workaround. It doesn’t address the underlying problem of the container being tied to your user session. It also introduces additional complexity and reliance on the screen or tmux session.

How to Use It (Temporarily):

  1. Start a screen or tmux session:
    screen  # or tmux new-session
    
  2. Start your container within the session:
    lxc-start -n dns1
    
  3. Detach from the session:
    • screen: Press Ctrl+A, then D.
    • tmux: Press Ctrl+B, then D.

The container will continue running within the detached session. You can later reattach to the session to monitor the container.

5. LXC Configuration Issues

The Issue:

Problems within the LXC container’s configuration file can also cause unexpected behavior. This includes incorrect network settings, resource limits, or other configuration errors.

The Solution:

  1. Check the Container’s Configuration File:

    • The main configuration file is typically located at /var/lib/lxc/<container_name>/config. Examine this file for any errors or inconsistencies.
    • Pay close attention to network settings (IP address, gateway, DNS), resource limits (CPU, memory), and any custom configurations you’ve made.
  2. Verify Resource Limits:

    • Ensure that the container has sufficient resources (CPU, memory) allocated to it. If the container is starved of resources, it may become unstable or crash.
    • You can adjust resource limits in the container’s configuration file or using LXC commands.

6. systemd User Lingering

The Issue:

In some cases, systemd user sessions might be automatically killed when the last login session for a user ends, even if there are services still running under that user.

The Solution:

Enable lingering for the user that runs the container. This will keep the user’s systemd instance running even after the user logs out.

  1. Enable Lingering:

    loginctl enable-linger <username>
    

    Replace <username> with the username specified in the User= directive of your systemd service file.

  2. Verify Lingering Status:

    loginctl show-user <username> | grep Linger
    

    Ensure that Linger=yes is displayed in the output.

Putting It All Together: A Comprehensive Approach

To ensure your LXC container runs reliably after you close your SSH session, follow these steps:

  1. Update your container.service file with the corrected configuration, including the User= directive and correct paths.
  2. Enable and start the systemd service using systemctl --user enable container@dns1.service and systemctl --user start container@dns1.service.
  3. Check the service status using systemctl --user status container@dns1.service to verify it’s running correctly.
  4. Verify the container’s network settings to ensure it has a valid IP address, gateway, and DNS configuration.
  5. Check the host’s firewall and routing settings to ensure traffic to and from the container is allowed.
  6. Enable IP forwarding on the host if necessary.
  7. Examine the container’s configuration file for any errors or inconsistencies.
  8. Verify resource limits to ensure the container has sufficient CPU and memory.
  9. Enable lingering for the user running the container, if necessary.

By systematically addressing these potential issues, you can ensure that your LXC container runs independently of your SSH sessions, providing a stable and reliable environment for your applications.