Why my lxc container stop when I close ssh session with host?
Why Your LXC Container Stops When You Close an SSH Session with the Host: Troubleshooting and Solutions
As revWhiteShadow, the author of kts personal blog site revWhiteShadow, we understand the frustration of having your LXC containers unexpectedly stop when you close your SSH session to the host machine. This is a common issue, and several factors can contribute to it. Let’s dive into the possible causes and, more importantly, how to resolve them, ensuring your containers remain running independently of your SSH sessions.
Understanding the Problem: LXC Containers and SSH Sessions
The core issue stems from how processes are managed within a Linux environment. When you initiate an SSH session, a process is created to handle your connection. If the LXC container startup process is inadvertently linked to this SSH session’s lifespan, the container may be terminated when the session ends. This linkage can occur in various ways, often related to how the container is started or the user under which it runs.
Common Causes for LXC Container Shutdowns on SSH Session Closure
Let’s explore the most frequent reasons why your LXC container might be stopping when your SSH session closes:
1. User Session Dependency
The Issue:
If the container startup process is directly tied to your user session, closing the SSH session will signal the system to terminate all processes associated with that session, including the container. This is particularly relevant if you’re starting the container manually within the SSH session without properly detaching it.
The Solution:
The critical step here is to ensure the container is started independently of your user session. This is often achieved through systemd services, as you’ve already begun to implement. However, subtle configuration errors can still lead to dependency issues. We will examine the configurations that cause dependency issues.
2. Systemd Service Configuration Errors
The Issue:
While using systemd is the recommended approach, incorrect settings within your service file (~/.config/systemd/user/container.service
in your case) can prevent the container from running reliably in the background. Common pitfalls include missing or incorrect Type
, RemainAfterExit
, or User
directives.
The Solution:
Let’s examine and refine your container.service
file:
# ~/.config/systemd/user/container.service
#
[Unit]
Description=lxc-autostart containers
DefaultDependencies=no
Wants=network.target lxc.service
After=network.target lxc.service
[Service]
Type=simple
RemainAfterExit=yes
Delegate=yes
User=%I # Critical: Run the container as a specific user, not the SSH session user
ExecStart=/usr/bin/lxc-start -n %i # Use %i to get the container name from the service instance
ExecStop=/usr/bin/lxc-stop --kill %i
[Install]
WantedBy=default.target
Key Changes and Explanations:
After=network.target lxc.service
: This directive is added to explicitly state that thenetwork.target
andlxc.service
must be active before starting your container service. This ensures that the network is up and LXC is ready to manage containers.User=%I
: Replace%I
by the user account that will run the container. This line is crucial. Specifying the user ensures that the container runs under that user’s context, independently of your SSH session. Replace%I
with the actual username. It should be a non-root user for security best practices.ExecStart=/usr/bin/lxc-start -n %i
andExecStop=/usr/bin/lxc-stop --kill %i
: We’ve updated these lines to use%i
. This placeholder is automatically replaced by the name of the service instance. To use instance names, you need to start your service usingsystemctl --user start container@dns1.service
- This format allows you to manage multiple containers using a single service definition by creating instances.
- Ensure the full path to
lxc-start
andlxc-stop
is specified. You can find these paths usingwhich lxc-start
andwhich lxc-stop
.
Enabling and Starting the Service:
Save the updated
container.service
file in~/.config/systemd/user/
.Enable the service:
systemctl --user enable container@dns1.service
Start the service:
systemctl --user start container@dns1.service
Check the service status:
systemctl --user status container@dns1.service
Ensure the service is active (running) and there are no errors.
3. Network Configuration Issues
The Issue:
Even if the container is running, network connectivity problems can make it appear unresponsive from other machines. This can be caused by incorrect network settings within the container itself or issues with the host’s network configuration, such as firewall rules or routing problems.
The Solution:
Verify Container Network Settings:
- IP Address: Confirm that the container has a valid IP address within the expected network range (192.168.0.4 in your example). Use the following command inside the container:
ip addr show
- Gateway: Ensure the container has a correct gateway configured, allowing it to reach the outside network. This is typically the host’s IP address on the relevant network interface. Check the container’s
/etc/resolv.conf
for DNS settings. - DNS: Verify that the container can resolve domain names. Check the
/etc/resolv.conf
file within the container to ensure it has valid DNS server addresses.
- IP Address: Confirm that the container has a valid IP address within the expected network range (192.168.0.4 in your example). Use the following command inside the container:
Check Host Network Settings:
- Firewall: The host’s firewall might be blocking traffic to or from the container. Use
iptables -L
orufw status
(if using UFW) to check the firewall rules. Ensure that traffic to the container’s IP address (192.168.0.4) on the relevant ports (e.g., SSH port 22) is allowed. - Routing: If the container is on a different subnet than the machines trying to access it, you may need to configure routing on the host to forward traffic to the container’s network.
- IP Forwarding: Ensure IP forwarding is enabled on the host. This allows the host to forward traffic between different network interfaces. You can enable it temporarily with:To make it permanent, edit
sudo sysctl net.ipv4.ip_forward=1
/etc/sysctl.conf
and uncomment or add the linenet.ipv4.ip_forward=1
. Then, runsudo sysctl -p
.
- Firewall: The host’s firewall might be blocking traffic to or from the container. Use
4. Terminal Multiplexers (Screen, tmux) as a Temporary Workaround (But Not a Solution)
The Issue:
While not a direct solution, using terminal multiplexers like screen
or tmux
can temporarily prevent the container from stopping when you close your SSH session. This is because the container is started within the screen
or tmux
session, which remains active even after you disconnect.
Why It’s Not a Solution:
This is merely a workaround. It doesn’t address the underlying problem of the container being tied to your user session. It also introduces additional complexity and reliance on the screen
or tmux
session.
How to Use It (Temporarily):
- Start a
screen
ortmux
session:screen # or tmux new-session
- Start your container within the session:
lxc-start -n dns1
- Detach from the session:
screen
: PressCtrl+A
, thenD
.tmux
: PressCtrl+B
, thenD
.
The container will continue running within the detached session. You can later reattach to the session to monitor the container.
5. LXC Configuration Issues
The Issue:
Problems within the LXC container’s configuration file can also cause unexpected behavior. This includes incorrect network settings, resource limits, or other configuration errors.
The Solution:
Check the Container’s Configuration File:
- The main configuration file is typically located at
/var/lib/lxc/<container_name>/config
. Examine this file for any errors or inconsistencies. - Pay close attention to network settings (IP address, gateway, DNS), resource limits (CPU, memory), and any custom configurations you’ve made.
- The main configuration file is typically located at
Verify Resource Limits:
- Ensure that the container has sufficient resources (CPU, memory) allocated to it. If the container is starved of resources, it may become unstable or crash.
- You can adjust resource limits in the container’s configuration file or using LXC commands.
6. systemd User Lingering
The Issue:
In some cases, systemd user sessions might be automatically killed when the last login session for a user ends, even if there are services still running under that user.
The Solution:
Enable lingering for the user that runs the container. This will keep the user’s systemd instance running even after the user logs out.
Enable Lingering:
loginctl enable-linger <username>
Replace
<username>
with the username specified in theUser=
directive of your systemd service file.Verify Lingering Status:
loginctl show-user <username> | grep Linger
Ensure that
Linger=yes
is displayed in the output.
Putting It All Together: A Comprehensive Approach
To ensure your LXC container runs reliably after you close your SSH session, follow these steps:
- Update your
container.service
file with the corrected configuration, including theUser=
directive and correct paths. - Enable and start the systemd service using
systemctl --user enable container@dns1.service
andsystemctl --user start container@dns1.service
. - Check the service status using
systemctl --user status container@dns1.service
to verify it’s running correctly. - Verify the container’s network settings to ensure it has a valid IP address, gateway, and DNS configuration.
- Check the host’s firewall and routing settings to ensure traffic to and from the container is allowed.
- Enable IP forwarding on the host if necessary.
- Examine the container’s configuration file for any errors or inconsistencies.
- Verify resource limits to ensure the container has sufficient CPU and memory.
- Enable lingering for the user running the container, if necessary.
By systematically addressing these potential issues, you can ensure that your LXC container runs independently of your SSH sessions, providing a stable and reliable environment for your applications.