Working with Signals
Side Quest: Docker signals not handled
When a game session reaches a game over condition, I need to shutdown game session processes so they aren’t restarted by the Dynamic Supervisor that spawned them. I experiment with the restart strategies for child processes to see how the Docker termination signal is handled during a rolling deploy. I notice that the exit signals are not being properly handled in the game session processes despite explicitly trapping exits and defining the termination callback.
I spend a few hours troubleshooting by creating a new minimal reproduction of an Elixir application with a Dynamic Supervisor that spawns a GenServer with exit trapping.
I finally find the culprit is the CMD statement syntax in the Dockerfile.
The Dockerfile for Minotaur uses the following syntax:
CMD /app/bin/migrate && /app/bin/server
This command uses the “shell form” which is run by the container’s shell and does not forward process signals to child processes. The “exec form” runs the command directly as the main process and the Docker signals are handled in the Elixir application.
I update the command to CMD ["/bin/bash", "-c", "/app/bin/migrate && /app/bin/server" ] and the GenServer processes are trapping the SIGTERM exit codes during rolling deploys.
Stopping GenServer processes
Game session processes are GenServers which are supervised by a distributed DynamicSupervisor provided by the Horde library.
When I deploy a new version of the application, a new node is added to the cluster and the old node is signaled to terminate which allows game session processes to stash their state and respawn on the new node with minimal downtime.
This behavior works only when I configure the restart policy of the child GenServer processes to :permanent which means they will be restarted by their supervisor if they are terminated for any reason.
I need to make a change that will allow the game session processes to be shut down without triggering the DynamicSupervisor to start them again when the game reaches a game over state.
The usual configuration for a GenServer process that has a specified end of life is to use the :transient restart option which will only have its supervisor restart the process when it shuts down for reasons other than :normal, :shutdown, or {:shutdown, term}.
If I switch the game session processes to this option, they will lose the benefit of automatically restarting during a rolling deploy or restart.
This is because the SIGTERM sent by Docker is treated as a standard :shutdown reason.
It seems these two restart options are at odds with each other in terms of the two use cases I want to implement.
One option might be to intercept the shutdown signals at the supervisor level and add custom logic to support the restart behavior I want.
A simpler alternative could be to leave the restart option as :permanent and have the process call DynamicSupervisor.terminate_child/2 for itself when the game session is in a game over state.
I will pursue this second option in my next development session for this project.