Ecto in Production (Part 1)

July 3, 2024 - 4 mins read

I’ve added Ecto to my Minotaur project and confirmed it works locally in development and test environments. I have a production PostgreSQL server running on the same VPS that runs Minotaur in Docker Swarm. I want to merge the latest code changes to ensure the Repo module is starting correctly in production.

Running containers outside of Swarm

I connect to my VPS and pull the latest changes from the main branch and build a new Docker image tagged as minotaur:ecto. I start a single container outside of the swarm service with docker run minotaur:ecto which crashes with the following error:

Can't set long node name!
Please check your configuration

This is likely to do with how I have hard coded the release environment to always expect to be run within the Docker Swarm service. I run the container again, but override the entrypoint to start an interactive terminal so I can poke around and validate my assumptions.

docker run -it --entrypoint /bin/bash minotaur:ecto

I suspect the issue lies with the RELEASE_NODE variable which is set at startup by releases/0.1.0/env.sh. This file contains the following code:

# Get the IP address for the Docker service overlay network
get_overlay_ip() {
  # Get all IP addresses
  IPS=$(hostname -I)

  for IP in $IPS; do
    # Check if the IP address is within the expected range (10.0.1.0/24)
    if echo $IP | grep -q "^10\.0\.1\."; then
      echo $IP
      return
    fi
  done
}

IP_ADDRESS=$(get_overlay_ip)

# Set the release to work across nodes.
# RELEASE_DISTRIBUTION must be "sname" (local), "name" (distributed) or "none".
export RELEASE_DISTRIBUTION=name
export RELEASE_NODE=minotaur@${IP_ADDRESS}

If an IP address that matches the expected Docker network is not found, get_overlay_ip will not output anything to standard output and RELEASE_NODE will have an invalid longname of minotaur@. In order to start one-off container instances outside of the swarm for troubleshooting or manual tasks, I’ll need to modify this release configuration.

To quickly test that fixing the longname will solve the startup issue, I override the export statement by running the following within the running container:

echo "export [email protected]" >> releases/0.1.0/env.sh

I attempt to start the server within the container with ./bin/server which gives a different error:

ERROR! Config provider Config.Reader failed with:
** (RuntimeError) environment variable DATABASE_URL is missing.
For example: ecto://USER:PASS@HOST/DATABASE

Perfect! This is the error shows the startup is proceeding past the previous longname error and is hitting the expected error for Ecto since I haven’t yet set a DATABASE_URL variable.

I update the release configuration in my project to have a valid fallback IP if the container is running outside the swarm:

# Get the IP address for the Docker service overlay network
get_overlay_ip() {
  # Get all IP addresses
  IPS=$(hostname -I)

  for IP in $IPS; do
    # Check if the IP address is within the expected range (10.0.1.0/24)
    if echo $IP | grep -q "^10\.0\.1\."; then
      echo $IP
      return
    fi
  done

  echo "127.0.0.1"
}

IP_ADDRESS=$(get_overlay_ip)

# Set the release to work across nodes.
# RELEASE_DISTRIBUTION must be "sname" (local), "name" (distributed) or "none".
export RELEASE_DISTRIBUTION=name
export RELEASE_NODE=<%= @release.name %>@${IP_ADDRESS}

Connecting to PostgreSQL

I commit the changes and rebuild the container. I set the variable DATABASE_URL in my local shell environment with the format of ecto://USER:PASS@HOST/DATABASE. For USER, PASS, and DATABASE I’m using the values I previously configured for PostgreSQL. I set HOST to host.docker.internal:5432 which will use the host machine’s port within Docker containers.

Next, I start the container and pass in the required environment variables for the application.

docker run -e SECRET_KEY_BASE=$MINOTAUR_SECRET_KEY_BASE -e DATABASE_URL=$MINOTAUR_DATABASE_URL  minotaur:ecto

The container crashes with an unhelpful error in the crash dump:

Slogan: Runtime terminating during boot (terminating)

The only thing that changed before getting this error was setting DATABASE_URL. I try again, but replace the password segment of the URL with the text password to see what happens if the credentials are incorrect. This time the application boots successfully, but the Postgrex module logs errors due to failed connections which is what I would expect with the wrong password. However, the logged errors are showing issues with the domain not existing.

[error] Postgrex.Protocol (#PID<0.1953.0>) failed to connect: ** (DBConnection.ConnectionError) tcp connect (host.docker.internal:5432): non-existing domain - :nxdomain

Using the wrong password allowed the application to boot which means there is likely an encoding problem with special characters in the password. The other issue is the Docker DNS is not resolving the hostname.

The Docker docs show I need to pass --add-host=host.docker.internal:host-gateway to the run command to map host.docker.internal to the host’s local IP.

Running the container with the --add-host parameter allows the hostname to resolve, but the connection times out since I’m still using the dummy password.

[error] Postgrex.Protocol (#PID<0.1951.0>) failed to connect: ** (DBConnection.ConnectionError) tcp connect (host.docker.internal:5432): timeout

Looking at my production password, I see it contains a # character which is going to cause problems if not encoded. I take the quick path around the problem by just I updating the password in PostgreSQL to remove that character and update DATABASE_URL to match it. Unfortunately, I’m still seeing the a timeout when trying to connect so I’ll have to continue this investigation.