Test First
Writing the first test
Excited to start writing tests for the cluster behavior, I wrote up the skeleton for my first test case:
@cluster_opts [cluster_size: 2, boot_timeout: 4_000]
scenario "when a node shuts down while having an active game session", @cluster_opts do
test "game session is restarted on another node", _ctx do
assert false
end
end
The code compiles and the test fails as expected.
Next, I check that I can interact with the nodes using the interface provided by ex_unit_clustered_case.
I pick a function in Minotaur.GameEngine
module to run on a node to see that I get an expected result.
With a clean application state, there shouldn’t be any game processes.
test "game session is restarted on another node", %{cluster: cluster} do
[node1, _node2] = Cluster.members(cluster)
res = Cluster.call(node1, Minotaur.GameEngine, :get_game, ["ABCD"])
assert {:error, :game_not_alive} == res
end
The test fails.
1) test when a node shuts down while having an active game session game session is restarted on another node (Minotaur.Cluster.GameSessionTest)
test/cluster/game_session_test.exs:9
Assertion with == failed
code: assert {:error, :game_not_alive} == res
left: {:error, :game_not_alive}
right: {:error, %ArgumentError{message: "unknown registry: Minotaur.GameEngine.SessionRegistry"}}
stacktrace:
test/cluster/game_session_test.exs:12: (test)
The result of calling :get_game
on the node is {:error, %ArgumentError{message: "unknown registry: Minotaur.GameEngine.SessionRegistry"}}
which means the registry is likely not started.
This makes sense since the nodes have been started, but not the Minotaur application on each node.
The README for ex_unit_clustered_case
shows exactly how to do this with a convenience function node_setup
which works like the standard setup
callback in ExUnit, but applies the callback to each node in the cluster.
I update the scenario
block just like the provided README example:
scenario "when a node shuts down while having an active game session", @cluster_opts do
node_setup [:start_apps]
test "game session is restarted on another node", %{cluster: cluster} do
[node1, _node2] = Cluster.members(cluster)
res = Cluster.call(node1, Minotaur.GameEngine, :get_game, ["ABCD"])
assert {:error, :game_not_alive} == res
end
end
defp start_apps(ctx) do
Application.ensure_all_started(:minotaur)
end
However, running the test runs into a compilation error:
== Compilation error in file test/cluster/game_session_test.exs ==
** (FunctionClauseError) no function clause matching in ExUnit.ClusteredCase.node_setup/1
(ex_unit_clustered_case 0.5.0) expanding macro: ExUnit.ClusteredCase.node_setup/1
test/cluster/game_session_test.exs:9: Minotaur.Cluster.GameSessionTest (module)
(ex_unit_clustered_case 0.5.0) expanding macro: ExUnit.ClusteredCase.node_setup/1
test/cluster/game_session_test.exs:9: Minotaur.Cluster.GameSessionTest (module)
(ex_unit 1.16.2) expanding macro: ExUnit.Case.describe/2
test/cluster/game_session_test.exs:8: Minotaur.Cluster.GameSessionTest (module)
(ex_unit_clustered_case 0.5.0) expanding macro: ExUnit.ClusteredCase.scenario/3
test/cluster/game_session_test.exs:8: Minotaur.Cluster.GameSessionTest (module
It doesn’t like the use of node_setup
which is a macro being imported within the using
macro for ExUnit.ClusteredCase
module.
I can’t find this particular issue online and the repo has been quiet for the past 2 years.
I poke around the source code and believe the issue is coming from a recursive call within the macro definition when matching for a callback as a List
:
defmacro node_setup(callbacks) when is_list(callbacks) do
quote bind_quoted: [callbacks: callbacks] do
for cb <- callbacks do
unless is_atom(cb) do
raise ArgumentError, "expected list of callbacks as atoms, but got: #{callbacks}"
end
node_setup(cb)
end
end
end
As much as my curiousity is pushing me to try to find a fix or reach out to the library maintainer, I’d be taking on another side quest that I don’t need right now.
Instead, I find the definition for node_setup
that was attempting to be called from the macro and just extract the underlying logic from the helper to use in my test module directly.
The solution is very simple:
scenario "when a node shuts down while having an active game session", @cluster_opts do
setup [:start_apps]
test "game session is restarted on another node", %{cluster: cluster} do
# test content remains the same
# ...
assert {:error, :game_not_alive} == res
end
end
defp start_apps(%{cluster: cluster}) do
Cluster.map(cluster, fn ->
Application.ensure_all_started(:minotaur)
end)
end
This time, the Minotaur application is started on each node and the call to :get_game
returns the expected :game_not_alive
error result.
I belive (or really hope) that I have everything I need to start writing my test case to assert the desired behavior before I implement anything in the application code.
Describing behavior through tests
I clear out the inner block of the test and replace it with some placeholder comments for the code I’ll need to complete the test.
I also add another comment after the setup
callback to tell me I’ll need to add another function that starts a game on node1.
scenario "when a node shuts down while having an active game session", @cluster_opts do
setup [:start_apps]
# Start game on node1
test "game session is restarted on another node", _ctx do
# Stop node1
# Assert game is alive on node2
end
end
Let’s tackle the setup first.
I add another callback reference to the setup
block with the most obvsious name start_game_on_node1
.
Then, I implement the function:
import Minotaur.GameEngineFixtures
defp start_game_on_node1(%{cluster: cluster}) do
[node1, _node2] = Cluster.members(cluster)
game = game_fixture()
Cluster.call(node1, fn ->
Minotaur.GameEngine.continue_game(game)
end)
[game: game]
end
game_fixture/1
is a helper function imported by Minotaur.GameEngineFixtures
to build a valid Game
structure for tests.
continue_game/1
is an existing function which attempts to start a new game session process with existing state and register it with Minotaur.GameEngine.SessionRegistry
.
game
is returned by the setup function to add it to the test context so the game id can be referenced in the tests.
I validate that the game session is live by writing a temporary check in the test case:
test "game session is restarted on another node", ctx do
[node1, _] = Cluster.members(ctx.cluster)
{:ok, game} = Cluster.call(node1, fn -> Minotaur.GameEngine.get_game(ctx.game.id) end)
assert false == game.id
# Stop node1
# Assert game is alive on node2
end
I like to use assert
to quickly validate things as I’m building out the test code.
I can see the value of game.id
is “XYZA” which is the default value from the game fixture.
Now that I can see the node is running as expected with the game session process, I want to go back and do a bit of cleanup with the small bit of test code I’ve written.
Let’s create a reference to the individual nodes within the test context instead of calling CLuster.members/1
whenever we need to access a single node.
defp start_apps(%{cluster: cluster}) do
Cluster.map(cluster, fn ->
Application.ensure_all_started(:minotaur)
end)
[node1, node2] = Cluster.members(cluster)
[node1: node1, node2: node2]
end
defp start_game_on_node1(%{node1: node1}) do
game = game_fixture()
Cluster.call(node1, fn ->
Minotaur.GameEngine.continue_game(game)
end)
[game: game]
end
test "game session is restarted on another node", ctx do
{:ok, game} = Cluster.call(ctx.node1, fn -> Minotaur.GameEngine.get_game(ctx.game.id) end)
assert false == game.id
end
Next, node1
needs to be stopped which can be accomplished with Cluster.stop_node/2
.
I can verify that it works as expected by checking the number of elements in Cluster.members/1
.
test "game session is restarted on another node", ctx do
Cluster.stop_node(ctx.cluster, ctx.node1)
res = Cluster.members(ctx.cluster)
assert res == false
# Assert game is alive on node2
end
Assertion with == failed
code: assert res == false
left: [:"[email protected]"]
right: false
The last placeholder comment to implement is to check that the game process is accessible from node2
.
The logic for this respawn behavior has not yet been implemented, but creating this failing test first will give me a way to validate when I’ve achieved the minimum desired behavior by running the test case.
I will also have confidence that I don’t have a false positive in my test because I’m making sure it is in a failing state when I expect it to not yet pass.
test "game session is restarted on another node", ctx do
Cluster.stop_node(ctx.cluster, ctx.node1)
res =
Cluster.call(ctx.node2, fn ->
Minotaur.GameEngine.get_game(ctx.game.id)
end)
assert {:ok, %{id: game_id}} = res
assert ctx.game_id == game_id
end
match (=) failed
code: assert {:ok, %{id: game_id}} = res
left: {:ok, %{id: game_id}}
right: {:error, :game_not_alive}
Next Steps
The Horde DynamicSupervisor
appears to be a drop in replacement for the native Elixir DynamicSupervisor
and will automatically restart supervised processes on another node if the host node is shutdown.
Unfortunately, just swapping the modules out was not enough to make things work.
The dynamic GenServer
processes for game sessions are currently started using start_child/2
of DynamicSupervisor
module without an :id
value which I believe will need to be set to use Horde.DynamicSupervisor
.
I’m glad to be back in the application code now that the tests are setup, but I’ll have to tackle this problem another day.