You Don't Always Need a Supervisor
Supervision trees in Elixir are a wonderful thing. Really. You get guaranteed information about when your processes crash, with the foresight to go ahead and restart the process before you need to get involved. One of the truest bits of advice in Elixir:
Spawning a process without supervision means you don't care about whether or not the process works. Why even write the code in the first place?
This is why you see start_link()
calls everywhere. Most processes are
responsible for other processes in some way. Supervisors abstract away
common tasks like monitoring and restarts.
But this isn't always a good thing.
In some cases, you may find yourself fighting against their behavior. If either of these apply to your situation, it may be wise to pick a different strategy:
- The parent process is a unified API of its children, routing and queuing messages
- A crashing child means the parent should clean up the other children and close
Now, both of these can be satisfied under a typical supervision structure, but there are some issues.
Supervisors Don't Want Your Messages
It would be nice to have handle_cast
and handle_call
callbacks in your
supervisor, but sadly they aren't available. Messaging child processes involves either calling Supervisor.which_children/1
or registering everything in a Registry. Both involve a certain degree of boilerplate to make the client assemble and/or fetch a pid before actually making the call.
Queuing will require another child process entirely, and also must be made aware of its siblings to whom messages will be routed.
Supervisors Don't Do Your Custom Cleanup
It's recommened to have another external process monitoring the children and
executing cleanup as necessary. While this certainly works, you now have yet
another process in the mix keeping track of the other siblings. If the parent also needs to terminate, this process will also have to explicitly spawn a task calling Supervisor.stop/1
, which can lead to interesting race conditions.
What's the alternative?
Trap Exits to the Rescue
By calling Process.flag(:trap_exit, true)
in your parent process, you can
monitor whats going on with the children. Instead of a Supervisor, you can use a standard GenServer as the parent.
Normally when a parent calls start_link()
on a child process, if the child dies, the parent will die with it. By trapping exits, instead the parent will receive a message, and from there it can decide what to do.
Here's a really simple example:
defmodule Simple.Parent do
use GenServer
def start_link do
GenServer.start_link(__MODULE__, :ok)
end
def init(:ok) do
Process.flag(:trap_exit, true)
case Simple.Child.start_link() do
{:ok, pid} ->
{:ok, %{child: pid}}
{:error, _reason} ->
# Something bad
{:stop, {:error, :child_failed}}
end
end
def do_work(request) do
GenServer.call(__MODULE__, {:do_work, request})
end
def handle_call({:do_work, request}, _from, state) do
# Forward the message, or perhaps do something fancier
GenServer.call(state.child, {:do_work, request})
{:reply, :ok, state}
end
def handle_info({:EXIT, _pid, reason}, state) do
# Child went down, we're done here.
{:stop, :shutdown, state}
end
end
This example handles both of the situations above, and can be easily extended to handle more complex functionality. You can pattern match specific values ofreason
in the :EXIT
message to differentiate between normal process shutdowns and crashes.
Want a more complicated example? Take a look at v0.4.3 of Kadabra, an HTTP/2 clientfor Elixir.
In it you'll find:
- A client opens an HTTP/2 connection, which starts a
ConnectionPool
ConnectionPool
starts aConnection
instanceConnection
starts two instances ofHpack
and oneSocket
- Subsequent client HTTP/2 requests get routed to the
Connection
, which
starts various instances ofStream
that make the requests and clean up
gracefully. - Excess requests are held in queue by
ConnectionPool
- If
Hpack
,Socket
orStream
instances crash, everything else is
gracefully stopped, includingConnection
and its parentConnectionPool
By structuring it this way, Kadabra was able to remove GenStage as a
dependency for request queueing, remove the need for a process registry
for child streams, and completely eliminate OTP crash reports during normal
parent cleanup.
Kadabra is still a work in progress for general purpose HTTP/2 functionality.Interested in helping out? Contributions are welcome!