Disclaimer: Following the advice of this post does not mean abandoning supervisors entirely and rolling your own solution every time.

Supervision trees in Elixir are a wonderful thing. Really. You get guaranteed information about when your processes crash, with the foresight to go ahead and restart the process before you need to get involved. One of the truest bits of advice in Elixir:

Spawning a process without supervision means you don’t care about whether or not the process works. Why even write the code in the first place?

This is why you see start_link() calls everywhere. Most processes are responsible for other processes in some way. Supervisors abstract away common tasks like monitoring and restarts.

But this isn’t always a good thing.

In some cases, you may find yourself fighting against their behavior. If either of these apply to your situation, it may be wise to pick a different strategy:

  1. The parent process is a unified API of its children, routing and queuing messages
  2. A crashing child means the parent should clean up the other children and close

Now, both of these can be satisfied under a typical supervision structure, but there are some issues.

Supervisors Don’t Want Your Messages

It would be nice to have handle_cast and handle_call callbacks in your supervisor, but sadly they aren’t available. Messaging child processes involves either calling Supervisor.which_children/1 or registering everything in a Registry. Both involve a certain degree of boilerplate to make the client assemble and/or fetch a pid before actually making the call.

Queuing will require another child process entirely, and also must be made aware of its siblings to whom messages will be routed.

Supervisors Don’t Do Your Custom Cleanup

It’s recommened to have another external process monitoring the children and executing cleanup as necessary. While this certainly works, you now have yet another process in the mix keeping track of the other siblings. If the parent also needs to terminate, this process will also have to explicitly spawn a task calling Supervisor.stop/1, which can lead to interesting race conditions.

What’s the alternative?

Trap Exits to the Rescue

By calling Process.flag(:trap_exit, true) in your parent process, you can monitor whats going on with the children. Instead of a Supervisor, you can use a standard GenServer as the parent.

Normally when a parent calls start_link() on a child process, if the child dies, the parent will die with it. By trapping exits, instead the parent will receive a message, and from there it can decide what to do.

Here’s a really simple example:

defmodule Simple.Parent do
  use GenServer

  def start_link do
    GenServer.start_link(__MODULE__, :ok)
  end

  def init(:ok) do
    Process.flag(:trap_exit, true)

    case Simple.Child.start_link() do
      {:ok, pid} ->
        {:ok, %{child: pid}}

      {:error, _reason} ->
        # Something bad
        {:stop, {:error, :child_failed}}
    end
  end

  def do_work(request) do
    GenServer.call(__MODULE__, {:do_work, request})
  end

  def handle_call({:do_work, request}, _from, state) do
    # Forward the message, or perhaps do something fancier
    GenServer.call(state.child, {:do_work, request})
    {:reply, :ok, state}
  end

  def handle_info({:EXIT, _pid, reason}, state) do
    # Child went down, we're done here.
    {:stop, :shutdown, state}
  end
end

This example handles both of the situations above, and can be easily extended to handle more complex functionality. You can pattern match specific values of reason in the :EXIT message to differentiate between normal process shutdowns and crashes.

Want a more complicated example? Take a look at v0.4.3 of Kadabra, an HTTP/2 client for Elixir.

In it you’ll find:

  1. A client opens an HTTP/2 connection, which starts a ConnectionPool
  2. ConnectionPool starts a Connection instance
  3. Connection starts two instances of Hpack and one Socket
  4. Subsequent client HTTP/2 requests get routed to the Connection, which starts various instances of Stream that make the requests and clean up gracefully.
  5. Excess requests are held in queue by ConnectionPool
  6. If Hpack, Socket or Stream instances crash, everything else is gracefully stopped, including Connection and its parent ConnectionPool

By structuring it this way, Kadabra was able to remove GenStage as a dependency for request queueing, remove the need for a process registry for child streams, and completely eliminate OTP crash reports during normal parent cleanup.

Kadabra is still a work in progress for general purpose HTTP/2 functionality. Interested in helping out? Contributions are welcome!