Skip to main content

Waiting for an answer

I want to describe my first iteration of exsim, the core server for the large scale simulation I described in my last blog post.
Listener module opens a socket for listening to incoming connections. Once a connection is made, a process is spawned for handling the login and the listener continues listening for new connections.
Once logged in, a Player is created, and a Solarsystem is started (if it hasn't already). The solar system also starts a PhysicsProxy, and the player starts a Ship. These are all GenServer processes.
The source for this is up on GitHub:


The player takes ownership of the TCP connection and handles communication with the game client (or bot). Incoming messages are parsed in handle_info/2 and handled by the player or routed to the ship, as appropriate.
The player creates the ship in its init/1 function.
The state for the player holds the ship and the name of the player.


The ship holds the state of the ship - its position, velocity, list of ships in range, etc. It also accepts commands from the player and queues them up for sending to the physics simulation.


The physics proxy manages the connection to the physics simulation, which is run in a separate OS process. The connection is a TCP socket, and the communication is done with JSON packets.


The solar system holds a list of ships present in the system, plus the link to the physics proxy.
It manages the ticking of the simulation for the system, which goes something like this:
  1. Save current list of ships as pending ships
  2. Call update on each ship
    1. Ship sends physics commands, and notifies system when done
    2. System removes ship from pending list once notification is received
  3. Once all ships are updated, the solar system updates the physics simulation
    1. Sends a stepsimulation command
    2. Sends a getstate command
  4. When the physics proxy receives the state from the physics simulation, it sends it to the solar system
  5. The solar system distributes the state:
    1. Sets the state for each ship (position, list of ships in range)
    2. Tells each ship to send the state to its client
      1. Ship gathers state from each ship within range, accumulating into a list
      2. Ship encodes the state to JSON and sends to client
      3. Ship notifies solar system that state has been delivered
  6. Once all ships have delivered their state, the next tick is scheduled
If I leave out the step of gathering state from each within range, this seems to work just fine. It is disappointing to see how slow the encoding and decoding of JSON is - I was hoping to be able to get to some decent numbers of bots running with this simplistic approach, but with only a few hundred bots running I'm already spending over a second per tick, most of it on JSON.
That's fine, I never expected to scale up with a fat text-based protocol for communication - it was convenient for getting started. Being able to connect to the server, or directly to the physics server with Telnet and give it commands and be able to read the output was very useful in the very first steps. I've started looking into other options, either roll my own binary protocol or use flatbuffers.

I'm waiting...

What is worse, I'm running into deadlocks with this setup if I let each ship store its own state.
Here's the code for gathering the state:
  def handle_cast({:send_solarsystem_state, solarsystem_state}, state) do
    me = %{"owner" => state[:owner], "type" => state[:typeid], "position" => state[:pos]}
    ships = [me]
    ships = List.foldl(
      fn (other, acc) -> "Finding pid for #{other}"
        other_ship = GenServer.whereis({:global, "ship_#{other}"})
        other_desc = %{
          "owner" => other,
          "type" => Ship.get_typeid(other_ship),
          "position" => Ship.get_position(other_ship)
        List.append(acc, other_desc)
    {:ok, json} = Poison.encode(%{"state" => %{"ships" => ships}})
    :gen_tcp.send(state[:socket], json)
    Solarsystem.notify_ship_state_delivered(state[:solarsystem], self())
    {:noreply, state}
Each ship is its own GenServer process, and the solar system casts this message to all ships, so they are all running this function concurrently. This works most of the time, but eventually I get an error like this:
23:24:42.472 [error] GenServer "ship_8" terminating
** (stop) exited in:<0.173.0>, {:get_typeid}, 5000)
    ** (EXIT) time out
    (elixir) lib/gen_server.ex:774:
    (solarsystem) lib/ship.ex:140: anonymous fn/2 in Ship.handle_cast/2
    (elixir) lib/list.ex:186: List."-foldl/3-lists^foldl/2-0-"/3
    (solarsystem) lib/ship.ex:132: Ship.handle_cast/2
    (stdlib) gen_server.erl:616: :gen_server.try_dispatch/4
    (stdlib) gen_server.erl:686: :gen_server.handle_msg/6
The problem is that get_typeid/1 and similar functions need a reply from the GenServer for the ship, but that ship may also be calling another ship requesting information, and sooner or later I run into a deadlock, where ship A is waiting for a response from ship B, which is waiting for a response from ship C, which is waiting for a response from ship A.

Dumbing it down

The solution, or at least a solution, is probably to stop storing state in the Ship process. The state comes from the solar system anyway, there maybe isn't any need to break it up and have each ship store its own piece of the information. If I keep all the state in the solar system and pass it down to the ship, the ship may as well gather the relevant bits to send to the client from the original big blob of state. Then this function in the Ship doesn't need to call other ships synchronously and I should be free from deadlocks. I guess I'm still thinking too much along the lines of object-oriented programming.

I must be missing something

I'm a little bit surprised at how easy it was to paint myself into a corner with Elixir. It's very easy to do certain things very efficiently with Erlang and Elixir, making good use of concurrency to keep things going with good performance.
I need to understand better how to use GenServers, where to store state and how to prevent deadlocks. The inherent problems of concurrency don't just disappear, even though the programming language provides mechanisms and conventions to deal with them.


Popular posts from this blog

Mnesia queries

I've added search and trim to my expiring records module in Erlang. This started out as an in-memory key/value store, that I then migrated over to using Mnesia and eventually to a replicated Mnesia table. The fetch/1 function is already doing a simple query, with match_object. Result=mnesia:match_object(expiring_records, #record{key=Key, value='_', expires_at='_'}, read) The three parameters there are the name of the table - expiring_records, the matching pattern and the lock type (read lock). The fetch/1 function looks up the key as it was added to the table with store/3. If the key is a tuple, we can also do a partial match: Result=mnesia:match_object(expiring_records, #record{key= {'_', "bongo"}, value='_', expires_at='_'}, read) I've added a search/1 function the module that takes in a matching pattern and returns a list of items where the key matches the pattern. Here's the test for the search/1 function: search_partial_…

Replicated Mnesia

I'm still working on my expiring records module in Erlang (see here and here for my previous posts on this). Previously, I had started using Mnesia, but only a RAM based table. I've now switched it over to a replicated disc based table. That was easy enough, but it took a while to figure out how to do, nonetheless. I had assumed that simply adding ... {disc_copies, [node()]} ... to the arguments to mnesia:create_table would be enough. This resulted in an error: {app_test,init_per_testcase, {{badmatch, {aborted, {bad_type,expiring_records,disc_copies,nonode@nohost}}}, ... After some head-scratching and lots of Googling I realized that I was missing a call to mnesia:create_schema to allow it to create disc based tables. My tests for this module are done with common_test so I set up a per suite initialization function like this: init_per_suite(Config) ->mnesia:create_schema([node()]), mnesia:start(…

Optimizing Wine on OS X

I've been doing some performance analysis of EVE running under Wine on OS X. My main test cases are a series of scenes run with the EVE Probe - our internal benchmarking tool. This is far more convenient than running the full EVE client, as it focuses purely on the graphics performance and does not require any user input.

Wine Staging One thing I tried was to build Wine Staging. On its own, that did not really change anything. Turning on CSMT, on the other hand, made quite a difference, taking the average frame time down by 30% for the test scene I used. While the performance boost was significant there were also significant glitches in the rendering, with parts of the scene flickering in and out. Too bad - it means I can't consider this yet for EVE, but I will monitor the progress of this. OpenGL Profiler Apple has the very useful OpenGL profiler available for download. I tried running one of the simpler scenes under the profiler to capture statistics on the OpenGL calls mad…