Skip to main content

Optimizing Wine on OS X

I've been doing some performance analysis of EVE running under Wine on OS X. My main test cases are a series of scenes run with the EVE Probe - our internal benchmarking tool. This is far more convenient than running the full EVE client, as it focuses purely on the graphics performance and does not require any user input.

Wine Staging

One thing I tried was to build Wine Staging. On its own, that did not really change anything. Turning on CSMT, on the other hand, made quite a difference, taking the average frame time down by 30% for the test scene I used. While the performance boost was significant there were also significant glitches in the rendering, with parts of the scene flickering in and out. Too bad - it means I can't consider this yet for EVE, but I will monitor the progress of this.

OpenGL Profiler

Apple has the very useful OpenGL profiler available for download. I tried running one of the simpler scenes under the profiler to capture statistics on the OpenGL calls made.

One thing that stood out to me was the high number of glGetError calls. Disabling them turned out to be easy enough, by recompiling Wine with -DWINE_NO_DEBUG_MSGS=1 added to CFLAGS. Running that same scene again took the number of glGetError calls down from 7.4 million to 18 thousand. The overall impact on performance was not significant, though.

The profiler can also capture a full trace of all OpenGL calls made by an application. Looking at the full trace can be interesting, if somewhat daunting. A minute long capture of this fairly simple scene yields 5.8M function calls. I plan to do further tests later on, with ultra simple scenes to better understand how DirectX calls get translated to OpenGL calls.

Multithreaded OpenGL engine

Apple's OpenGL implementation has something called the multithreaded OpenGL engine. This needs to enabled explicitly. The multithreaded OpenGL engine then creates a worker thread and transfers some of its calculations to that thread. On a multicore system (as Macs are today), this allows internal OpenGL calculations performed on the CPU to act in parallel with Wine, improving performance.

It took me a little while to figure out where to put the code to enable this, and it didn't help that the CGLEnable function did not complain when I accidentally passed in an invalid context. Eventually I added the following lines in dlls/winemac.drv/opengl.c, in the function create_context:
    TRACE("Enabling the multithreaded OpenGL engine\n");
    err = CGLEnable(context->cglcontext, kCGLCEMPEngine);
    if(err != kCGLNoError)
         WARN("Enabling the multithreaded OpenGL engine failed\n");

Enabling the multithreaded engine did help with performance although initial tests indicated the opposite. It wasn't until I tested with the glGetError calls disabled and the multithreaded engine enabled that I saw some positive results. The boost in performance was more significant for the heavier scenes, such as our deathcube of 1000 ships.

The future

I'm just starting to get to know the Wine code base and it's hard to say what optimization opportunities are in there. One thing is for certain, though, that I will spend a lot more time analyzing it and trying out different things, both in our codebase as well as Wine.


Popular posts from this blog

Mnesia queries

I've added search and trim to my expiring records module in Erlang. This started out as an in-memory key/value store, that I then migrated over to using Mnesia and eventually to a replicated Mnesia table. The fetch/1 function is already doing a simple query, with match_object. Result=mnesia:match_object(expiring_records, #record{key=Key, value='_', expires_at='_'}, read) The three parameters there are the name of the table - expiring_records, the matching pattern and the lock type (read lock). The fetch/1 function looks up the key as it was added to the table with store/3. If the key is a tuple, we can also do a partial match: Result=mnesia:match_object(expiring_records, #record{key= {'_', "bongo"}, value='_', expires_at='_'}, read) I've added a search/1 function the module that takes in a matching pattern and returns a list of items where the key matches the pattern. Here's the test for the search/1 function: search_partial_…

Replicated Mnesia

I'm still working on my expiring records module in Erlang (see here and here for my previous posts on this). Previously, I had started using Mnesia, but only a RAM based table. I've now switched it over to a replicated disc based table. That was easy enough, but it took a while to figure out how to do, nonetheless. I had assumed that simply adding ... {disc_copies, [node()]} ... to the arguments to mnesia:create_table would be enough. This resulted in an error: {app_test,init_per_testcase, {{badmatch, {aborted, {bad_type,expiring_records,disc_copies,nonode@nohost}}}, ... After some head-scratching and lots of Googling I realized that I was missing a call to mnesia:create_schema to allow it to create disc based tables. My tests for this module are done with common_test so I set up a per suite initialization function like this: init_per_suite(Config) ->mnesia:create_schema([node()]), mnesia:start(…