gfx-rs is a Rust project aiming to make graphics programming more accessible and portable, focusing on exposing a universal Vulkan-like API. It’s a single Rust API with multiple backends that implement it: Direct3D 12/11, Metal, Vulkan, and even OpenGL. We are also building a Vulkan Portability implementation based on it, which allows non-Rust applications using Vulkan to run everywhere. This post is focused on the Metal backend only.
Previously, we benchmarked Dota2 and were able to run many other applications and engines successfully, including Dolphin Emulator. For Dolphin, we previously focused on visual correctness. After games appeared to render correctly, we shifted our focus to performance to ensure they also render quickly.
@MayImilae proposed a simple benchmark scenario: run the game Metroid Prime 2 (US), load into Sanctuary Fortress, wait for the animation to finish, and finally record 20 seconds of frame times (without providing any input to the game). We ensure the game window is on screen and in focus while being benchmarked.
The Dolphin settings used for the benchmark were:
- Store EFB Copies to Texture Only must be enabled
- Speed Limit: Unlimited
- 4x native internal resolution
- Vsync: Off
As with Dota2, gfx’s Metal backend was tested in 2 modes: one with Immediate command recording and one with Deferred. These where configured using
GFX_METAL_RECORDING environment. gfx-portability itself was selected by pointing
LIBVULKAN_PATH environment to it. The library was built from tag 0.5 using a simple
make version-release command. We also played a bit with Dolphin’s “Backend multi-threading” option (or “MT” for short) because we had doubts whether this is the right approach when used with a normal Vulkan driver.
|platform A (Intel, dual-core)|
|frame time average||14.933781||15.989498||14.827277||15.731309||15.492961|
|frame time variance||2.3165195||2.1808865||1.753293||3.0022306||4.5931387|
|platform B (AMD, quad core)|
|frame time average||14.572058||14.32026||14.479047||18.306593||18.41038|
|frame time variance||17.192923||2.0200737||2.1380246||30.974926||29.487541|
Frame times where gathered using Dolphin’s built-in logging, which was manually turned on/off for that 20 second time span. The output was then fed to a simple analysis tool which produced the average and variance of the numbers.
In Dolphin, gfx-portability provides faster and more consistent frame rates. The average frame times decreased by 4% on Intel machines, and significantly decreased by 22% on AMD machines. Consistency difference is especially visible on AMD, where we produce rock solid frame rate. Subjectively the game plays much smoother in gfx-portability as well.
Of the gfx configurations tested, the Deferred+MT showed best results. This is similar to Dota2 results, but we still find it surprising that Immediate did not get ahead this time. Unlike Dota, in this case we didn’t have many small command buffers that the Deferred recording would be able to stitch together. Thus, we conclude that the explanation lies in Metal implementations/drivers, which work most efficiently when the hardware queue is immediately available (which is not the case for Immediate recording).
Rust is still showing it’s strength (and potential!), although we approach a point where zero cost abstractions start breaking (quantum level?). For example, copyless crate allows us to use the same standard containers but with fewer memcpy instructions generated by LLVM. Hopefully, the optimization story of Rust will keep evolving, and eventually we’ll be able to deprecate the crate and programs will run faster out of the box.
Finally, a usual disclaimer that we are not benchmarking specialists, and the results here might be taken with a grain of salt. We’ll be happy to assist any party that attempts to reproduce them.