Computer vision with Unreal Engine: A challenge of inLab FIB

Computer vision with Unreal Engine: A challenge of inLab FIB
Authors:

Let’s imagine for a moment that we are tasked with developing a very special piece of software. This is a project driven by a client (Foxtenn), whose business domain is related to the arbitration of sports events. This project requires complex calculations with great speed and, at the same time, demands a high-quality three-dimensional representation suitable for public display.

This was precisely the challenge we faced at inLab FIB: a project combining 3D rendering with computer vision and highly demanding computation time constraints.

With this task ahead, we at inLab FIB embarked on developing this product with a team of four people.

Choosing the Right Technology Stack

The first step was to select the technology we would work with, based on the established requirements. For confidentiality reasons, we will not disclose the exact details of the problem to be solved, but we can say that it required the use of a person detector capable of processing six images in 10 ms, a rendering system with modern lighting, and the application of several traditional computer vision techniques, also within the same 10 ms.

Faced with this challenge, we evaluated various technologies. One of the first aspects we had to consider was how to detect people in images. In this field, we decided to use an AI model called YOLO, specifically version 8. This model is platform-independent but is usually implemented via a Python API combined with a model executor.

However, Python presents performance issues, which led us to discard it almost immediately. The other viable option for using YOLO was C++, a language that, despite its challenges in memory management, is one of the most efficient available. To put it in context, C++ has become the de facto language in the video game industry and for any application with high computational requirements.

Once the AI issue was solved, we needed a way to perform computer vision calculations. Since we had already chosen C++ as our programming language, the natural decision was to use OpenCV, one of the most popular libraries for image processing.

Rendering and 3D Representation

For the system’s visualization, one option we considered was using a game engine. Although our project was not for entertainment purposes, its nature shared many similarities with the gaming industry. In this context, we evaluated two alternatives: Unity3D and Unreal Engine 5.

Unity3D features a scripting system in C# and allows the creation of plugins in multiple languages, mainly C++. While it has a very powerful lighting system, integrating the plugin logic with the 3D scene presented difficulties, especially in data conversion. One of the main issues is that Unity-C# uses a garbage collector, while in C++, memory management is manual.

Additionally, the 3D environment and all engine elements are implemented in C#, so we needed to create an API for communication with the C++ plugin. These APIs tend to have many limitations, and in general, they should be simple—exactly the opposite of what we intended to do.

Given these considerations, Unreal Engine 5 turned out to be the best option. Although it may seem like a choice by elimination, it offered several key advantages:

  1. Its scripting is in C++, allowing direct integration with our code.
  2. Its rendering system is one of the best currently available.
  3. Its licensing model perfectly fit the client’s needs.

Ensuring Decoupled System Architecture

Even though we chose Unreal Engine, one of the things we were clear about at inLab FIB was that the core design of our solution had to be independent of the engine. In other words, the 3D rendering code could be aware of the computation domain’s existence, but not vice versa.

With this idea in mind, we divided our system into two layers:

  • The “Unreal” layer, responsible for displaying information to the user and acquiring UI data.
  • The “Logic” layer, responsible for processing all information and capturing images from the six cameras.

This structure forced us to extensively use the dependency inversion pattern. However, our main goal was to ensure that the logic layer’s code remained completely independent of Unreal Engine’s API and libraries, so that if the client wanted a different representation in the future (e.g., using ImGui [3]), they could switch without major issues.

GPU Challenges and Performance Optimization

One of the biggest risks we faced was that both Unreal Engine and AI models heavily rely on GPU resources. We were working with a CUDA-based backend [1] on an RTX 3090 graphics card. Initially, this hardware seemed sufficient, but it ultimately proved incapable of processing the six images and rendering the scene within the 10 ms required by the client.

Fortunately, the sports events the client worked with took place on small courts with smooth surfaces and controlled lighting using spotlights. The solution, therefore, was to disable Nanite [2] in Unreal Engine and opt for a classic lightmap-based rendering model [4].

UI Implementation Challenges

The second problem we encountered was that the client not only wanted a 3D representation for public display, but also required a UI to control everything in real time during the event and manage the various cameras. The sophistication level of this UI posed a significant challenge, especially if we used Unreal Engine’s UMG system [5].

Unfortunately, there was no better alternative for implementing the UI, so we ultimately brought in a specialist developer to handle this task.

Final Refinements: Efficient Mathematical Calculations

Despite the challenges, Unreal Engine provided a major advantage: its extensive collection of built-in 3D mathematical functions. While the logic layer had to remain separate from Unreal, many necessary operations were already implemented within the engine.

Our solution was to create a small mathematical layer with simple vector implementations that redirected calculations to Unreal when more complex operations were required. This way, if the client decided to replace Unreal in the future, they could do so without breaking the system’s functionality.

Another great advantage of Unreal Engine was its rendering system. Even without advanced lighting techniques, the engine’s base provided highly satisfactory visual results. We had no issues representing what we needed, and debugging visualization tools were more than sufficient throughout the development process.

Conclusion

Choosing Unreal Engine for this project was the right decision, despite the challenges it posed. The fact that our algorithm had to compete with the engine for GPU resources was a significant difficulty. However, having access to such a powerful engine provided highly useful tools and a faster-than-expected development process.

References

[1] CUDA Toolkit – NVIDIA
[2] Nanite – Virtualized Geometry in Unreal Engine
[3] ImGui – Immediate Mode GUI Library
[4] Understanding Lightmapping in Unreal Engine
[5] UMG – Unreal Engine UI Designer