Hello Captains! Zuff here bringing you exciting news about the latest patch v0.4.12 that has just landed to the experimental branch!
TL;DR: The game now loads 2-8x faster, save files are 2-3x smaller, and FPS is 2-3x higher! This patch is without a doubt the largest performance update to Captain of Industry to date! And there are also some nice changes regarding construction visualization and minor fixes. Keep on reading to learn how these amazing speedups were achieved or skip to the end for full patch notes.
Before we dive into the technical details about speedups, I would like to announce a change in the schedule of Captain Diaries. Until now, we were doing CD posts every two weeks but going forward we won’t keep a strict schedule. These posts take a long time to prepare and sometimes we’d prefer to keep working on things and announce them when ready, not when a 2 week deadline ends. In fact, that’s precisely what happened with this post.
Here’s Marek and Filip with the technical details about the performance improvements.
Rendering performance improvements
Ahoy! Captain Marek speaking and I am thrilled to tell you about some serious performance improvements that we have done. The two techniques used to achieve this were GPU instancing and Level of Detail (LOD).
As you might recall, our previous performance improvements were also revolving around GPU instancing (batch-rendering) of many small entities, transport pillars mentioned in Captain's Diary #15 gave as 2-3x more FPS and building ports mentioned in Captain’s Diary #29 yielded 20-50% more FPS.
We have this handy tool that can selectively disable rendering of objects and based on this performance analysis, the biggest performance offenders were 1) buildings/machines, 2) transports, and 3) construction cubes (when present). We will skip transports for now and focus on buildings and construction cubes. Let me start with the cubes optimizations.
Construction cubes optimizations
Construction cubes are these little boxes representing scaffolding that appear over a building when it is being constructed, upgraded, or deconstructed. They appear for each voxel of the built entity. For common entities that's 100-1000 cubes per building. But a single greenhouse needs over 10k cubes! You probably see where this is going. Tens of thousands of individual cubes updated every frame (since they are animated) and we have a perfect recipe for FPS disaster.
We thought about two ways of addressing this issue. One, use GPU instancing and don’t worry about the high counts of cubes. Alternatively, we could reduce the number of cubes and make them larger to avoid too many individual objects. Well, we decided to go all-the-way and do both optimizations!
GPU instancing was already explained in the previous posts, basically it allows rendering of multiple instances of an object by the GPU given a list of data describing how each object is rendered. And you can probably imagine how four 1x1 cubes can be combined to one 2x2 cube reducing the total number of cubes. I will just note that to make this super efficient, cube position, size, color, and animation state are all handled in the GPU shader and cubes are no longer updated on the CPU every frame.
For performance testing, we have set up an empty island where 5 greenhouses were being constructed at the same time. The game without construction cubes was running at 123 FPS, before optimizations this dropped to 7.4 FPS when all 50k construction cubes appeared. After our optimizations the FPS is back at 120! It suffice to say that construction cubes will no longer lag your game!
GPU instancing for buildings
Building optimizations have a similar vibe as construction cubes or building ports, using GPU instancing to reduce the overhead of rendering many objects separately, but there is a problem. GPU instancing handles only rendering of 3D models, but buildings need much more than that. They need to be selectable by the player, have animations, sounds, particles, be able to collapse, and none of this can be “GPU-instanced”.
Based on our performance data shown below, we have decided to go after the low-hanging fruit and optimize only the buildings that have no animations, sounds, or particles. As a coincidence these ones are actually taking the majority of the frame time because there are so many of them. I am talking about retaining walls, connectors, balancers, solar panels, vehicle ramps, etc.
There were additional issues with GPU-instanced rendering of buildings. One of them is flipping. Most buildings can be flipped horizontally or vertically to aid with connection constraints. The issue is that when a 3D model is flipped, all triangles will change their winding order (whether the vertices are ordered clockwise or counterclockwise). Why does this matter? A typical optimization done by virtually any 3D game/engine is to do something called back-face culling. Basically, if the 3D models have all triangles with the same winding order, GPU can quickly detect whether a triangle is facing “towards” or “away” from the camera by checking its winding order and if it is facing away, it can skip drawing its pixels. Have you ever wondered why 3D models are “invisible” from inside when the camera clips inside? It is exactly thanks to this optimization. Triangles are visible only from one side.
So what is the issue with flipping again? GPU instancing can render all the instances of one 3D model at once, but the setting for back-face culling must be the same for all instances. So you either enable back-face culling and have all flipped buildings rendered “inside-out”, or disable it and draw twice as many triangles.
Actually, there is a third option. Split buildings into two sets, flipped and non-flipped, and draw each set with a different setting of back-face culling. While this makes code infrastructure slightly more complicated and introduces a second draw-call for each building type, we went with this solution to avoid drawing unnecessary triangles.
Another issue was building selection and highlighting. We had to set up proxy elements in the scene that are invisible but allow us to select buildings with a cursor. Highlighting is done by rendering into a special buffer and this is handled by a separate draw call (a third one). Here, we do not distinguish between flipped and non-flipped buildings.
And a final issue that I’d like to touch on is the collapse mechanic. When a building is dug under it will collapse and this must work for buildings rendered with GPU instancing. We have actually borrowed a similar code that animates construction cubes and used it to animate a fall of a building. When collapse happens, the static building model is hidden and a same-looking but animated building model is displayed. We also add a proxy-object with dust particles.
A visualization of a frame as rendered by the GPU. On the left you can see the scene being rendered before GPU-instancing optimizations and the water collectors are rendered one by one. Compare this to the right half where instancing allows the GPU to render large chunks of buildings at once. Note that this comparison is not exact in the representation of time taken to render each entity.
Level of details for buildings
GPU instancing is making triangle drawing more efficient for the GPU but it actually does not reduce the number of processed triangles. At some point, no matter how efficiently we feed the GPU with triangles, it won’t be able to keep up processing and rendering them. And we know that some of you will just keep building bigger and bigger so we’ve implemented another layer of optimization – Level of detail (LOD).
The idea behind LOD is simple. Models far from the camera can be replaced by simpler ones, reducing the number of rendered triangles. When done well, you can hardly tell the difference.
The disadvantage of this technique is added code complexity and, more importantly, we need to make simplified models for each building which takes a lot of extra 3D modeling work. Because of the extra work, we are LOD-ing only the most important models for now and will be adding more as we go.
The kicker is that LOD and GPU instancing optimizations can go hand-in-hand. We group entities to 256x256 tile chunks and all entities on each chunk are rendered with respective LOD based on camera distance. This allows us to utilize both optimizations and also compute the LOD level only for each chunk, not for each entity.
Final results for building rendering optimizations
We will present the results on a rendering benchmark of several selected factories from the community, namely from MaddProf, Yandersen, and NeetEngineer – thanks for sharing your amazing factories with us!
The overall results are impressive! We saw 2-3x more FPS on very large factories. The more buildings the greater speedups. MaddProf’s sea of retaining walls (over 4k walls to be specific) saw an improvement from 17 to 25 FPS. A factory from Yandersen with over 1700 solar panels and 480 rainwater collectors saw a FPS increase from 10 to 25 (when the gigantic solar farm was in the view). Finally, Neet Engineer's factory is something else. With nearly 4k solar panels and 5k transport connectors, it is the largest factory that we have ever loaded and due to the insane amount of buildings, we saw FPS improving from 7 to 23 (depending on the camera view). With these improvements we are looking forward to seeing what else you can build!
That was a lot of info, but wait, there is more! Filip has detailed info regarding optimizations of loading and save files!
Captain Filip here. We have heard your feedback on long load duration times and we have been noticing them as well. It has become our priority to understand what is going on and reduce it. The good news is that we managed to shave off a lot.
Game init consists of several phases:
Game init + load
First critical one was game load. It turned out that for large factories it became crazily slow. I have actually received a save file from JDPlays from his YouTube series. You can check out his video here showcasing his island, it’s really impressive. We also used a couple more save files provided by folks from our community - JoneY and Manuel de Heer, thanks for that!
One of my theories was that the game simply decided to punish JD for trying to break it all the time which would make sense 🤣. But after a few technical investigations it turned out that several things were more expensive than we expected. If you recall how Marek previously improved game rendering by focusing on ports, well guess what, ports did strike again. Each port is represented by a class which turned out to be a bit too heavy as ports supported Events, however given there can be 4-6 times more ports than entities and it quickly adds up. Also, our Events are a bit special as they are automatically saveable, however their callback compilation was taking way too big of a slice of our load time, so simplifying ports by removing these Events entirely helped a ton. We have also simplified terrain designations and their backing data structures for mine towers and assigned jobs as players typically have lots of designations. With all this, the improvement got quite substantial.
First of all, the save size got reduced quite a bit. On large factories the save files are now 3 times smaller. However, it has been also reduced due to us fixing one leak in stats that not everyone was hitting.
The game load is now 6.5 times faster, however even for more compact factories we are still hitting 1.7 times improvement.
As a side effect of this effort, the save & auto save now became faster as well. Which is great because no one likes slow auto save.
When investigating the game load we noticed that there is actually another phase that decided to cause troubles - UI initialization. It was eating 12 sec from the start of the game. Now this one is interesting, because it does not depend on the size of a factory. So reducing this would reduce the waiting time for everyone. It turned out that Unity had few footguns in their API, and we were paying hard for re-parenting during UI build.
For Unity fans out there, we build UI via code using Unity’s API. When you want to create a new UI element you need a GameObject. The most obvious API for that is “new GameObject()”, however what happens is that Unity creates a new separate root hierarchy for it and once you place that object into a new parent, it throws the old parent hierarchy away and updates the entire sub-hierarchy about the new parent. The solution was to use Object.Instantiate which (unlike GameObject constructor) accepts a parent to create that object in the right place. Sometimes we would build an entire window UI (such as research) before we put it into Unity’s Canvas. At that point we would pay for re-parenting of every single element of that window, almost the same as creating each of those elements again. When this happened on multiple levels of the UI it added up very quickly. In total, the re-parenting was eating half of our UI build duration.
Another observation we made was that our bottom toolbar panels, that contain all the machines, were taking up significant time to build as for each category we had a separate panel. So we merged all of these into one UI view which is shared by all the categories.
And final improvement was that we added Unity Sprites caching to avoid duplicates when building the UI which saves not just time but also the memory.
At the end all the improvements got us from 12 seconds to 2 seconds. Not too shabby! When we add the load duration improvement + ui init improvement into one chart, here are our new game load + ui build improvements:
Now it becomes a bit more clear while JDs island got so well thought through. Whilst waiting for the game to load, he had so much time thinking about his next steps! 😉
Subsequent game load
We had several reports that subsequent load takes longer (loading while already in a loaded save). It was because of a memory leak. Well, not necessarily a memory leak but a misunderstanding between us and how Unity expects us to dispose of their native objects. Also we learned that running GC right before we load a new game might be a good call to prevent C# allocating an even bigger heap. Obviously systems with smaller RAMs had to have a hard time because at that point it might have ended up page swapping the memory. So we fixed all of that and based on our experiments, subsequent load is no longer slower.
Improved buildings blueprint visualization
Part of the performance improvements was a partial rewrite of entities construction and visualization. While at it, we have improved construction visualization in two ways.
First, building blueprints now show the entity texture in a subtle way. This makes recognising blueprints slightly easier. Take a look at the following comparison.
Second, blueprint color as well as colors of construction cubes are now matching the construction state. Blueprints for buildings under construction are blue, paused construction is marked as gray, and deconstruction is red.
This new color scheme also eliminates the need for icons marking paused construction as they were sometimes in the way, visually cluttering the scene. All this and more is also described in a new game tutorial that is triggered when the first building construction is paused.
Changelog for v0.4.12
Optimized rendering of common buildings resulting in up to 3x more FPS.
Optimized rendering of construction cubes that no longer cause any FPS drop.
Optimized game loading which is 2-8x faster.
Optimized save datastructures making save files 2-3x smaller.
Fixed issues causing consecutive loads consuming more memory and being slower.
Other improvements and fixes
[Important] Added `ChangeConfigs` method to `IMod` interface allowing mods to change configs. This is not a backwards-compatible change.
Building ghosts, construction cubes, and ports now have color based on construction state.
Building ghosts now have subtle texture to make them more recognisable.
Fixed that the cargo depot would accept any fluid as fuel instead of just diesel.
Fixed cargo depot that could give free fuel when adjusting sliders.
Fixed loud sound when quitting the game.
Fixed loading of files with special characters such as '['.
Fixed a special case where cargo depot did not allow to assign a product
Fixed top bar jumping between single and double row when a date length was changing.
Clinic now prefers a higher tier of medical supplies if it has multiple types stored.