Back to Post list

Optimization Basics

Often, with C# code (or any language really), it's easy to get away with code that isn't exactly optimized for performance - often without even noticing just how badly it can perform. When developing code to work in a game engine such as Unity, this is especially true. In earlier versions of Unity, for example, the foreach loop is inefficient and leaks garbage, which the garbage collector will then have to spend time cleaning up.

The impact of something like this is relatively minor on the overall performance of your application - but what if the code containing the loop executes multiple times each frame? The unnecessary nanoseconds of execution time consumed by the loop start to add up, especially on devices with less processing power, such as cellphones and tablets.

As such, it's important to learn:

  • When to start optimizing
  • Types of optimization issues
  • How to track down and identify optimization issues
  • How to correct them

When to start optimizing

In general, this is the easy part. Perhaps your application isn't performing as well as you'd like. Perhaps the framerate is a bit lower than you'd expect, or varies a lot when certain code is executed.

Usually, you will have already written yor code before the thought of optimization crosses your mind - and that's perfectly fine. Readability and maintainability go hand-in-hand when it comes to writing good code, and that should almost always be the first thing on your mind.

This doesn't mean that you can't write efficient code right from the start, but that's something that comes with experience - and even then, it's often easier to get the code working properly first, and then worry about optimizing it later.

This list is by no means comprehensive:

  • Inefficient code

    This is kind of general, but in many cases code can be replaced with different code that achieves the exact same end result, but reaches it faster and more efficiently. For example, LINQ statements are fantastic for writing extremely readable code very quickly. However, their performance is, to say the least, not great. That doesn't mean you should never use them - it's more a case of knowing when to use them, and when you'd be better off replacing them with more old-school (but faster) code.

  • Code executing unnecessarily

    This one is fairly simple - if the code doesn't need to be executed, then it shouldn't be. This often goes hand-in-hand with caching - if you have a complex method which your application will call more than once (with the same expected result), then you can cache the result, and then avoid calling the method again in future. I'll go into more detail on caching efficiently in future.

  • Code generating unnecessary garbage

    One advantage of languages like C# is that they manage memory for you automatically, so you don't need to concern yourself with allocating or deallocating memory for every bit of code you write. However, the downside of this is that the Garbage Collector (which handles deallocation) executes on a regular basis, and the more work it has to do, the longer it takes. To that end, its important to learn how to minimize the amount of garbage (variables in memory that need to be deallocated) generated by your code.

Identifying Optimization Issues and fixing them

When working with game engines, you are often dealing with code which is executed repeatedly, whether it is every frame, or just a repeated process (such as in an OnDrag() event in the UI). Often, you will find that your efforts are best spent best spent focusing on such code - saving (for example) a millisecond of execution time each frame adds up to far more than saving a millisecond in initialization or event-handling code which executes infrequently.

In Unity, the best tool in your arsenal to track down optimization problems is the Unity Profiler. The Profiler shows a great deal of information, not the least of which is how long it took to execute a particular frame, and, even better, what was being executed that frame, how long it took, and how much memory it used.

In the screenshot above, you can see the Profiler window, with a graph showing frame times (how long it took to generate and render a frame) over time, with detail sections showing how much time each frame took to render, how much time each frame took to execute scripts, etc.

Your interest will normally be in the time taken up by scripts (which is, for the most part, your code).

In the detailed section below the graph, you can see data about each method call in that frame (it's a hierarchy, so you can click to expand each method to get details about methods called by that method). Specifically, columns of particular interest are:

  • Total

    How much of the total time it took to render this frame was taken up by this method? (%)

  • Calls

    How many times was this method called this frame?

  • GC Alloc

    How much memory was allocated by this method (across all calls) this frame?

  • Time Ms

    How much time (in milliseconds) did this method take to execute (across all calls) this frame?

If you look at the profile view in the screenshot above (which is taken from an evolution simulation I've been working on), you'll see that 1.8MB of memory has been allocated during a single frame (this is quite a lot to allocate at once), and even worse, that frame took over 40 milliseconds to execute, almost all of which was taken up executing scripts.

This is all located in the Update.ScriptRunBehaviourUpdate call, which is not unusual as that is the Unity method which calls all Update() methods on your scripts.

Expanding the hierarchy and looking in a bit more detail, we've come across the culprit for the memory usage. GC.Alloc has been called a staggering 52 thousand times, totalling 1.8MB of memory. But what does this mean? GC.Alloc is a garbage collector function used by Unity to allocate memory - knowing that a lot of memory is being allocated is useful, but it doesn't really tell us where to look in the code.

This is because, by default, Unity's Profiler runs in a relatively fast mode without adding too much overhead, which unfortunately means that its reports lack some detail. You can correct this by enabling the 'Deep Profile' mode by clicking the button at the top of the profiler (and again, later to turn it back off when you're done). Unity will need to reload its scripts in order to enter or leave Deep Profile mode - it can do so without restarting your application, but it might be best to restart it anyway to get the most accurate results.

Once you enter Deep Profile mode, you will unfortunately have to locate another spike on the graph (as the frame we were looking at was run in regular profiling mode).

The primary difference between Regular Profiling mode and Deep Profiling mode is that GC.Alloc calls show up under the method in which the memory was actually called, rather than showing up at the top level. This allows you to track down memory-intensive methods.

Now I've managed to look through the hierarchy and track down both the method which is allocating so much memory, as well as occupying most of the execution time during this frame - List.Sort(), which is located in my Creature.FindFoodCells() method.

private List FindFoodCells()
{
    List foodCells = new List();

    foreach(var cell in SensedCells)
    {
        if (cell.Food != null)
        {
            foodCells.Add(cell);
        }
    }

    Comparison sortMethod = (a, b) =>
    {
        var distanceA = grid.GetDistanceFrom(CurrentCell, a);
        var distanceB = grid.GetDistanceFrom(CurrentCell, b);

        if (distanceA != distanceB)
        {
            return distanceA > distanceB ? 1 : -1;
        }
        else
        {
            return a.Food.EnergyValue > b.Food.EnergyValue ? 1 : -1;
        }
    };

    foodCells.Sort(sortMethod);

    return foodCells;
}

This method, used by my simulation, looks for 'food' near the 'creature' (in the area defined by 'SensedCells', which are cells on a grid that are within range of the creatures senses), and then sorts it into an order determined a) its distance from the creature, and b) how valuable it is to the creature. The idea is that food closer to the creature, or with a higher energy value, will be at the beginning of the collection, and less valuable food will be at the end.

Looking at this method and comparing it to the profiler graph makes me think of a few things:

  • Sort is a very effective and efficient C# method used to sort arrays. However, it can only be as efficient as its comparer, and when I check in detail, it appears that my grid.GetDistanceFrom() method (called twice for each sort iteration) is responsible for the bulk of the memory allocation and time usage. As such, it's probably a good idea for me to take a look at that method and see what I can improve about it.
  • Secondly, my creature is only interested in a single food cell (the closest and most valuable one). Is there an approach I could use to just get that without sorting through all of them?

In earlier iterations, this method used LINQ with an OrderBy()/ThenByDescending() combination - this wasn't particularly efficient either. So I probably need an entirely different approach here.

To start with, I tried optimizing the sort comparer by extracting it into its own class (Implementing IComparable so it can be used by Sort) and then maintaining a reference to it so as to avoid it being created from scratch repeatedly.

private static SortMethod sortMethod;

private List FindFoodCells()
{
    List foodCells = new List();

    foreach(var cell in SensedCells)
    {
        if (cell.Food != null)
        {
            foodCells.Add(cell);
        }
    }

    if (sortMethod == null) sortMethod = new SortMethod(grid);
    sortMethod.currentCell = CurrentCell;

    foodCells.Sort(sortMethod);

    return foodCells;
}

private class SortMethod : IComparer
{
    private SimulationGrid grid;
    public Cell currentCell;

    public SortMethod(SimulationGrid grid)
    {
        this.grid = grid;
    }

    public int Compare(Cell a, Cell b)
    {
        var distanceA = grid.GetDistanceFrom(currentCell, a);
        var distanceB = grid.GetDistanceFrom(currentCell, b);

        if (distanceA != distanceB)
        {
            return distanceA > distanceB ? 1 : -1;
        }
        else
        {
            return a.Food.EnergyValue > b.Food.EnergyValue ? 1 : -1;
        }
    }
}

This helped a bit by avoiding variable capture (a complex topic, but essentially I took direct control over how the 'currentCell' and 'grid' variables are used by the comparer), and avoiding creating new instances of the comparer all the time. By making 'sortMethod' static, we've also avoided creating a new instance for each creature - instead, now all creatures share a single instance.

However, the actual difference made by these changes was not enough - we went from approximately 1.8MB of memory allocated to somewhere around 1.5MB.

It's time for me to look into the grid.GetDistanceFrom() method.

private static Dictionary distanceCache = new Dictionary();

public float GetDistanceFrom(Cell a, Cell b)
{
    var cacheString = string.Format("{0}-{1}-{2}-{3}", a.X, a.Y, b.X, b.Y);

    if (!distanceCache.ContainsKey(cacheString))
    {
        distanceCache.Add(cacheString, Vector2.Distance(new Vector2(a.X, a.Y), new Vector2(b.X, b.Y)));
    }

    return distanceCache[cacheString];
}

As you can see, I've done a bit of optimization on this method already - I'm caching the results of each distance call between any two cells, and then, in future using the cached result rather than performing the calculation again.

However, looking a the profiler, I've indentified a potential issue with this caching - the string.Format call is using a lot of memory. This makes me wonder if perhaps it is not better to avoid caching this data - after all, if it takes more time to look up the result in the cache than to perform the actual calculation, then caching it is actually having a negative effect on performance.

public float GetDistanceFrom(Cell a, Cell b)
{
    return Vector2.Distance(new Vector2(a.X, a.Y), new Vector2(b.X, b.Y));
}

So, after having returned the calculation to its original uncached state, I suddenly find performance increased by a huge amount. Whoops, that'll teach me to pre-emptively optimize .

One nice thing with this solution is that it appears that I get to more-or-less keep my original approach - the issue turned out to be not with the idea of sorting the collection, but with how that sort was being handled.

I'm going to leave this here for now - I hope that this helps give you some idea of how to go about getting familiar with optimization. I'll have more on this sort of thing in the future!

If you're interested in the simulation, which admittedly is just for fun, you can check it out here. (I'll update it from time to time as I make changes).