Runtime Optimizations

Discuss issues pertaining to the Creature Animation Editor here.
Post Reply
pg_interactive
Posts: 25
Joined: Thu Feb 22, 2018 2:42 pm

Runtime Optimizations

Post by pg_interactive » Sun Jul 22, 2018 1:54 pm

May i suggest a runtime optimization for the Creature Unity Runtimes, specifically the garbage creation and collection optimization which causes in-engine stutter when garbage is collected.

Use Unity Editor / Window / Profiler / Deep Profile = Toggle ON when testing for Garbage Generation.


What creates garbage in Creature Unity Runtimes?

CAUSE:
Heaviest garbage generation occurs due to per-frame usage of the "new" keyword.

EXAMPLE:

Code: Select all

Example 1A)
MeshBone::computeWorldDeltaTransforms
XnaGeometry.Vector3 cur_tangent = new XnaGeometry.Vector3(calc.Item1.X, calc.Item1.Y, 0);

Exmaple 1B)
MeshBone::computeDirs
return new Tuple<XnaGeometry.Vector4, XnaGeometry.Vector4> (tangent, normal);

Example 1C)
CreatureManager::RunCreature
public void RunCreature()
{
 ...
     for (int i = 0; i < 2; i++)
     {
          ...
          foreach (var cur_bone_packet in bones_map)
          {
               XnaGeometry.Vector4 cur_bone_start_pt = new XnaGeometry.Vector4(0, 0, 0, 0);
               XnaGeometry.Vector4 cur_bone_end_pt = new XnaGeometry.Vector4(0, 0, 0, 0);
          }
     }
 ...
}
SOLUTION:
Instance caching. Cache and reuse the object instances that cause garbage (for example cache and reuse your custom Tuple and XnaGeometry.Vector4 instances).

COST
The current cost of garbage per frame due to using uncached instances in a simple example we've used is 26.3kB. This is significant, and can certainly be brought down to 0kB per frame. A simple glance through the code will reveal that using metada for events for example, will further add to this number (usage of foreach on all available frame_callback(s) on each frame [sidenote: using the tryTrigger could be certainly replaced with a better solution]; for now, instead of foreach, you could just use for).

Don't use foreach, and if you must, use IEnumerator with it. Using for() cycle is GC-free. There are other things to avoid like Delegates (create garbage during subscribe/unsubscribe/call), Coroutines (create garbage when called), runtime Serialization/Deserialization, object creation (described above)/Destruction, using dynamic strings (use StringBuilder instead, if you must), and others, all create garbage. Use structs instead of classes, where applicable (they are created in Stack rather than on Heap, just like other value-types - meaning they are GC-free [cause no garbage]). Also, if you find that you need multiple instances of the same object each frame, you might want to create ObjectPool for such instances.

Bottom line, use Profiler + Deep Profile to profile your code and especially optimize the code inside loops/nested loops. You can easily see the ms/kb cost of per-frame GC in the Profiler, and focus on the heavy hitters first. Also, in C# Unity, Profiler.BeginSample(“SampleName”) / Profiler.EndSample() might help to locate the exact line of code where the garbage is being created.

I've seen other animation solutions for Unity run with 0GC-hit per frame, I'm sure You can too! Best wishes, hope this can help to make Creature better.
Last edited by pg_interactive on Sun Jul 22, 2018 6:45 pm, edited 1 time in total.

chong
Posts: 1178
Joined: Thu Feb 19, 2015 2:21 am

Re: Runtime Optimizations

Post by chong » Sun Jul 22, 2018 4:29 pm

Hello,

Thanks for the very useful + important tips! I will look into the Unity runtime optimizations when I get the cycles. Right now the focus is on rolling out the next big Creature update, and then yes the focus will shift back to maintenance, bug fixes and runtime optimizations/improvements.

Btw, if you are using the regular Unity runtimes, you can try enabling point caching. It is a lot faster and bypasses the posing engine ( assuming you are not doing custom runtime procedural bone posing ). Having said that, yes, those tips you provided will be used a guide for runtime optimization.

Cheers

pg_interactive
Posts: 25
Joined: Thu Feb 22, 2018 2:42 pm

Re: Runtime Optimizations

Post by pg_interactive » Sun Jul 22, 2018 6:25 pm

chong wrote:
Sun Jul 22, 2018 4:29 pm
you can try enabling point caching. It is a lot faster and bypasses the posing engine
Thanks, this is a great suggestion!

pg_interactive
Posts: 25
Joined: Thu Feb 22, 2018 2:42 pm

Re: Runtime Optimizations

Post by pg_interactive » Tue Dec 25, 2018 5:21 pm

Hello there,

The garbage issue seems to still persist. In short, each frame, 5.5ms is spent on releasing garbage. This time interval can't be used for gameplay. This is tested on quad core i7 generic cpu. That is approximately 33% (one third) of the timeframe available, if you want your game to run at 60fps. Meaning that using creature animation forces one to sacrifice 1/3rd of frametime resources for animation (this is with fully animating 1 character, bones, posing and everything). I would consider this to be a major problem with creature atm.

This might help - a screenshot of the debugger, with all the code responsible for garbage generation in a generic CreatureJSON + MetaJSON character import that uses CharacterAsset:
Image
NOTE: the 30.46ms in the example is due to using UnityEditor Deep Profiler. The actual runtime cost is 5.5ms. 19.2kb frame cost is the same for both.

The code mainly responsible for the Garbage is the custom Tuple.ctor and the other bits i mentioned in my previous post.

Are there any plans to have these issues resolved on the roadmap, or not yet? Looking for a heads-up, if any.

Thank You.

chong
Posts: 1178
Joined: Thu Feb 19, 2015 2:21 am

Re: Runtime Optimizations

Post by chong » Tue Dec 25, 2018 10:28 pm

Hello,

Thanks for the very useful profiling! Will pre-allocating the Tuple and not creating it every update help? Will investigate...

Update: On the side note while I need to investigate, you can try the following. It seems like it's really due to the ComputeDirs method which returns a tuple. Instead, just change ComputeDirs to something like:

Code: Select all

public void computeDirs(XnaGeometry.Vector4 start_pt, XnaGeometry.Vector4 end_pt, ref XnaGeometry.Vector4 tangent, ref XnaGeometry.Vector4 normal)
		{
			tangent = end_pt - start_pt;
			tangent.Normalize();

			normal = Utils.rotateVec4_90(tangent);
		}
Then for the places that call it, since they just use ComputeDirs as a temporary place of computation storage ( they use the results of it later permanently but the tangent/normal are used temporarily ), first go into the constructor of MeshBone and pre-allocate a normal and tangent vector so it is alive for the entirety of the MeshBone instance ( you can even make it more global than that if you want, probably make it static )
Then use your new normal and tangent variables in the new ComputeDirs() method which does not return a new tuple and thus hopefully will not incur the added GC cost.

Thanks

pg_interactive
Posts: 25
Joined: Thu Feb 22, 2018 2:42 pm

Re: Runtime Optimizations

Post by pg_interactive » Wed Dec 26, 2018 1:45 am

Hi again,

I could do the pooling myself, but i prefer you update the tool, so i don't have to re-update the code myself each time new version comes out.

Yes, as you mentioned, creating new Tuple each time has to create garbage once disposed. Basically anytime you use "new" keyword on a complex data type like a class, its best to cache/pool the instances and reuse them later. Alternative is to use structs instead. If you made your Tuple extend a struct instead of a class (and unless it doesn't reference other complex data types within itself, which i'd guess Tuple wont, don't have the code at hand right now - can't comment on that), it gets allocated on Stack (structs and simple data types do) instead of on Heap (complex data types do), which then doesn't trigger garbage collector at all (since GC doesnt release Stack, because Stack is considered in-use memory that's never released by a GC).

Let me know if you need some simple guide on pooling (but yeah you can just make a simple array to store those Tuple instances, its fine), and if you don't mind, ill give you feedback sometime later, after you do the updates, i'll Profile it again, because i have a 0-Garbage policy, so its a must for me to double-check anyway.

EDIT: i just checked the code you posted. Probably best if you do the changes, and include that in next version, i am not in a hurry to have this today. That is, if you can Deep Profile it yourself. Or, if you want, i can apply your suggestions, and pool whatever else needs pooling, but let's not do double work, id be happy to help. Then I should just fork the repo and do the changes that way, assuming i dont mess up anything, i'm a master at breaking repositories xD. I could have a look at it somewhere this/next week.

Thanks

chong
Posts: 1178
Joined: Thu Feb 19, 2015 2:21 am

Re: Runtime Optimizations

Post by chong » Wed Dec 26, 2018 6:49 am

Just made the optimization to no longer allocate tuples when calling computeDirs, you can update and give it a go:

https://github.com/kestrelm/Creature_Unity

Thanks

pg_interactive
Posts: 25
Joined: Thu Feb 22, 2018 2:42 pm

Re: Runtime Optimizations

Post by pg_interactive » Wed Dec 26, 2018 7:31 pm

Hi again,

Good news: we are from 19.2kb per frame cost down to 13.8kB (for the example of a character i am using)!

So its not just the Tuples, its other objects that need pooling too. Please also understand, i'm not using things like Flat Buffers or Point Cache, perhaps others. It would be good to test all use cases.

How is your experience with Deep Profiling in Unity, can you have a look at it someday, or should i help with any aspect, let me know. I could paste to you the new profiler screenshot if that helps. Also, if you will do the profiling, to locate the exact line where the the allocation happens, you might want to use these, they should suffice for the job:
https://docs.unity3d.com/ScriptReferenc ... ample.html
https://docs.unity3d.com/ScriptReferenc ... ample.html

Let me know if i can help further, i would be willing to spend some extra time on this to make it Garbage-free. As i said, i have a 0-garbage policy, its a requirement before i can use this tool in production.

EDIT: Updated profiler pic:
Image

Cheers

chong
Posts: 1178
Joined: Thu Feb 19, 2015 2:21 am

Re: Runtime Optimizations

Post by chong » Wed Dec 26, 2018 9:41 pm

Hello,

Sounds good and thanks for the useful info, will take a look. In the meantime, you should consider FlatBuffers ( faster loading ) or Point Caching ( must faster posing if you are not going to be setting bone positions in game ).

If you need lots of background characters ( crowds ), consider also using CreaturePack which is even faster.

Cheers

pg_interactive
Posts: 25
Joined: Thu Feb 22, 2018 2:42 pm

Re: Runtime Optimizations

Post by pg_interactive » Thu Dec 27, 2018 3:43 am

Thanks a lot!

FlatBuffers and Point Caching, as well as Creature Pack - i reviewed those already, they are great features, i will use them later selectively. Most important for this thread - if i do so later, i'll add results for their GC testing (if applicable).

Post Reply