Getting a crash when trying to destroy Cesium3DTileset Actor

We have this scenario where we spawn Cesium3DTileset at runtime for separate URLs and when we change a level, we destroy the Cesium3DTileset Actor. This was working fine but for some reason it’s not working anymore. We’re using UE 5.3 and Cesium 2.2.0 on Oculus Quest / Android.

So far we’re noticing that when we call DestroyActor the garage collection calls this function IsReadyForFinishDestroy so that it can destroy it from the queue. And we can see that in the Cesium3DTileset this function is overridden, and it’s waiting for this to be true:
this->_tilesetsBeingDestroyed == 0;
and for some reason when we print the value of _tilesetsBeingDestroyed it’s always 1, and for some reason it works correctly on Windows but on Android it’s stuck on this function, and garbage collector waits for this function to be true for this object and it never becomes True in Android and it eventually crashes.


void ACesium3DTileset::BeginLoading()
{
    this->LoadTileset();
}

Screenshot 2024-01-04 at 12.34.43 AM

These are the stack traces + screenshots of blueprint nodes which we are calling when we destroy the actor. The temp fix until we hear back from you is to remove this line:
ready &= this->_tilesetsBeingDestroyed == 0;
from IsReadyForFinishDestroy

Can you point us in the right direction?

I looked at your blueprint and noticed that SuspendUpdate is being set to true before calling Destroy Actor.

Does your hang / crash on Android still happen if you don’t do this? (leave SuspendUpdate off)

Yes we have tried removing SuspendUpdate, and it still crashes. @Kevin_Ring @Brian_Langevin have you seen this happening elsewhere per chance? What else can we do?

Sounds like a potential bug in cesium-native, where the event returned from getAsyncDestructionCompleteEvent is never firing, but from you said, only on Android. It’s fine on Windows.

I’m not aware of any existing issues like this, but @Kevin_Ring might know.

Is there a small sample level you could make for us that could help reproduce this on our end?

(apologies for any delays, many members of the team are still coming back from holidays)

Can you tell us in which version this last worked, and in what version it broke? I’m not sure what’s going on, but that will help us narrow it down.

@Brian_Langevin @Kevin_Ring we just noticed it in the new production which is using v2.0.0, before that in v1.31.2 we didn’t notice it.

Can you see when you introduced getAsyncDestructionCompleteEvent ?
We noticed that in Android, this function is returning, but we have this doubt that it’s sometimes not returning (ie 1/2 returns on average) and it’s working inside Unreal Editor, we’re not sure about Windows packaged build.

We shared above the way we unload the tilesets. We thought it was maybe to do with the way we were unloading the tilesets that’s causing the problem, but we tried another approach where we only destroy the actor and it still does it.

To repro, you need to spawn the tileset at runtime and set the URL correctly, and then you need a code to destroy these tileset actors and in Android it’s always going to crash.

To fix this, we initially created a patch where we removed the _TilesetBeingDestoyed parameter from the IsReadyForDestroy function, but that was causing us to crash sometimes. Right now we’re not destroying the actors which is not calling the IsReadyForDestroy function, so we basically get an empty actor in the level, which is not an ideal approach.

Please see if you can repro this in a sample project.

Thanks,
Carl

Can you please try v2.2.0 and see if that helps?

getAsyncDestructionCompleteEvent has been around for well over a year now.

Ok, so here’s how this works at a high level. When you destroy a tileset, a bunch of things need to be cleaned up. But very often, that tileset is loading tiles, so Cesium for Unreal is actively waiting for network requests and it’s doing processing of downloaded tiles in a background thread at the moment we try to delete the tileset. If we just deleted the tileset immediately, we’d crash when those background threads tried to use it.

Instead, we wait for those background operations to complete. Generally this is quick, maybe a second or two at worst. And we don’t block the game loop while it’s happening. However, it can be worse in extreme situations. For example, if you’re loading tiles from a server that is very slow, or over a very slow network, it might take awhile for in-flight network requests to complete. A broken server could keep a connection hanging for a minute or more, and during that time Cesium for Unreal can’t finish the tileset destruction. I’m not aware of a time limit for IsReadyForFinishDestroy to return true - certainly I’ve never hit it - but I can imagine that one might exist. It might even be configurable.

So if you’re suddenly having this problem, it might not be a change in Cesium for Unreal. It might just be that the server you’re connecting to is struggling more than it used to. It’s worth going back to a version of Cesium for Unreal that was previously working for you to confirm if that is the case.

@Kevin_Ring Yes, happens on both versions. As you said _tilesetsBeingDestroyed parameter is being used to let the Object complete its AsyncTasks, which makes sense. Can you point out what are the AsyncTask it waits for, so I can try to debug by putting logs? It will be great if you can point the code. Before Destroying the TileSet, I always set the URL empty, is this not going to cancel the task? Are there any logs which can tell us what AsyncTask are still pending? Right now our current patch is: Instead of Destroying the actor I wait for _tileSetBeing Destroyed to become = 0 , then destroy the actor, but in Android I don’t see any TileSet actor is destroying, which means in Android all tilesets are in the stuck state forever. In Unreal Editor it works correctly, but In Android it never recovers, which means it is not server related?

Before Destroying the TileSet, I always set the URL empty, is this not going to cancel the task?

It will not. In fact, it might start a new request. I don’t recommend that you do this.

Can you point out what are the AsyncTask it waits for, so I can try to debug by putting logs?

Tileset destruction is typically blocked by a Tile load started here:

Or - especially if you’ve just set a new URL - a top-level layer.json / tileset.json load started by one of the TilesetContentManager constructors.

Destruction can also be blocked by someone holding a std::shared_ptr to the TilesetContentManager, but this is not too likely.

@Kevin_Ring thanks.

We are getting some crashes, Cesium3DTileset->OnConstruction->ResolveGeoreference. In our setup, as we always have GeoReference, But inside the

Sub Level

not inside the

Persistent Level

And I have BaseCesiumObject, which acts like a manager that manages spawning more Cesium3DTileset Object and also Manages Destroy. Now BaseCesiumObject also knows about GeoReference so I wanted to give all the new Cesium3DTileset Object the same GeoReference, Before Any ResolveGeoreference is called, Because it is not needed, since we already know about it. So I’ve made some changes to Cesium3DTileset Object, I have added this

ExposeOnSpawn

BlueprintTag to Georeference so when I spawn the object, I can set it before even spawning, which used to work in old version, but it does not work any more maybe, GeoReference was a hard Reference in older version, now it is soft? I noticed the only place where the call is coming for

ResolveGeoreference

is from this

_updateTilesetToUnrealRelativeWorldTransform

This is after removing all the other calls, (OnConstruction, BeginPlay, Tick). Is this going to cause any issue, removing those calls ? What is the purpose of

_updateTilesetToUnrealRelativeWorldTransform

because I want to delay it, and once i will set the Georeference from manager then i will allow it to update. Thank you.

@Kevin_Ring did you have a chance to take a look? Just making sure it did’t slip through the cracks. Much appreciated.

GeoReference was a hard Reference in older version, now it is soft

Yep. This was necessary to allow an object in a sub-level to reference a CesiumGeoreference in the persistent level. It’s not clear why this would cause problems for you, though.

Is this going to cause any issue, removing those calls ?

I mean, I don’t know, but probably yes. In general, my going-in position is that any change to the code, even changes I make myself, will break things. Until I test it and demonstrate otherwise. Not a very good answer there, but I don’t have a better one.

because I want to delay it, and once i will set the Georeference from manager then i will allow it to update.

Hmm I’m not quite sure why this needs to be so complicated. I suggest you create the tileset, set the CesiumGeoreference property, then set the URL or ion Asset ID properties last. The tileset won’t start loading until it has a valid URL or asset ID.

But it’s also not clear to me why the default georeference resolution logic isn’t working for you. Most applications have eactly one, and there’s no drama in finding it. Are you using multiple georeferences for some reason?