Metadata: Nodata / nullable?

In EXT_Structural_Metadata the properties can have a noData value (see 3d-tiles/specification/Metadata at main · CesiumGS/3d-tiles · GitHub).

A drawback of this method we have to define and use a noData value when data is missing - like -1 or -999999 for int values. When for example using the uint8 type (range 0-255) it’s tricky to define a noData value (because chances are all the possible values are in the range).

A common solution for this is to make a property nullable, so we can use null’s when data is missing.

Is there a way to make a property nullable?

There are some possible details to to into (caveats and corner cases). But one reason why it’s not so easy to introduce som boolean nullable flag is that there is no way to represent this real null (or undefined) in binary representations of metadata.

Actually, your recent questions and some other issues did eventually cause the following note on my TODO list

TODO Check specs if we say something about “undefined” strings in JSON-based metadata entities
(In binary it is not possible Fix metadata migration of null strings by javagl · Pull Request #96 · CesiumGS/3d-tiles-tools · GitHub)

The STRING case is special in some way. But the broader question is indeed about the case where someone says noData: null. This should probably be disallowed or addressed in the spec (at least as an “Implementation Note” - and for now, I’m still ignoring the quirks of null-vs-undefined in JavaScript…).

(It might be worth tracking that in a dedicated issue…)

ok, but why it’s not possible to store a null in binary? In something like good-old dbf (well the more recent versions) it’s possible.

I’m not familiar with DBF/dBASE in detail, and don’t know how null is represented there on the lowest level.

For the metadata, the binary representation is essentially the same as the representation of binary data in glTF, with the bufferView/buffer concepts.

(The schema roughly serves the purpose of the accessor. Using the actual glTF accessor concept was considered, but it was too constrained in terms of the data types - some details omitted here)

So at the lowest level, there may be a buffer that just consists of a byte[16] array under the hood, and that is viewn as something that resembles a typed short[8] array. In this typed array, each array element has a (numeric!) value, and there is no representation of null.

Of course, one could think about dozens of possible representations of null. The “numeric” types are one thing. For STRING typed data, it may be even more tricky: The string values are “sliced out” of the underlying byte array by using two consecutive stringOffsets values. So it is possible to define empty strings (by offset[i+1]===offset[i]), but null is not easily possible there either. All this also has to work together with array-typed properties, which again have a very specific representation on the binary level.

Given these different representations, there doesn’t seem to be an obvious representation of null that is applicable to all of them. One “generic” (although cumbersome) approach would probably be some sort of “bitstream” that just contains something like nullEntries = bitstream(00000100101100001) to reflect which values are null. But this is essentially just a BOOLEAN in disguise.

So for now, when there is the (domain) requirement of an explicit representation of null, it will eventually be necessary to model that based on the existing types.

But… do you have an idea how null could sensibly be represented in binary metadata?

yeah I was also thinking storing a list of boolean flags per property (isNull) would be possible and simple. Or just 00 special value.

for the array types its more complex (also for the current solution with offset): how to store when an attribute is for example [100, null, 101] ?

The “bitstream”/BOOLEAN approach was just a first thought, but as I said: There are different representations for the data, depending on their type and structure, and it’s easy to overlook some caveat here.

For arrays, one could probably go with a boolean array as well, and store the isNull: [false, true, false] (which boils down to just 3 bits in binary form).

Depending on how large the arrays are, and how many elements are null, there could be other options. For example, one could think about modeling “large, sparse” arrays with something like

arrayLength: 1000;
nonNullIndices: [4, 26, 676, 985];
nonNullValues: [12, 34, 56, 78];

to make that sparsity more explicit, and not have to store that as
[null, null, null, null, 12, null, ...(22 times), 34, null, ...(650 times), 56, ... ]
with the additional overhead for the isNull-bitstream.

Conversely, one could store something like nullIndices: [4, 65, 123, 566 ] instead of a bitstream when the arrays are large but have “few” null elements.

(NOTE: This is just some wild brainstorming right now. The main point is that there currently is no nullable-concept for the metadata (mainly due to the binary representation), and there are different possibilities for how this information could be modeled, depending on the exact use-case)