The “bitstream”/BOOLEAN
approach was just a first thought, but as I said: There are different representations for the data, depending on their type and structure, and it’s easy to overlook some caveat here.
For arrays, one could probably go with a boolean array as well, and store the isNull: [false, true, false]
(which boils down to just 3 bits in binary form).
Depending on how large the arrays are, and how many elements are null
, there could be other options. For example, one could think about modeling “large, sparse” arrays with something like
arrayLength: 1000;
nonNullIndices: [4, 26, 676, 985];
nonNullValues: [12, 34, 56, 78];
to make that sparsity more explicit, and not have to store that as
[null, null, null, null, 12, null, ...(22 times), 34, null, ...(650 times), 56, ... ]
with the additional overhead for the isNull
-bitstream.
Conversely, one could store something like nullIndices: [4, 65, 123, 566 ]
instead of a bitstream when the arrays are large but have “few” null
elements.
(NOTE: This is just some wild brainstorming right now. The main point is that there currently is no nullable
-concept for the metadata (mainly due to the binary representation), and there are different possibilities for how this information could be modeled, depending on the exact use-case)