Non-English Label Support

Hi Cesium-Dev Team,

First, thanks for contributing this fantastic library.

I’m Thai engineer and now starting to adopt Cesium 3D view visualization.

I’ve followed Sandcastle Label example to show Thai Language and found that some Thai vowel/tone mark on upper/superscript position was invisible on the scene. (Ex. “ตึก” show as “ตก”)

I’ve checked and found Glyph.billboard of that missing character is undefined.

Any idea or suggestion to correctly display label?

Thanks in advance.

This should work. I’m not super familiar with language encoding; but if you’re using Sandcastle, could this possibly be caused by the page encoding itself not matching the language you’re using?

Can you post your code to recreate the issue? Thanks.

Hi Matthew,

Thanks for your reply.

I’ve tried to set page encoding from UTF-8 to TIS-620 under my localhost but result still be the same.

I’ve put simple code to reproduce on jsfiddle here.

http://jsfiddle.net/mjsvpoyg/

Thanomsak.

Hi Cesium Dev Team,

Even I face the same problem as mentioned above. When I use label text with hindi characters it's showing extra chars with hindi characters. If you provide any solution that would be great.

Thanks,
Gowtham.

So I looked into this some and I think the main problem is the way Unicode works in JavaScript. Here’s a fairly extensive article on the subject: https://mathiasbynens.be/notes/javascript-unicode

The problem is specifically the way JavaScript deals with surrogate pairs (two characters that actually form one glyph). For example, alert(‘ตึก’.length) reports 3, even though visually there are clearly 2 characters. In order to support a large amount of strings, the LabelCollection renders each glyph as a separate image, so when we iterate the string, we actually create two separate characters for one glyph (because JavaScript itself treats a surrogate pair as two characters).

I’m not sure there’s an easy solution for this problem using our current label rendering techniques. The good news is it’s incredibly easy to work around by just using Billboards instead. (which is how labels are implemented anyway) Below is an example of doing that in Cesium. Basically rename LabelCollection to BillboardCollection and instead of assigning a text property, use writeTextToCanvas to create an image from the text and assign it to the image property.

The only downside to this approach is that it will use more texture memory and if you have dynamic textures, you could run out. Give it a try and let me know how it works out. I also opened a GitHub issue so that we look at this in the future: https://github.com/AnalyticalGraphicsInc/cesium/issues/2521

var viewer = new Cesium.Viewer(‘cesiumContainer’);

var scene = viewer.scene;

var camera = viewer.scene.camera;

camera.lookAt(Cesium.Cartesian3.fromDegrees(100.5382368,13.8, 50000),

Cesium.Cartesian3.fromDegrees(100.5382368,13.7242002, 0), Cesium.Cartesian3.UNIT_Z);

var labels = scene.primitives.add(new Cesium.BillboardCollection());

labels.add({

position : Cesium.Cartesian3.fromDegrees(100.545624,13.743179),

image : Cesium.writeTextToCanvas(‘ตึก’, { font: ‘24px san-serif’ })

});

Let me know how it works out.

Thanks,

Matt

You can replace text with this and get the same result

text: String.fromCharCode(0xe15,0xe36,0xe01)

Codes gotten from

console.log(‘ตึก’.charCodeAt(0).toString(16));

console.log(‘ตึก’.charCodeAt(1).toString(16));

console.log(‘ตึก’.charCodeAt(2).toString(16));

console.log(String.fromCharCode(0xe15,0xe36,0xe01));

Doing this you can see the components of the surrogate pairs

console.log(‘ตึก’.charAt(0));

console.log(‘ตึก’.charAt(1));

console.log(‘ตึก’.charAt(2));

The first glyph is a combination of 0xe15 (the bottom of the first glyph) and 0xe36 (the top of the first glyph). For some reason label isn’t showing the top part of the first glyph. Though I’m not sure how one is suppose to know that the 2 glyphs are suppose be printed at the same character position.

Though I’m not sure how one is suppose to know that the 2 glyphs are suppose be printed at the same character position.

Exactly, that’s the root of the problem. It appears that they are trying to fix this for ES6 and we may be able to use the code form here to fix it:
http://norbertlindenberg.com/2012/05/ecmascript-supplementary-characters/

I was reading through some of the links. There are some wild Unicode characters out there from character combining! This is one is 75 characters long, yet it only takes 6 character slots (well horizontally, vertically it takes like 5.)

console.log(“Z͑ͫ̓ͪ̂ͫ̽͏̴̙̤̞͉͚̯̞̠͍A̴̵̜̰͔ͫ͗͢L̠ͨͧͩ͘G̴̻͈͍͔̹̑͗̎̅͛́Ǫ̵̹̻̝̳͂̌̌͘!͖̬̰̙̗̿̋ͥͥ̂ͣ̐́́͜͞”.length);

They also talk about a new commands codePointAt() and fromCodePoint. While charCodeAt assumes 8bits per char I presume that this can be any multiple of 8bits per char.