Non-English Label Support

Thanomsak_Ajjanapany · December 24, 2014, 6:40am

Hi Cesium-Dev Team,

First, thanks for contributing this fantastic library.

I’m Thai engineer and now starting to adopt Cesium 3D view visualization.

I’ve followed Sandcastle Label example to show Thai Language and found that some Thai vowel/tone mark on upper/superscript position was invisible on the scene. (Ex. “ตึก” show as “ตก”)

I’ve checked and found Glyph.billboard of that missing character is undefined.

Any idea or suggestion to correctly display label?

Thanks in advance.

Matt_Amato · December 29, 2014, 4:48pm

This should work. I’m not super familiar with language encoding; but if you’re using Sandcastle, could this possibly be caused by the page encoding itself not matching the language you’re using?

Can you post your code to recreate the issue? Thanks.

Thanomsak_Ajjanapany · January 7, 2015, 7:27am

Hi Matthew,

Thanks for your reply.

I’ve tried to set page encoding from UTF-8 to TIS-620 under my localhost but result still be the same.

I’ve put simple code to reproduce on jsfiddle here.

http://jsfiddle.net/mjsvpoyg/

Thanomsak.

Venkat_Gowtham_Kumar · February 27, 2015, 9:51am

Hi Cesium Dev Team,

Even I face the same problem as mentioned above. When I use label text with hindi characters it's showing extra chars with hindi characters. If you provide any solution that would be great.

Thanks,
Gowtham.

Matt_Amato · February 27, 2015, 2:40pm

So I looked into this some and I think the main problem is the way Unicode works in JavaScript. Here’s a fairly extensive article on the subject: https://mathiasbynens.be/notes/javascript-unicode

The problem is specifically the way JavaScript deals with surrogate pairs (two characters that actually form one glyph). For example, alert(‘ตึก’.length) reports 3, even though visually there are clearly 2 characters. In order to support a large amount of strings, the LabelCollection renders each glyph as a separate image, so when we iterate the string, we actually create two separate characters for one glyph (because JavaScript itself treats a surrogate pair as two characters).

I’m not sure there’s an easy solution for this problem using our current label rendering techniques. The good news is it’s incredibly easy to work around by just using Billboards instead. (which is how labels are implemented anyway) Below is an example of doing that in Cesium. Basically rename LabelCollection to BillboardCollection and instead of assigning a text property, use writeTextToCanvas to create an image from the text and assign it to the image property.

The only downside to this approach is that it will use more texture memory and if you have dynamic textures, you could run out. Give it a try and let me know how it works out. I also opened a GitHub issue so that we look at this in the future: https://github.com/AnalyticalGraphicsInc/cesium/issues/2521

var viewer = new Cesium.Viewer(‘cesiumContainer’);

var scene = viewer.scene;

var camera = viewer.scene.camera;

camera.lookAt(Cesium.Cartesian3.fromDegrees(100.5382368,13.8, 50000),

Cesium.Cartesian3.fromDegrees(100.5382368,13.7242002, 0), Cesium.Cartesian3.UNIT_Z);

var labels = scene.primitives.add(new Cesium.BillboardCollection());

labels.add({

position : Cesium.Cartesian3.fromDegrees(100.545624,13.743179),

image : Cesium.writeTextToCanvas(‘ตึก’, { font: ‘24px san-serif’ })

});

Let me know how it works out.

Thanks,

Matt

Hyper_Sonic · February 27, 2015, 4:28pm

You can replace text with this and get the same result

text: String.fromCharCode(0xe15,0xe36,0xe01)

Codes gotten from

console.log(‘ตึก’.charCodeAt(0).toString(16));

console.log(‘ตึก’.charCodeAt(1).toString(16));

console.log(‘ตึก’.charCodeAt(2).toString(16));

console.log(String.fromCharCode(0xe15,0xe36,0xe01));

Doing this you can see the components of the surrogate pairs

console.log(‘ตึก’.charAt(0));

console.log(‘ตึก’.charAt(1));

console.log(‘ตึก’.charAt(2));

The first glyph is a combination of 0xe15 (the bottom of the first glyph) and 0xe36 (the top of the first glyph). For some reason label isn’t showing the top part of the first glyph. Though I’m not sure how one is suppose to know that the 2 glyphs are suppose be printed at the same character position.

Matt_Amato · February 27, 2015, 4:36pm

Though I’m not sure how one is suppose to know that the 2 glyphs are suppose be printed at the same character position.

Exactly, that’s the root of the problem. It appears that they are trying to fix this for ES6 and we may be able to use the code form here to fix it:
http://norbertlindenberg.com/2012/05/ecmascript-supplementary-characters/

Hyper_Sonic · February 27, 2015, 5:37pm

I was reading through some of the links. There are some wild Unicode characters out there from character combining! This is one is 75 characters long, yet it only takes 6 character slots (well horizontally, vertically it takes like 5.)

console.log(“Z͑ͫ̓ͪ̂ͫ̽͏̴̙̤̞͉͚̯̞̠͍A̴̵̜̰͔ͫ͗͢L̠ͨͧͩ͘G̴̻͈͍͔̹̑͗̎̅͛́Ǫ̵̹̻̝̳͂̌̌͘!͖̬̰̙̗̿̋ͥͥ̂ͣ̐́́͜͞”.length);

They also talk about a new commands codePointAt() and fromCodePoint. While charCodeAt assumes 8bits per char I presume that this can be any multiple of 8bits per char.

Topic		Replies	Views
The text in the label in the entity is sometimes not displayed CesiumJS	4	2172	August 5, 2020
Label Issues CesiumJS	4	265	March 9, 2015
Missing characters from label? CesiumJS	2	130	September 10, 2015
dataSource loadUrl / bloken umlaut CesiumJS	3	196	September 25, 2013
bug CesiumJS	1	172	April 30, 2019

Non-English Label Support

Related topics