Why is CesiumJS molesting my tileset url?

Hi folks,

We are serving a tileset whose URL is this:

    'https://myserver/mytileset.json?somename=some_value_with_a_comma,and_an_encoded_comma%2C'

And we’re trying to use this tileset in a CesiumJS client program, using a Cesium3DTileset.
So we give the desired url string to the constructor:

  tileset = new Cesium.Cesium3DTileset({ 
    url : 'https://myserver/mytileset.json?somename=some_value_with_a_comma,and_an_encoded_comma%2C', 
  });

But that’s not the request that is sent over the network. Instead, what is sent is:

  'https://myserver/mytileset.json?somename=some_value_with_a_comma%2Cand_an_encoded_comma%2C'

This has thrown us into complete turmoil :slight_smile:

Questions:

(1) Can we express the desired URL at all?
It seems to be impossible, using a Cesium3DTileset,
and so we are actually changing/redesigning our tileset server to work around this.
Specifically, it seems that the actual url can’t contain any ‘,’ characters,
since those just simply can’t be expressed at all. (Is that right?)
The amount of design work and discussions we are having right now
in order to figure out what contortions to use to avoid the ‘,’
(and maybe other characters? we don’t even know) is mind-boggling.

(2) Why on earth is CesiumJS doing anything to the ‘url’ at all?
Is there any reason it shouldn’t be simply sending the url string
I give it, as the actual request URL string?

(3) What other restrictions / transformations is it doing?
We know it percent-encodes ‘,’ (in only some positions in the url? not sure),
and ‘%’ apparently gets left alone in the above example;
but in other cases, ‘%’ seems to make it silently fail
(no requests are sent at all), e.g. each of the following examples:
url=‘https://myserver/mytileset%.json
url=‘https://myserver/mytileset.json%
url=‘https://myserver/mytileset.json?name=value_with_a_percent_%

Thanks,
Don Hatch

You need to decode url at server end to get actual url like:
decodeURIComponent('https://myserver/mytileset.json?somename=some_value_with_a_comma%2Cand_an_encoded_comma%2C');

  • Regards

Hi Jacky, thanks for your reply.

I’m afraid I’m not following your answer at all :frowning:

As I said, the url to the tileset is exactly “https://myserver/mytileset.json?somename=some_value_with_a_comma,and_an_encoded_comma%2C” ,
but I can’t figure out how to get cesiumjs to send that as the https request.
Are you saying that’s not a legitimate URL for a 3dtileset?

Note that the algorithm that the server uses to parse that url is an implementation detail (not exposed and not nailed down) and doesn’t necessarily use decodeURIComponent(); in fact, there probably isn’t a function called decodeURIComponent() in the C++ server framework we’re using.

Thanks,
Don

What service has a comma in their URL path? That’s violating several best-practices on good URLs. Now, they’re not evil (as in, they are not considered invalid), but you won’t have any guarantees that a comma is preserved through the whole pipeline.

In other words, it might not be Cesium that can’t deal with it, it can be the URL handling code, JS/TS, maybe if you’re using a framework, and equally on your side as the server side. I find it very odd to have commas in the service, it’s a recipe for disaster as you can’t guarantee it. Is there some other way? Maybe use some other character?

If not, if this is maintained by the service provider, maybe have a word with them? See if they can use other characters to denote what the comma denotes?

Cheers,

Alex

The comma , is a reserved character. They usually have to be encoded when the URLs are passed around between applications. It’s sometimes difficult to get this right, though.

I had a short look at the relevant CesiumJS code. It basically extracts the query parameters, resulting in some_value_with_a_comma,and_an_encoded_comma, (note that the originally encoded comma is now decoded!). This is later used to build the actual URL that is sent out - including the query parameters. But this time, they are properly encoded, resulting in the some_value_with_a_comma%2Cand_an_encoded_comma%2C that you see.

So far for the reason. Finding a solution might be more tricky. I’m even a bit on the fence whether this might not be considered as a “bug”: Strictly speaking, the % in
some_value_with_a_comma,and_an_encoded_comma%2C
should be encoded as %25, so that this part would become
some_value_with_a_comma%2Cand_an_encoded_comma%252C
Decoding this would yield the original URL.

Hi Alex and Marco,

Alexander_Johannesen wrote:

What service has a comma in their URL path?

It’s a service we wrote, that produces a 3DTiles tileset.

That’s violating several best-practices on good URLs. Now, they’re not evil (as in, they are not considered invalid), but you won’t have any guarantees that a comma is preserved through the whole pipeline.

I don’t follow your reasoning-- while I agree that we never have a guarantee that anything is preserved through the whole pipeline, due to bugs in the various pipeline components, my reading of RFC3986 is that pipeline components that conform to that RFC do in fact guarantee that comma will be preserved, since it’s a reserved character (more on this below). What are these best-practices you’re referring to?

In other words, it might not be Cesium that can’t deal with it, it can be the URL handling code, JS/TS, maybe if you’re using a framework, and equally on your side as the server side. I find it very odd to have commas in the service, it’s a recipe for disaster as you can’t guarantee it. Is there some other way? Maybe use some other character?

Again, I don’t understand the reasoning. Yes, all of these components may, and do, have bugs, not just Cesium, but I don’t understand why having commas in URIs would be considered odd. RFC3968 specifically says comma is one of the characters reserved for being used like this (that’s precisely what “reserved” means, if I understand that RFC correctly; more on this below).

Marco13 wrote:

The comma , is a reserved character. They usually have to be encoded when the URLs are passed around between applications. It’s sometimes difficult to get this right, though.

Yes, comma is a “reserved character” as defined by RFC3986 Section 2.2. Reserved Characters.
This implies, according to that section:

Thus, characters in the reserved set are protected from normalization and are therefore safe to be used by scheme-specific and producer-specific algorithms for delimiting data subcomponents within a URI.

My reading of that is that “reserved” means that it’s reserved for exactly what my server (the “producer-specific algorithm”) is doing with it (that is, “delimiting data subcomponents within a URI”) and so this passage is specifically saying that comma is “protected from normalization”, i.e. cesiumjs shouldn’t be messing with it (that is, turning ‘,’ and ‘%2C’ into the same thing, which is a normalization).

BTW, I’m also (trying to) use comma in the path part (to the left of the ‘?’) rather than the query part (to the right of the ‘?’) of the URI. In this case, cesium apparently does a different normalization: instead of turning both ‘,’ and ‘%2C’ into ‘%2C’, it turns them both into ‘,’. This is another illegal normalization on cesium’s part, according to my reading of RFC3986, and it seems even more bizarre than the first one, since I can’t think of why cesium would want to do any transformations on the path part at all.

It’s still not clear to me why cesium is doing any decoding/encoding of anything at all-- even the query params part of the URI, let alone the path part. Why doesn’t it just use the URI it’s given?

Hi there,

The short answer is that you can use commas in URLs all you want, and you can argue that all software shouldn’t have bugs dealing with commas, no need to listen to anyone telling you it’s a bad idea (because they may have ventured down that path before), and that commas aren’t used much in URLs for a variety of reasons, including those bugs or otherwise. All of that is perfectly fine. Happy coding, and good luck.

Cheers,

Alex

Hi Alex, I’m sorry if I’ve offended you. Your paraphrasing of what I’ve said sounds like I’m being unreasonable and dismissive of you, and I didn’t mean to come off that way at all.

Thank you for your clarification, it’s clearer to me now and I believe I understand what you’re saying, and never meant to dismiss it. It’s valid advice, and a valid point of view.

I’m also reporting this bug (it seems increasingly clear that it is in fact a clear bug in CesiumJS, which wasn’t obvious to me at first) with the hope that it might be addressed, for obvious benefits, which, I hope you agree, is also valid and not unreasonable.

Cheers to you,
Don

No need to apologize, mate, and I’m not offended. I’m just pointing out that using commas in a URL is setting yourself up for a world of pain. Using commas might be fine in terms of standards and definitions and certainly will work as intended a large portion of the time (which is why it’s often too late to change, because there’s a 70% success rate until you bump into something), but in the real world with tons of layers and frameworks that all may or may not treat a special characters in URLs like a comma slightly differently. This is why most services don’t use commas in their URLs, but rather use things like pipe and tilde, or even semi-colon (although some warnings about that one, too). It might not be Cesium that’s your issue, but a third-party library that Cesium uses, or any other along the chain.

I know that going back to a service and change it, especially if there’s people already using your originally designed way, can be hard. But my advice, with some 35 years of experience as a developer and 27 of those with web technologies, is to not use commas in URLs; they will cause you grief.

Cheers,

Alex

It has been 5 months since this was brought up. I had looked at some details of RFC3986 back then. But I’d need to re-read some of the implementation and specs as a refresher for some of the details.

From what I remember, my impression eventually was that CesiumJS did “The Right Thing”. (Even though I wasn’t 100% confident here: The spec has ~60 pages, and some passages appeared a bit vague for me - probably due to my lack of understanding of certain parts of the … let’s call it ‘domain-specific vocabulary’).

For example, @donhatch , you quoted one paragraph of the spec. But the quoted part is immediately followed by the statement

If data for a URI component would conflict with a reserved character’s purpose as a delimiter, then the conflicting data must be percent-encoded before the URI is formed.

So… does the , have to be percent-encoded/decoded in this particular scenario? (And if so: Where exactly does it have to be encoded, and where does it have to be decoded?). This might be clearer for people who have a more specific understanding of what terms like “purpose” or “conflict” are supposed to mean here…

The statement

may sound a bit snarky, but I’m sure that this was neither caused by any sort of offense, nor intented to cause any sort of offense. (It might just be a sign of the ubiquitous “Programmer Weltschmerz” - there are just too many bugs out there :laughing: ).

I think the main point of this statement was: You have to anticipate that there is buggy software out there. But you should not (have to) anticipate that CesiumJS is buggy. If there is a bug in the URL handling of CesiumJS, then it should be reported and fixed.

(Again: I had read a bit of the spec, and the behavior seemed right for me back then. But if you suspect that this is a bug in CesiumJS, then the technical details could better be sorted out in an issue - also to better keep track about the reasoning why it was implemented one way or the other)