Sandcastle Embedding Search: We want your input!

Sandcastle has experienced significant user interface enhancements over the past several months. To build on these developments, we’re experimenting with an embedding-based search to make finding examples more intuitive and resilient to semantic differences. We’d love your input on how it feels in practice.

Where things stand

  • Draft pull request can be viewed here.

  • Deployed sample can be found here.

  • In the PR, we’ve fully replaced the current text-based search with the new embedding search for evaluation.

  • The approach uses an open-source, MIT-licensed embeddings model from HuggingFace.

What we’d like your feedback on

  • Does this improve your workflow and help you find the right sandcastles more quickly?

  • Are startup download size and runtime memory acceptable for your environment? The model increases the initial download size and memory usage.

  • Do you see any regressions in relevance or quality when trying different queries?

Examples, links, and real-world use cases are especially helpful. We look forward to hearing your thoughts.

See some before-and-after results below:



1 Like

In deployed sample, the ‘labels’ doesn’t seem to work. I wanted to test the search’s impact with the labels, tried turning ‘showcases’ on and off, nothing. I’m a huge hater of the default label ‘showcases’ being turned on, with anything I do in Sandcastle it’s the first thing I have to turn off, and that’s just because I know it’s there. It’s a poor design choice.

Cheers,
Alex

The new cesium sandcastle is very impressive and easy to use.

The use of the labels is a cool addition but I personally don’t see myself using them as much as I probably should. Anyway, the labels didn’t seem to update my search results :cry:

Design: The menu looks great, I would prefer checkboxes to select multiple and its easier to see the check mark (On the left). Also a remove all button would be amazing as well!

I look forward to seeing this in!

Thanks @Alexander_Johannesen and @rileyhowley for the feedback! One note for both of your comments, this PR and Demo is scoped to the new search bar mechanics, not the labels feature (though the regression to the labels can definitely be fixed :slight_smile:). The target with this change was to make the search more effective on its own; as to not necessary require labels.

In your use cases, would you typically filter by a label, and then type something in to search? With this in mind, one other idea we had discussed in this area would be to use AI to automatically generate better labels for each sandcastle. Does that seem like something that would improve your typical use case for discovering sandcastles?

My apologies, I 100% read that wrong. My typical use case would be just to search for exactly what I want to find. I can now see the search gives me more examples.

One thing I did notice, when I search “Cluster“ I get some results for clustering but then I get a bunch of other ones that to my knowledge I don’t see any information why they are being returned under that search condition.

Ill continue to use this for research and finding things and see how it goes :laughing:

Yes, however I was trying to test if the new search helped with the problem of the Showcase label being on by default, don’t know if what I saw was a regression or not. Any other search improvement would be an improvement. :slight_smile:

Another thing to test (unless you’re already doing this) is to search the code, more specifically for comments, but also nice to see what sandcastles uses certain features of the API, that would be really helpful as the documentation on best practice is a bit sparse. (Which raises the question if the Sandcastle code examples are best practice or not … different thread. :slight_smile: )

Addendum: I personally think the labels gets in the way of good UX here, as it’s operating as a strict filter and the problem is that it isn’t obvious there are any Labels ticked (hence my main issue with it). There are several ways to fix it, simplest with a note at the top that these results are based on Label X (5 results), click here to see all results (20 results). You also have many labels, and from the look of things only one label works as a filter at a time? Multi-select labels might work ok (with some merging of results?). I don’t know, tagging / labels was the old-school way of trying to fix bad search, and it still might be helpful, but I don’t think the mechanics of it is clear in the UI. Anyway, consider me a hater. :slight_smile:

1 Like

Hi Riley! Thanks for the feedback! I updated the branch to address some of the concerns you raised. The initial embedding search always returned a static number of results, which usually included results that weren’t related to the search. I did some testing to set a static distance threshold that should reduce the number of irrelevant results. I also re-introduced the Pagefind search, so it is now a hybrid of the Pagefind search and embeddings search. Give it another try and let me know if that improves the results for you!

In my testing, the showcases filter is applied as a default before a search is provided, but after starting to search it is removed. So I don’t see it applicable in this case (though I can totally understand an argument for wanting it to start with no filter applied). But my hopes with this embeddings search is that it eventually makes the tagging system less necessary, as ideally the search would capture the tags implicitly within the embeddings

I did some initial testing with embedding the code, though this done using some of the larger embedding models available. On smaller models (like the one used in this PR), the performance notably degraded. But I think it would be a great idea in the future to expand on this search to embed the code as well, if we find this initial implementation worth expanding on.

Hey @tomdicarlo,

Sorry for the delay in getting back to you! I hope you had a good new year!

I noticed some weird behavior when searching for the first time.

But if I remove one letter and put it back I get this (Without refreshing)

Then if I search before the loading

Hi Riley, thanks for the additional feedback! Your first issue was being caused by a race condition, which I’ve just pushed additional logic to prevent from happening. The second looks like it was a result of the Showcases filter being applied. Also based on Alexander’s feedback, I’ve chosen to remove that filter form the default, as it made a confusing search experience since it would start applied and then be removed after the first search.

Let me know what you think!

Tom

1 Like