Abstract Wikipedia/Updates/2023-10-25
◀ | Abstract Wikipedia Updates | ▶ |
Running on WebAssembly
A few weeks ago we opened up Wikifunctions for some community members – but have yet to open it up to wider contribution and usage. Thanks to the brilliant input of some community members, most notably Lockal, we were made aware of some potential security issues before they could be exploited. This led us to limit function calls to logged-in users while we implemented some security mitigations.
Our original plan was to rely on a multi-layered approach to security, where we split up the backend into two parts, one being the orchestrator, which collects all necessary data, and the other the evaluator, which actually runs the code written by Wikifunctions editors. The evaluator would be running in a Docker virtual machine with very limited rights. But, as we opened up Wikifunctions, issues arose that, although not yet exploitable themselves, might become so in the future.
We partnered with the SRE and Security teams in response to the new concerns, and together we brainstormed ideas and hammered out potential solutions to add further layers of protection. The idea is to provide additional security in depth. One major component of our revised security strategy required a complete rewrite of the evaluator encapsulation service: instead of running user-written code in language runtimes directly in Docker, we will run them on top of a WebAssembly runtime inside the container.
What is WebAssembly? WebAssembly, or "WASM" for short, is a low level programming language, meaning it is comparably simple and doesn’t directly support higher levels of abstractions. There are many different runtimes for WebAssembly, the most prominent of which are basically all modern browsers (thus the “Web” in the name). As with many other low level programming languages, it can also serve as a compilation target for other programming languages, meaning that you can take, for example, code written in C or Rust and compile it to WebAssembly. This allows programs that were written for the desktop to be run in the browser. One example is the Jump-and-run game SuperTux, which was originally written in C++, and can now be run in the browser.
WebAssembly does not have to be run in the browser; it can also be run on a server. In the last few years, a flurry of activity has created dozens of runtimes. One advantage of WebAssembly is that the runtime that runs WebAssembly is easy to control and limit; thus, translating code to WebAssembly adds an additional layer of security.
As of this week, we have deployed the new version of the evaluator for JavaScript. We will be monitoring how this change will affect the performance and cost of running Wikifunctions. Note that the WebAssembly runtime does not replace the other security measures, but is being added in addition to the existing measures. If you inspect the "Details" of a function run on JavaScript now, you'll see that it's run on QuickJS v0.5.0 inside WASM (specifically, on WasmEdge), rather than Node v16.17.1. We are working on also switching the evaluator for Python to one based on WebAssembly soon.
One previous decision has made things a bit more challenging, though: our choice to start with JavaScript and Python. WebAssembly is geared towards compiled programming languages such as C, Rust, or Go, whereas Python and JavaScript are interpreted languages. Eventually, we found Python and JavaScript interpreters that can be compiled to WebAssembly, and then these compiled builds are used to run the actual Python and JavaScript code. We live in interesting times.
In fact, the tooling around WASM for Python and JS is so novel and bleeding-edge as to have caused some "fascinating" bugs during adoption. At one point, we had got our Python executor running on WebAssembly, using (among other things) a great tool called wasmtime, written by Bytecode Alliance. Our tests were reliably green for a couple of weeks, even up to the day we decided to switch our staging Python executor to use WASM. However, once our new release reached the staging area, Python function calls mysteriously failed. After debugging, we found that our call to the wasm command line tool was the culprit. It turned out that the wasm runner we were using had pushed a new major version, flagged as a breaking change, less than an hour before we built the image for deploy. The fix for that issue was easy–we simply re-specified that our code download and use the previous version of the command line tool–but this demonstrates how fast-moving the world of WASM can be.
Where will we go next? We will be monitoring the load that the new architecture puts on our servers, to see if the system is sustainable. There will be some change in the speed of evaluating functions, but we expect that the change will be, overall, barely noticeable at all. We hope that the additional layer of protection will hold up, but if you do find a way past it, let us know.
We think there is quite some room for improvement in terms of runtime speed. WebAssembly runtimes have seen a whirlwind of development in the last few years, and it seems that particularly for interpreted languages it is still rife with opportunities. One way to improve the runtime characteristics of Wikifunctions is to add support for languages that are more natural fits for WebAssembly, such as Rust or C. Given the automatic support for the fastest implementation, this might swiftly consolidate to more efficient implementations. But compiled languages would also need a slightly different architecture, as the compilation results would need to be stored somehow. One interesting option would be to also push the function evaluation to the user’s browser, since it contains a WebAssembly runtime as well. But we would need to understand the consequences of that, particularly for slower devices.
We used this change also as an opportunity to experimentally switch on the right for everybody to run community-approved functions on Wikifunctions, not just for logged-in users. As you can see, this change is buried deep in this update, and it might be pulled back anytime again. We will monitor the system to see how stable it is. We will keep you up-to-date in this newsletter.
Thanks to Cory for taking the lead on this project, James for taking it to production, and the Security and SRE teams, who supported us so helpfully! It is great to see it deployed, taking us a big step closer to opening up Wikifunctions to everyone.