Ok, so you’ve decided to use WebAssembly for your application. With over 30 runtimes to choose from the next question will probably be - “What runtime should I use?”. It is a tough decision, and as always, there isn’t a one size fits all.

Choosing a WebAssembly Runtime

This is part 2 of the 4 part mini-series I’m writing about embedded WebAssembly. In the last part I discussed the common issues development teams can encounter when porting their code to WebAssembly. In this article I am going to focus on choosing the best WebAssembly runtime for your needs.

The Criteria

The runtime you pick will depend on your requirements, so it’s probably best that I explain the criteria I have been using. My predominate requirement is to be able to run WebAssembly on a range of devices from large machines through to smaller embedded devices. Apart from the technical requirements, there is always best practice when selecting an Open Source project, and that includes understanding the open source community.

An Open Source Project and it’s Community

Like anyone adopting an open source project it is important to understand that the project you pick is going to be supported. This can be less important for a technology demo or research project. It can be much more important if you are planning to use the runtime in a commercial product. If your product is going to require additional features and you’d like to add it to the open source project (upstream them) then it’s best to understand if these are the types of features your preferred project would also want. - This is a bit like dating but with an open source community and project. Ask yourself “Is this project going to be a good fit for your use case?”.

Any software project, either commercial / closed source, or open source is about the people and organisations that support it. I’d recommend spending some time getting to know the folks behind the projects. When you take a dependency on an open source project you are effectively partnering with the folks behind your chosen project.

With that being said. Let’s get back to the technical requirements.

What does Embedded mean to you?

This mini series is all about WebAssembly for embedded systems, and embedded systems can take all shapes and sizes. This can range from small micro-controllers all the way up PC’s embedded in larger products. Naturally the available operating systems and hardware capabilities can vary highly.

A Quick and Dirty Map of Embedded Hardware and Operating Systems

This, of course has a huge impact on the range of operating systems (platforms) and hardware (instruction sets). On the smaller end there maybe devices which run without an operating system at all, a bare metal environment, then there may be devices which run with real time operating systems, or on the larger side embedded Linux. As a very rough rule of thumb, we can map this out as follows.

A Rough Map of Embedded Systems

The diagram is a quick and rough estimate of the hardware types and operating systems that are used in Embedded systems. Generally as the hardware capabilities decreases the software will move from fully fledged desktop like operating systems, to Real Time Operating Systems (RTOS), then has hardware gets ever smaller eventually to bare mental programming with no operating system at all.

There are of course exceptions to this rule, I’ve seen bare metal programming on an x86-64 with gigabytes of memory, so it’s not impossible, it is just a bit less likely to happen.

WebAssembly verses Containers verses Virtual Machines

As I mentioned in the previous post there are two key reasons why WebAssembly is useful:

It provides a common compilation target across a diverse landscape
It provides a sandboxing option for devices which have previously not been able to support one

While this is true, it is also import to highlight the alternatives to WebAssembly. Standard operating systems like Linux, Windows and BSD provide defacto standard compilation targets. As you can see from the diagram, as devices become more capable the adoption of well known and widely used operating systems provide an increasingly well supported technology stack and, effectively a common compilation target.

Likewise as the devices capabilities increase the options for sandboxing also increase. Linux of course provides containers, BSD Jails, and all operating systems including Windows support hypervisors. These are all viable sandboxing alternatives and are all really well known and understood.

As device capability increases the likelihood of other sandboxing / isolation and pre-existing compilation targets being used increases. Simply put, the more capable the device the more options you will have. More hardware power, more software is available, more choices and design decisions become possible.

A Rough Map of Embedded Systems

We can show this by updating our map of the embedded systems. Hypervisors are more widely available in the higher end RTOS systems than containers are.

WebAssembly the Software Matryoshka

Apart from the lower footprint, and being able to address smaller devices in the embedded landscape, the other thing that WebAssembly excels at is the ability to tightly embed itself inside other software. This design pattern starts to look like a Russian doll (a Matryoshka), with software embedded inside software.

If you need to host some sandboxed 3rd party code inside your application then WebAssembly is a great choice, Itsio, is a great example of this. There are some runtimes specifically designed for this software within software scenario, and some others that have been designed to make it easy for developers to embed themselves inside their preferred language environment.

It is far easier to integrate a WebAssembly runtime into existing software than it would be to integrate an OS or platform primitive, like a BSD Jail, Linux Container, or hypervisor for a virtual machine. Each of these implementation options also require binding the resulting solution to a specific operating system and potentially a hardware platform. Using WebAssembly allows this implementation decision to be delayed.

The WebAssembly Runtimes

There are this site lists over 30 WebAssembly runtimes, I am going to provide an overview of probably the most widely known, non-web runtimes. This includes:

Wasm2C
WAMR
WasmTime
Wamser
Wazero

Each of these have their pros and cons, and many are developed for specific use cases. For each of these runtimes we can detail their footprint either in terms of ROM image size or RAM requirements, and the operating systems they support. This will allow us to place them on the map of embedded systems shown above. Let me go through and quickly introduce them.

Wasm2C

This is a tool which will convert. (transpile) any *.wasm file into a block of ‘C’ code. This can then be compiled into your application. Wasm2C is reliable and is used in production today by Mozilla in the Firefox browser. It forms part of a solution Mozilla uses called RLBox which was developed by the University of San Diego to provide a form of micro-encapsulation and sandboxing to protect Firefox from software supply chain attacks. Wasm2C is awesome for compiling core WebAssembly to C. If your WebAssembly module has any imports, Wasm2C will create the stub functions for you to fill in with the missing functionality.

There is a great presentation from Keith Winstein (Stanford University) who maintains Wasm2C and Dominik Tacke from Siemens which demonstrates how small WebAssembly can be - TLDR; 3kb of ROM and 100’s of bytes of RAM. The presentation is available on Youtube here. The resulting c code includes the necessary parts of runtime, so when you transpile from wasm to c, the resulting c code includes both the logic you originally had in your wasm file and the bits of a runtime you need. The resulting output is ANSI-C, and can be compiled on any platform you have a C compiler for. In the embedded world, this is almost every target platform.

Wasm2C is currently bundled as part of WABT - pronounced w-abbit, or the WebAssembly Binary Toolkit. You can use Wasm2C wherever you have a ‘C’ compiler.

WAMR (Wasm Micro Runtime) aka `iWasm`

The WAMR runtime was originally developed by Intel for microcontrollers. It is very efficient and some researchers have reported that for some specific workloads they see WAMR’s AoT mode delivering faster than native performance, although in general and for most workloads there is a performance hit. In terms of compiled size, WAMR is small, and can come in at around 50kb for the AoT only version, the default MacOS and Linux versions both come in at ~426k. It is possible to run WAMR on devices with just 340kb of RAM.

WAMR is reliable has been used in production by Amazon Prime Video, Xaiomi, Intel and others. Information shared by Xaiomi and Amazon via the Bytecode Alliance suggest that there are millions of embedded devices deployed with customers today which run WAMR. It is also the runtime of choice for many members of the Embedded Special Interest Group at the Bytecode Alliance.

WAMR comes with a command line utility called iwasm and it can sometimes referred to as that. In terms of both performance and resulting binary size, WAMR and Wasm2C are the two best performing runtimes. For footprint, they both very small.

With the exception of Wasm2C, which just needs a ‘C’ compiler, WAMR by far and away has the longest list of platforms upon which it can execute of any of the runtimes. So, here goes.. WAMR is available for Windows, MacOS, Linux and Linux SGX, Android, iOS, Freebsd, Nuttx, VXWorks, Zephyr, AliOS, Mbed OS, RT-Thread, RIOT and more… (Phew!)

WasmTime

The WasmTime runtime is also reliable and is used by Fastly, Cosmonic and Fermyon, and F5.A The full WasmTime footprint is large coming in with a 38 meg binary. It is designed initially for server side deployments. There has been a lot of work recently to reduce the footprint of WasmTime and specifically for the embedded world this has produced a ROM footprint which is about 2 megabytes in size, a huge drop from the original 38mb. WasmTime is available for Windows, MacOS, Linux, and Android. WasmTime is written in Rust.

WasmEdge

WasmEdge was developed to help bring cloud developer experiences to the edge and to allow the same binaries to be deployed in both locations. It has support for Windows, MacOS, and Linux. It’s written in C++. The default compiled version of Linux comes in at roughly 82mb.

Wasmer

Wasmer is another great runtime. It has support for Windows, MacOS, and Linux and like WasmTime is too is also written in Rust. Once installed it comes in at roughly 108 mb on a Linux machine. So for Linux machines running at the edge, but not necessarily requiring the tight embedded requirements it

Wazero

Wazero is a Go based implementation as such would be ideal for use by a Go developer wanting to add safe sandboxing to their Go application. It has support for Windows, MacOS, Linux, FreeBSD, NetBSD, OpenBSD, DragonFly BSD, illumos and Solaris, and has one of the smallest rom footprints of the server targeted runtimes at roughly 5.5mb.

Updating the Embedded Map

With this information we can quickly update our embedded map, to see where each of these runtimes can be used.

A Rough Map of Embedded Systems

Published Performance Figures

Some of the latest published performance figures comparing WebAssembly runtimes has been compiled by Frank Denis and made available on his personal blog. The results are basically inline with what I personally have observed. With Wasm2C and WAMR as the two, generally fastest implementations, with Wasmer and WasmTime following up.

Runtimes and Modes of Operation

There have been improvements since Frank’s blog post by all of the projects and I’m sure that the performance figures have improved by all. However this is good enough to get a sense for where the runtimes currently sit when compared to each other.

Covering performance testing and comparisons between different runtimes probably requires its own dedicated blog post. What’s important to note when you see the performance figures is that each of these runtimes have various “modes” of operation and when you check Frank’s blog post you’ll this mentioned. It is probably therefore important to cover them briefly from an embedded perspective. There are roughly 3 basic mode of operation each runtime can implement, they are as follows:

Interpretation
JIT (Just in Time Compilation)
AoT (Ahead of Time)

Some runtimes offer all three modes and some only do interpretation. But this does have an impact on how we can use the runtimes, so let’s quickly review these three modes.

Interoperation

As you know the WebAssembly bytecode is format which cannot be executed on hardware, instead the software runtime needs to read the bytecode and work out what to do. This essentially is what the interpreter mode does. This of course can be slow as it is interpreting the meaning of each wasm bytecode in turn.

Just in Time (JIT)

A faster way to do this, would be to cross compile short snippets of the WebAssembly bytecode you’ve been asked to execute, just before you need to execute it. Then save that compiled bit of code in memory so the next time you need to execute it, you don’t even need to compile it, you just jump to the pre-computed native code and run it. This is basically what a Just In Time runtime would do. It can be faster for larger application, but for a small short application the interpretation might be faster - since you don’t have to wait for the WebAssembly bytecode to be compiled at all.

JIT does have the side effect that it can produce variations in speed which can be troublesome for embedded systems or cyber-physical systems. The control code could execute slowly and then all of a sudden speed up, and if the runtime runs out of memory it may throw away a pre-compiled snippet, meaning it needs to be compiled again, also reducing speed. So performance can be varied and not reliable - this is often referred to as “jitter”. In addition it requires the inclusion of a compiler which can make the runtime size large.

Ahead of Time (AoT)

If you could do all of the JIT for all of your wasm file, generating native code for every snippet and were to save that in advance of needing to execute it, then you would get JIT like speed without the jitter. You also wouldn’t need to include a compiler in your runtime, so you’d have a small runtime. This is basically what AoT - Ahead of Time compilation does. This is often a manual step that the developer needs to do after the wasm file is built. The resulting pre-compiled file will be platform specific, so it will loose the portability of the original WebAssembly file but it will gain significantly in performance. The Wamr, Wasmedge and Wamer runtimes all offer this feature.

A Very Quick Example of AoT

Just to demonstrate, this is how the Wamr runtime supports AoT. Let’s assume we’ve got a really simple “hello world” c application:

// hello.c
#include <stdio.h>
int main(void) {
  printf("Hello World\n");
  return 0;
}

We can compile that to WebAssembly using the WASI-SDK like this:

/opt/wasi-sdk/bin/clang -o hello.wasm ./hello.c

Of course we can run this using the WAMR runtime like this:

mad@Workstation:~/tmp$ iwasm ./hello.wasm
Hello World
mad@Workstation:~/tmp$

We can compile this to a WAMR AoT file using the WAMR compiler, cunningly called “wamrc”, as follows:

mad@Workstation:~/tmp$ wamrc -o hello.aot ./hello.wasm
Create AoT compiler with:
  target:        x86_64
  target cpu:    haswell
  target triple: x86_64-unknown-linux-gnu
  cpu features:
  opt level:     3
  size level:    3
  output format: AoT file
Compile success, file hello.aot was generated.
mad@Workstation:~/tmp$

We can execute that with WAMR like this:

mad@Workstation:~/tmp$ iwasm ./hello.aot
Hello World
mad@Workstation:~/tmp$

For embedded systems, if you can get away with it using the AoT and compiling ahead of time will reduce the load on the eventual target system you are supporting.

The Rough Rule of Thumb

There are a range of WebAssemly runtimes available, for the Embedded systems the two most useful runtimes are WAMR (WebAsembly Micro Runtime) and Wasm2C. The other runtimes require either more resources or more capable operating systems. As a quick rule of thumb, if the device is small use Wasm2C, if the device is capable, WAMR is a good fit. If the device is big, then your choices expand and you can use a wider range of runtimes, or equally use hypervisors or containers for sandboxing.

Rough Hardware Estimation	Operating System	WebAssembly Runtime
Under 340k RAM	Bare Metal / RTOS	Wasm2C
340kb+ RAM	Bare Metal / RTOS / Linux	Wasm2C or WAMR
Linux Capable Device / 8 Megs + RAM	Linux / RTOS	Wasm2C, WAMR, WasmTime, Wasmer, Wazero, WasmEdge, + others.

Note: By default WebAssembly allocates memory for WebAssembly modules in pages, these pages are 64kb in size. There is a proposal called (unsurprisingly) Custom Page Sizes which aims to allow page sizes of any arbitrary value.

Next Time

In the first part of this mini series I covered how to port software to WebAssembly, in this article I’ve given an overview of the available runtimes focusing on the most suitable for embedded systems. In the next post, I’ll focus on how to integrate a runtime into your embedded system, how the runtime gets access to system resources, and how to share data between the WebAssembly application and the host system.

Choosing a WebAssembly Runtime

The Criteria

An Open Source Project and it’s Community

What does Embedded mean to you?

A Quick and Dirty Map of Embedded Hardware and Operating Systems

WebAssembly verses Containers verses Virtual Machines

WebAssembly the Software Matryoshka

The WebAssembly Runtimes

Wasm2C

WAMR (Wasm Micro Runtime) aka iWasm

WasmTime

WasmEdge

Wasmer

Wazero

Updating the Embedded Map

Published Performance Figures

Runtimes and Modes of Operation

Interoperation

Just in Time (JIT)

Ahead of Time (AoT)

A Very Quick Example of AoT

The Rough Rule of Thumb

Next Time

WAMR (Wasm Micro Runtime) aka `iWasm`