The Growing WASI-Size Questions
I have taken a break from the series on Embedded WebAssembly to take a look at a change in the WASI-SDK. Back in July 2024, WASI-SDK 23 was released. This release dramatically increased the size of the resulting *.wasm
file, increasing it a whopping 1029%. This is because WASI-SDK 23 turned on debug information by default. If you are linking against WASI Lib-C there is no way to turn this off. The traditional compiler switches of -g
or -g0
to turn on / off debug information respectively simply don’t work. It is always on. There are ways to remove the debug information, either by using post compilation tools, or via some funky linker commands.
The bottom line is that the WASI-SDK does not adhere to typical compiler convention on this subject. The result? - confusion. Particularly for new entrants to the WebAssembly ecosystem. It is important to note that this change only impact the WASI linked applications. If you compile your application using the WASI-SDK but don’t use or link against WASI then you won’t see this issue at all. As a result I’ve personally seen teams reject WASI and simply create their own system interfaces, or use WAMR’s built in version. These teams simply deemed the WASI output impractical and too big. Even if you are familiar with WebAssembly, unless you’ve been watching each and every toolchain release and delving into every issue / PR liked to in the release notes, you might have missed what changed and why.
In this blog post I’m going to go through the change in a bit more detail and share tips with you on how to reduce the size of the resulting WASI based *.wasm
file, getting it down to a size which is in line with what you might expect for WASI-SDK releases prior to 23.
Getting Caught out by Toolchain Behavior - Identifying what changed
Unless you’ve setup a CI/CD pipeline which builds with the latest WASI-SDK you might have missed this change. Very few of us update our WASI-SDK toolchains regularly. I have to admit that I personally don’t. Like many of us, I set up a tool chain and continued to use it for a while, only really updating when needing to set up a new development environment. So, for me it came as a bit of a shock when the size of hello world jumped when I upgraded my SDK from19 to WASI-SDK 25 and saw the size of my simple hello world jump:
The example application was really simple:
// hello.c
#include <stdio.h>
int main(void) {
printf("hello world\n");
return 0;
}
Compiling with the traditional optimizations on (-O2
/ -O3
) didn’t make a difference it still produced a large size difference. So I knew something was up. But what changed and when? - To get to the bottom of this I build a script which would download and build hello world with -O3
with as many versions of the WASI-SDK as I could try. The results looked like this:
Which I converted into a chart:
Obviously something had changed in WASI-23, but what exactly? - At first glance nothing in the release notes for 23 stood out to me:
Has the code significantly changed?
My first concern was that the library size had jumped for some reason. I knew the change had occurred in 23, but I wanted to know what the latest WASI-SDK was doing. To do this I focused on the changes from 22, the last small binary, and 25 the latest SDK and largest binary. A really quick way to check the resulting code is to use the wasm2wat
tool. This is part of the WABT project. It will extract the code into a human readable format.
The resulting file sizes are very similar. Because these file sizes are so similar, it looked unlikely that a code change was the real culprit. So, before going any further I converted these files back to a *.wasm
file using the wat2wasm
tool. This allowed me to confidently see the compiled code size - it left me with the following:
If you look at the O3_hello_22_from_wat.wasm
and O3_hello_25_from_wat.wasm
files you will see that they are within 15 bytes of each other. This rules out code change as the root cause of the *.wasm
file size increases. It must, therefore be something else which is present in the *.wasm
files.
Introducing Sections
A *.wasm
file is made up of a set of sections, some of which are compulsory - like the code, and some of which are optional. The optional sections are referred to as “custom sections”.
There is tool called wasm-objdump
which is also part of WABT which can show us which sections are available in a *.wasm
file, we can run this on our 4 files, the original and wat2wasm
versions of our WASI-SDK 22 and 25 binaries.
WASI-SDK 22 Sections verses WASI-SDK 25 Sections
Interesting, as you can see there are a log more custom sections in the WASI-SDK 25, than in WASI-SDK 22. Since the size jump occurred in WASI-SDK 22 -> WASI-SDK 23, let’s double check what’s in WASI-SDK 23:
Checking the WAT2WASM files
The files we converted to WebAssembly’s human readable format (WAT) and then converted back to WASM were small, let’s see what they contained:
These small files contained only compulsory sections, all the other sections where removed.
Custom Sections Investigation
I know you, dear reader are thinking the same thing, “what do all these custom sections do ?”. A big thanks to Daniel Mangum who has a great blog describing the internal structure of a wasm file. This was really useful in understanding some of the custom sections.
- The
target_features
section was introduced in 2019 this provides additional information to allow linkers to correctly link binaries. - The
name
section is defined the WebAssembly specification and is intended to provide additional human readable information which can be used when debugging, or converting the Wasm code into human readable formats. - The
producers
section details the source language and compiler that produced the WebAssembly module - The
.debug_*
sections are a result of an effort to add DWARF debugging information to WebAssembly binaries.
As you can see WASI SDK 23 introduced DWARF like debug information, and indeed this is mentioned in PR #422 in WASI-SDK 22’s release notes - “Add DWARF debugging information to all artifacts by default”. So why is this debug information appearing by default, and what would you normally expect a compiler tool chain to do?
DWARF Debugging Information
It is important to understand developer expectations and how this behavior differs from typical expectations. To do this, let’s make a quick comparison between what happens with DWARF information when compiling for native binaries, and what happens when building *.wasm
binaries.
Native Experience with Debug Information
The DWARF debug information is typically only placed in a compiled binary when it is requested, it needs to be enabled by specifying -g
at the clang command line. We can compile the same hello.c
file to a native ELF binary using clang, then inspect what DWARF information is contained within it as follows:
As you can see, no DWARF information is included. If we would like to include debug information we can using the -g
switch as follows:
WebAssembly Experience with Debug Information
Without WASI
For the sake of simplicity, let’s start with a very simple no-wasi.c
file.
// no-wasi.c
// A simple c file which doesn't use WASI at all
#ifdef __wasm__
#define WASM_IMPORT(A, B) __attribute__((__import_module__((A)), __import_name__((B))))
#define WASM_EXPORT(A) __attribute__((export_name(A)))
#else
#define WASM_IMPORT(A, B)
#define WASM_EXPORT(A)
#endif
WASM_EXPORT("add") int add(int a, int b) {
return a+b;
}
This file contains a single function, and let’s compile that to WebAssembly using WASI-SDK 25 as follows:
It produces a very small *.wasm
binary with no debug information. Let’s see if enabling the debug information with the -g
switch works:
The file size increases and we do get debug information. It should be possible to explicitly ask clang to remove all debug information via a -g0
switch as follows:
This returns the file to its original 326 bytes.
With WASI
Now what happens if we use our hello.c
and link against the WASI-SDK ? - First let’s compile without debug information, so we’ll exclude the -g
flag:
Ah, this appears to include the debug information, even when we didn’t want it. Let’s try this again with the -g0
flag to turn this off:
This doesn’t appear to make any difference to the resulting binary. It appears to always contain debug information. The WASI SDK, in ignoring these switches is behaving abnormally.
How WASI-SDK Behaves
The latest versions of WASI-SDK are effectively turning on -g
and providing debug information and have removed the ability to provide a release like build from the command line. This only applies when you link against the WASI LibC , if you compile without the standard library, which is similar to just using the compiler without the “sysroot” then everything works as expected.
The Release Notes for WASI-SDK 22 - PR 422
The release notes for WASI-SDK 22 include PR 422.
In PR 422 explains that it turns on the -g
switch for all WASI generated artifacts, along with the following note:
The main downside to this is binary size of generated artifacts will, by default, be larger. Stripping debug information from an artifact though involves removing custom sections which is generally pretty easy to do through wasm tooling.
Removing Unwanted Custom Sections
There are three ways in which unwanted sections can be removed, one during compilation and two post compilation. Let’s cover the post compilation methods first.
Post compilation
After you’ve built your *.wasm
file you can strip the custom sections using these two methods:
wasm2wat
andwat2wasm
: This is discussed earlier in the article but, by extracting the code to human readable wat format, you eliminate the custom sections. You can then convert this back to a*.wasm
file usingwat2wasm
.wasm-strip
: This tool which is part of WABT, likewasm2wat
andwat2wasm
allows you to simply remove the custom sections in one pass.
The image above shows the impact of wasm-strip
removing the custom sections and reducing the *.wasm
file from nearly 50kb to under 4kb.
During Compilation
As I’ve already mentioned the -g
and -g0
compiler switches are ineffective with WASI-SDK, instead there is one linker command you can use to strip the custom sections, this is the -wl,--strip-all
or alternatively -wl,--strip-debug
.
Conclusion
The WASI-SDK is not following compiler tool chain convention, it is turning on debug output by default. This forces developers to remove the debug information explicitly using post compilation steps, or specific additional complication flags. For this, wasm-strip
is your friend.