I should subtitle this post as “That’s not a pointer… This is a Pointer!” . In the previous two posts I showed how to import and export functions between Wasm and the host environment. This seems pretty safe, but can open up an attack vector which extern_ref shuts down. Let’s check that out…

The Attack

One of the most common attacks is host pointer, or reference manipulation. Let me explain. Let’s assume that you open a file with a call to fopen that would give you a file descriptor, which typically is just an integer. But an attacker could change number, then try to fread hoping that the person who implemented the fopen function won’t validate the file descriptor. In this attack, this could allow the attacking code to read files they’ve never opened.

The Mitigation - Introducing extern_ref

To mitigate this most code does a whole bunch of checking to ensure that the value you’ve got is the one you originally requested. This is ok, but it’s a lot of work. But, what if we could prevent the code that called fopen or our function from modifying the value we give it? - This is what extern_ref provides.

extern_ref is a new data type which cannot be modified by any code compiled to Wasm. It allows the host to supply a value to the Wasm code and the Wasm code can never change this. This is enforced by both the compiler and the runtime.

Using it can be a little tricky as there are not as many examples of this around. So in this post I’ll quickly cover how to use extern_ref and provide you with a C example. The code for this, you can find on GitHub here: woodsmc/wasm_examples: An repository of examples to follow withbighair’s blog. But before we go any further I’d better explain able tables.

Introducing Wasm Tables

WebAssembly is a Harvard machine, it’s not a Von Neumann machine. If you know what those are then that’ll make perfect sense. If you don’t I might as well be speaking Dutch. Note - I can’t speak Dutch - so here is the quick and dirty explanation.

A Trip Back in Time

Just after the second world war two researchers in the US started looking at how computers should be structured and two competing architectures appeared. The big question at the time was how should code and data be stored?.

Howard Aiken in Harvard University came up with the idea of storing the data separately from the code, meanwhile John Von Nuemann at the University of Pennsylvania promoted the idea that data and code should all be kept in the same memory store, and we could treat code as a special kind of data.

harvard-vn-compare

Von Nuemann’s ideas won out and most modern computers, certainly all PCs are Von Nuemann machines. This allows data and code to share the same space (RAM), which now allows us to do cool things like create pointers (which would normally point to data) and point them at code to create function pointers.

Back to Wasm

harvard-vn-compare

WebAssembly is not a Von Nuemann machine it is instead a Harvard machine. This means that it should not be possible to create pointers to functions, since those functions live in a completely different store to the data. C and hence most languages we use today use pointers to functions a lot, since they are all designed for a Von Neumann architecture.

The Hidden Table

To get around this limitation the WebAssembly folks create a hidden table, called the function table. It is a hidden construct which the developer usually doesn’t see. Essentially, all functions in a WebAssembly module are enumerated and placed into a table. Then rather than creating a pointer to a function the developer gets a pointer object (void* etc) which actually contains the index in the table that the function resides in. When that function pointer is dereferenced the compiler knows to invoke the correct function in the table.

You can see this when you print the address of functions:

#include <stdio.h>
void function_a(void) {
    printf("I'm function %s with address %p\n", __func__, &function_a);
}

void function_b(void) {
    printf("I'm function %s with address %p\n", __func__, &function_b);
}

int main(void) {
    printf("A quick function pointer example, compare this as a native or wasm based application\n");
    #ifdef __wasm__
    	printf("This is WebAssembly's version:\n");
    #else
    	printf("This is the native version:\n");
    #endif // __wasm__

    function_a();
    function_b();
    printf("thanks.... bye..\n");
    return 0;
}

Running this will produce the following:

Native Wasm

I’m function function_a with address 0x5617fdaf6150
I’m function function_b with address 0x5617fdaf6170

I’m function function_a with address 0x1
I’m function function_b with address 0x2

How Tables Relate to extern_ref

The goal is to have a data type that the Wasm code can not modify, but Wasm applications are going to need some way to store these values for later use. If the value was stored in regular memory something could come along and try to change it, and it would be very hard to prevent this from happening. Instead the extern_ref re-uses the table concept that Wasm had already introduced for function pointers. That concept is extended so that you can create tables of extern_ref . This is pretty handy as now when you store your extern_ref in a table it is protected by the compiler and the runtime which can control what operations happen on it… essentially keeping it safe.

harvard-vn-compare

Using extern_ref in C

Now we’ve got the principles out of the way, let’s have a go at using it. I’ll create a small example where our host calls our Wasm code and gives it a function pointer for safe keeping, and then at a later stage it asks for it back.

The Wasm C Code for extern_ref

LLVM and the clang compiler have introduced some new data types and builtin functions which allow us to handling an extern_ref and to create and deal with the table concept needed to store it. Probably the best way to describe this is to jump into it.

static __externref_t table[0];

WASM_EXPORT("set_ref") void set_ref(__externref_t r) {
  if ( __builtin_wasm_table_size(table) == 0 ) {
    __builtin_wasm_table_grow(table, r, 1);
  }
    
  __builtin_wasm_table_set(table, 0, r);
}

WASM_EXPORT("get_ref") __externref_t get_ref() {
  __externref_t retval = __builtin_wasm_table_get(table, 0);

  return retval;
}

You can see in the code above that we create a zero index array, ` table[0] this is the table where we'll store our extern_ref objects. Clang provides us with a set of functions to manipulate the tables so in the set_ref` function we check the size of the table, making sure that we’ve space for 1 entry. If we don’t we grow the table by 1 entry in size. Note that we need to have an instance of the type we’re storing in the function call. Then we store the value in the table.

The get_ref pulls the value back of the table and returns it to the caller.

There is a complete list of all of the Wasm builtins that are available to the developer available on the LLVM Github pages : llvm-project/clang/include/clang/Basic/BuiltinsWebAssembly.def at 81dcbefba3901545d3aef79f7030d45e81e798be · llvm/llvm-project

Compiling our Wasm

We can now compile our Wasm but we have to supply some additional command line arguments to enable extern ref. This is -mreference-types argument in the line below:

/opt/wasi-sdk/bin/clang -O3 -nostdlib -Wl,--no-entry -mreference-types -o extern_ref_example.wasm ./extern_ref_example.c

This is going to produce a extern_ref_example.wasm file which is about 283 bytes in size, again we can check to see what this actually produces by running it through the wasm2wat tool:

wasm2wat -o extern_ref_example.wat extern_ref_example.wasm

If you peak into the extern_ref_example.wat you’ll see that the Wasm module is now using some table instructions to store our extern_ref object:

(module
  (type (;0;) (func (param externref)))
  (type (;1;) (func (result externref)))
  (func $set_ref (type 0) (param externref)
    block  ;; label = @1
      table.size 0
      br_if 0 (;@1;)
      local.get 0
      i32.const 1
      table.grow 0
      drop
    end
    i32.const 0
    local.get 0
    table.set 0)
  (func $get_ref (type 1) (result externref)
    i32.const 0
    table.get 0)
  (table (;0;) 0 externref)
  (table (;1;) 1 1 funcref)
  (memory (;0;) 2)
  (global $__stack_pointer (mut i32) (i32.const 66560))
  (export "memory" (memory 0))
  (export "set_ref" (func $set_ref))
  (export "get_ref" (func $get_ref)))

Now we just need to update our WebAssembly host environment to pass and receive parameters as extern_ref

Dealing with extern_ref in the host

Again, setting up the host environment can be rather lengthy, to save the time the code to do this is available in the wasm_example project I created on github]. Dealing with extern_ref from the host side is actually rather simple - you just need to specify this as part of the return value or parameters of the function using the wasm_val_t type that we used in the previous blog post:

wasm_val_t parameters[] = {
    {.kind = WASM_EXTERNREF
    ,.of.ref = ptr}
};

The complete code for invoking the set_ref function then looks like this:

void invoke_set_ref(wasm_module_inst_t module_instance, wasm_exec_env_t exec_env, void* ptr) {
	const char* WASM_FUNCTION = "set_ref";
	printf("look up a function [%s]\n", WASM_FUNCTION);
	wasm_function_inst_t func = wasm_runtime_lookup_function(module_instance, WASM_FUNCTION);
	if (!func) {
		printf("couldn't find function [%s]\n", WASM_FUNCTION);
		return;
	}
	printf(">function found!\n");	
	wasm_val_t parameters[] = {
		{.kind = WASM_EXTERNREF
		,.of.ref = ptr}
	};

	printf(">calling function with parameters(%p)\n", parameters[0].of.ref);
	bool executed_ok = wasm_runtime_call_wasm_a(exec_env, func, 0, NULL, 1, &parameters[0]);
	if (!executed_ok) {
		printf("call failed!\n");
		return;
	}	
}

In the example code linked to above, I get the address of a function pass it to set_ref then request it back from get_ref and ensure that the pointer values are consistent before invoking the function.

	// The good stuff is here.... 
	TFunctionPtr ptr = function_a;

	printf("set %p\n", ptr);
	invoke_set_ref(module_inst, exec_env, ptr);
	TFunctionPtr ptrBack = invoke_get_ref(module_inst, exec_env);
	printf("get %p\n", ptrBack);
	if (ptr == ptrBack) {
		printf("These pointers match!\n");
	} else {
		printf("There was an error!!\n");
	}
	ptrBack();
	// the good stop stops here...

And that is pretty much it for extern_ref… at least for now. But it does open up the possibility to use these tables in our application code, either to store extern_ref or even to store func_refs.

You see, func_refs refer to the original concept of tables of function pointers. These now become programmable, which means we have an extra level of hidden indirection when we dereference a function pointer. - This could be a lot of fun, but that’s got to be a topic for another time.