Emu86: an x86 emulator for teaching assembler

Gene Callahan
NYU Tandon School of Engineering

The motivation for the project

Students in classes on system architecture, operating systems, C, and compilers should understand some assembler.

What do they have to do to do this?

Assembling and linking on a Mac.
(Source: http://www.idryman.org/blog/2014/12/02/writing-64-bit-assembly-on-mac-os-x/)

Oh yeah, and students will have to know things like:

Oh, and when they want to debug, we'll have to teach them:

And these tools, parameters, etc, will be different depending on their laptop OS!

Solution?

Everyone uses a single, web-based interpreted assembly, with built-in debugging!
Emu86!

Making a RESTful interpreter

To make things like code stepping work, we toss a "virtual machine" back and forth from sever to client to server.

A clean separation between the interpreter and the interface

A single entry point into the interpreter.

Given the separation of concerns between the interpreter and the interface, we can run the assembler from an iPython (or Python) shell today, by simply calling the assemble() function with the code we want to run.

Other interfaces

We are working on making the interpreter an iPython kernel, so we can simply type assembly code straight into the shell.
We could create an assembler microservice that allows any interface one wishes.

Implementation details

Use classes to reflect the language structure.

Domain-driven design

The base class for all tokens (instructions, registers, numbers, etc.

Make instructions classes.

Other types of tokens are subclassed from Token as well.

The classes for all things addressable.
(Registers and Symbols are locations, but not addresses.)

Our class hierarchy nicely reflects our domain.

This structure makes parsing simple: is the next token we get of the right class to be in that place? In a line of assembler code, the first item must be an instruction: so we can just check if the first token is an instance of Instruction, without worrying about what exact instruction it is. The following items must be operands... again, we can just check if they are instances of the high-level class Operand and not worry about their specific class. If we are trying to move a value, the first operand must be a location (we can't move into a constant!), and we just check if the token is an instance of Location.

These classes let us create the jump table quite easily.

We use this to execute instructions by calling the f() method of whatever the current instruction is:
last_instr = curr_instr[INSTR].f(curr_instr[OPS], gd)

Use exceptions to make jumps.

Using operators as functions.

This was very handy with arithmetic instructions:

Using the functional version of operators.

The role of Django

Right now, it is mostly used for its template capabilities.
But down the road, we could allow users to store source code, as well as storing sample programs.
What else might we store?

Adding Javascript enhancements

Binary translation
Error messages
Code storage and loading
Because of separation of concerns, the Javascript code need know nothing about the Python code.

The proof is in the pudding

Lessons

Python is a great language for building "little languages."
There is still "low-hanging fruit."
Emu86 is now in use by ~300 students at NYU.
This is all open source: please feel free to join in the fun!
Our repo: https://github.com/gcallah/Emu86
We still need:
- A data section.
- System calls.
- Better stepping.

Sources

Emu graphic: https://upload.wikimedia.org/wikipedia/commons/thumb/c/c3/Tasmanian_Emu.jpg/440px-Tasmanian_Emu.jpg)