Emu86: an x86 emulator for teaching assembler

Gene Callahan
NYU Tandon School of Engineering

The motivation for the project

Students in classes on system architecture, operating systems, C, and compilers should understand some assembler.

What do they have to do to do this?
Assembling and linking on a Mac.
(Source: http://www.idryman.org/blog/2014/12/02/writing-64-bit-assembly-on-mac-os-x/)

Oh yeah, and students will have to know things like:

Setting the stack size
Oh, and when they want to debug, we'll have to teach them:
A typical disassembler

And these tools, parameters, etc, will be different depending on their laptop OS!

Solution?

Everyone uses a single, web-based interpreted assembly, with built-in debugging!
Emu86!

Making a RESTful interpreter

To make things like code stepping work, we toss a "virtual machine" back and forth from sever to client to server.

A clean separation between the interpreter and the interface
A single entry point into the interpreter.
An iPython session

Given the separation of concerns between the interpreter and the interface, we can run the assembler from an iPython (or Python) shell today, by simply calling the assemble() function with the code we want to run.

Other interfaces
  • We are working on making the interpreter an iPython kernel, so we can simply type assembly code straight into the shell.
  • We could create an assembler microservice that allows any interface one wishes.
Implementation details
Use classes to reflect the language structure.

Domain-driven design

The base class for all tokens (instructions, registers, numbers, etc.
Make instructions classes.
The base class for all instructions.
Other types of tokens are subclassed from Token as well.
The classes for all things addressable.
(Registers and Symbols are locations, but not addresses.)
Our class hierarchy nicely reflects our domain.

This structure makes parsing simple: is the next token we get of the right class to be in that place? In a line of assembler code, the first item must be an instruction: so we can just check if the first token is an instance of Instruction, without worrying about what exact instruction it is. The following items must be operands... again, we can just check if they are instances of the high-level class Operand and not worry about their specific class. If we are trying to move a value, the first operand must be a location (we can't move into a constant!), and we just check if the token is an instance of Location.

These classes let us create the jump table quite easily.
The first few lines of the jump table.

We use this to execute instructions by calling the f() method of whatever the current instruction is:
last_instr = curr_instr[INSTR].f(curr_instr[OPS], gd)

Use exceptions to make jumps.
How to jump with exceptions
Using operators as functions.

This was very handy with arithmetic instructions:

Using the functional version of operators.
The role of Django
Can he help more?
  • Right now, it is mostly used for its template capabilities.
  • But down the road, we could allow users to store source code, as well as storing sample programs.
  • What else might we store?
Adding Javascript enhancements
  • Binary translation
  • Error messages
  • Code storage and loading
  • Because of separation of concerns, the Javascript code need know nothing about the Python code.
The proof is in the pudding
The pudding of which we speak

Lessons

  • Python is a great language for building "little languages."
  • There is still "low-hanging fruit."
  • Emu86 is now in use by ~300 students at NYU.
  • This is all open source: please feel free to join in the fun!
    Our repo: https://github.com/gcallah/Emu86
    We still need:
    • A data section.
    • System calls.
    • Better stepping.

Sources

Emu graphic: https://upload.wikimedia.org/wikipedia/commons/thumb/c/c3/Tasmanian_Emu.jpg/440px-Tasmanian_Emu.jpg)