Emu86: an x86 emulator for teaching assembler
The motivation for the project
Students in classes on system architecture, operating systems, C, and compilers should understand some assembler.
What do they have to do to do this?
Oh yeah, and students will have to know things like:
Oh, and when they want to debug, we'll have to teach them:
And these tools, parameters, etc, will be different depending on their laptop OS!
Solution?
Everyone uses a single, web-based interpreted assembly,
with built-in debugging!
Emu86!
Making a RESTful interpreter
To make things like code stepping work, we toss a "virtual machine" back and forth from sever to client to server.
A clean separation between the interpreter and the interface
A single entry point into the interpreter.
Given the separation of concerns between the interpreter and the interface, we can run the assembler from an iPython (or Python) shell today, by simply calling the assemble() function with the code we want to run.
Other interfaces
- We are working on making the interpreter an iPython kernel, so we can simply type assembly code straight into the shell.
- We could create an assembler microservice that allows any interface one wishes.
Implementation details
Use classes to reflect the language structure.
Domain-driven design
Make instructions classes.
Other types of tokens are subclassed from Token as well.
Our class hierarchy nicely reflects our domain.
This structure makes parsing simple: is the next token we get of the right class to be in that place? In a line of assembler code, the first item must be an instruction: so we can just check if the first token is an instance of Instruction, without worrying about what exact instruction it is. The following items must be operands... again, we can just check if they are instances of the high-level class Operand and not worry about their specific class. If we are trying to move a value, the first operand must be a location (we can't move into a constant!), and we just check if the token is an instance of Location.
These classes let us create the jump table quite easily.
We use this to execute instructions by calling the f()
method of whatever the current instruction is:
last_instr
= curr_instr[INSTR].f(curr_instr[OPS], gd)
Use exceptions to make jumps.
Using operators as functions.
This was very handy with arithmetic instructions:
The role of Django
- Right now, it is mostly used for its template capabilities.
- But down the road, we could allow users to store source code, as well as storing sample programs.
- What else might we store?
Adding Javascript enhancements
- Binary translation
- Error messages
- Code storage and loading
- Because of separation of concerns, the Javascript code need know nothing about the Python code.
The proof is in the pudding
Lessons
- Python is a great language for building "little languages."
- There is still "low-hanging fruit."
- Emu86 is now in use by ~300 students at NYU.
-
This is all open source: please feel free to join in
the fun!
Our repo: https://github.com/gcallah/Emu86
We still need:- A data section.
- System calls.
- Better stepping.
Sources
Emu graphic: https://upload.wikimedia.org/wikipedia/commons/thumb/c/c3/Tasmanian_Emu.jpg/440px-Tasmanian_Emu.jpg)