VM(s) + Repository = MagLev
MagLev is a new Ruby VM with built-in object persistence. GemStone has recently released a public alpha of MagLev. I’m going to give a quick overview of how the VM and the object Repository work to serve up Ruby objects.
The Virtual Machine
MagLev is a Virtual Machine (VM) based implementation of Ruby. There are several other VM based Rubies, including Ruby 1.9.x, Rubinius and JRuby. A Ruby VM is a program that provides an environment to instantiate Ruby objects and invoke methods on them. The VM typically provides a host of other services: memory allocation, garbage collection, managing threads and other runtime structures. But without concrete objects to work on, there is little the VM can do on its own.
The Repository
So, where do objects come from? They are constructed by calling new, e.g.:
a_new_object = Object.new
But that begs the question of where all of the objects needed to execute that simple expression came from. Among the objects needed to execute a_new_object = Object.new are:
- the Object class
- Object‘s class, Class
- Kernel, which is mixed into Object
- Module, since Kernel‘s class is Module
- and other objects as well (e.g., MagLev has a pure ruby parser, which is made of a bunch of objects that also need a home).
While all Ruby VMs allow the creation of new objects, MagLev distinguishes itself from the other VMs in how it provides its pre-defined objects. For MagLev, all persistent objects (which includes all of the Ruby “core” classes and modules) are stored in something called the repository.
The default Ruby repository is just a large file: $MAGLEV_HOME/data/maglev/extent/extent0.ruby.dbf [1]. The repository contains all of the (persistent) objects, classes and methods known to the system at VM startup. All of the Ruby objects in the repository have previously been parsed and compiled from some .rb file and saved.
A MagLev VM starts with an empty object memory, and goes through a process similar to what the Java Virtual Machine (JVM) class loader does when it initializes its object memory. Every time there is a reference to an object, and that object is not already loaded into the VM, the VM will ask the repository for that object. If the repository has it, then it is pulled off disk, and put into VM object memory and becomes a live object.
Until an object is in the VM’s object memory, it is not active. However, not all objects that live in the VM’s memory are persistent. There are many objects that only live a brief time in the VM, and then are garbage collected. They never make it into the repository. E.g., the object a_new_object created previously, lives only in the VM’s memory, as we have not asked MagLev to save it to the repository. If the VM exits, then a_new_object will be lost. If you want a persistent object, you’ll need to save it (or its state) to some persistent store. For most Rubies, this means saving to an RDBMS via ActiveRecord, or some other ORM, or marshaling to the file system. MagLev provides the repository as a built-in alternative to store long-lived objects.
Saving an object to the repository is the topic of a future post, but here’s a preview:
Maglev::PERSISTENT_ROOT[:my_object] = Hash.new Maglev.commit_transaction
and then, from the same VM or a different one:
p Maglev::PERSISTENT_ROOT[:my_object] # => {}
For another example, see the hat trick
Bringing it together
So, when the MagLev VM runs some Ruby code, it pulls objects out of the repository puts them into memory, and then calls their methods (which were also brought out of the repository with the object [2]). Those methods may create more objects, which might be temporary objects or might be saved to the MagLev repository.
Shared Repository and the Stone
A MagLev repository can be shared by several VMs. This means that each VM connected to a given repository sees the same instances of persistent objects. If VM_1 and VM_2 are sharing a repository, then new objects, classes and methods created and saved by VM_1 will also be available to VM_2.
If a repository were used by only a single VM, then the VM could simply read and write repository data directly. Since MagLev repositories are shared, there needs to be coordination among the VMs to ensure the repository state remains consistent. MagLev VMs don’t directly access the repository, they go through a distributed cache, with coordination provided by the Stone. The Stone is a collection of processes that manage a given repository (the MagLev VMs are called “gems” and they talk to the “stone”, hence “GemStone”).
Among the services provided by the Stone processes are:
- transaction monitoring: coordination among all the VMs to maintain ACID access to the repository.
- disk-based garbage collection: clean up objects that are no longer referenced by any other persistent object in the repository.
- distributed cache: allows efficient sharing and access to the objects in the repository.
So, before you can successfully run Ruby code in MagLev, you need to start a Stone. You can start a Stone with rake:
$ cd $MAGLEV_HOME
$ rake maglev:start # starts a stone named "maglev"
$ rake
(in /Users/pmclain/GemStone/snapshots/MagLev-2009-11-30)
Status Version Owner Pid Port Started Type Name
------ --------- --------- ----- ----- ------------ ------ ----
OK 3.0.0 pmclain 50923 58892 Dec 02 09:25 Stone maglev
OK 3.0.0 pmclain 50924 58884 Dec 02 09:25 cache maglev@cairo.gemstone.com
The stone is a long-lived process. You can stop it with:
$ rake maglev:stop
(in /Users/pmclain/GemStone/snapshots/MagLev-2009-11-30)
stopstone[Info]: initiating "maglev" shutdown...
stopstone[Info]: Stone repository monitor "maglev" has been stopped.
When you type maglev-ruby, it starts a MagLev VM. If the VM cannot connect to the stone, it reports an error:
$ maglev-ruby -e 'puts "Hello"'
maglev-ruby: [Error] The MagLev server "maglev" is not running.
To fix this run "rake maglev:start" before running maglev-ruby.
You can think of the Stone as the analogue of mysqld, or the Oracle DB Server. The VM will not have any Ruby objects to work on, until it connects to the Stone, and gets objects from there.
Review
In MagLev, persistent objects, including the core Ruby objects, are stored on disk in a repository. A MagLev VM starts off with no Ruby objects, and must connect to a Stone to get its core Ruby objects. The VM even has to request the core objects, Object, Kernel, String from the Stone.
Once the required core objects are loaded into the VM memory, the Ruby code executes as normal. Newly created objects exist only in the memory of the VM that created them. To write objects out to the repository requires explicit action by the Ruby code (a topic for another post).
Footnotes
- [1]
- In GemStone, a repository is not limited to a file. It can be spread across a number of “extents”, allowing repositories to be larger than any single disk or filesystem. Extents may be files, or raw disk partitions or a mixture of the two.
- [2]
- MagLev stores methods in the repository as bytecode. When methods are executed for the first time in the VM, they are compiled from bytecode into native code for that machine. This allows MagLev to have a single repository that can serve VMs running on different architectures simultaneously.



Ah, the pieces are finally starting to come together – great post! A couple of questions:
Given that all of the core objects need to be loaded from the stone, what is the startup time like? (Yes, I haven’t build maglev on my own machine like). Is it a concern?
What is the protocol over which the VM and the stone communicate? I’m assuming that at startup you can specify the remote address of the stone?
Architecturally, are you limited to a single stone server? Or can I startup multiple on different nodes and then distribute my objects transparently between all of them? Failover, sharding? Oi, so many questions!
Ilya Grigorik
December 14, 2009 at 4:40 pm
You ask a lot of good questions, but comments isn’t the place to answer them fully. Quick answers to some of your questions:
Startup time depends on a number of things, including: is the repository local over remote? are there other VMs on the machine that have already loaded the objects you need (all VMs on a machine share an object cache), plus, you don’t have to load all of the core objects, just those you need. Your questions are worth a couple of blog entries, which I hope to get out soon. In the meantime, you can read the manuals from the 2.x GemStone VM here: manuals. Yes, you can specify a particular stone at VM startup (–stone stone_name). In the standard offering, a VM communicates with only one stone.
More to come soon…
pbmclain
December 14, 2009 at 10:33 pm
Ah, the manual will come in handy — will check it out, thanks! Looking forward to future blog posts, this is great stuff.
Ilya Grigorik
December 16, 2009 at 2:43 pm
[...] El siguiente es un blog de uno de los programadores de MagLev http://maglevity.wordpress.com/2009/12/03/vms-repository-maglev/#more-9 [...]
Ruby con persistencia transparente – MagLev « Administración del Conocimiento usando Smalltalk
December 21, 2009 at 11:04 am
[...] connect to the stone and retrieve all of their data from this service. Ruby classes are stored as bytecode in the stone server, which is transported via shared memory for local connections, and via optimized binary protocol [...]
Distributed Ruby with the MagLev VM - igvita.com
January 15, 2010 at 9:02 am