Maglevity

MagLev, Ruby, …

VM(s) + Repository = MagLev

with 5 comments

MagLev is a new Ruby VM with built-in object persistence. GemStone has recently released a public alpha of MagLev.  I’m going to give a quick overview of how the VM and the object Repository work to serve up Ruby objects.

The Virtual Machine

MagLev is a Virtual Machine (VM) based implementation of Ruby.  There are several other VM based Rubies, including Ruby 1.9.x, Rubinius and JRuby.  A Ruby VM is a program that provides an environment to instantiate Ruby objects and invoke methods on them.  The VM typically provides a host of other services: memory allocation, garbage collection, managing threads and other runtime structures. But without concrete objects to work on, there is little the VM can do on its own.

A VM with no objects.

The Repository

So, where do objects come from? They are constructed by calling new, e.g.:

  a_new_object = Object.new

But that begs the question of where all of the objects needed to execute that simple expression came from. Among the objects needed to execute a_new_object = Object.new are:

  • the Object class
  • Object‘s class, Class
  • Kernel, which is mixed into Object
  • Module, since Kernel‘s class is Module
  • and other objects as well (e.g., MagLev has a pure ruby parser, which is made of a bunch of objects that also need a home).

While all Ruby VMs allow the creation of new objects, MagLev distinguishes itself from the other VMs in how it provides its pre-defined objects. For MagLev, all persistent objects (which includes all of the Ruby “core” classes and modules) are stored in something called the repository.

The default Ruby repository is just a large file: $MAGLEV_HOME/data/maglev/extent/extent0.ruby.dbf [1]. The repository contains all of the (persistent) objects, classes and methods known to the system at VM startup. All of the Ruby objects in the repository have previously been parsed and compiled from some .rb file and saved.

VM Loads objects from Repository

A MagLev VM starts with an empty object memory, and goes through a process similar to what the Java Virtual Machine (JVM) class loader does when it initializes its object memory. Every time there is a reference to an object, and that object is not already loaded into the VM, the VM will ask the repository for that object. If the repository has it, then it is pulled off disk, and put into VM object memory and becomes a live object.

Until an object is in the VM’s object memory, it is not active. However, not all objects that live in the VM’s memory are persistent. There are many objects that only live a brief time in the VM, and then are garbage collected. They never make it into the repository. E.g., the object a_new_object created previously, lives only in the VM’s memory, as we have not asked MagLev to save it to the repository. If the VM exits, then a_new_object will be lost. If you want a persistent object, you’ll need to save it (or its state) to some persistent store. For most Rubies, this means saving to an RDBMS via ActiveRecord, or some other ORM, or marshaling to the file system. MagLev provides the repository as a built-in alternative to store long-lived objects.

Saving an object to the repository is the topic of a future post, but here’s a preview:

  Maglev::PERSISTENT_ROOT[:my_object] = Hash.new
  Maglev.commit_transaction

and then, from the same VM or a different one:

  p Maglev::PERSISTENT_ROOT[:my_object]   # => {}

For another example, see the hat trick

Bringing it together

So, when the MagLev VM runs some Ruby code, it pulls objects out of the repository puts them into memory, and then calls their methods (which were also brought out of the repository with the object [2]). Those methods may create more objects, which might be temporary objects or might be saved to the MagLev repository.

Shared Repository and the Stone

A MagLev repository can be shared by several VMs. This means that each VM connected to a given repository sees the same instances of persistent objects. If VM_1 and VM_2 are sharing a repository, then new objects, classes and methods created and saved by VM_1 will also be available to VM_2.

If a repository were used by only a single VM, then the VM could simply read and write repository data directly. Since MagLev repositories are shared, there needs to be coordination among the VMs to ensure the repository state remains consistent. MagLev VMs don’t directly access the repository, they go through a distributed cache, with coordination provided by the Stone. The Stone is a collection of processes that manage a given repository (the MagLev VMs are called “gems” and they talk to the “stone”, hence “GemStone”).

VM connects to Repository through Stone Processes.

Among the services provided by the Stone processes are:

  • transaction monitoring: coordination among all the VMs to maintain ACID access to the repository.
  • disk-based garbage collection: clean up objects that are no longer referenced by any other persistent object in the repository.
  • distributed cache: allows efficient sharing and access to the objects in the repository.

So, before you can successfully run Ruby code in MagLev, you need to start a Stone. You can start a Stone with rake:

    $ cd $MAGLEV_HOME
    $ rake maglev:start    # starts a stone named "maglev"
    $ rake
    (in /Users/pmclain/GemStone/snapshots/MagLev-2009-11-30)
    Status  Version    Owner    Pid   Port   Started     Type  Name
    ------ --------- --------- ----- ----- ------------ ------ ----
      OK   3.0.0     pmclain   50923 58892 Dec 02 09:25 Stone  maglev
      OK   3.0.0     pmclain   50924 58884 Dec 02 09:25 cache  maglev@cairo.gemstone.com

The stone is a long-lived process. You can stop it with:

    $ rake maglev:stop
    (in /Users/pmclain/GemStone/snapshots/MagLev-2009-11-30)
    stopstone[Info]: initiating "maglev" shutdown...
    stopstone[Info]: Stone repository monitor "maglev" has been stopped.

When you type maglev-ruby, it starts a MagLev VM. If the VM cannot connect to the stone, it reports an error:

    $ maglev-ruby -e 'puts "Hello"'
    maglev-ruby: [Error] The MagLev server "maglev" is not running.
    To fix this run "rake maglev:start" before running maglev-ruby.

You can think of the Stone as the analogue of mysqld, or the Oracle DB Server. The VM will not have any Ruby objects to work on, until it connects to the Stone, and gets objects from there.

Review

In MagLev, persistent objects, including the core Ruby objects, are stored on disk in a repository. A MagLev VM starts off with no Ruby objects, and must connect to a Stone to get its core Ruby objects. The VM even has to request the core objects, Object, Kernel, String from the Stone.

Once the required core objects are loaded into the VM memory, the Ruby code executes as normal. Newly created objects exist only in the memory of the VM that created them. To write objects out to the repository requires explicit action by the Ruby code (a topic for another post).

Footnotes

[1]
In GemStone, a repository is not limited to a file. It can be spread across a number of “extents”, allowing repositories to be larger than any single disk or filesystem. Extents may be files, or raw disk partitions or a mixture of the two.
[2]
MagLev stores methods in the repository as bytecode. When methods are executed for the first time in the VM, they are compiled from bytecode into native code for that machine. This allows MagLev to have a single repository that can serve VMs running on different architectures simultaneously.
About these ads

Written by maglevdevelopment

December 3, 2009 at 2:33 pm

Posted in MagLev

5 Responses

Subscribe to comments with RSS.

  1. Ah, the pieces are finally starting to come together – great post! A couple of questions:

    Given that all of the core objects need to be loaded from the stone, what is the startup time like? (Yes, I haven’t build maglev on my own machine like). Is it a concern?

    What is the protocol over which the VM and the stone communicate? I’m assuming that at startup you can specify the remote address of the stone?

    Architecturally, are you limited to a single stone server? Or can I startup multiple on different nodes and then distribute my objects transparently between all of them? Failover, sharding? Oi, so many questions!

    Ilya Grigorik

    December 14, 2009 at 4:40 pm

    • You ask a lot of good questions, but comments isn’t the place to answer them fully. Quick answers to some of your questions:

      Startup time depends on a number of things, including: is the repository local over remote? are there other VMs on the machine that have already loaded the objects you need (all VMs on a machine share an object cache), plus, you don’t have to load all of the core objects, just those you need. Your questions are worth a couple of blog entries, which I hope to get out soon. In the meantime, you can read the manuals from the 2.x GemStone VM here: manuals. Yes, you can specify a particular stone at VM startup (–stone stone_name). In the standard offering, a VM communicates with only one stone.

      More to come soon…

      pbmclain

      December 14, 2009 at 10:33 pm

      • Ah, the manual will come in handy — will check it out, thanks! Looking forward to future blog posts, this is great stuff.

        Ilya Grigorik

        December 16, 2009 at 2:43 pm

  2. […] connect to the stone and retrieve all of their data from this service. Ruby classes are stored as bytecode in the stone server, which is transported via shared memory for local connections, and via optimized binary protocol […]


Leave a Reply

Fill in your details below or click an icon to log in:

WordPress.com Logo

You are commenting using your WordPress.com account. Log Out / Change )

Twitter picture

You are commenting using your Twitter account. Log Out / Change )

Facebook photo

You are commenting using your Facebook account. Log Out / Change )

Google+ photo

You are commenting using your Google+ account. Log Out / Change )

Connecting to %s

Follow

Get every new post delivered to your Inbox.

%d bloggers like this: