We’re happy to announce that Codeship, the hosted Continuous Integration and Deployment platform, has built support for Continuous Deployment to dotCloud.

With Codeship you can test your code and deploy your GitHub and Bitbucket projects. Should your tests fail, Codeship will not deploy your application. Should all your tests pass, Codeship will automatically deploy your app in a matter of minutes.

Continuous Deployment to dotCloud with Codeship

All you need to deploy to dotCloud is your API token. Within 2 minutes you can configure the Codeship to deploy your app to dotCloud.

All you have to do is

  • Retrieve your dotCloud API token from your account page
  • Fill in the API token
  • Choose a name for your application

As soon as you’ve configured your deployment, the Codeship will deploy your application to dotCloud with every build. The dotCloud command line tool gets installed during deployment and is used to push your app to dotCloud.

Have a look at the videos to see a step-by-step introduction on how to set up the Codeship. Getting started is really easy. Go ahead and give Codeship a try!

If you like this news consider sharing!

How to continuously deploy a Django app from GitHub to dotCloud

How to continuously deploy a Django app from Bitbucket to dotCloud

About Codeship

The Codeship is a hosted Continuous Integration and Continuous Deployment platform. Be sure to check out the Codeship blog and follow them on twitter. You will learn about Software Testing, Continuous Integration and Continuous Deployment.

I am thrilled to be joining dotCloud as CEO and excited to be joining the talented, passionate and rapidly growing Docker community.

I started following dotCloud in 2011, when the standard PaaS model was to offer a single stack that ran on a single provider’s infrastructure. I was impressed by dotCloud’s vision of a multi-language PaaS, which offered developers a wide variety of different stacks that worked well together. In the process, dotCloud built a great business around public PaaS.

In the past two years, however, it has become clear that the industry has a set of opportunities that even the broadest-based public PaaS can’t address. Developers want to be able to build their applications using an unlimited set of stacks, and run those aps on any available hardware in any hardware environments. Operators both inside and outside of the enterprise want to be able to run applications seamlessly. Almost every enterprise wants its own PaaS-like environment.

In other words, the industry seems to want not just a multi-language PaaS, but a limitless-language, multi-environment, and multi-enterprise PaaS.

Clearly, this is beyond the capabilities of any one organization or solution to deliver. But, an ecosystem, with the right open source technology, can deliver this.

So, I was exceptionally impressed when, in March of this year, Solomon Hykes and the dotCloud team took the bold step of releasing much of their core technology as the open source project, Docker.  I’ve spent the past three months as an advisor to the Docker project, and have been consistently amazed by both the vision of the team, and by the incredible momentum and community that has built up behind Docker. I was so impressed, that I decided to come on board full time.

This is the new dotCloud/Docker vision of what PaaS (and software deployment in general) should be:

  1. Developers build their applications using their choice of any available services
  2. An application and its dependencies are packaged into a lightweight container
  3. Containerized applications run anywhere- a laptop, a VM, an OpenStack Cluster, the public cloud—without modification or delay

With Docker, developers can finally build once and run virtually anywhere. Operators can configure once, and run virtually anything.

We think this will have huge implications for a wide variety of use cases, from developers shipping code, to continuous integration, to web scale deployment and hybrid clouds. Indeed, most of the biggest trends in IT today (hybrid clouds, scale out architecture, big data) depend on making some version of this vision work.

The community seems to agree. In a little more than four months, we’ve gotten over 4,000 github stars, 30,000 pulls, over 100 significant contributors, and have seen huge numbers of applications getting “Dockerized”. Moreover, we’ve seen some of the largest web companies start to deploy Docker inside their environments. We’ve seen over 100 derivative projects built on top of Docker. And, our community has integrated Docker into key open source ecosystem projects like Chef, Puppet, Vagrant, Jenkins, and OpenStack.

So…why am I excited? I’ve been fortunate to build businesses at four successful startups (twice as CEO). I’ve learned there are few things as rewarding as joining a great team and community, using innovative and disruptive technology, and solving wide ranging and important problems. Combined with great investors, obvious momentum, a sound existing business, and some exciting new business models, I can’t imagine a better place to be than dotCloud and Docker.

With thanks to Solomon, the team at dotCloud, and the whole community, I look forward to the road ahead!

Read the full press release here.

The new dotCloud Sandbox with Docker

As announced, the dotCloud Sandbox has been sunset and we have been working on an open-source project which replicates the dotCloud builder. This project lets you develop and host your dotCloud applications anywhere.

We are releasing it today, and the community can now build, deploy, and run the dotCloud sandbox on top of Docker. The project is named Sandbox, and you can find it on GitHub.

Sandbox takes your application (and its dotcloud.yml) as input, and outputs a Docker image for each service that can be built. The resulting Docker images can be directly started in Docker.

Sandbox supports the full build pipeline: it takes your code, unpacks it into a Docker container, installs system packages and application dependencies, configures Supervisor, and generates the environment files. It has been designed to be extensible, so you can easily add support for new service types. Moreover, since it is using Docker, you are no longer limited to Ubuntu 10.04 LTS “Lucid Lynx”: you can build your apps on top of your favorite release of Debian or Ubuntu GNU/Linux.

Note, however, that Sandbox only knows how to build and run “code services”: databases are not implemented. Unlike the dotCloud platform, Sandbox doesn’t do any kind of orchestration; it just builds and runs individual services. Sandbox doesn’t know how to generate credentials for a database and inject them in the environment of another service. This means that the development workflow with Sandbox is a bit different from what you are used to on dotCloud. Sandbox gives you a build system, but you’ll have to deploy your databases and stateful services beforehand.

As an example of how to use this sandbox, you can check out the Flask/ZeroRPC example in the Sandbox repository. Here is the screencast hosted on ascii.io:

dotCloud Sandbox Screencast

When compared to the dotCloud platform, Sandbox has a more limited feature set. But contributing to Sandbox is easy; and if you want to be involved, here are some possible next steps:

  • add more services (right now only python, python-worker and the custom service are supported);
  • add a mechanism to select the base image to use to build a service automatically (this would lead to support for incremental builds and –clean flag like on the dotCloud CLI).

Development happens on https://github.com/dotcloud/sandbox and on the #docker IRC channel.

About Louis OpterJerome
Louis Opter is a Platform engineer at dotCloud. He is working with us since day 1 in 2009. He’s passionate about systems programming and specialized in Python. He likes to code while listening to music and is a Vietnamese martial arts enthusiast (Tay Son Vo Dao). 

Connect with Louis on Twitter! @1opter

Mo' data, mo' problems!

An example graph of the new memory metrics.
(Click to enlarge!)

A while ago, we published a detailed blog explaining How to Optimize the Memory Usage of Your Apps. There was a strong emphasis on metrics. Because knowing the amount of used and available RAM is not enough, and doesn’t cut it when you’re trying to assess whether or not your apps need more memory.

With this in mind, we just released a new version of the dotCloud Dashboard. The new dashboard exposes more detailed memory metrics. You will now see that the memory allocated to your app is split in 4 parts: Resident Set Size, Active Page Cache, Inactive Page Cache, Unused Memory. Let’s review what they mean for your apps.

Resident Set Size

That’s essentially the memory used by processes when they malloc() or do anonymous mmap(). This memory is inelastic: it will amount to exactly what your app has been asking for, no more, no less. If your app asks for more than what is available, it will be restarted. If the memory usage was due to a leak or to the occasional odd request, restarting the app will get it back on track. However, if your app constantly needs more of this kind of memory than what is available, it will constantly be restarted, and it will appear to be unstable.

We detect out-of-memory conditions, and we report them to you: we send e-mail notifications, and we record them to display them on the dashboard. When you receive those notifications, you should take them very seriously, and scale up your app — or audit your code to reduce your memory footprint.

On the new memory graph, the resident set size is drawn in solid dark blue. It’s the baseline of your memory usage, and you should not scale your memory below that amount.

Active and Inactive Page Cache

When your app reads and write from disk, data never goes directly into the application buffers. It transits through the system’s buffer cache or page cache. It stays here for a while, so that if you request the same data again some time later, it will be available immediately, without performing actual disk I/O. Likewise, when you write something, it transits to the same buffer cache; this lets the system perform some optimizations regarding the order in which writes should be committed to disk.

The page cache is elastic: when you run out of memory, the system will happily discard it (since the cached data can be re-read anytime from the disk), or commit it to disk (in the case of cached writes). Conversely, if you havetons of memory, the system will happily retain as much as it can in the cache; which can lead to absurdly high memory uses for seemingly trivial apps. Typical example: a tiny HTTP server, handling requests for 10 MB of content, and using a few GB of page cache. How? Why? Well, because it’s also logging requests, and the log happens to be on disk. And Linux will keep the log in memory as well — if memory is available. Of course, if at some point you need the memory, Linux will free it up instantly. But meanwhile, if you look at your usage graphs, you will see the big memory usage.

On Linux, the page cache is split in two different pools: active and inactive. As the name implies, the active pool contains data that has been accessed recently, while the inactive pool contains data that is accessed less frequently. To make an informed scaling decision, it is important to understand how “active” and “inactive” really work under the hood. The memory is divided in pages, which are blocks of 4 KB. A given page of the buffer cache will start its existence (when it is loaded from the disk) as an active page. When an inactive page is accessed, it gets moved to the active pool. That part is easy! Now, when does an active page get move to the inactive pool? This doesn’t happen out of “old age” (i.e., a page being left untouched for a while). It happens when the active pool becomes bigger than the inactive pool! When there are more active pages than inactive ones, the kernel scans the active pages, and demotes a few of them to the inactive pool. Some time later, if there are still more active than inactive pages, it will do it again. It will go on until the balance is restored. However, at the same time, your app is running, and accessing memory; potentially moving inactive pages back to the active pool.

What does it mean? The bottom-line is the following: you should look at the active:inactive ratio. If this ratio is big (e.g. 200 MB of active memory vs. 20 MB of inactive memory), it means that the system is under heavy pressure. It’s constantly moving pages from active to inactive (to meet the 1:1 ratio), but the activity of your app is constantly moving pages back from inactive to active. In that case, it would be wise to scale verticaly, to achieve better I/O performance (since more data will fit in the cache). As you add more memory, the ratio will lower, and get closer to 1:1. A ratio of 1:1 (or even lower) means that the system is at equilibrium: it has moved all it could to inactive memory, and there was no strong pressure to put things back into active memory. You want to get close to this ratio (at least if you need good I/O performance).

On the new dashboard, active and inactive memory pools are shown in respectively medium-blue and light-blue shades, to highlight the fact that they are still important, but less than the (darker) resident set size.

Free Memory

Well, that one at least doesn’t deserve a long, technical explanation! If the metrics show that your app consistently has a leeway of free memory, you can definitely consider scaling down by that amount.

Warning: even if it’s often said that “free RAM is wasted RAM”, be wary of spikes! Take, for instance, a 1 GB Java app, which constantly shows 200 MB of Free Memory. Before scaling down to 800 MB, make sure that it is not experiencing occasional spikes that consumes that Free Memory! If you scale down, your app will be out of memory during the spikes, and will most likely crash. Also, remember that the long-term graphs (like the 7-days and 30-days trends) show average values; meaning that short bursts will not show up on those graphs. The metrics sample rate is 1 data point per minute; and that’s about the resolution that you can get on the 1-hour and 6-hours graphs. This means that unfortunately, short spikes (less than one minute) won’t appear on any graph.

On the new dashboard, the free memory in shown in light grey.

Putting It All Together

This is a lot of new information, but the new dashboard should make it very easy for you to figure out the appropriate vertical scaling for your application.

  • For code services, make sure that the Resident Set Size (dark blue) never maxes out the available memory. If it gets close to it, you should add more memory before you receive out-of-memory notifications. Conversely, do not hesitate to cut through the Free Memory and the Inactive Page Cache (grey and light blue areas). The Page Cache will typically be small compared to the Resident Set Size.
  • For database services (and static services), the previous rule applies as well, but the Page Cache (both Active and Inactive) will very likely be much bigger, and you will have to pay attention to that, too. As a rule of thumb, compare the Active and Inactive amounts during peak times. If Active is bigger than Inactive, your memory usage is close to being optimal. If they are equivalent (or if Inactive if larger), it means that you can scale down a little bit. This should be an iterative process: scale down, wait for memory usage to stabilize, check again, and repeat until the Active pool starts being larger.

We hope that the new dashboard can help you to make informed scaling decision, and cut down significantly on your dotCloud bill!

 

Dear dotCloud Customers,

We are going open-source.

It has been a wild week for dotCloud. Of course as we prepared to open-source Docker, the container technology that powers the platform, we hoped it would be well received, like ZeroRPC and Hipache before it. But nothing could have prepared us to the magnitude of the response. Now, 6 days, 50,000 visits, 1000 github follows and 300 pull requests later… we think we get the message. You want an open-source dotCloud – and we’re going to give it to you.

Today, as the first step in our new open-source strategy, we are announcing an important change to our free Sandbox. In the coming weeks we will hand it over to the community as an open-source project which can be deployed and hosted anywhere. As part of this transition we will be sunsetting our free hosting tier – see below for details. The resources freed by this transition will be re-invested in our open-source roadmap.

I want to emphasize that this transition does not affect our Live and Enterprise flavors, and it does not change our business model. Our core competency is and will continue to be the operation and support of large-scale cloud services, for tens of millions of visitors, 24 hours a day, every day. We intend to continue expanding that business, and we believe the best way to do that is by embracing open-source.

1. Going open source

Our approach to open-source is simple: solve fundamental problems, one at a time, with the simplest possible tool. The result is a collection of components which can be used separately, or combined to solve increasingly large problems.

So far dotCloud’s open-source toolbox includes:

  • ZeroRPC, a communication layer for distributed services;
  • Hipache, a routing layer for HTTP and Websockets traffic;
  • Stack.io, a communication framework for real-time web applications
  • Docker, a runtime for linux containers.
  • Recipes for automatically deploying NodeJS, Django, Memcache and dozens of other software components as cloud services.

All these components are already available, and the open-source community is using them to build alternative implementations of dotCloud’s development sandbox. We want to make that even easier by open-sourcing the remaining proprietary components – including our uploader, build system, database components, application server configuration, and more.

To learn more about future open-source announcements, follow the Docker repository and join the Docker mailing list.

 

2. Sunsetting the hosted sandbox

In order to properly focus resources on our ongoing open-source effort, we will be phasing out the hosted version of the free Sandbox. Going forward, the recommended way to kick the tires on dotCloud will be to deploy a Live dotCloud application. For your existing Sandbox applications, we can provide an easy upgrade. If you don’t feel ready to pay us quite yet, take a look at what the community is building.

Below is a calendar of the sunset. As usual, our support and ops team will be happy to assist you in every way we can during the transition.

 

Date Change to Sandbox
April 8th (no change)
April 22nd All Sandbox applications will be unreachable via HTTP. You can still access them via SSH to download your code and data.
April 25th All Sandbox applications will be destroyed.

Note that we’ve pushed-out the sunset dates since first posting this blog. We’ve removed the ‘no push’ week of April 8 and extended HTTP access to the 22nd. 

How to Graduate from the Sandbox

We’ve made it easy for you to change your Sandbox application to a Live flavor if you want to keep it running on the dotCloud platform:

  1. add your billing information to your account and
  2. file a support ticket telling us which applications to migrate. Please use your account email address and give the full URLs to the applications.
  3. We’ll do the rest.

If you don’t want to move to a paid service, you can use several techniques to download your data and files before they are destroyed.

For those of you who have been using the Sandbox as staging for paid applications, we’re sorry for the inconvenience. We hope our hourly billing will help keep your staging and testing costs down, and that developing in a paid service will ease testing related to scaling.

Looking Back, Looking Forward

We want to thank you, our sandbox users, for trying out the dotCloud platform. We hope that you will enjoy experimenting with our open-source version, discovering the awesome features of our Live flavor, or both!

We look forward to helping you be the most awesome and productive developers out there.

Happy hacking!

/Solomon Hykes

CEO, dotCloud

This ebook from dotCloud about DIY PaaS is very interesting!” @nmerouze

Cover

Developers are always asking about the technologies that power dotCloud, partly because they are either in the middle of choosing a PaaS provider or contemplating building their own PaaS.

  • What technology stack does dotCloud use?
  • How is application isolation accomplished?
  • How does the platform handle data isolation?
  • How does dotCloud provide security and resiliency?
  • Is dotCloud an open source project?

We decided to write a series of blog posts to expose the essential technologies under the hood.  The first of five blog posts in the PaaS Under the Hood series have been compiled into one eBook, register for your own copy.

Overview of the eBook

Just to give you a sense of the complexity, tens of thousands of apps are constantly deployed and migrated onto the dotCloud platform. Every minute, millions of metrics are collected, aggregated and analyzed, and millions of HTTP requests are routed through the platform. We are lifting the covers off the hood and will be showing you the essential technologies such as Light Weight Virtualization (LXC), cgroups and other specific Linux kernel technologies.  We will also discuss our open source Hipache project which is the distributed proxy that powers dotCloud’s routing layer.

Happy reading and keep on hacking.

ClubbingOwl_COlogo_purple

Things in nightlife are very subjective because it is a business based off of people first, and products (alcohol) come second, so it is hard to build an algorithm to replicate the job of an operator or doorman as far as reservations via a website go” @NYNightLife

The Bar and Nightclubs industry is a $23Bn fragmented industry with high turnover. IBISWorld’s Bars & Nightclubs market research reported that there are approximately 65,774 family-owned and operated businesses in US, with 98% of them employing fewer than 50 employees. The competition for clientele is extremely keen, especially with high concentrations of clubs in metropolitan cities.

It is tougher for nightclub owners than restaurant owners to turn a profit as there are fewer hours of operations and fewer days of operations per year. To add to the problem, nightclub clientele tend to occupy tables until closing and /or occupy their tables longer than dining in restaurants which means an empty table is lost revenue.

According to Chef’s Blade, there are many fixed costs that nightclub owners cannot change such as rent, equipment, insurance, inventory, payroll, and others. Clubbing Owl aims to provide   a full suite of venue management and outbound marketing software to nightclub owners so that they can positively impact cash flow.

CheckIn_largeUnlike other traditional club management software that serves back office, Clubbing Owl is designed to serve 3 communities – club-goers, nightclub owners, and promoters.

For club-goers, Clubbing Owl’s platform can confirm guest admissions through SMS text messaging. The system is integrated in real time with guest list management so that no guests is ever turned away at the door. The integration with Facebook allows Clubbing Owl to update the club-goers’ Facebook status once they have been confirmed. The status updates not only let their friends know about the clubs they frequent but also allow club owners and promoters to tap into their guests’ network of friends.

For promoters, Clubbing Owl can help promoters with their guest list management. Promoters can send SMS communication to club-goers as soon as they are confirmed on the guest list.

For nightclub owners, Clubbing Owl provides live chat so that the entire staff and extended team of promoters can communicate in real time using smartphones and tablets. Clubbing Owl’s Host Check-in app is also synchronized with guest confirmation.

Continue reading

PyCon 2013 March 13-21 in Santa Clara, CA

We couldn’t be more excited for PyCon 2013!

If you’re new to PyCon we suggest catching up with PyDanny’s Beginner’s Guide to PyCon. It’s a 4-part series but his most recent posts cover the actual conference days. Here’s his guide to Friday and Saturday events.

Open Spaces

Open spaces are a way for people to come together to talk about topics, ideas or whatever they want. There’s a board by the registration booth where you can schedule an open space.

The open space schedule is like an un-conference. Anyone can suggest a topic, claim a room and dazzle attendees. You can find the tentative schedule here.

Team dotCloud will have our own open space on Saturday night from 9-10pm in room 202 to showcase “Buildbot on dotCloud”.

Buildbot on dotCloud

Continuous integration and testing is critical to application performance. At dotCloud, we have implemented Buildbot at a large scale to make our platform more reliable and robust. You should too.
In this open session, dotCloud engineer Daniel Mizyrycki will share how we implemented Buildbot at a large scale within dotCloud. He will also show how to easily integrate Buildbot as a service on dotCloud.
Ultimately we will open-source this project on GitHub.

Plus a special Lightning Talk….

We have been working on something big here at dotCloud and we can’t wait to unveil it at a Lightning talk session during PyCon.

Stop by booth #157 for more details and to meet with our engineers.

logo

dotCloud gave us the solid and flexible abstraction layer we needed to get our business off the ground. We went from a working prototype to a professional product in just under four months.” Brian Schwartz, Founder & CEO

Titan modernizes traditional IT procurement by taking the process online through a competitive marketplace similar to what NASDAQ has done for institutional trading by providing a view into market participants’ quotation activity. By enabling the most qualified local suppliers to compete for IT projects nationwide, companies can now get the highest quality IT services for the best possible price. The company name comes from the Greek gods. Titan represents incredible strength, and one can’t spell Titan without “IT”.

The founders of Titan have a deep background in private equity and enterprise IT procurement. Since most companies lack expertise in technology, large IT purchases are often made with insufficient information, which results in non-competitive pricing or overpaying. This is in stark contrast to financial trading transactions where market efficiency prevents sellers from charging more than fair market price.

Envision a restaurant chain looking to procure software to optimize their inventory management operations. Generic software isn’t going to bring enough efficiency to their unique operational structure and hiring a local supplier for ongoing customizations to make it fit is an expensive and endless endeavor. Alternatively, a global supplier can build custom software to the exact specifications, software that will scale as the operations grow more complex, but add a prohibitive premium over development costs.

Titan eliminates this tradeoff between quality and price by allowing companies to bid out their IT projects in a competition where only the qualified suppliers can participate. As a result, companies can get high quality IT resources that meet their requirements and at the fair market price.

How Titan Works

Titan lets companies post their IT project along with any requirements, and then qualified suppliers have 2 weeks to submit competitive bids. Once the winning bid is chosen, Titan processes the work contract and handles milestone payments through their escrow service. It’s a win-win for buyers and suppliers, and is 100% free to use.

titan_product_screenshotTitan specializes in the supplier due-diligence process, which is particularly important to get right when it comes to large IT projects. In order to bid, suppliers must prove they are both financially secure and qualified for the particular project. Titan’s qualification process includes performing annual business credit checks, interviewing the team, and verifying past work references. For example, when a supplier makes a claim on Titan to have worked with a Fortune 500 client, this means that Titan has already verified this claim with positive references provided by the supplier.

Continue reading

SCALE 11x happened two weeks ago in Los Angeles. There were so many very interesting talks, that in multiple occasions, I had a hard time deciding which one to attend! I cannot summarize them all, so I decided to pick a few.

Is PostgreSQL Web Scale?

The actual title of the talk was “Unstructured Data in PostgreSQL”. Christophe Pettus gave a very good overview of the “schema-less” features of PostgreSQL, including its ability to handle JSON, XML data, and the hstore data type. PostgreSQL support for JSON might sound very basic, because it doesn’t have “native” indexing. For instance, if you put the following JSON document in a author column:
{ “name”: “Jerome”, “company”: “dotCloud” }

You won’t be able to select the row by referencing the name or company attribute. Unless! Unless you specifically create an index, using a PL/V8 function to extract that attribute (and index according to it) each time the document is updated. It turns out that this is very easy to do!

I mentioned PL/V8: if you’re familiar with PostgreSQL, you probably know that PL/ is the prefix for the languages “embedded” within PostgreSQL. E.g. PL/Python lets you execute Python code manipulating your data within the PostgreSQL server itself. And if you’re familiar with the Javascript world, you probably know that V8 is Google’s Javascript engine, featured (among others) in Chrome, Node.js, and now PostgreSQL. You can e.g. store JSON data in a column, and use triggers to update other columns or tables when the JSON data is modified.

Christophe Pettus did some performance testing on PostgreSQL with JSON, comparing it with MongoDB. The results were indeed very curious: in almost all the tests, PostgreSQL vastly outperformed MongoDB! This doesn’t mean that everybody should ditch MongoDB and switch to PostgreSQL. It just shows that if you are familiar with PostgreSQL, and need a NoSQL database for its schemaless properties (rather than its sharding characteristics), it’s fine to use PostgreSQL.
If you didn’t get the “Web Scale” question, and enjoy a good cup of controversy with your cookie, have look at the humoristic video “MongoDB is Web Scale”…

File bug reports—like a boss!

Jon Erdman gave a talk titled “Submitting a proper Postgres bug report”, and I dare saying that his talk was not Postgres-specific at all. Allow me to make a very bold statement: the PostgreSQL people are probably among the most demanding, quality-obsessed developers in the whole Open Source community.

Jon told us all the things that you need to gather when submitting a bug report; he explained why those things were important; and how to gather them. Some of those items were very generic (“provide a simplified test case showing how to reproduce the problem”), others were more technical (“if there is a crash, obtain a core dump and provide a stack trace”).

If you apply the same guidelines to your projects, I can guarantee that your developers will waste less time and be way more efficient when dealing with bugs, and that your product quality will increase as a result!

Advancements in Open Virtualization with KVM

Mike Day, from IBM, gave a nice overview of the progress of KVM. The talk was quite interesting because it dealt with technical as well as business aspects. According to Mike, IBM is backing KVM with full force. My interpretation is that they need a virtualization solution to compete with, say, VMWare; and KVM perfectly fits the bill.

I tend to prefer Xen for server virtualization, but I reckon that KVM has improved tremendously over those last years, and while Xen is still a perfectly good option, if I had to pick one of them tomorrow, I might go with KVM this time.

Containers Checkpoint and Restore in Userland

I was eager to listen to that talk, and I wasn’t disappointed! We use containers at dotCloud; they are a key part of our infrastructure. “Checkpoint and Restore” means that you can save a container (just like you can save a virtual machine, and restart it later, in the exact same state).

This is an interesting story. A while ago, a bunch of skilled developers forked the Linux kernel and implemented OpenVZ, a process isolation system. It was very similar to FreeBSD Jails or Solaris Zones. It implemented a lot of nifty features, including “Live Migration”. Imagine that you are logged into a server (with SSH). Suddenly, the SSH session freezes for a few seconds—what’s happening? Ah, it’s back to normal. It was probably a problem with the network connection. No! In fact, during those few seconds, the whole server (all its processes!) were moved from a machine to another. The state of the connections was kept as well; that’s why your SSH session froze a few seconds (while the state was transferred) but didn’t crash.

Sounds cool, right? Very cool indeed. But it involved a lot of deep patching in the Linux kernel. And Linux kernel maintainers don’t like deep patching, because you can never know what kind of weird and seemingly-unrelated bug you could introduce with those patches. So they Just Said No to the inclusion of OpenVZ into the kernel.

OpenVZ developers are not rancorous people. Instead of arguing, fighting, trolling, and whatnot, they decided to work on LXC instead. LXC brings most of the features of OpenVZ to the kernel, but one little bit at a time, in a less intrusive way. However, “Checkpoint and Restore” was still missing. Since it was so hard to do it in such a way that would satisfy the very stringent Linux kernel development guidelines, they decided to implement it in userland (i.e., as regular processes, as opposed to privileged code running inside the kernel). Checkpoint and Restore In Userland (CRIU) was born!

The trick was to isolate the code which could really run in userland, and to figure out which “backdoors” (or “entry points” if you prefer that) were needed. Basically, you need to be able to access some internal state (like file descriptors) when doing the Checkpoint operation; and to set back this internal state when doing Restore.

Good news, everyone: they made it. They isolated all those features in a relatively small number of patches (~100), which were eventually accepted into the “vanilla” kernel. Those features were accepted not only because they were smaller and easier to study (than a huge, tentacular patch), but also because they could have other purposes than just Checkpoint and Restore for Containers.

At the end of his presentation, Kirill Kolyshkin told us that it was possible to Checkpoint, Restore, and Migrate containers, provided that you have a post-3.7 kernel. His team did a lot of extensive testing (like saving and restoring a MongoDB server while it was performing some kind of benchmark), but now they need more people to expose the occasional corner case bug that might be creeping around.

We will definitely investigate that at dotCloud, because it brings very exciting possibilities (e.g. moving a service from a machine to another without restarting the processes).

Lightweight Virtualization with LXC containers

That’s my talk! The idea came while I was writing the first episodes of “PAAS Under The Hood”, a series of blog posts giving in-depth technical details about Linux Containers and how we use them at dotCloud. If you are the kind of people who want to know how things work before entrusting them with anything important, and if you want to dive into Linux Containers, I bet (or at least, I sincerely hope!) that you will find those posts interesting! If you want to skim quickly over the material, you can also check the slides of my talk below:

The slides for the other mentioned talks should be online soon on the conference’s website.

About Jérôme Petazzoni
JeromeJérôme is a senior engineer at dotCloud, where he rotates between Ops, Support and Evangelist duties and has earned the nickname of “master Yoda”. In a previous life he built and operated large scale Xen hosting back when EC2 was just the name of a plane, supervized the deployment of fiber interconnects through the French subway, built a specialized GIS to visualize fiber infrastructure, specialized in commando deployments of large-scale computer systems in bandwidth-constrained environments such as conference centers, and various other feats of technical wizardry. He cares for the servers powering dotCloud, helps our users feel at home on the platform, and documents the many ways to use dotCloud in articles, tutorials and sample applications. He’s also an avid dotCloud power user who has deployed just about anything on dotCloud – look for one of his many custom services on our Github repository.
Connect with Jérôme on Twitter! @jpetazzo