Kyo Lee

Open-Source Cloud Blog

Tag: cloud

A Developer’s Story on How Eucalyptus Saved the Day

This is a short story on how a UI developer at Eucalyptus was able to use Eucalyptus to save his time on development of Eucalyptus.

travis-mascot-200px

The agenda of the day is to set up Travis CI for the latest Eucalyptus user console, Koala. Quoted from Wikipedia, “Travis CI is a hosted, distributed continuous integration service used to build and test projects hosted at GitHub.” In other words, we want to set up an automated service hook on Koala’s GitHub repository so that whenever developers commit new code, “auto-magic” takes places somewhere in the Internet, which ensures that the developers did not screw things up by mistake, which, in turn, allows us developers to sit back and enjoy a warm cup of “post-commit-victory” tea while our thoughts are drifting away on the sea of Reddit.

But, before arriving at such Utopia, first things first. Must read the instructions on Travis CI.

Luckily, Travis CI put together a nice and comforting set of documentations on how to hook a project on Travis CI (http://about.travis-ci.org/docs/user/getting-started/) as well as its impressive, pain-free registration interface. Things are as easy as clicking buttons for the first few steps so far.

And, of course, nothing ever comes that easy. Now I am looking at the part where I need to create the XML configuration file for Koala’s build procedures. Done with the button-clicking. Time to put down the cup of tea because now I got some reading and thinking to do.

A few minutes after, I am stunned by the line below:

“Travis CI virtual machines are based on Ubuntu 12.04 LTS Server Edition 64 bit.”

Oh, bummer.

The main development platform for Koala has been on CentOS 6, meaning Koala’s build dependencies and scripts have been targeted toward running on CentOS 6 environment. It means that I need to go over the build procedures and dependency setting so that Koala can be built and tested on Ubuntu 12.04. But, first, where do I find those Ubuntu machines?

Then, the realization, ‘Wait a second here. I have Eucalyptus.’

I open up a browser and log into the Eucalyptus system for which I have been using as backend to develop Koala. I launch a couple of Ubuntu 12.04 instances. Within a minute, I have 2 fresh instances of Ubuntu 12.04 virtual machines up and running.

Immediately I log in to the first instance and start installing Koala to validate the build procedures on Ubuntu 12.04 environment. Along the way, I discover various little issues in this new environment and tweak things around to fine-tune Koala’s build procedures. Once felt ready, I log into the other Ubuntu instance to verify the newly adjusted build procedures under its fresh setting. More mistakes and issues are captured, and more adjustments are made. Meanwhile, the first instance has been shut down and a new Ubuntu instance has been brought up. With this new instance, I am able to rinse and repeat the validation of the build procedures. Of course, there are some mistakes again. They get fixed and adjusted. Meanwhile, another instance goes down and comes up fresh.

The juggling of the instances lasts a couple more times until the build procedures are perfected. Now I am confident that Koala will build successfully on Ubuntu 12.04 environment. Commit the new build XML script to GitHub. It’s time for the warm cup of “post-commit-victory” tea.

images-tea

Advertisements

Eucalyptus Console is for Developers

Screen shot 2013-09-18 at 9.28.06 PM

One major event taken place in preparation for the upcoming release of Eucalyptus 3.4 is that Eucalyptus Console has found its own nest in GitHub (https://github.com/eucalyptus/eucalyptus-console) and is available as an independent project. This separation aims Eucalyptus Console to be developed and released at a different pace than its much bigger core system, Eucalyptus (https://github.com/eucalyptus/eucalyptus). Having faster cycles for the project expects Eucalyptus Console to be able to deliver patches and introduce new features in a much responsive manner.

Another appealing characteristic of Eucalyptus Console for this release is that now it can be used as a stand-alone application — running on a developer’s laptop — to manage the cloud resources on AWS. When provided a pair of an access key and a secret key into Eucalyptus Console’s login window, the console sets the end-point to AWS and allows the user to interact with the AWS cloud under the provided AWS credentials.

For many AWS enthusiasts and developers, this new Eucalyptus Console project on GitHub (https://github.com/eucalyptus/eucalyptus-console) offers an opportunity to develop a personal, customizable AWS management console for specific demands and use cases. Composed of a client-side web interface written in Javascript, using Backbone.js and Rivets.js, and a server-side proxy written in Python, using Boto and Tornado, Eucalyptus Console is designed to deliver the maximum flexibility when it comes to adding new features and adopting other latest web technologies.

github_bw

Check out the link below for getting the Eucalyptus Console VM image up and running on Vagrant:

https://github.com/eucalyptus/eucalyptus-console/wiki/Running-Pre-Baked-Eucalyptus-Console-Image-on-Vagrant

If you would rather install Eucalyptus Console from scratch on a CentOS 6 image/machine, check out the link below instead:

https://github.com/eucalyptus/eucalyptus-console/wiki/Installing-Eucalyptus-Console-on-a-Laptop-using-Vagrant

euca_new_logo

Introducing Metaleuca

Nuclear-Devil-Horns

What is Metaleuca?

Metaleuca is a bare-metal provision management system that interacts with open-source software Cobbler via EC2-like CLI.

Using Metaleuca, users can communicate with Cobbler to self-provision a group of bare-metal machines to boot up with new, fresh OS images. The main appeal of Metaleuca is that it allows users to manage the bare-metal machines like EC2’s virtual instances via the command-lines that feel much like ec2-tools, or euca2ools.

euca_new_logo

ec2_logocobbler_logo

Metaleuca Command-Line Tools

Metaleuca consists of a set of command-lines that mirror some of the command-lines in ec2-tool or euca2ools. The list below shows a number of the core command-lines used in Metaleuca:

  • metaleuca-describe-profiles – Describe all the profiles provided in Cobbler
  • metaleuca-describe-systems – Describe all the bare-metal systems registered in Cobbler
  • metaleuca-reboot-system – Reboot the selected bare-metal system
  • metaleuca-run-instances – Initiate the provision sequence on the selected bare-metal systems
  • metaleuca-describe-instances – Describe the statuses of the provisioned bare-metal systems
  • metaleuca-terminate-instances – Terminate the bare-metal systems, returning them back to the resource pool

Metaleuca Configuration

Prior to installing Metaleuca, it is required that you have already configured Cobbler to provision a group of bare-metal machines in your datacenter. If you are new to Cobbler, please visit the Cobbler’s homepage http://cobbler.github.com/ for more information on how to set up Cobbler.

Once you have Cobbler running in the datacenter, you will need to install Metaleuca on a Ubuntu machine, or virtual machine. The installation guide for Metaleuca is provided at:

https://github.com/eucalyptus/metaleuca/wiki/Metaleuca-Installation-Guide

Metaleuca Walkthrough

Now that you have Metaleuca configured in the datacenter, let’s go over a scenario where you will want to launch two bare-metal instances with fresh CentOS 6.3 images.

First, you might want to use the command “metaleuca-describe-systems” to survey all the available systems registered in Cobbler.

Picture 3.png 03-30-32-133

Metaleuca allows users to directly select which bare-metal machines to provision — by using the machines’ IPs. However, those who are familiar with AWS will contest that this approach is not how virtual machines are provisioned on EC2; rather than specifying the IPs, the AWS users are to provide the number of instances to launch. For this reason, here, we will cover the EC2’s approach to provision the instances.

In Metaleuca, first you will need to find out which ‘profile‘ is set to install the CentOS 6.3 image. In Cobbler, a profile maps to a preconfigured ‘kickstart‘ file that contains the netboot instructions on which OS to install when a machine boots up and initiates PXEBOOT. In other words, you may compare the profiles on Cobbler to the instance images on EC2. In Metaleuca, you can display the available profiles using the command ‘metaleuca-describe-profiles‘:

Picture 7

Let’s say that the profile “qa-centos6u3-x86_64-striped-drives” is what we want to use.

Next, you will want to determine which “system-group” you want the machines to be selected from. In Metaleuca, the bare metal machines can be grouped into different resource pools. For instance, in our QA system at Eucalyptus, which utilizes Metaleuca, we partitioned the machines in the datacenter into 6 groups: qa00, qa01, dev00, dev01, test00, and test01. Such grouping allows us to provide semantics on the machine pools based on their usages, in which we created the resource allocation policy for the users, who are mainly developers and QA engineers. The command “metaleuca-describe-system-groups” displays all the machines and the groups accordingly:

Picture 8

However, keep in mind that not all machines in the list will be available to be provisioned; some of the machines might be in use by other users. Thus, you will want to run the command “metaleuca-describe-system-groups -f” to discover which machines are free to use. Fortunately, at this moment, among those 6 system-groups mentioned above, the group ‘test01′ has 2 machines available, which are labeled as “FREED” in the screen shot below:

Picture 6

Once you figure out the profile and the system-group availability information, you are ready to provision the bare-metal machines. The command “metaleuca-run-instances” takes input of how many instances you want to launch, which profile to use, which group to call from, and finally, the user string to mark the machines:

./metaleuca-run-instances -n 2 -g test01 -p qa-centos6u3-x86_64-striped-drives -u kyo_machines_for_demo

Picture 10

And, similar to ec2-tools and euca2ools, you can monitor the progress of the provisioned machines using the command “metaleuca-describe-instances -u“:

Picture 11

Notice that the instances are at the “pending” state at the moment. Soon, in about 8 minutes after launching, the instances will be shown as “running”:

Picture 12

At that point, you may ssh into the bare metal machines to verify that they are up and running with the fresh CentOS 6.3 OS installed.

Later, the command “metaleuca-describe-system-user -u” comes in handy when you want to find out which machines are provisioned under your name:

Picture 13

When you are done with the machines, you may “free” the machines so that they can return to the resource pool; the command for that case is “metaleuca-terminate-instances -u“:Picture 14

When running the command “metaleuca-describe-instance -u“, you will notice that the machines have been successfully freed:

Picture 15

Metaleuca as an Open Source Project

BARE METAL FOIL2

Metaleuca was evolved out of the internal usage — the development and test environment for engineers — at Eucalyptus, and is now available as an open source project on Github under the Apache license. The goal of the project is to complete the integration of Metaleuca into the Eucalyptus system so that it can be served as a “bare-metal only” zone in Eucalyptus. Your contribution is much appreciated!

Check out the project at:

https://github.com/eucalyptus/metaleuca

\m/

Beyond Continuous Integration: Locking Steps with Dev, QA, and Release

Continuous integration: the practice of frequently integrating one’s new or changed code with the existing code repository [wikipedia]

In this blog we will talk about how the continuous integration process was put in place for the new component, Eucalyptus User Console, in order to collaborate the efforts among the dev, QA and release teams throughout the development cycle of Eucalyptus 3.2.

Backgrounduserconsoleconponentview

Eucalyptus User Console is a newly introduced component in Eucalyptus, whose main goal is to provide an easy-to-use, intuitive browser-based interface to the cloud users, thus assisting in the dev/test cloud deployments among IT organizations and enterprises. Eucalyptus User Console consists of two components: javascript-based client-side application and Tornado-based user console proxy server.

Early Involvement

The first phase of the development was to come up with a quick prototype to demonstrate how the user console would work under the given initial design of the architecture (see the Eucalyptus Console components layout diagram above). As soon as the prototype was evaluated and its feasibility was verified, the release team started creating the packages for two major Linux OS platforms: Ubuntu and Centos/RHEL.

The early involvement of the release team turned out to be the best help any developers or QA engineers could ask for; since the very beginning stage of the development, the release team was able to provide invaluable information that served as guardrail for the fast-moving development. Such information included advising on how the files should be named and organized and identifying which dependencies should or should not be used in order to meet the requirements for various Linux distributions. Dealing with such issues at the later stage of the development would have been undoubtedly a major pain in the back-end.

jenkins_logo

Further more, the release team was able to ensure that the development of the new user console would never go off the track against the Linux distro requirements by setting up the automated daily package-building process using Jenkins — which utilizes the VM resources from our Release cloud that runs on Eucalyptus.

Keeping Up With Eucalyptus

Setting up the automated process to build the packages would allow the release team to keep an eye on the progress of the user console’s development in terms of the ability to build the packages according to the constraints set by the Linux distributions. However, it would not guarantee whether the newly built packages contain the version of the user console that works with the current, up-to-date Eucalyptus cloud that was also in development.

Thus, the challenge was to ensure that the latest built user console packages work with the latest built Eucalyptus throughout the development.

In order to solve this issue, the QA team created a testunit that automatically installs the latest user console packages on a newly built Eucalyptus. Then, the testunit was added to the main test sequences used by the Eucalyptus 3.2 development in our automated QA system, making the installation of the latest user console packages accessible by all developers at Eucalyptus.

This setup encouraged a failure in the user console package installation to be seen by any developers throughout the development, thus allowing the failure to be detected fast and reported with quickness.

Screen shot 2012-12-10 at 5.50.02 AM

The testunit ui_setup can be seen in action above in the table which displays the results of the test sequence ran by the automated QA system.. Check out the link below for more details of this testunit:

https://github.com/eucalyptus-qa/ui_setup

Circle of Trust

As the user console evolved out of its prototype state and took the form of a more product-like shape, the QA team was working in parallel, figuring out how to set up the automated testing process for the user console. The blog here talks in detail about how Selenium was used to create the automated web-browser testing tools, se34euca.

big-logo

In the mid-stage of the development, as the features of the user console started functioning in reasonably stable manners, 3 automated tests were added — incrementally — to ensure that the working state of the user console throughout the development.

Screen shot 2012-12-10 at 6.41.28 AMThose 3 tests are:

  1. user_console_view_page_testhttps://github.com/eucalyptus-qa/user_console_view_page_test
  2. user_console_generate_keypair_testhttps://github.com/eucalyptus-qa/user_console_generate_keypair_test
  3. user_console_launch_instance_test https://github.com/eucalyptus-qa/user_console_launch_instance_test

These automated tests were to ask the 3 simple questions below on a daily basis:

  1. Can the user log in and see all the landing pages on the latest user console?
  2. Can the user generate a new keypair using the latest user console?
  3. Can the user launch a VM instance using the latest user console?

Of course, it would be possible, and desirable, to ask more questions in a more complicated fashion. However, during the rapid development phase, asking those 3 simple questions on a daily basis, turned out to be sufficient, and effective, to understand whether something terrible had happened to the user console or not.

traffic_light

The goal of these automated tests at this stage of the development was not to detect every little defect in the product. Not too soon at the moment.

The main purpose is rather to serve as an indicator for the developers, QA engineers, and release engineers to assure ourselves that the change that went in the code earlier today did not ruin the delicate trust among the three groups, meaning that the build, installation, and configuration procedures are still in tact. Having such assurance in check by mechanical means has made the three groups extremely effective in discovering issues during the development since it allowed each member to narrow down exactly what was responsible for the defects in a finely reduced time frame, which was in hours, rather than days or weeks.

Guardrail For Development

Having the automated package build process and the automated installation/configuration process in place at the early stage of the development was proven to be extremely useful; rather than agreeing on the written procedures, the dev, QA, and release team materialized such agreements into the actual implementation, and put them into work by using various automated mechanics that run on a daily basis. Therefore, throughout the development, we were able to witness and assure ourselves that we were making progress in accordance with the plan and our self-imposed restrictions.

Check out the Eucalyptus Open QA webpage to see the continuous integration at Eucalyptus in action:

Eucalyptus Open QA (beta) – http://ec2-50-112-61-121.us-west-2.compute.amazonaws.com/open_qa.php

Simulate 150 Cloud User Activities Using Open Source Tools

For the 3.2 release in this December, Eucalyptus is coming out with an intuitive, easy-to-use cloud user console, which aims to support the on-premise dev/test cloud adoption among IT organizations and enterprises.


This easy-to-use Eucalyptus User Console is consisted of two main components: a browser-side javascript application, written in JQuery, and a proxy server that utilizes Python Boto to relay requests to Eucalyptus Cloud, which is written in Python Tornado, an open source version of the scalable, non-blocking web server developed by Facebook.

The target scale for the initial version of the user console is set to handle 150 simultaneous user activities under a single user console proxy.

Now, the challenge is how to simulate these 150 users to ensure that the user consoles and the proxy are able to withstand the workload of 150 active cloud users; more importantly how to ensure that such workload is not jeopardizing the user experience on the console.

One obviously answer is to find 150 people, train them thoroughly, and ask them to participate in the load testing. After all, 150 is doable.

However, what’s not doable is that having those 150 people to repeat the process over and over during the entire life cycle of the development until the release.

Then, the most “realistic” answer is to simulate those 150 people using machines. It turns out that the machines are really good at repeating the same things over and over, and they tend to behave in a very predictable manner when tuned properly.

At Eucalyptus, we use Selenium, open source web testing automation tools, to simulate the actual user interactions on the user console.

The steps are first, use Selenium IDE on Firefox to write an automation script that completes a single path of cloud user workflow — for instance, one simple user workflow is to log into the console, create a new keypair, and log out, and another workflow to log in, create a new volume, and log out. Second, repeat the first step above for all possible use cases to ensure that all, or most, of the functionality on the console are covered, allowing all use cases to be automatically executable via Selenium IDE. Third, export those automated IDE scripts to Selenium Python WebDriver format, which allows the automated scripts to run on a remote server without needing to actually opening up a browser. Finally, create a wrapper for each exported script so that each test case can be execute as a command-line tool on Linux.

The link below contains the collection of automated Selenium WebDriver test scripts, command-line tools, and their installer for testing the Eucalyptus User Console:

Se34Euca (Selenium34Eucalyptus) – https://github.com/eucalyptus/se34euca

With Se34Euca, you can instantaneously convert any machine — or virtual machine if you are already a cloud geek 😉 — into a Eucalyptus cloud user simulator.

The steps are, on a Ubuntu image, run the commands below to install and setup Se34Euca:

sudo apt-get -y install git-core

git clone git://github.com/eucalyptus/se34euca.git

cd ./se34euca/script/

./installer_se34euca.py

Then, running the actual test can be as simple as:

export DISPLAY=:0

./runtest_view_page.py -i 192.168.51.6 -p 8888 -a ui-test-acct-00 -u user00 -w mypassword1 -t view_all_page_in_loop

The command line above will simulate a cloud user clicking through every single landing page on the user console within 2 second, then taking a rest for 5 seconds, and repeating the frantic, yet controlled clicking again and again and again.

However, funny enough, it turned out that the automated script’s ability to click through all pages on the user console within 2 seconds was well beyond the capability of a human user. The graph below renders the normal behavior of an actual human user. The X-axis in the graph shows the total length of TCP packets seen in a second on the user console proxy server machine via tcpdump.  Notice the peak in the beginning as the user logging in, and a group of little ripples that mark the user clicking buttons or viewing different pages in a 7-minute period:

And, the graph below shows the difference in the actual user behavior and the automated script behavior simulated by a single instance of Se34Euca. Notice the super-human strength of the automated script — the first half of the graph below is showing the same 7-minute period shown in the graph above. According to the graph below, the automated script is able to generate 10 times workload than a human user.

This discovery turns out to be good news; the fact that one Se34Euca instance can generate 10x human user workload, all I need to do is to launch 15 instances of Se34Euca to simulate 150 users. So, I provisioned 3 Ubuntu machines and launched 5 instances of Se34Euca on each machine:

The first fifth of the graph above covers the same period as the second graph above. What you are looking at is 15 instances of Se34Euca clicking through every single page on the Eucalyptus User Console for about two hours, starting at 21:00 mark.

When computed for averaging the packet length per second over 60 second observation period, the graph looks below:

The graph above is showing that when 150 users simultaneously logging in to the user consoles, the average packet transmission throughput rate seen on the wire is about 750Kb per second. Assuming that the user console proxy server is hooked on 1 Gig link, the throughput of 750Kb per second is certainly “doable” by all means. 😉

Then, how do we ensure the user experience of the console?

Simple. While the user console proxy server is being slammed by 150 click-monkeys, I’m opening up my own browser to verify that my interaction with the console is smooth as usual. 🙂

On my next blog, I will cover more details on the exact setup of the Eucalyptus User Console load-testing, including the selenium scripts and monitoring setup, and dig deeper into the analysis of the data. Please, stay tuned 😉

Meanwhile, feel free to check out the blog below if you would like to preview the Eucalyptus User Console for yourself:

http://coderslike.us/2012/11/11/installing-the-eucalyptus-console-from-source-and-packages/

10 Steps to Euca Monkey

Euca Monkey is an easy-to-deploy test tool designed for performing stress-test on Eucalyptus Cloud. The tool repeatedly generates and tears down 6 types of cloud user resources: running instances, volumes, snapshots, security groups, keypairs, and IP addresses. As the resources are being populated and released, the cloud is actively queried to validate such resources are indeed being allocated correctly per request. Then, the tool renders the progress of the stress testing using Gnuplot, an open source graphing tool, and displays the graphs as webservice in real time.

Euca Monkey uses cloud-resource-populator — which utilizes Eutester, which is based on Boto, thus making the tool AWS-API-compatible  — to populate and release resources from Eucalyptus as a user. The input for cloud-resource-populator looks below:

[USER INFO]
account: ui-test-acct-23
user: user-23
password: mypassword23
[RESOURCES]
running instances: 2
volumes:  2
snapshots: 1
security groups: 10
keypairs: 3
ip addresses: 2
[ITERATIONS]
iterations: 200

With the given input above, cloud-resource-populator will generate the resources as specified in the [RESOURCES] section as the user ‘user-23′ under the account ‘ui-test-acct-23′. When viewed from Eucalyptus user console, it will look as below:

As soon as the resources are populated according to the specification, cloud-resource-populator will immediately send requests to the cloud to release all the allocated resources, which makes the console view look as below:

And, as you would have guessed, the process of populating and releasing of the resources is repeated for [ITERATIONS] times.

The most appealing feature of Euca Monkey is that it launches a webservice to render the progress of the stress-testing in real time.

The graph above is showing the input values of a few iterations of the resource population and tear-down process. When tracing the running instance line, which is in red, this graph is telling us that 20 instance were started on the first request, then those 20 instances were all terminated on the second mark, thus brought the count down to 0. And, 20 instances were started again, then terminated, and so on. Such operations were repeated 7 times on the graph above. And, also notice that there are 4 other resources being populated and released 7 times as well.

When the cloud is behaving nicely, the actual resources should be populated and released in match with the input values in the graph above, thus resulting in the output graph as below:

While the first graph shown above renders the input values to the tool cloud-resource-populator, this graph is showing the actual values reported by the cloud. The fact that these two graphs look the same means “Yay Cloud!!”

However, occasionally, during the development of Eucalyptus, you would see the graph like below on a rainy day in Santa Barbara:

The graph above reveals an interesting state of the cloud. Notice that the “running instance” line went from the “nice” behavior pattern to a flat line. It means that the cloud was able to launch and terminate 20 instances during the first phase, but somehow it got stuck to the state where it was not able to release instances, nor launch more instances, thus stuck with 18 running instances. But, notice that other resources were making the usual progress as before, except the security groups. It turned out what we were witnessing was the deadlock case in Eucalyptus Cluster Controller, which occurred around the 8th hour of the stress-testing. R.I.P CC. 😦

The purpose of stress-testing is to push the limit of the system to the point where malfunctions and faulty behaviors of the system can be observed. Such stress testing is crucial for the development of a distributed system like Eucalyptus; many unknowns and bugs are constantly introduced to the system as new features from various components are being integrated. Thus, having an easily deployable stress-testing tool with visualization support, such as Euca Monkey, yields tremendous benefits for the developers in an agile development environment since the tool aims to ensure the system’s stability and reliability throughout the rapid development cycle.

If you would like to take Euca Monkey for a spin, feel free to check out the GitHub link below. On a fresh Centos 6 machine or VM, it will take 10 simple steps to launch your own monkey.

https://github.com/eucalyptus/euca-monkey

Other Resources:

Eutester – https://github.com/eucalyptus/eutester

Boto – https://github.com/boto/boto

Eucalyptus QA – https://github.com/eucalyptus-qa

Open QA for Eucalyptus – https://github.com/eucalyptus/open-qa

Quality Flow in Eucalyptus

When being interviewed at Eucalyptus, one is often asked, “when do you stop testing software?” This is not a trick question. As a matter of fact, this is not even a question; there is only one answer, and everyone knows it.

You never stop testing.

In Eucalyptus, we take the answer above very much literally.

Introduction

Eucalyptus has always displayed strong interest in creating an innovative, state-of-the-art software development workflow that supports four value-driven development practices: continuous integration, continuous testing, extreme automation, and open development. Based on these core principals, Eucalyptus engineering has designed an advanced quality control process, backed by a versatile automated QA system, whose function is to maintain the quality standard in Eucalyptus that is persistent and unchallengeable.

The graph below shows the overview of Eucalyptus software development workflow:

Graph 1. Quality Flow of Eucalyptus Software Development

Overview

Eucalyptus quality control process ensures that there exists monotonic increase in quality in the code as traveling through quality assurance check-points, known as Quality Gates, represented as green bars in Graph 1. Each quality gate is a set of automated operations performed by Eucalyptus’s automated QA system, which manages testing of Eucalyptus in a completely automated fashion — including bare-metal provisioning, installation and configuration of Eucalyptus, sequencing of automated tests, and providing feedback via various medium.

Eucalyptus quality control process is not just a quality assurance process; it is fully integrated into Eucalyptus software development. The process combines continuous integration and continuous testing into a unified workflow, which is powered by extreme automation via the QA system. The goal is to get rid of mundane tasks for developers so that they can focus on tasks that they are good at doing — developing a cloud infrastructure system of the highest quality.

Advantages

Guarantees in Quality

The branch #master (https://github.com/eucalyptus/eucalyptus) is the face of Eucalyptus open development model; every Eucalyptus engineer and community member checks out from the branch #master for development, experiment, or contribution. Thus, it is absolutely crucial that at all time, the quality of the branch #master is set to meet the standard of a functional cloud infrastructure system. In other words, at any given moment, a person should be able to check out from the branch #master and immediately start developing on top of the existing system without wasting time wondering if the system has any major problems that may influence further development. It is Eucalyptus’s responsibility to provide such guarantees in quality in the open branch #master. It is such guarantees in quality that encourage the community members to actively engage with Eucalyptus since it “defrictionalizes” contributions from merging into the main body of the code.

Commit Often and Fail Fast

In the early days, without formal guidance put in place, Eucalyptus has experienced a fair share of growing pain where development branches have drifted far, far away. The only known, effective solution to the “drifting development branch” problem is to have developers commit and check out the code as often as possible (continuous integration). However, the challenge in this practice is on how to minimize the effort that goes into this checking-in and checking-out process. The more tedious and complicated this process is, the more a developer is inclined to delay the process, causing his or her branch to be drifted far, far away. On the other hand, overly simplified commit-and-checkout process will severely jeopardize the quality of the code since a such process would likely permit unchecked mistakes to sneak into the main body of the code.

Eucalyptus quality control process attempts to address this issue by automating the entire “code-commit to quality-validation” process with a click of a button (extreme automation). As seen on TV, a developer can “just click it and forget it” when committing the code – that is until he or she gets a notification email indicating whether the commit has passed or failed to merge into the branch #master. The goal is to encourage developers to commit frequently by providing an easy mechanism that automatically performs quality verification on the code and returns fast feedback.

Procedures

In Eucalyptus software development workflow, a commit from a development branch to the branch #master is a two-step process. As seen in Graph 1., there is no direct path that leads to #master. Every commit must go through the aggregation branch called #testing. When the code on a development branch – from the far left stage on the graph – is committed to #testing, leaving the hand of a developer, the automated quality assurance process – represented as “Quality Gate 2” in the graph – will determine whether the code is allowed to move to #master. If the code fails to meet the quality standard enforced by Quality Gate 2, the branch #testing will be reverted to remove the history of the failed code. The purpose of this action is to bring the state of the branch #testing back to the last known stable quality. This action allows other developers to continuously commit to #testing even after previous commits in #testing fail to enter the branch #master.

The automated tests triggered by Quality Gate 2 is set to consume about 20 QA machines running tests for 5 hours. The cost of operation for Quality Gate 2 is very high. Therefore, there is a need for setting up a pre-commit check-up process — represented as “Quality Gate 1” in the graph above — in order to perform quick spot-checks on development branches. Such preliminary check-up process aims to minimize failures caused by preventable mistakes when committing to #testing.

Challenges

1. Scale and Resources

Ideally, every commit to the branch #testing must be examined and validated independently. However, in reality, with the limited resources and the frequency of commits, it becomes impossible to process all the commits on time if examined one at a time. Let’s say, in order to perform all the necessary automated tests kicked off by Quality Gate 2, it takes 20 QA machines running tests for 5 hours. Given this resource requirement, processing 5 commits individually would take 100 machines running tests for 5 hours in parallel, or the entire day if those commits end up completing for the resources. It is not difficult to see how fast the wait-line in the queue will grow for pending commits.

The current solution to this problem is batch-processing of commits at a scheduled time. There will be 2 or 3 times a day when Quality Gate 2 opens up to provide the opportunities for pending commits to “try out” for the branch #master. All the commits will be bundled together and go through the automated quality assurance testing as a group. If it passes, then all the commits will enter the branch #master. But, if it fails, then all the commits are rejected together, notifying each developer who submitted the commit accordingly. By examining the test-results report, attached to the notification email, the developers should communicate with each other to determine exactly what went wrong in the system and learn whose commit was responsible for the failure, which will allow the other, non-responsible developers to retry the commit.

2. Quality of Automated Tests

Eucalyptus quality control process counts on an extensive range of automated tests that cover all the essential functionality of a cloud infrastructure system. Those automated tests are designed to capture the accurate view of the target system. At the end of testing, there must be a clear “passed” or “failed” signal for each automated test with a high level of reliability. Any automated test that lacks such reliability provides no value in Eucalyptus quality control process since it would introduce noise when trying to examine the overall quality of the system.

To make the matters more complex, in agile-like development, today’s feature tests are tomorrow’s regression tests. In other words, in a such fast-paced development cycle, the automated tests written for testing new features in this release cycle will be used for detecting regressions in the next release cycle. In such continuous testing environment, combined with agile development, there exists a close relationship between the quality of the features and the quality of the automated tests; poorly written tests for a feature would result in a poorly maintained feature since such tests would fail to detect regressions as the development progresses. Thus, in order to develop good quality features, Eucalyptus quality control process must provide rapid ways to:

  • Ease the maintenance of existing automated tests (for detecting regressions)
  • Ramp up the maturity time of newly created automated tests (for validating new features)

Eucalyptus’s approach to these challenges is to develop a sophisticated testing framework, called “eutester“(https://github.com/eucalyptus/eutester), in order to enhance the automated test writing and maintenance experience.

3. Extremeness in Extreme Automation

Eucalyptus quality control process is glued together by complete automation. The entire process is put on auto-pilot that is guided by various automated pieces and modules working in accordance. Each independent component should have the ability to make decisions based on its own intelligence, meaning that it knows how to configure itself, run tasks, detect failures, examine the condition of the failures, recover from the failures, gather information, communicate with other components, and etc.. A collection of such intelligent components is essential to keep the quality process in motion with minimal interruptions. Now, the challenge is, how do we make it think like a QA engineer?

FYI. Eucalyptus is currently hiring talented Quality Assurance engineers who share the same vision and will power to attack these challenges (http://www.eucalyptus.com/careers/jobs).

A Developer Walks through Cloud

Skip Directly to [Instruction on How to Run the Video Processing Prototype]

A Developer Walks through Cloud

IMG_1562

1. Little Phone, Big Cloud

A few months ago, a phrase caught my attention: “Instagram for Video”. It was an interesting idea for a mobile application. As a software designer, I dug into the idea, soon to realize one major implementation challenge.

IMG_1564

It turns out that video is a collection of pictures–many, many pictures. Given the standard 24 frames-per-second rate, even an one-minute-long video would be comprised of 1440 pictures, which meant image-processing of 1440 pictures on a mobile phone. That is a lot of pictures for a small battery in your mobile phone to handle.

IMG_1567

There is an alternative way to the scenario; let’s consider moving the image-processing task over to a remote machine that is bigger, stronger, and meaner. In this scenario, the mobile phone could upload the video to a server via the internet, process it remotely, and retrieve the processed video back in a seamless fashion.

IMG_1568

However, there is one absolutely-crucial requirement in this scenario; we are going to need a big, big, big machine–big enough to handle millions of requests once this killer application goes viral (go big or go home). There is only one answer to this type of demand: “the Cloud.”

Luckily, there is an open-source cloud available; Eucalyptus is an open-source Infrastructure-as-a-Service cloud platform whose APIs are compatible with the ones with Amazon’s EC2. This makes Eucalyptus an ideal in-house cloud application development platform. It guarantees that once my killer application runs on Eucalyptus, it will also run on EC2 with no modifications required, thus creating a truly portable cloud application with the world-wide deployability.
IMG_1570

2. IaaS Cloud

For those who are not familiar with the IaaS clouds, let me to take you to a quick walkthrough to the cloud.

Eucalyptus and Amazon’s EC2 offer “Infrastructure-as-a-Service” cloud platforms. It means that a cloud-user can request, “Hey cloud, I need 5 machines with full network connectivity and access to the storage,” then within minutes, the user will have the complete system ready for use.

IMG_1572

Take this concept little further; instead of requesting machines for generic purposes, the cloud-user could have specified which machines to serve as what purposes at the creation. For instance, using the example in this article, the cloud-user could have asked, “Hey cloud, I want one machine to work as a collector while the rest as image-processors, and have them process my cat video immediately!” Then, the cloud would have brought up a network of machines with the specific tasks assigned to each machine, and they would have worked on processing of the cat video right away. Once the processing was complete, the machines would have been self-terminated, leaving only the processed cat video behind.

IMG_1575

3. App on the Cloud

Let’s go back to the video-processing application on the cloud. Here I will cover some major design considerations when developing applications on the cloud.

3.1. Parallelism and Elasticity

Designing an application on a distributed system requires a process to be broken down into small tasks. Then, one must identify the tasks that can bring parallelism into the process. In this video-processing application, the process can be broken down into 3 major steps: decoding the video into images, processing the images, and encoding the processed images back to a video. Given these breakdowns, the natural approach is to distribute the image-processing task over multiple machines and assign a single machine to perform the encoding and decoding.

IMG_1577

One important characteristic of the cloud that you must realize at the core of the design is the elasticity of the cloud. The elasticity is what differentiates the cloud applications from the traditional distributed applications. Traditionally, in a distributed computing environment, the number of nodes N in the system is a static value that is unchangeable during a job. However, in the cloud environment, there is no bound in the number N, theoretically the number N is limitless. This means that at any given point during the job, the system should expect the number N to grow, or even shrink in some cases. For instance, in our video-processing application, we could initially start with 5 machines assigned to be image-processing nodes, however in the middle of the processing, we should be able to add 5 more nodes to boost the productivity. Taking such advantage of the elasticity must be considered at the design level of the application.

IMG_1579

3.2. Prototype

Following is the overview of the prototype of the video-processing application in the cloud.

For more detailed instructions, please go to the page [Instruction on How to Run the Video Processing Prototype]

The goal of the prototype is to demonstrate a cloud application that performs image-processing tasks in a distributed fashion. The application takes input of a video file, performs image-processing in parallel, and when it terminates, the processed video file is stored in a known storage location provided.

For the simplicity of the prototype, let’s assume that there is a machine that works as a file server that has an apache web-service running in the open, which is accessible from the cloud. In other words, any virtual instances(nodes) spawned on the cloud will have access to the files on the file server via download(wget). Given this setup, for instance, when we trigger the collector node, it can download the input video file from the file server to start the process.

IMG_1581

For the prototype, we need to construct two types of nodes: collector node and image-process node. However, before I go into further details, I must explain what takes places when the cloud-user requests an instance from the IasS cloud.

When the cloud-user asks the cloud, “Hey cloud, I need one machine,” the user is required to specify the image of the machine. In other words, the cloud-user must request, “Hey cloud, I need one machine with the RHEL 6.1 image that I have prepared for this video-processing prototype.” Then, the cloud will bring up a virtual instance that is flashed with the specified RHEL 6.1 image. Since users can prepare and upload images of their choices to the cloud, the possibilities are limitless on what you want the instances to do or to become.

IMG_1583

For this particular prototype, I prepared a single image that would be used by all nodes. I took a generic Ubuntu Karmic image as the base image and modified its ‘rc.local’ script, which is the default script that gets executed automatically when the image boots up. The modified ‘rc.local’ script is set to read a line from the ‘user-data’ field, which get passed to the instance from the cloud-user at the creation. This small modification allows me to control the rolls of the instances with having only one image. For example, I can request, “Hey cloud, I want one instance with my special Ubuntu image and have it run the script ‘collector.pl'”, then later, I can ask, “Hey cloud, I want another instance with the same image, but this one will run the script ‘processor.pl’.”

The requests in the example above would look like the below. Notice using the same image ID ’emi-9BD01749′, but different ‘user-data’ values (-d).

First request to bring up a collector:

euca-run-instances emi-9BD01749 -k mykey0 -n 1 -g group0 -t c1.medium -d “collector.pl”

Second request to bring up a processor:

euca-run-instances emi-9BD01749 -k mykey0 -n 1 -g group0 -t c1.medium -d “processor.pl”

IMG_1585

In the prototype, the actual requests contain more information than just a script name. The first request looks like,

euca-run-instances emi-9BD01749 -k mykey0 -n 1 -g group0 -t c1.medium -d “collector.pl 192.168.7.77 [lovemycat.avi]”

This command translates to, after the instance boots up, it downloads the specified script ‘collector.pl’ from the file server at ‘192.168.7.77’ via wget and execute the script. The purpose of the script ‘collector.pl’ is to turn the instance into the collector node for the video-processing application. First, the script installs all the necessary softwares via apt-get commands in Ubuntu; it uses various open-source softwares for the encoding and decoding tasks. It also installs the NFS server to create a shared directory where the processing nodes can access. Second, it downloads the target video file ‘lovemycat.avi’ from the file server at ‘192.168.7.77’ (for the convenience of the prototype, the file server is designed to provide all the external file resources to the instances). Then, the collector node decodes the avi file into a collection of JPEG images. These image files are stored in the shared directory opened up by the NFS server. Now, the collector node waits for the image files to be processed by the processing nodes. The collector node’s job is to periodically scan the shared directory for the progress.

IMG_1587

After the collector node enters the stage where it idles and scans, the next step is to start a group of the processing nodes by requesting,

euca-run-instances emi-9BD01749 -k mykey0 -n 3 -g group0 -t c1.medium -d “processor.pl 192.168.7.77 [10.219.1.2 neon.scm]”

As result, 3 instances will boot up, download the specified script ‘processor.pl’ from the file server at ‘192.168.7.77’, and convert themselves into the image-processing nodes. It installs the opens-source image-processing software GIMP and the NFS client. It performs NFS-mount to the shared directory of the collector node, whose IP is at ‘10.219.1.2’. Then, these processing nodes will start picking up image files from the shared directory and perform image-processing using GIMP according to the script ‘neon.scm’.

The syntax of the user-data for this image is:

-d “<script> <file_server_IP> [ <arguments_for_script> ]”.

IMG_1588

Now, here is one crucial design decision that compliments the elasticity of the cloud. The work-unit for the image-processing is set to be 20 images at a time. This means that each node is only allowed to grab a chunk of 20 images at a time to perform image-processing. Under this policy, the processing nodes must frequently inquire the collector node for a small amount of work, instead of pre-determining the complete workload for each processing node prior to the beginning of the processing. This approach allows more processing nodes to be added to the system at any moment, thus taking full advantage of the elasticity.

IMG_1590

When the processing nodes discover that there are no more images to be processed, they will be self-terminated, freeing up the computing resources for the cloud. When the collector node learns that all the images have been processed, it wakes up and encodes the images to a new video file. The final AVI file will be uploaded to the storage location belongs to the cloud-user. Eucalyptus and EC2 offer S3 storage units that allow such operation, however I will skip the details for later.

IMG_1592

This prototype demonstrates how a complex operation, such as distributed video-processing, can be automated using the cloud. However, the automation is just a tip of the iceberg for the cloud. The raw power of the cloud comes from the ability to instantly replicate the application in a massive scale across the world. Such capability of the cloud contributes to the recent booming development in Software-as-a-Service (SaaS) solutions.

IMG_1595

Extra. Links to Processed Videos

Using Invert Filter –

Using Edge Filter –

Using Motion Blur Filter –

Related. Links to Project Home Page –


%d bloggers like this: