Kyo Lee

Open-Source Cloud Blog

Tag: application

[Cloud Application] Run Eucalyptus UI Tester on your Mac using Vagrant

Eucalyptus User Console

Initially, this blog was written to be a technical blog that describes the instructions on how to run Eucalyptus UI Tester (se34euca) on your Mac using Vagrant and Virtual Box. However, writing this blog has made me reiterate the benefits on running or developing applications on a virtual machine.

Background: Automated Tester As An Application (ATAAA)

When developing software, there is a need for having an automated test suite readily available; with the click of a button, a developer should be able to run a sequence of automated tests to perform a speedy sanity check on the code that is being worked on.

Traditionally, a couple of in-house machines would be dedicated to serve as automated testers, shared by all developers. In such setup, it would require the developers to interact with the tester machines over VPN, which can get quite hectic sometimes — especially for those developers who like to hang out at a local coffee shop.

Now, with Vagrant and Virtual Box, you can have your own personal automated tester as a “cloud application” running on a laptop. In this scenario, when the code is ready for testing, you can quickly run a set of automated tests on your laptop by launching a virtual image that has been pre-configured to be the automated tester for the project/software. When finished, the virtual instance can be killed immediately to free up the resources on the laptop.

Screen shot 2013-07-09 at 9.46.04 PM

Benefits of Running Applications on a Virtual Instance

As mentioned in the introduction, while preparing this Eucalyptus UI Tester to run as a cloud application, I rediscovered the appreciation for using virtual machines as part of the software development environment. The fact that the application runs on a virtual image brings the following benefits: contain-ability, snapshot-ability, and portability of the application.

1. Contain-ability

Running the application on a virtual instance means that no matter how messy dependencies the application requires, they all get to be installed on a contained virtual environment. This means that you get to keep your precious laptop clean and tidy, protecting it from all those unwanted unstable, experimental packages.

2. Snapshot-ability

When working with a virtual instance, at some point, you should be able to stabilize the application, polish it up to be a known state, and take a snapshot of the virtual image in order to freeze up the moment. Once the snapshot is taken and preserved, you have the ability to bring the application back to the such known state at any time. It’s just like having a time machine.

groundhogday

3. Portability

When working with a team or a community, the portability of the application on a virtual image might be the most appealing benefit of all. Once you polish up the application to run nicely on a virtual image, then the promise is that it will also run smoothly on any other virtual machines out there — including on your fellow developers’ laptops as well as on the massive server farms in a data center, or in the cloud somewhere. Truly your application becomes “write once, run everywhere.”

Screen shot 2013-07-09 at 9.47.51 PM

Running Eucalyptus UI Tester on Your Mac Laptop via Vagrant

If you would like to run Eucalyptus UI Tester from scratch, follow the steps below:

1. Installing Vagrant and Virtual Box on Mac OS X in 5 Steps

and

2. Installing Eucalyptus UI Tester on CentOS 6 image via Vagrant

If you would like to run Eucalyptus UI Tester from the pre-baked Vagrant image, follow the steps below:

1. Installing Vagrant and Virtual Box on Mac OS X in 5 Steps

then

3. Running PreBaked Eucalyptus UI Tester Image using Vagrant

, and see 4. Creating a New Vagrant Package Image if you are interested in creating a new image via Vagrant.

Instructions

1. Installing Vagrant and Virtual Box on Mac OS X in 5 Steps

https://github.com/eucalyptus/se34euca/wiki/Installing-Virtual-Box-and-Vagrant-on-Mac-OS-X

2. Installing Eucalyptus UI Tester on CentOS 6 image via Vagrant

https://github.com/eucalyptus/se34euca/wiki/Installing-se34euca-on-Centos-6

3. Running PreBaked Eucalyptus UI Tester Image using Vagrant

https://github.com/eucalyptus/se34euca/wiki/Running-PreBaked-se34euca-Image-using-Vagrant

4. Creating a New Vagrant Package Image

https://github.com/eucalyptus/se34euca/wiki/Creating-a-New-Vagrant-Package-Image

euca_new_logo

Advertisements

TCP Dumpster: Monitoring TCP for Eucalyptus User Console

This is the part III of the Eucalyptus Open QA blog series that cover various topics on the quality assurance process for Eucalyptus’s new user console.Eucalyptus User Console

On this blog, we would like to share the information on how we monitors the traffic on the user console proxy, using the Linux command ‘tcpdump‘ and its rendering application ‘tcpdumpster‘, to derive and understand the behaviors of users when interacting with the user console.

Background

Eucalyptus user console consists of two components: javascript-based client application and Tornado-based user console proxy. When logged in, the client-side application, which runs on a user’s web-browser, polls the user’s cloud resource data at a certain interval, and the user console proxy, located in between the cloud and the users, relays the requests originated from the client applications.

userconsoleconponentview

Recalling from the first blog of the series, our challenging question was, when 100+ users are logged into the Eucalyptus user consoles at the same time, would the user console proxy be able to withstand the traffic that was generated by those 100+ users? Plus, how do we ensure the user experience under such heavy load?

The answer to the questions above was provided in details here.

The short answer is to generate 100+ user traffic using the automated open-source web-browser testing tool, Selenium, while manually evaluating the user experience on the user console.

However, prior to answering the questions above, first we needed to establish a way to quickly, yet effectively monitor the traffic between the clients and the proxy in order to make observations on the patterns and behaviors of the traffic.

TCP Dump

tcpdump‘ is a standard tool for monitoring the TCP traffic on Linux. For instance, if the user console proxy was running on the port 8888 on the machine 192.168.51.6, monitoring the traffic on the port 8888 can be as simple as running the command below on the Linux terminal at 192.168.51.6:

tcpdump port 8888

Then, this command will “dump” out the information on every packet that crosses the port 8888 on the machine 192.168.51.6. However, the information generated by this command is just too overwhelming; such information would fly by on the terminal screen as soon as the user consoles start interacting with the proxy. There had to be a better way to render the output of the command ‘tcpdump‘.

 TCP Dumpster

At Eucalyptus, using the automated QA system, a new, up-to-date Eucalyptus system is constantly installed and torn down within a day or two life span (check out here to see the Eucalyptus QA system in action). For this reason, we needed to come up with a quick way to set up the monitoring application on the machine where the proxy was installed. Plus, we would like to have all necessary monitoring information displayed on a single HTML page for a quick glance, thus making it easier for the observer to apply intuition on understanding the big picture. As a result, ‘tcpdumpster‘ was born.

Picture 96

The application ‘tcpdumpster‘ runs on the same machine where the proxy is installed. It runs the Linux command “tcpdump port 8888” and parses its output into a list file. This list tracks 8 attributes of the TCP traffic:

  • Unique connections, based on IP
  • Unique connections, based on Port
  • Connection count, per second
  • Connection count, averaged over a minute
  • Connection count, in total
  • Packet length, per second
  • Packet length, averaged over a minute
  • Packet length, in total

With those 8 attributes displayed on a single HTML page, which can be accessed via:

http://192.168.51.6/tcpdumpster.php

, we were able to make some interesting observations on the behaviors of the traffic as the user console starts interacting with the proxy.

TCP Dumpster Examples

The graph below is showing the traffic pattern for 7 minutes, generated by a user logged in to the user console.

Picture 18

Notice the first peak that represents the log-in of the user, followed by the periodic peaks that show the polling of the cloud resource data, and user actions can be seen in the blobs among the peaks.

The graph below is showing the traffic pattern as more selenium-based automated scripts are activated to simulate a large amount of users.

Picture 46

The first block shows when 1 and 2 Selenium scripts are active, and the second block shows when 6 and 12 Selenium scripts are active (check out here to learn how Selenium was used). When graphed for averaged over a minute, the differences between the stages become more visible:Picture 47

When graphed all together, along with the connection data, they look below:

Picture 45

tcpdumpster‘ turns out to be very useful when validating if a newly written selenium script is behaving correctly. The graph below shows the selenium script that launches a new instance, waits until the instance is running, then terminates the instance, waits a few minutes, and repeats:

Picture 81

And, of course, ‘tcpdumpster‘ is very handy when you are running a longterm test; it allows me to set up the test, go to sleep, and wait up the next day to check out the results. The graph below shows how the proxy was able to withstand the constant ‘refresh’ operations from multiple connections for longer than 5 hours:

Picture 94

Now, can you guess what is going on in the graph below?

Picture 105

Check out the GitHub link below and try out ‘tcpdumpster‘ on your own Eucalyptus user console proxy to find out for yourself:

https://github.com/eucalyptus/tcpdumpster

Pigeons on a Euca: Eucalyptus Cloud Monitoring Mobile App via Twitter

Being a system administrator is the easiest job in the building when the system is working; no one questions your presence nor existence. Your tasks are highly under-appreciated during the time of peace, yet you do not mind for such vanity since you’d rather be reading blogs and watching youTubes in serenity. Every once in a while, an idiot cracks an ancient joke, “hey, aren’t you supposed be working?”. But, I’m working, you imbecile employ-of-the-month.

However, the curse begins once you step out of the building, entering the realm of unknowns, far-disconnected from the comfort of your Macbook and Wi-Fi. While staring at endless tail-lights on a freeway, having a long walk on the beach, or being queued behind 7 shopping carts at a grocery market, your mind begins to wonder, “how’s my system doing?”

Since the day 1, you have set up numerous layers of email-notification alarms, but it’s never enough; “getting the e-mails” only means “it’s too late”. Always there is an urge of logging in. But you can’t. You are cut off. You are trapped; the lady in front of you just pulled out a checkbook while the sign clearly says “credit or cash only.” You begin to panic. You compulsively refresh emails on your smartphone, but no answers. Silence is deafening. No news is never the good news. The only exit is when the system whispers in your ear: “Have no worries. Everything is working… For now.”

Now, your concerning days are over. In the midst of the 3G wilderness, the application Pigeons on a Euca will deliver you peace and tranquilly that are comparable to those of a laptop on VPN. Of course, that is if you are equipped with a smartphone at all times and the system is running Eucalyptus Cloud.

The trick is to run a periodic Cloud monitoring app via Twitter.

Instead of being passively notified by emails when there arise problems in the system, you can set up the application Pigeons on a Euca that runs a small script that actively “tweets” the status of the cloud for you and your co-sys.admins to follow.

Screenshot of Pigeons on a Euca on iPhone

Here are the requirements:

  • Have a twitter account opened for this application.
  • Have a machine, or a virtual machine, running Linux with network capability.
  • Have the cloud admin’s credentials.
  • Have a smartphone with Twitter Client App installed.

Current (Beta) Features**:

  • In every 1 min, it tweets the status-change on running instances*.
  • In every 10 min, it tweets the number of currently running instances in the cloud*.
  • In every 10 min, it tweets the number of newly-launched instances in the cloud*.
  • In every 10 min, it tweets each availability zone information

* These features rely on the new version of euca2ools (v 2.0)

** The application is highly configurable so that more reporting can easily be added when needed.

Instructions on How to Set up the Application Pigeons on a Euca

First, you need to open a new twitter account.

1. Go to twitter.com and sign up for a new account; if you already have an account with Twitter, you are going to need a new email address to create a new twitter account for this application.

2. As soon as the new twitter account is open, check the “Protect my Tweets” box on the Tweet Privacy section so that your tweets do not accidently get broadcast to the public channel.

3. Apply for a developer account at dev.twitter.com.

4. After opening the twitter developer account, you need to create an application at dev.twitter.com.

5. Fill out the application details form. There is no requirement on what you put in the name and description boxes. For the website box, you may specify any working web URL of your choice; it won’t matter for this application.

6. After creating the application, on the “settings” page, change the access level to be “Read and Write

7. Click the button “Change this Twitter application’s settings” at the bottom of the page to apply the change. It might take a few minutes for the change to be applied.

8. On the “Details” page, verify that the access level is changed to “Read and write“. After seeing that the change has taken place, click on the button “Create my access token” at the bottom of the page.

9. Go to the page “OAuth tool” and verify the consumer key and access token are generated. You will need these keys to configure the application Pigeons on a Euca later.

10. At this point, your twitter account is configured to receive script-generated tweets from the application Pigeons on a Euca.

Second, after the twitter account is ready, you need to set up the machine where the application Pigeons on a Euca will be running on.

1. Install a perl module “Net::Twitter::Lite” on your Linux box.

You may install the module from source by visting the link:

http://search.cpan.org/~mmims/Net-Twitter-Lite-0.10004/lib/Net/Twitter/Lite.pm

Or, for UBUNTU distributions, such as Lucid, you may simply add the line:

deb http://ubuntu.mirror.cambrium.nl/ubuntu/ lucid main universe

to “/etc/apt/source.list”.

Then, install the perl module twitter-lite-perl by using the commands:

apt-get update
apt-get install libnet-twitter-lite-perl

2. Install the latest version of “euca2ools” (v 2.0)

Visit the website (http://open.eucalyptus.com/downloads) for detailed instructions on how to install the latest euca2ools.

For UBUNTU distributions, such as Lucid, you add the line:

deb http://downloads.eucalyptus.com/software/euca2ools/2.0/ubuntu lucid universe

to “/etc/apt/source.list”

Then, install euca2ools by using the commands:

apt-get update
apt-get install euca2ools

Third, after all the necessary modules and tools are installed on the machine, now you can finish setting up the application on the machine.

1. Download the tarball pigeons_on_a_euca.tar.gz from the project repository:

https://projects.eucalyptus.com/redmine/projects/pigeons-on-a-euca/files

or

https://github.com/eucalyptus/pigeons_on_a_euca

2. Untar the tarball at a directory of your choice:

tar zxvf pigeons_on_a_euca.tar.gz

3. On the “my applications” page on dev.twitter.com, copy the lines in the “OAuth tool” section, as shown in the Step 9. of the first instruction set.

4. Change the lines in the file “./pigeon_on_a_euca/pigeon_cage/key/o_auth_settings.key” with your account’s actual values.

5. Perform a quick check to validate the setups so far by running the commands:

cd ./pigeon_on_a_euca/pigeon_cage

perl ./tweet_it_away.pl ./tweets/mytest.tweet

6. Check the twitter account to verify that the line “this is a test” was tweeted. Also notice the lock sign on the tweet that indicates the security level is private.

7. After verifying the test tweet, go to the directory “./pigeons_on_a_euca/credentials” and store your Eucalyptus cloud’s admin credentials.

8. Verify that you can talk to your Eucalyptus cloud via the admin credentials by running the commands:

cd ./pigeons_on_a_euca/credentials

source eucarc

euca-describe-availability-zones verbose

9. At this point, the application is all set to run. Do a quick check by running the main script:

perl ./activate_the_pigeons.pl

10. Check the twitter account to verify that the status of instances running on the cloud are being tweeted.

11. To run the main script in the background, do:

nohup perl ./activate_the_pigeons.pl > ./stdout.log >> ./stderr.log &

12. To monitor the run:

tail -f stdout.log

Last, install any Twitter Client App on your smartphone and follow the account that you created above.

Now you have an mobile application that keeps you updated with the status of the cloud.

Warning: The amount of tweets generated by the application might be overwhelming; at its maximum rate, it will upload 350 tweets per hour. It is recommended that you and your co-workers open a separate twitter account exclusively for receiving tweets from this application.

And, if you decide to modify the script, please be aware of the hourly limit of the tweet updates, which is set to be 350 tweets per hour. Carefully limit your tweets so that the application maintains consistent tweet-ability.

Thank you for your interest in the application, and feel free to contribute and share.

Kyo

A Developer Walks through Cloud

Skip Directly to [Instruction on How to Run the Video Processing Prototype]

A Developer Walks through Cloud

IMG_1562

1. Little Phone, Big Cloud

A few months ago, a phrase caught my attention: “Instagram for Video”. It was an interesting idea for a mobile application. As a software designer, I dug into the idea, soon to realize one major implementation challenge.

IMG_1564

It turns out that video is a collection of pictures–many, many pictures. Given the standard 24 frames-per-second rate, even an one-minute-long video would be comprised of 1440 pictures, which meant image-processing of 1440 pictures on a mobile phone. That is a lot of pictures for a small battery in your mobile phone to handle.

IMG_1567

There is an alternative way to the scenario; let’s consider moving the image-processing task over to a remote machine that is bigger, stronger, and meaner. In this scenario, the mobile phone could upload the video to a server via the internet, process it remotely, and retrieve the processed video back in a seamless fashion.

IMG_1568

However, there is one absolutely-crucial requirement in this scenario; we are going to need a big, big, big machine–big enough to handle millions of requests once this killer application goes viral (go big or go home). There is only one answer to this type of demand: “the Cloud.”

Luckily, there is an open-source cloud available; Eucalyptus is an open-source Infrastructure-as-a-Service cloud platform whose APIs are compatible with the ones with Amazon’s EC2. This makes Eucalyptus an ideal in-house cloud application development platform. It guarantees that once my killer application runs on Eucalyptus, it will also run on EC2 with no modifications required, thus creating a truly portable cloud application with the world-wide deployability.
IMG_1570

2. IaaS Cloud

For those who are not familiar with the IaaS clouds, let me to take you to a quick walkthrough to the cloud.

Eucalyptus and Amazon’s EC2 offer “Infrastructure-as-a-Service” cloud platforms. It means that a cloud-user can request, “Hey cloud, I need 5 machines with full network connectivity and access to the storage,” then within minutes, the user will have the complete system ready for use.

IMG_1572

Take this concept little further; instead of requesting machines for generic purposes, the cloud-user could have specified which machines to serve as what purposes at the creation. For instance, using the example in this article, the cloud-user could have asked, “Hey cloud, I want one machine to work as a collector while the rest as image-processors, and have them process my cat video immediately!” Then, the cloud would have brought up a network of machines with the specific tasks assigned to each machine, and they would have worked on processing of the cat video right away. Once the processing was complete, the machines would have been self-terminated, leaving only the processed cat video behind.

IMG_1575

3. App on the Cloud

Let’s go back to the video-processing application on the cloud. Here I will cover some major design considerations when developing applications on the cloud.

3.1. Parallelism and Elasticity

Designing an application on a distributed system requires a process to be broken down into small tasks. Then, one must identify the tasks that can bring parallelism into the process. In this video-processing application, the process can be broken down into 3 major steps: decoding the video into images, processing the images, and encoding the processed images back to a video. Given these breakdowns, the natural approach is to distribute the image-processing task over multiple machines and assign a single machine to perform the encoding and decoding.

IMG_1577

One important characteristic of the cloud that you must realize at the core of the design is the elasticity of the cloud. The elasticity is what differentiates the cloud applications from the traditional distributed applications. Traditionally, in a distributed computing environment, the number of nodes N in the system is a static value that is unchangeable during a job. However, in the cloud environment, there is no bound in the number N, theoretically the number N is limitless. This means that at any given point during the job, the system should expect the number N to grow, or even shrink in some cases. For instance, in our video-processing application, we could initially start with 5 machines assigned to be image-processing nodes, however in the middle of the processing, we should be able to add 5 more nodes to boost the productivity. Taking such advantage of the elasticity must be considered at the design level of the application.

IMG_1579

3.2. Prototype

Following is the overview of the prototype of the video-processing application in the cloud.

For more detailed instructions, please go to the page [Instruction on How to Run the Video Processing Prototype]

The goal of the prototype is to demonstrate a cloud application that performs image-processing tasks in a distributed fashion. The application takes input of a video file, performs image-processing in parallel, and when it terminates, the processed video file is stored in a known storage location provided.

For the simplicity of the prototype, let’s assume that there is a machine that works as a file server that has an apache web-service running in the open, which is accessible from the cloud. In other words, any virtual instances(nodes) spawned on the cloud will have access to the files on the file server via download(wget). Given this setup, for instance, when we trigger the collector node, it can download the input video file from the file server to start the process.

IMG_1581

For the prototype, we need to construct two types of nodes: collector node and image-process node. However, before I go into further details, I must explain what takes places when the cloud-user requests an instance from the IasS cloud.

When the cloud-user asks the cloud, “Hey cloud, I need one machine,” the user is required to specify the image of the machine. In other words, the cloud-user must request, “Hey cloud, I need one machine with the RHEL 6.1 image that I have prepared for this video-processing prototype.” Then, the cloud will bring up a virtual instance that is flashed with the specified RHEL 6.1 image. Since users can prepare and upload images of their choices to the cloud, the possibilities are limitless on what you want the instances to do or to become.

IMG_1583

For this particular prototype, I prepared a single image that would be used by all nodes. I took a generic Ubuntu Karmic image as the base image and modified its ‘rc.local’ script, which is the default script that gets executed automatically when the image boots up. The modified ‘rc.local’ script is set to read a line from the ‘user-data’ field, which get passed to the instance from the cloud-user at the creation. This small modification allows me to control the rolls of the instances with having only one image. For example, I can request, “Hey cloud, I want one instance with my special Ubuntu image and have it run the script ‘collector.pl'”, then later, I can ask, “Hey cloud, I want another instance with the same image, but this one will run the script ‘processor.pl’.”

The requests in the example above would look like the below. Notice using the same image ID ’emi-9BD01749′, but different ‘user-data’ values (-d).

First request to bring up a collector:

euca-run-instances emi-9BD01749 -k mykey0 -n 1 -g group0 -t c1.medium -d “collector.pl”

Second request to bring up a processor:

euca-run-instances emi-9BD01749 -k mykey0 -n 1 -g group0 -t c1.medium -d “processor.pl”

IMG_1585

In the prototype, the actual requests contain more information than just a script name. The first request looks like,

euca-run-instances emi-9BD01749 -k mykey0 -n 1 -g group0 -t c1.medium -d “collector.pl 192.168.7.77 [lovemycat.avi]”

This command translates to, after the instance boots up, it downloads the specified script ‘collector.pl’ from the file server at ‘192.168.7.77’ via wget and execute the script. The purpose of the script ‘collector.pl’ is to turn the instance into the collector node for the video-processing application. First, the script installs all the necessary softwares via apt-get commands in Ubuntu; it uses various open-source softwares for the encoding and decoding tasks. It also installs the NFS server to create a shared directory where the processing nodes can access. Second, it downloads the target video file ‘lovemycat.avi’ from the file server at ‘192.168.7.77’ (for the convenience of the prototype, the file server is designed to provide all the external file resources to the instances). Then, the collector node decodes the avi file into a collection of JPEG images. These image files are stored in the shared directory opened up by the NFS server. Now, the collector node waits for the image files to be processed by the processing nodes. The collector node’s job is to periodically scan the shared directory for the progress.

IMG_1587

After the collector node enters the stage where it idles and scans, the next step is to start a group of the processing nodes by requesting,

euca-run-instances emi-9BD01749 -k mykey0 -n 3 -g group0 -t c1.medium -d “processor.pl 192.168.7.77 [10.219.1.2 neon.scm]”

As result, 3 instances will boot up, download the specified script ‘processor.pl’ from the file server at ‘192.168.7.77’, and convert themselves into the image-processing nodes. It installs the opens-source image-processing software GIMP and the NFS client. It performs NFS-mount to the shared directory of the collector node, whose IP is at ‘10.219.1.2’. Then, these processing nodes will start picking up image files from the shared directory and perform image-processing using GIMP according to the script ‘neon.scm’.

The syntax of the user-data for this image is:

-d “<script> <file_server_IP> [ <arguments_for_script> ]”.

IMG_1588

Now, here is one crucial design decision that compliments the elasticity of the cloud. The work-unit for the image-processing is set to be 20 images at a time. This means that each node is only allowed to grab a chunk of 20 images at a time to perform image-processing. Under this policy, the processing nodes must frequently inquire the collector node for a small amount of work, instead of pre-determining the complete workload for each processing node prior to the beginning of the processing. This approach allows more processing nodes to be added to the system at any moment, thus taking full advantage of the elasticity.

IMG_1590

When the processing nodes discover that there are no more images to be processed, they will be self-terminated, freeing up the computing resources for the cloud. When the collector node learns that all the images have been processed, it wakes up and encodes the images to a new video file. The final AVI file will be uploaded to the storage location belongs to the cloud-user. Eucalyptus and EC2 offer S3 storage units that allow such operation, however I will skip the details for later.

IMG_1592

This prototype demonstrates how a complex operation, such as distributed video-processing, can be automated using the cloud. However, the automation is just a tip of the iceberg for the cloud. The raw power of the cloud comes from the ability to instantly replicate the application in a massive scale across the world. Such capability of the cloud contributes to the recent booming development in Software-as-a-Service (SaaS) solutions.

IMG_1595

Extra. Links to Processed Videos

Using Invert Filter –

Using Edge Filter –

Using Motion Blur Filter –

Related. Links to Project Home Page –


%d bloggers like this: