DevOps Culture — Fail Fast on Eucalyptus
At a meetup event down in San Diego, California, Eucalyptus had a chance to meet Sander van Zoest (@svanzoest), the VP of technology at OneHealth (http://www.onehealth.com/), who is also the organizer of the San Diego DevOps group (http://www.meetup.com/sddevops/). Sander and his team at OneHealth have been using Eucalyptus cloud for some time. Asked why OneHealth runs Eucalyptus in-house, Sander had some interesting stories to say about dealing with health-related data and the company’s DevOps engineering culture.
Due to the strict regulations on Protected Health Information (PHI), OneHealth needs to take extra strong measures if they are to provide the services on AWS; Sander spent a good amount of time explaining to us how demanding it is to satisfy the regulations. Such barriers make things complicated to push any personal identifiable health information to the cloud.
For the AWS case, the very specific barrier was that AWS provides no legal protection when storing sensitive data in the cloud storage space. For instance, it is required by HIPAA and HITECH regulations that One Health needs to be able to promise a 72-hour response time to inform their customers about the breach of the data, should it ever happen, and provide an ETA to identify and patch the security hole that caused the breach.
Sander points out that at the moment, AWS does not guarantee such protections/services. For this reason, OneHealth’s production environment is deployed at Rackspace’s co-location since it provides HIPAA Business Associate Addendums. However, it is noted that given the evolving nature of the public cloud, it is very “cloudy” to predict how things are going to change in the near future. The recent announcement by AWS on CloudHSM (http://aws.typepad.com/aws/2013/03/aws-cloud-hsm-secure-key-storage-and-cryptographic-operations.html) — although it doesn’t cover the legal protection — is a good indicator showing AWS’s interest in providing secure storage service as moving forward.
What this uncertain, “cloudy” future means for engineering at OneHealth is employing a variety of infrastructure environments to take advantage of each platform while staying flexible. It becomes essential to design OneHealth’s services and applications to be deployable on bare-metal systems at Rackspace (production environment), AWS (sandbox/staging environment), Eucalyptus (in-house continuous integration and testing environment), and engineers’ laptops using Vagrant (development and testing environment). (http://www.vagrantup.com/).
Under such heterogeneous systems, from its production down to the engineer’s laptop, the development environment — the OS, dependencies, configurations, etc — needs to be kept uniform via virtualization and automation, allowing seamless pushing of new code from the laptop up to the production. For handling the life cycle of machines and VM instances, the engineers at OneHealth are big fans of Chef (http://www.opscode.com/chef/), which makes the configuration management portable on any infrastructure platforms. For virtual machines, the instance images are prepared via debian preseed files while leveraging a open source tool VeeWee (https://github.com/jedi4ever/veewee).
At OneHealth, the philosophy of DevOps is deeply embedded in every aspect of its development and operation. The concept of DevOps was not new to many engineers who brought in the ideas of “Infrastructure as Code” and “Commit Often and Fail Fast” from previous companies such as MP3.com and Joost.
Speaking of DevOps culture, one fun fact Sander mentioned — which goes against intuition for many traditional IT shops — was that the operation team at OneHealth likes to take down the instances and rebuild them regularly. The recycling of the instances ensures the “freshness” of the deployed services and applications. The operation engineers should be more concerned if an instance’s uptime was longer than, say, 30 days because it meant that the content of the instance was outdated, possibly containing unfixed bugs or security issues. If the deployment setup was doing what it was supposed to be doing, then it should have killed the outdated instance and brought up a new instance with the latest updates.
The same goes for the development environment. It would be much better to refresh the dev environment instances with frequent relaunching and reconstructing than having the developers working on a stale dev environment, which turns out to be more harmful for the development. Plus, this destroy-and-rebuild enforcement encourages the developers to consistently check in the code to a version-controlled code repository, allowing early detection of conflicts in code.
All of these procedures, bringing together datacenter automation and configuration management, are part of a very new movement in software development now labeled as “DevOps”. The DevOps folks often joke around and say even a few years ago, the terminology didn’t even exist, but now, DevOps has become the most sought-after practice in IT. All thanks to the wide spread of cloud computing, giving birth to the programmable infrastructure.