Cloud Computing: Programming Support and Emerging Software Environments
Programming Support of Google App Engine
This section introduces the Google App Engine infrastructure. We describe the programming model supported by GAE. The access links to the GAE platform are provided.
a) Programming the Google App Engine
Several web resources and specific books and articles discuss how to program GAE. We summarize some key features of GAE programming model for two supported languages: Java and Python. A client environment that includes an Eclipse plug-in for Java allows you to debug your GAE on your local machine. Also, the GWT Google Web Toolkit is available for Java web application developers. Developers can use this, or any other language using a JVM based interpreter or compiler, such as JavaScript or Ruby.
b) Google File System (GFS)
GFS was built primarily as the fundamental storage service for Google’s search engine. As the size of the web data that was crawled and saved was quite substantial, Google needed a distributed file system to redundantly store massive amounts of data on cheap and unreliable computers. None of the traditional distributed file systems can provide such functions and hold such large amounts of data. In addition, GFS was designed for Google applications, and Google applications were built for GFS. In traditional file system design, such a philosophy is not attractive, as there should be a clear interface between applications and the file system, such as a POSIX interface.
c) Bigtable, Google’s NOSQL System
In this section, we continue discussing key technologies in the Google cloud environment. We already discussed the most well-known Google technology, MapReduce, and Sawzall. Bigtable was designed to provide a service for storing and retrieving structured and semi structured data. Bigtable applications include storage of web pages, per-user data, and geographic locations. Per-user data has information for a specific user and includes such data as user preference settings, recent queries/search results, and the user’s e-mails. Geographic locations are used in Google ’s well-known Google Earth software. Geographic locations include physical entities (shops, restaurants, etc.), roads, satellite image data, and user annotations.
d) Chubby, Google’s Distributed Lock Service
Chubby is intended to provide a coarse-grained locking service. It can store small files inside Chubby storage which provides a simple namespace as a file system tree. The files stored in Chubby are quite small compared to the huge files in GFS. Based on the Paxos agreement protocol, the Chubby system can be quite reliable despite the failure of any member node. Each Chubby cell has five servers inside. Each server in the cell has the same file system namespace.
Programming on Amazon AWS and Microsoft Azure
In this section, we will consider the programming support in the AWS platform. First we will review the AWS platform and its updated service offerings. Simple DB services with programming examples. Returning to the programming environment features, Amazon (like Azure) offers a Relational Database Service (RDS) with a messaging interface sketched. The Elastic MapReduce capability is equivalent to Hadoop running on the basic EC2 offering. Amazon has NOSQL support in SimpleDB.
● Programming on Amazon EC2
Amazon was the first company to introduce VMs in application hosting. Customers can rent VMs instead of physical machines to run their own applications. By using VMs, customers can load any software of their choice. The elastic feature of such a service is that a customer can create, launch, and terminate server instances as needed, paying by the hour for active servers. Amazon provides several types of preinstalled VMs. Instances are often called Amazon Machine Images (AMIs) which are preconfigured with operating systems based on Linux or Windows, and additional software.
a) Amazon Simple Storage Service (S3)
Amazon S3 provides a simple web services interface that can be used to store and retrieve any amount of data, at any time, from anywhere on the web. S3 provides the object-oriented storage service for users. Users can access their objects through Simple Object Access Protocol (SOAP) with either browsers or other client programs which support SOAP. SQS is responsible for ensuring a reliable message service between two processes, even if the receiver processes are not running.
b) Amazon Elastic Block Store: (EBS) and SimpleDB
The Elastic Block Store (EBS) provides the volume block interface for saving and restoring the virtual images of EC2 instances. Traditional EC2 instances will be destroyed after use. The status of EC2 can now be saved in the EBS system after the machine is shut down. Users can use EBS to save persistent data and mount to the running instances of EC2. Note that S3 is “Storage as a Service” with a messaging interface. EBS is analogous to a distributed file system accessed by traditional OS disk access mechanisms. EBS allows you to create storage volumes from 1GB to 1TB that can be mounted as EC2 instances.
c) Microsoft Azure Programming Support
We describe the programming model in more detail. Some key programming components, including the client development environment, SQLAzure, and the rich storage and programming subsystems. First we have the underlying Azure fabric consisting of virtualized hardware together with a sophisticated control environment implementing dynamic assignment of resources and fault tolerance. This implements domain name system (DNS) and monitoring capabilities. Automated service management allows service models to be defined by an XML template and multiple service copies to be instantiated on request.
Emerging Cloud Software Environments
We will assess popular cloud operating systems and emerging software environments. We cover the open source Eucalyptus and Nimbus, then examine OpenNebula, Sector/ Sphere, and Open Stack. We will also cover the Aneka cloud programming tools recently developed at the University of Melbourne.
a) Open Source Eucalyptus and Nimbus
Eucalyptus
Eucalyptus is a product from Eucalyptus Systems (www.eucalyptus.com) that was developed out of a research project at the University of California, Santa Barbara. Eucalyptus was initially aimed at bringing the cloud computing paradigm to academic supercomputers and clusters. Eucalyptus provides an AWS-compliant EC2-based web service interface for interacting with the cloud service. Additionally, Eucalyptus provides services, such as the AWS-compliant Walrus, and a user interface for managing users and images.
● Eucalyptus Architecture
The Eucalyptus system is an open software environment. The architecture was presented in a Eucalyptus white paper. The system supports cloud programmers in VM image management as follows. Essentially, the system has been extended to support the development Of both the computer cloud and storage cloud.
● VM Image Management
Eucalyptus takes many design queues from Amazon’s EC2, and its image management system is no different. Eucalyptus stores images in Walrus, the block storage system that is analogous to the Amazon S3 service. As such, any user can bundle her own root file system, and upload and then register this image and link it with a particular kernel and ramdisk image. This image is uploaded into a user-defined bucket within Walrus, and can be retrieved anytime from any availability zone.
Nimbus
Nimbus is a set of open source tools that together provide an IaaS cloud computing solution the architecture of Nimbus, which allows a client to lease remote resources by deploying VMs on those resources and configuring them to represent the environment desired.
b) OpenNebula, Sector/Sphere, and OpenStack
● Open Nebula
Open Nebula is an open source toolkit which allows users to transform existing infrastructure into an IaaS cloud with cloud-like interfaces. The architecture of Open Nebula has been designed to be flexible and modular to allow integration with different storage and network infrastructure configurations, and hypervisor technologies.
● Sector/Sphere
Sector/Sphere is a software platform that supports very large distributed data storage and simplified distributed data processing over large clusters of commodity computers, either within a data center or across multiple data centers. The system consists of the Sector distributed file system and the Sphere parallel data processing framework. Sector is a distributed file system (DFS) that can be deployed over a wide area and allows users to manage large data sets from any location with a highspeed network connection. The fault tolerance is implemented by replicating data in the file system and managing the replicas.
● OpenStack
: was been introduced by Rackspace and NASA in July 2010. The project is building an open source community spanning technologists, developers, researchers, and industry to share resources and technologies with the goal of creating a massively scalable and secure cloud infrastructure. In the tradition of other open source projects, the software is open source and limited to just open source APIs such as Amazon.