Overview
The Elastic Processing Unit Cloud project is an implementation of the elastic Machine Provisioning for Cloud environments, based on the jclouds framework.
The general idea is that when the elastic processing unit needs to scale out, it will provision more compute resources from the cloud, and when the resources are no longer needed, the resources will be released. In addition to provisioning the compute instances, this project will also set up GigaSpaces and launch on the remote machine so that it immediately joins the GigaSpaces cluster without requiring any manual process - this is called Agentless Installation.
Elastic Processing Unit
(Excerpt from the GigaSpaces Wiki)
An Elastic Processing Unit (EPU) is a Processing Unit with additional capabilities that simplify its deployment across multiple machines. Containers and machine resources such as Memory and CPU are automatically provisioned based on Memory and CPU requirements. When a machine failure occurs, or when scale requirements change, new machines are provisioned and the Processing Unit deployment distribution is balanced automatically. The PU scale is triggered by modifying the requirements through an API call. From that point in time the EPU continuously maintains the specified capacity (indefinitely, or until the next scale trigger).
For more details about the Elastic Processing Unit see: Elastic Processing Unit at GigaSpaces
jClouds
(Excetpr from the jclouds site)
jclouds is an open source library that helps you get started in the cloud and reuse your java development skills. Our api allows you the freedom to use portable abstractions or cloud-specific features. We support many clouds including Amazon, GoGrid, Azure, vCloud, and Rackspace.
For more details about jclouds see: The jclouds Web Site
Agentless Installtion
The Agentless Installation process copies (or downloads) GigaSpaces and its dependencies to the newly started compute node, performs any required processing, and starts the GigaSpaces agent. Once the agent is up, it will join an already running GigaSpaces cluster and so will be accessible using the GigaSpaces API and tools.
How Agentless Installation Works
The installation process is often a point of some confusion, so we will start by explaining the reasoning behind this design, and how it works.
The idea behind agentless installation is to create an installation process that will work the same on all clouds. This requires that we rely on the least common denominator among clouds, rather then relying on services that may not exist on some cloud vendors. This includes:
- VM Images - these do not exist on all clouds (like bare metal clouds) and are often difficult to maintain over time and multiple versioning
- Blob Storage - may not exist on all vendors.
To meet these requirements, the installation process requires only that the newly started machine run a linux-style operating system and have SSH available. All installation and configuration are performed over SSH. Installation files can be pushed to the new node over the SSH connection or downloaded from a different location (like Amazon S3 or a software repository) by a start-up script.
We find this process is suited for a rapid development cycle - only a base Linux node is required, and all changes can be built on the development machine and pushed to the new node when it first starts up.
The installation process is as follows:
- Installer polls the target IP until the SSH port (22 by default) becomes available.
- Files are uploaded from a configurable local directory to a configurable remote directory, if the files already exist on the remote machine, they will only be copied if the local files are newer. Note that local directory may be empty, in which cases no files will be uploaded.
- If this machine is joining a running GigaSpaces cluster, the following command is executed on the remote machine, over ssh: chmod +x ~/gs-files/start-esm.sh;~/gs-files/start-esm.sh <LOCATOR_IP> agent <MACHINE_IP> &
. LOCATOR_IP is the private IP of the machine running the GIgaSpaces Lookup Service (LUS) and PRIVATE_IP is the private IP of the remote machine (useful in environments with multiple network cards)
- If this machine is the first node of a new GigaSpaces cluster, the following command is executed on the remote machine, over ssh: chmod +x ~/gs-files/start-esm.sh;~/gs-files/start-esm.sh <MACHINE_IP> lus <MACHINE_IP> & .
- If the new machine is the first in the cluster, the installer will wait until the LUS port (4166 by default) on the new machine is available. If the new machine is joining an existing cluster, the installer will wait until the new machine joins the cluster (using the Admin API)
Note that all steps include a timeout - if the timeout is exceeded, the installation process failed.
Design
This section describes the technical design of the main components of the project.
Provisioning Handler
(Link to source code in SVN?)
The main implementation class is org.openspaces.cloud.esm.CloudMachineProvisioning. It implements the org.openspaces.grid.gsm.machines.plugins.ElasticMachineProvisioning interface, described here: ElasticMachineProvisioning Javadoc. The important methods are startMachine and stopMachine - the first launches and installs a new cloud compute node, the second releases it.
Note that all cloud API calls are actually implemented by org.openspaces.cloud.jclouds.JCloudsDeployer - this is a simple facade on top of the jclouds context that only exposes the jclouds functionality required by the Provisioning Handler.
Agentless Installer
The agentless installer relies on a few key components:
- commons-vfs - used for file transfer over ssh
- ant - used for remote shell command execution, using the ant ssh task
- OpenSpaces Admin API
Note that there is nothing 'cloud' about this process - it can be reused in any system, but is especially useful in this environment.
Startup Script
The startup script (start-esm.sh) is the script that actually sets up GigaSpaces on a machine. This script is expected to be on the remote machine, either by uploading it during the installation process or because it was already there. This script is often adapted to the requirements of each environment, as clouds and operating systems differ widely in the functionality they offer.
Here is a sample of how the startup script looks on one cloud vendor:
(GET THE UPDATED VERSION FROM AWS)
#! /bin/bash
#############################################################################
# This script starts a Gigaspaces agent for use with the Gigaspaces
# Elastic Scaling Module. The node will either run a LUS, ESM and GSM
# if this is the first node, or no services other then the GSA if it is not
#
# Parameters:
# $1 - Ip of the head node that runs a LUS and ESM. May be my IP.
# $2 - 'agent' if this node should join an already running node. Otherwise,
# any value.
# $3 - The IP of this server (Useful if multiple NICs exist)
#############################################################################
export EXT_JAVA_OPTIONS="-Xmx1024m -Xms1024m"
# Some distros do not come with unzip built-in
if [ ! -f "/usr/bin/unzip" ]; then
chmod +x /opt/gs-files/unzip
chmod +x /opt/gs-files/unzipsfx
cp /opt/gs-files/unzip /usr/bin
cp /opt/gs-files/unzipsfx /usr/bin
fi
if [ ! -d "/opt/java" -o /opt/gs-files/java.zip -nt /opt/java ]; then
rm -rf /opt/java
mkdir /opt/java
unzip -q /opt/gs-files/java.zip -d /opt/java
chmod -R 777 /opt/java
fi
if [ ! -d "/opt/gigaspaces" -o /opt/gs-files/gigaspaces.zip -nt /opt/gigaspaces ]; then
rm -rf /opt/gigaspaces
mkdir /opt/gigaspaces
unzip -q /opt/gs-files/gigaspaces.zip -d /opt/gigaspaces
chmod -R 777 /opt/gigaspaces
mv /opt/gigaspaces/*/* /opt/gigaspaces
fi
if [ ! -f /opt/gigaspaces/lib/platform/esm/rackspace_esm.jar -o /opt/gs-files/esm.zip -nt /opt/gigaspaces/lib/platform/esm/rackspace_esm.jar ]; then
rm -f /opt/gigaspaces/lib/platform/esm/*
unzip -q /opt/gs-files/esm.zip -d /opt/gigaspaces/lib/platform/esm
chmod -R 777 /opt/gigaspaces/lib/platform/esm
fi
# UPDATE SETENV SCRIPT...
echo Updating environment script
cd /opt/gigaspaces/bin
sed -i '1i export JAVA_HOME=/opt/java' setenv.sh
sed -i "1i export NIC_ADDR=$3" setenv.sh
sed -i "1i export LOOKUPLOCATORS=$1" setenv.sh
# DISABLE LINUX FIREWALL
service iptables save
service iptables stop
chkconfig iptables off
# Restart network cards - This is a recurring problem on Rackspace
# service network restart
# START THE LOOKUP SERVICE AND ESM
if [ "$2" = "agent" ]; then
nohup ./gs-agent.sh gsa.lus=0 gsa.global.lus=0 gsa.gsm=0 gsa.global.gsm=0 gsa.esm=0 gsa.gsc=0 > /dev/null &
else
nohup ./gs-agent.sh gsa.lus=1 gsa.global.lus=0 gsa.gsm=1 gsa.global.gsm=0 gsa.esm=1 gsa.gsc=0 > /dev/null &
fi
exit 0
Note how the script expects to find the files it depends on in the gs-files directory, assuming they were previously uploaded there. It other cases, the script could download these files from the S3, or set up the system using a package manager (like apt, yum, and others).
Configuration
Configuration File
The provisioning handler is configured via a java properties file, available in the root of the project. The properties are explained inline
# Properties file for Elastic PU on Amazon EC2
# Username for the cloud provider
user=CLOUD_USERNAME
# Api key for the cloud provider, may be a password or a token
apiKey=CLOUD_PASSWORD
# Provider name, as defined by jclouds
provider=ec2
# The local directory where files will be copied from
localDir=C:/docBaseAws
# the remote directory where files will be copied to
remoteDir=/home/ec2-user/gs-files
# the ID if the image to use when scaling out
imageId=us-east-1/ami-76f0061f
# Optional. Minimum size of RAM to request
minRamMegabytes=
# The Hardware ID to use when scaling out
hardwareId=m1.small
# ec2 only. The ec2 security group
securityGroup=default
# ec2 only. Key pair to use when scaling out
keyPair=cloud-demo
# Optional. Path to key file to use when establishing shh connection.
keyFile=cloud-demo.pem
Network Configuration
Nodes of a GigaSpace clusters are expected to communicate directly with each other over multiple ports. This may require some network configuration, like setting up a security-group on Amazon EC2.
In addition, clients of a GigaSpace cluster will also require to connect directly with instances of a space cluster. If the client is running outside of the cloud, this may require additional network configuration. During development, it is recommended to run the client on a separate machine on the same cloud.
Examples
The project includes an example of deploying an elastic data grid on a cloud. The example is available in the class: org.openspaces.cloud.examples.ElasticProcessingUnitCloudDemo and can be executed from the command line or an IDE.
Video
Below is a recording of this handler in action.
TODO
Future Directions
The following features are planned for a future release of this project
Windows Support
Support Windows as well as Linux for compute node instances. Will require supporting windows file sharing (SMB) and windows based remote execution (possible secure telnet)
Highly Available LUS
In any production setting, you should always have two instances of the GigaSpaces services, including the Lookup service and GSM. The current example only deploys one instance, for demonstration purposes. An update should allow a user to specify how many 'head' nodes should be deployed initially.