Introduction
With the emergence of grid economy in the world, there is a growing requirement to enhance current billing models, and to add more models for measuring grid resources utilization upon which customers are charged.
Most existing middleware grid implementations provide basic pricing components, and very basic billing capabilities. Furthermore, they do not provide standard billing APIs (Grid Detailed utilization Record - GDR), which can be adopted as a standard by Telco and utility billing-system providers. Telco and utility billing stacks are based on the requirement to monitor usage of services, and store accounting information of services accessed and consequent computing resources utilization. This information enables pricing for service access, as well as charging and billing users that access these services. This is very similar to the potential grid billing systems.
Obtaining well-defined and standard billing APIs (GDR) is a key in encouraging Telco and utilities billing component suppliers to expand their current offering, and support the emerging gird economy.
In this document, we will define an API and Grid Detailed utilization Record (GDR) that can be used as a standard for billing systems and as a base for grid monitoring. We will define several levels of utilization measuring, beginning from overall machine utilization, through process-level utilization, to a single message / transaction utilization.
In addition, we will develop an implementation that demonstrates the API usage.
Purpose
In a distributed environment like a computing Grid, the ability to find information regarding which resources are available is crucial. This information might regard which computers are available to run jobs on various sites, including their current load and what software is available. Information might also be needed by mass
data-storage facitilites, including the current status and maximum size and number of stored files. It is also crucial to monitor the progress of jobs, especially since the user usually doesn't know where the job is going to be executed in the Grid environment.
Overview
This document will define Grid Detailed utilization Record (GDR), a new standard for grid monitoring and billing, including APIs for major languages and platforms.
Implementing grid monitoring on top of Gigaspaces middleware allows you to:
Collect and record HW utilization parameters (Memory, CPU)
Collect and record application utilization parameters (read, write, transactions, messages/size, throughput)
Obtain real-time monitoring capabilities over Gigaspaces ultra-fast middleware
Support database persistency
Provide monitoring and accounting to SLA-based services
The GigaSpaces-Ganglia integration provides a scalable distributed monitoring system for high-performance computing systems like clusters and grids. It is based on a hierarchical design, targeted at federations of clusters.
Software Dependency
Platform Dependency
The current GDR implementation supports any Unix platform.
Installation
- Install JDK 1.4+
- Install Ganglia on every host
Architecture
The GDR monitoring API provides the following levels of runtime information - HostRuntime, ProcessRuntime and a transaction monitoring API.
HostRuntime - contains host resource utilization (totals per host - CPU, RAM, network,etc.)
Embedded processing running within GigaSpaces XAP server collects the host utilization from each host participating in the Grid, and caches it in the GigaSpaces server as a POJO named IHostRuntime:

ProcessRuntime - an agent running on a desired machine collects utilization information (CPU, RAM) for all processes running on this host, and caches it in the GigaSpaces server as a POJO named IProcess.

Getting Started with Master AMI and Nodes
Running Master AMI
- Run: ec2-run-instances ami-b30feada -k your-keypair -t m1.large -d "master"
- Logon to the AMI and run the ". start.sh" script.
Running the Nodes
- Run: ec2-run-instances ami-b30feada -k your-keypair -n number-of-machines -t m1.large -d "master-machine-internal-host-name"
- Logon to each of the AMIs and run the " . start.sh" script.
- To generate transaction monitoring data, run:
GridEcon/bin/startTxnTest.sh
- To get the report, in the master AMI, run: GridEcon/bin/reporter.sh -all
Retrieving GDR Runtime Information
It is possible to retrieve all cached GDR runtime information in two ways: via CLI, and directly, using the reporter API.
CLI Reporter
Run: GridEconRootDir/bin/reporter.sh -all
The HostRuntime output is as follows:
*******************
* GDR REPORTER *
*******************
Total Host Runtime=28
Hostname: 192.168.8.40
Machine Type: x86_64
OS Name: Linux
OS Release: 2.6.9-34.ELsmp
CPU Total: 4
CPU Speed: 2613 MHz
CPU Idle: 31.4%
Memory Total: 4045848 KB
Memory Free: 19796 KB
Disk Total: 199.674 GB
Disk Free: 73.154 GB
Swap Total: 4194296 KB
Swap Free: 1749596 KB
Bytes In: 12229468.00 bytes/sec
Packets out: 5413.92 packets/sec
Hostname: pc-lab9.gspaces.com
Machine Type: x86_64
OS Name: Linux
OS Release: 2.4.21-15.EL
CPU Total: 4
CPU Speed: 3600 MHz
CPU Idle: 27.8%
Memory Total: 4058164 KB
Memory Free: 2624800 KB
Disk Total: 44.470 GB
Disk Free: 38.512 GB
Swap Total: 2096472 KB
Swap Free: 2096472 KB
Bytes In: 2751757.00 bytes/sec
Packets out: 6480.83 packets/sec
Process Runtime Output
Total Running Processes=3
Hostname : pc-lab32
User : igor
ProcessID : 5839
Command : java
CPU : 8.8%
Memory : 629312 KB
Virtual Memory: 711424 bytes
Avarage :
CPU : 28.68%
Memory : 629184 KB
Hostname : pc-lab32
User : igor
ProcessID : 5507
Command : java
CPU : 15.4%
Memory : 667648 KB
Virtual Memory: 749760 bytes
Avarage :
CPU : 18.31%
Memory : 667034 KB
Hostname : pc-lab32
User : igor
ProcessID : 6116
Command : java
CPU : 242%
Memory : 626432 KB
Virtual Memory: 708544 bytes
Avarage :
CPU : 242%
Memory : 626432 KB
Transaction Runtime Output
Total User Transactions=4
GDRTransaction:
UserId: Robert
ID: Process Robert Txn
Total: 1
GDRTransaction:
UserId: Robert
ID: Starting Robert Txn
Total: 1
GDRTransaction:
UserId: RobertID: End Robert Txn
Total: 1
GDRTransaction:
UserId: Eugene
ID: Start Eugene Txn
Total: 1
GDR Data Model
IHost interface
public interface IHost
{
public String getHostname();
public String getHostIP();
public String getCPUNum();
public String getCPUSpeed();
public String getOSName();
public String getOSRelease();
public String getMachineType();
public List<IProcess> getProcesses();
}
IHostRuntime interface description table
package org.gridecon.gdr.model;
public interface IHostRuntime extends IHost
| Method name |
Type |
Description |
Data Example |
| getHostname() |
String |
Host name machine |
pc-rose |
| getHostIP() |
String |
Host IP |
192.168.8.40 |
| getCPUNum() |
String |
Number of CPUs |
4 |
| getCPUSpeed() |
String |
CPU Speed |
2613 MHz |
| getOSName() |
String |
OS name(Linux,Windows,Mac) |
Linux |
| getOSRelease() |
String |
OS release date |
2.6.9-34.ELsm |
| getMachineType() |
String |
Machine type |
x86_64 |
| getBytesIn() |
String |
Total input bytes/second |
12229468.00 bytes/sec |
| getPacketsOut() |
String |
Total output packets/sec |
5413.92 packets/sec |
| getCPUIdle() |
String |
Total idle CPU |
31.40% |
| getDiskFree() |
String |
Total free disk in GB |
73.154 GB |
| getDiskTotal() |
String |
Total capacity disk in GB |
199.674 GB |
| getMemFree() |
String |
Total free memory in KB |
19796 KB |
| getMemTotal() |
String |
Total memory in KB |
4045848 KB |
| getSwapFree() |
String |
Total free swap in KB |
1749596 KB |
| getSwapTotal() |
String |
Total swap |
4194296 KB |
IProcess interface
*package org.gridecon.gdr.model;*
public interface IProcess
{
public String getUser();
public String getProcessID();
public String getCommand();
public String getMemory();
public String getCPU();
public String getAvrCPU();
public String getAvrMemory();
}
IProcess interface description table:
| Method name |
Type |
Description |
Data example |
| getUser() |
String |
The user name of running process |
Alex |
| getProcessID() |
String |
The process ID |
5507 |
| getCommand() |
String |
The command name |
java |
| getMemory() |
String |
Consumed memory |
667648 KB |
| getCPU() |
String |
Consumed CPU |
15.40% |
| getAvrCPU() |
String |
Average CPU so far |
18.31% |
| getAvrMemory() |
String |
Average Memory so far |
667034 KB |
Transaction interface
public interface Transaction
{
/** @return transaction id */
public String getId();
/** @return the username this transaction belongs to */
public String getUserId();
/** @return total committed transaction with the same id */
public AtomicInteger getTotal();
/** commit transaction */
public void commit();
/** rollback transaction */
public void rollback();
/** create new instance of nested transaction */
public Transaction newTxn( String id );
}
Transaction interface description table:
| Method name |
Type |
Description |
Data example |
| getId() |
String |
Transaction ID |
"Start connection" |
| getUserID() |
String |
The username this transaction created by Alex |
| getTotal() |
String |
Total committed transactions with the same ID 15 |
GDR Reporter API with example:
XAPGDRConfig config = new XAPGDRConfig(XAPGDRAdapter.class.getName() );
/* ex: rmi:config.setSpaceURL(spaceURL);
/* get GDR Runtime */
GDRruntime runtime = GDRruntime.getRuntime( config );
IGDRAdapter adapter = runtime.getGDRAdapter();
/* get GDRReporter from adapter */
IGDRReporter reporter = adapter.getReporter();
/* get and print all transactions */
List<Transaction> allTxn = reporter.getTransactions();
printTxn( allTxn );
/* print all transactions belongs to Robert user */
List<Transaction> robTxn = reporter.getTransactions("Robert");
printTxn( robTxn );
/* print all transactions belongs to Robert & Eugene users */
List<Transaction> robEugTxn = reporter.getTransactions("Robert", "Eugene");
printTxn( robEugTxn );
/* print all host runtime */
List<IHostRuntime> hostRuntime = reporter.getHostRuntimes();
/* print processes runtime */
List<IProcess> processRuntime = reporter.getProcesses();
static public void printTxn( List<Transaction> xtnList )
{
for( Transaction txn : xtnList )
System.out.println( txn );
}
Interfaces and POJO Description
h3. Report Interface:
public interface IGDRReporter
{
/** generate report with different generic keys like -all -pr -txn */
public void generate( String...type );
/** @return list of all running processes */
public List<IProcess> getProcesses();
/** @return list of all available HostRuntime */
public List<IHostRuntime> getHostRuntimes();
/**
* Return all committed transaction belongs to the list of supplied users.
*
* @param userId list of users.
* @return list of all committed transactions belongs to the supplied userId
**/
public List<Transaction> getTransactions( String...userId );
}
Transaction API
public interface Transaction
{
/** @return transaction id */
public String getId();
/** @return the username this transaction belongs to */
public String getUserId();
/** @return total committed transaction with the same id */
public AtomicInteger getTotal();
/** commit transaction */
public void commit();
/** rollback transaction */
public void rollback();
/** create new instance of nested transaction */
public Transaction newTxn( String id );
}
Transaction Example:
/* get transaction manager for Robert user */
TransactionManager txnMng = adapter.getTxnManager("Robert");
Transaction txn1 = txnMng.newTxn("Starting Robert Txn");
Transaction txn2 = txnMng.newTxn("Process Robert Txn");
/* open new nested transaction from txn2 */
txn2.newTxn("End Robert Txn");
txn1.commit();
/* commit txn2 + txn3 nested transaction */
txn2.commit();
Installation Instructions
Download and extract GDRInstall.zip
Setup Environment (Classpath, System Properties)
GridEcon/bin - contains 5 main GridEcon project scripts:
- set-ge-env.sh - contains environment variables
- proc-mon.sh - process monitor agent
- Reporter.sh - GDR reporter
- startTxnTest.sh - transaction example
- start-ganglia-monitor.sh - standalone ganglia monitoring application with spaceURL argument and ganglia host:port
To start working with the GDR API, just call the set-ge-env.sh script, and use the $COMMAND_LINE USER_MAIN_CLASS variable.
This is the easiest way to inherit a classpath, system properties, and more from the set-ge-env.sh script.
For example:
#!/bin/bash
. set-ge-env.sh
${COMMAND_LINE} org.gridecon.gdr.xap.XAPReporter ${SPACE_URL} $*
Running in standalone mode without Amazon EC2
Run the following scripts by the following order:
1. start-ganglia-monitor.sh ganglia-host:8846
2. proc-mon.sh
3. reporter.sh -all
NOTE: start-ganglia-monitor.sh receives as argument gmon with port, you can pass any host of gmon agent.