Dr. Hossein Eslambolchi
March 2012
Grid computing architecture is a collaborative, network-based model that enables the sharing of computing resources: data, applications, storage and computing cycles. Distributed computing serves as the core of grid computing. Using IP-based networks, a grid links hundreds or thousands of servers and desktop computers into a supercomputing engine capable of delivering massive amounts of computational power and other IT resources.
KEY POINTS
• Grid computing creates a distributed network of computers so that applications can share resources.
• In general, sharing can occur over diverse operating systems (e.g. Linux, Solaris, UNIX and Windows).
• The sharing involved in grid technologies is not primarily file exchange, but rather direct access to computers, storage, software, data and other resources.
• Some supporters – including HP, IBM, Sun Microsystems and Platform Computing – now say that grid computing is ready for primetime. But serious limits to the technology remain.
• Major hurdles for grid computing include nonfunctional system attributes such as security, reliability, execution latency, etc. Likewise, grid computing requires a communications infrastructure that can support large-scale sharing of data.
• Technical and economic concerns, including end-to-end security and usage metering, need to be addressed before widespread adoption is possible.
• Traditional business software is generally poorly suited for grid computing. However, new versions are being developed to harness grid power. The licensing of grid application business software is still an open issue.
• Some industries have successfully adopted grid computing technology. The financial services industry uses it for derivatives analysis, statistical analysts and portfolio risk analysis. The insurance industry uses it for certain tasks. Life sciences use grid technology to carry out cancer research and protein sequencing and folding.
THE TECHNOLOGY
The current grid computing model emphasizes the sharing of computational cycles, and is tailored to compute-intensive and parallelizable applications. As this model evolves, it will allow systems to share other resources, such as storage, data and software. This evolution will increase the requirement for high bandwidth communications across the grid – and increase its relevance as a possible source of revenue.
Grids create a distributed network of computers that share resources over a heterogeneous set of systems, pooling resources so that many computers can share work and conveniently access remote resources. There is a clear need for grid computing in drug design, geophysical prospecting, and mechanical engineering. Grid computing harnesses the idle time of hundreds or thousands of servers that can be rented out to anyone who needs massive processing power.
In an enterprise, servers often sit idle, with only 10 percent to 20 percent of servers utilized. Personal computer resources are even less utilized – about 1 or 2 percent of these resources are used on average. This means that grid computing can leverage significant amounts of idle enterprise resources.
By pooling IT assets across servers, storage systems, computational resources, networks and data centers, grids may help reduce IT complexity. Resources can be quickly allocated, used as needed, and reallocated to address the changing infrastructure needs of an enterprise. As a consequence, fewer boxes and fewer management systems are required to meet overall computing needs. In this model, grid technologies serve as the infrastructure for utility or “on-demand” computing.
Advocates like HP, IBM, Sun Microsystems and Platform Computing believe that grid computing is ready for primetime, but there are still many limits to what current grid computing architectures can accomplish in a business environment. Grid computing is still in the early adoption phase, and pieces of the infrastructure are still being developed. It will take a few years before grids are deployed on a large scale.
Major issues related to grid computing include nonfunctional system attributes. Security, reliability, execution and latency all pose challenges.
The necessary communications infrastructure to support large-scale grid computing has not yet been developed. Security is a prime concern; it may be difficult to talk corporate buyers into investing in a technology that seems to provide outsiders with access to their servers. Sophisticated resource management is also necessary in the architecture of the grid. Network limitations will not hamper the ability of users to access computational power, but these limitations will be a constraining factor in the final development of complete resource-sharing models. Finally, broad adoption of grid computing depends on solving technical and economic concerns, including end-to-end security and usage metering.
Another challenge is posed by traditional business software, which is not tailored to support the grid model. It is crucial for grids to have interoperability standards that accommodate components from different vendors. XML is beginning to play an important role in solving this problem.
Real-world examples of grid deployments include:
• Hartford Life is using a grid network to handle intensive financial analytics, measuring market conditions and market behavior.
• Entelos, a biotechnology firm in California, uses a grid structure to speed the process of drug discovery. It can run simulations in a matter of hours or days with its network.
• Wachovia’s fixed-income derivatives trading unit runs trading simulations on a grid network to reduce risks and enable the firm to make larger trades.
• Cadence Design Systems integrates grids into every aspect of its production environment. In particular, it incorporates grid technology into its software development and chip design processes.
• Johnson & Johnson uses grid technology to run powerful applications that model clinical trials of pharmaceuticals.
• Pharmaceutical giant Novartis has linked nearly 3,000 of its researchers’ personal computers in a grid that delivers more than 5 teraflops of computing power. This enables their researchers to examine bigger data sets with greater precision and to target new problems.
• Bank One is using grid middleware technology to distribute risk-analytics processing. It aims to cut hardware costs while increasing performance of analytics for its interest-rate derivatives trading business.
THE PLAYERS
Leading vendors in grid computing include:
• DataSynapse offers its Grid Server software for commercial applications. Grid Server creates a self-managed grid computing infrastructure.
• HP has endorsed grid technology in a big way. Their current support includes grid-enabled systems running HP-UX, Linux and Tru64 UNIX using the Globus Toolkit for HP Platforms.
• IBM offers grid products based on DataSynapse middleware for the banking and financial industries, including risk management and customer analytics products. The company has landed hundreds of millions of dollars in contracts to build grid infrastructures for universities and governments. IBM has relationships with leading grid tool and application providers.
• Oracle provides substantial grid computing technology. The Oracle Database 10g leverages grid-enabling hardware innovations and automatically provisions clustered storage and servers to different databases running in a grid.
• Platform Computing has developed distributed computing software since 1992. Its commercial software, the LSF line, is running in about 1,500 of the Fortune 2000 companies.
Leading grid standardization bodies include:
• Globus Consortium – This non-profit organization was launched by IBM, HP, Intel and Sun Microsystems to champion open-source grid technologies in the enterprise. It aims to advance the use of the Globus Toolkit in the enterprise by drawing together the resources of interested parties throughout the community, including vendors, enterprise IT groups and open source developers.
• Globus Alliance – This international organization conducts research and development to create fundamental grid technologies. Its members contribute to the development of the Globus Toolkit.
• Global Grid Forum – An international organization working to address grid architecture, infrastructure, and standards issues. Community-initiated working groups carry out the work of GGF, developing best practices and specifications.
• Organization for the Advancement of Structured Information (OASIS) – This international consortium drives the development, convergence and adoption of e-business standards. The web services developed by OASIS now form the foundation for the grid computing standards developed by Global Grid Forum.
• Enterprise Grid Alliance – This non-profit group was founded in April 2004 to develop enterprise grid solutions and accelerate the development of grid computing in enterprises. It addresses the near-term requirements for deploying commercial applications in a grid environment. Initial focus areas include reference models, provisioning, security and accounting. The group unveiled its first reference model in May 2008.
• Open Grid Services Architecture (OGSA) is a product of the grid community at large, and is a major focal point of the Global Grid Forum. It represents an evolution toward a grid system architecture based on Web Services concepts and technologies. OGSA defines mechanisms to create, manage and exchange information between grid services, a special type of web service.
Notable projects include:
• Condor – The goal of this project is to develop, implement, deploy and evaluate mechanisms and policies that support high throughput computing on large collections of distributively owned computing resources. Condor is a specialized workload management system for compute-intensive jobs. Like other full-featured batch systems, Condor provides a job queuing mechanism, scheduling policy, priority scheme, and resource monitoring and resource management. The Condor Team continues to build software tools that enable scientists and engineers to increase their computing throughput.
• TeraGrid – This project is an open scientific discovery infrastructure combining resources at eight partner sites to create an integrated, persistent computational resource. It is coordinated through the Grid Infrastructure Group at the University of Chicago, working in partnership with the Resource Provider sites that participated in the TeraGrid construction project. Deployment was completed in September 2004, bringing more than 40 teraflops of computing power, nearly 4 petabytes of rotating storage, and specialized data analysis and visualization resources into production. A dedicated national network interconnects the whole at 10-30 gigabits per second.
• NEES – NEES’s grid links earthquake researchers across the United States with computing resources and research equipment, allowing collaborative teams to plan, perform and publish their experiments. It is maintained by the NEES consortium.
• Bioinformatics Information Research Network – BIRN is a National Institutes of Health initiative that fosters distributed collaborations in biomedical science. The BIRN data grid is used for computational and data sharing.
POTENTIAL IMPACTS
Grid computing promises to deliver to the computing world what it successfully delivered to the networking world: an information technology service based on distributed architecture that is ubiquitous, flexible, and always available. This promise cannot be realized without a network able to support and link the various elements that make up the grid – in a secure, reliable, scalable and billable way. The need for such a network – and higher level associated services or capabilities that may exploit the network – represent the key opportunities for grid technologies.