Grids, Clusters, Virtualized Environment, and All of That1, Page 2
This is where Virtualization comes to the rescue. The idea is that you might not be able to have 99.999% reliability at the node level, but can you achieve this level of reliability at the system level across multiple nodes? Distributed virtual environments have the capability to add reliability atop of unreliable environments.
Figure 1: Added Reliability of a Virtualized Environment
Interconnected nodes are rather unreliable. If a job is running on a node, and the node crashes, your job is left in an unknown state at best. Due to the fact that Virtualization adds a layer of indirection for you to access your environment, it can add a certain level of reliability to the environment. I have shown this in Figure 1, with the color red representing a job that is failed due to a box crashing. The virtualized environment reschedules the same job on another node (green) and job continues. You might not be able to recover your work (no check-pointing), but you can be assured that your job will be completed in face of a failure.
The added layer provided by the virtualization software can essentially keep the job (yellow) in an uncompleted state until the result returns and the job can be marked as such. There are a number of side effects to this, as you can imagine. You are adding another component to the picture and it can fail; the added component adds delay to the system, which may not be acceptable; and so forth.... The good thing is that the software or hardware packages that you get today to assist you with this problem have already tried to answer these shortcomings of virtualization with methods such as redundant scheduling where the scheduler schedules the task on multiple nodes simultaneously, synchronized node managers where a primary and backup node are in sync and if one fails the other takes over, reliable interconnects, and many other techniques.
... but Now What?
What you see today in the industry is that many emerging technologies surrounding interconnects, processing technologies, virtualized environment, and so on, are moving toward more coupled environments. The reason for this is rather simple: Web services, standardized communication protocols, open Operating Systems, open source projects, and many other factors have led this "open coupled system" movement. During much of the 70s thru 90s, performance was equated with proprietary systems such IBM's mainframe, Sun's powerful workstations, HP's servers, and the like. But now, the same systems are open: Sun's release of Solaris into open source, IBM's introduction of Linux on the mainframe, and HP's desire to add more to commodity processors are only some the trends that you have seen over the past decade. Many up and coming vendors are taking advantage of this situation as well. If you are able to conform to open standards and provide a common interface to our users, you can take advantage of innovative ways of providing environments that can meet the needs and complexities of the business.
This article covered many of the terms you hear today in relation to High Performance Computing (HPC). You need to spend a ton of time realizing what I covered in Table 1 because every single project that you come across will focus on one or more these classes. You need to realize the problem that you are tackling; otherwise, you will be spinning your wheels and money solving a problem that was not a problem to begin with.
The HPC industry has gone through many changes over the years, and it is now stronger and more important than ever. It is certain that the need for more computing power is increasing, but achieving this power at a cost that won't break your back or your wallet is the key.
... till next time
- For those of you who suffered like I did through Vector Calculus, this title might seem very familiar to you. For those who were lucky enough to not have to take such a class, you should still check out Div, grad, curl, and all that by H. M. Schey; it was a life saver.
- It's funny to use the term "classical" when you speak about the computers and IT, granted that the industry itself is less than 100 years old, and that goes to show the fast growth of this industry.
About the Author
Mr. Sedighi is currently the CTO and founder of SoftModule. SoftModule is a startup company with engineering and development offices in Boston and Tel-Aviv and a Sales and Management office in New York. He is also the Chief Architect for SoftModule's Grid Appliance product that has risen from a current need in the market to manage excessive demands of computing power at a lower cost.
Before SoftModule, Mr. Sedighi held a Senior Consulting Engineer position at DataSynapse, where he designed and implemented Grid and Distributed Computing fabrics for the Fortune 500. Before DataSynapse, Mr. Sedighi spent a number of years at TIBCO Software, where he implemented high-speed messaging solutions for organizations such as the New York Stock Exchange, UBS, Credit Suisse, US Department of Energy, US Department of Defense, and many others. Mr. Sedighi received his BS in Electrical Engineering and MS in Computer Science, both from Rensselaer Polytechnic Institute.