http://www.developer.com/tech/article.php/3747006/The-Large-Grid-Issue.htm
It is one thing to discuss Grid and/or virtualized environments of small stature, but how can one manage, maintain, support, or even build large-Grid environments? In this article, I will focus the discussion around this very question, trying to shed some light on available technologies, ideas, and methodologies surrounding this issue. To expand; to deal with departmental politics; to deal with the IT department; to deal with multiple users who think their applications are the most important; are you ready for the challenge? I mentioned in one of my previous articles that a Center of Excellence (COE) is usually established internally to support new projects such as the build-out of a large-Grid environment. But, how do you handle the scenario when there is no COE, and you are on your own? I propose that it is harder for the second and third application to be Grid-enabled than it is for the 10th application. Furthermore, I will make a radical proposition here and now: What do I mean by this? Very few, if any, medium-to-large organizations establish a relationship with a Grid vendor at the onset. The process is a step approach; in other words, the first application tests a model, followed by the second, the third, and so forth. After a critical mass has been reached, the company chooses a vendor to be used by the subsequent applications and departments. Typically, this initial vendor will not address the broad range of demands from the user community. Frankly, that is not the point of this article. The problem arises when the second, the third, and other departments get wind of this new and useful vendor, and they all want to get on the bandwagon and deploy applications on the Grid. After all, "grid" is the coolest and catchiest phrase in any company at this point! I will leave the question of whether an application is Grid-worthy for another time, and assume that these applications are in fact perfect candidates for the Grid. What are you to do now? How do you extend your tiny Grid infrastructure to the masses? At this point, it doesn't matter whether your installation is 10 Processing Elements (PE's), 1000-PE's, or whatever. All that matters is: How can you ensure that all parties play nicely and don't step on each other's toes? I will not delve into the resource and organizational aspects of expanding your Grid installation because that topic is worthy of an independent article (or book). This article aims at the technical aspects of evolving from the initial app support to full COE infrastructure. What do I mean by this? I will focus on the "interim phase"; in other words, the period of time during which your grid support demands evolve beyond the initial installs, but do not yet qualify for the full support infrastructure found in a COE. In other words, your daily battle becomes increasingly challenging, but you can't yet justify a call for troop expansion. Figure 1 depicts the situation visually. Figure 1: Timeline of Grid expansion As you can see from Figure 1, you face major challenges in the early and interim phases of Grid expansion. With the initial app deployment, you have no infrastructure, no support, are greatly dependent on the Grid vendor, and have minimal experience to draw from. Furthermore, at this stage, you will need to define a QA cycle, versioning strategy, and so on. The experience you derive from this effort may provide some insights as you deploy your second and third apps, but practically speaking, you are still "flying by the seat of your pants." You are simultaneously learning and executing. Let me dig a little deeper and cover some of the technical aspects of your expansion. I have compiled a list of attributes in the Table 1 that I believe are critical. Here are a few assumptions that I make about Table 1 and the timeline. Table 1: Attributes to look for in expanding your Grid Throughout this analysis, keep in mind what I said in the previous section, in that the first application group is more willing to deal with inefficiencies of the vendor, but this "tolerance" does not persist with the second or third application teams. As such, and indicated in Table 1, many of the Grid attributes are trivial to the first app group. They are primarily focused on early-stage execution success. To the contrary, the second and third app teams are very demanding of the Grid vendor. These users are evaluating the overall attributes of the product, and thus are more critical. Features such as resource sharing, administration, network utilization, and the like are very important to them for appropriate reasons. Imagine a scenario where you have a couple of applications that are interested in Grid technology and looking into what it would take for them to undertake the effort. There is no central point of support, and asking the vendor of support will cost some hefty consulting dollars. The new users don't necessarily feel that they were involved in the original decision making process, and they have to somewhat live with the decision that was already made. They are looking for the path of least resistance to deployment and look for immediate ROI. This makes all the usability features very important to these folks. Now, there are two paths that can take place here: a number of smaller pockets of Grid installations, or one Grid that is slowly growing. If one or more of the "very important" items are not met, these new users opt to 1) use a different vendor, or 2) do not join the first Grid install and create an installation of their own. Either case is not good for the organization, but the second in my opinion is worse. If multiple pockets of Grid installations take place, it gives the organization a false sense of security for moving towards the COE model that I spoke about. Ideally, you want more applications to join the Grid, and after the critical mass has been reached, move to the COE model. If multiple installations take place, a management decision will "force" these pockets to merge and that's never easy. The adoption will slow down, and you feel resistance from every newcomer that you want to join the COE. Much of the time, an organization does not forsee the challenges ahead in Grid expansion. It is not that they are ignorant, but that they simply have not encountered the issues previously. This is why the Pre-COE era is very critical and challenging. Therefore, as you approach this scenario, make certain to select a Grid vendor that shares your vision of the future, while demonstrating flexibility for change. After all, a decade ago, having a 1000-node Grid was rare. Today, it is common and even expected for most organizations. Do everything you can to prepare organizationally, technologically, and strategically to ensure that the evolution process is as painless as possible. Mr. Sedighi is currently the CTO and founder of SoftModule. SoftModule is a startup company with engineering and development offices in Boston and Tel-Aviv and a Sales and Management office in New York. He is also the Chief Architect for SoftModule's Grid Appliance product that has risen from a current need in the market to manage excessive demands of computing power at a lower cost. Before SoftModule, Mr. Sedighi held a Senior Consulting Engineer position at DataSynapse, where he designed and implemented Grid and Distributed Computing fabrics for the Fortune 500. Before DataSynapse, Mr. Sedighi spent a number of years at TIBCO Software, where he implemented high-speed messaging solutions for organizations such as the New York Stock Exchange, UBS, Credit Suisse, US Department of Energy, US Department of Defense, and many others. Mr. Sedighi received his BS in Electrical Engineering and MS in Computer Science, both from Rensselaer Polytechnic Institute.
The Large-Grid Issue
May 15, 2008
Overview
To Boldly Do What No One Has Done Before...
"A large Grid is any extension to the original and first application deployment on that Grid."
Qualifying for a Large Grid?

Click here for a larger image.

Click here for a larger image. Decoding the Grid
Conclusion
About the Author