http://www.developer.com/

Back to article

The Large-Grid Issue


May 15, 2008

Overview

It is one thing to discuss Grid and/or virtualized environments of small stature, but how can one manage, maintain, support, or even build large-Grid environments? In this article, I will focus the discussion around this very question, trying to shed some light on available technologies, ideas, and methodologies surrounding this issue.

To Boldly Do What No One Has Done Before...

To expand; to deal with departmental politics; to deal with the IT department; to deal with multiple users who think their applications are the most important; are you ready for the challenge?

I mentioned in one of my previous articles that a Center of Excellence (COE) is usually established internally to support new projects such as the build-out of a large-Grid environment. But, how do you handle the scenario when there is no COE, and you are on your own? I propose that it is harder for the second and third application to be Grid-enabled than it is for the 10th application.

Furthermore, I will make a radical proposition here and now:

"A large Grid is any extension to the original and first application deployment on that Grid."

What do I mean by this?

Very few, if any, medium-to-large organizations establish a relationship with a Grid vendor at the onset. The process is a step approach; in other words, the first application tests a model, followed by the second, the third, and so forth. After a critical mass has been reached, the company chooses a vendor to be used by the subsequent applications and departments. Typically, this initial vendor will not address the broad range of demands from the user community. Frankly, that is not the point of this article.

The problem arises when the second, the third, and other departments get wind of this new and useful vendor, and they all want to get on the bandwagon and deploy applications on the Grid. After all, "grid" is the coolest and catchiest phrase in any company at this point! I will leave the question of whether an application is Grid-worthy for another time, and assume that these applications are in fact perfect candidates for the Grid. What are you to do now? How do you extend your tiny Grid infrastructure to the masses? At this point, it doesn't matter whether your installation is 10 Processing Elements (PE's), 1000-PE's, or whatever. All that matters is: How can you ensure that all parties play nicely and don't step on each other's toes?

Qualifying for a Large Grid?

I will not delve into the resource and organizational aspects of expanding your Grid installation because that topic is worthy of an independent article (or book). This article aims at the technical aspects of evolving from the initial app support to full COE infrastructure. What do I mean by this? I will focus on the "interim phase"; in other words, the period of time during which your grid support demands evolve beyond the initial installs, but do not yet qualify for the full support infrastructure found in a COE. In other words, your daily battle becomes increasingly challenging, but you can't yet justify a call for troop expansion. Figure 1 depicts the situation visually.



Click here for a larger image.

Figure 1: Timeline of Grid expansion

As you can see from Figure 1, you face major challenges in the early and interim phases of Grid expansion. With the initial app deployment, you have no infrastructure, no support, are greatly dependent on the Grid vendor, and have minimal experience to draw from. Furthermore, at this stage, you will need to define a QA cycle, versioning strategy, and so on. The experience you derive from this effort may provide some insights as you deploy your second and third apps, but practically speaking, you are still "flying by the seat of your pants." You are simultaneously learning and executing.

Let me dig a little deeper and cover some of the technical aspects of your expansion. I have compiled a list of attributes in the Table 1 that I believe are critical.

Here are a few assumptions that I make about Table 1 and the timeline.

  1. First-application developers are more willing to take a chance. Remember, these are the pioneers in an organization and are more willing to go the extra mile to get a system to work. They have a vision of what Grid technology can do for the organization in the long haul and are willing to deal with some of its early-stage inefficiencies.
  2. Second, third, or forth application groups are typically not willing to deal with any of the inefficiencies that the first group did. They see the benefit of Grid, and they want to get on the bandwagon because they have a critical issue that needs to be solved or they want to share the spotlight. In either case, they are not willing to deal with the headaches. It's highly likely that multiple Grid installs will take place in the form of pockets of smaller Grids emerging in various locations. Administration is problematic and there is a requirement for inter-departmental collaboration. Most of the traditional benefits of "the Grid" are not being realized in this environment. As the number of applications continues to increase, the inefficiencies become more apparent and the need for internal support infrastructure becomes more obvious.
  3. The organization is at a point where the "pockets of installs" have gone out of control, thus justifying a group dedicated to supporting and promoting Grid across the organization. Best practices, training, and ROIs are developed and now the COE takes control of the Grid.



    Click here for a larger image.

    Table 1: Attributes to look for in expanding your Grid

Decoding the Grid

Throughout this analysis, keep in mind what I said in the previous section, in that the first application group is more willing to deal with inefficiencies of the vendor, but this "tolerance" does not persist with the second or third application teams. As such, and indicated in Table 1, many of the Grid attributes are trivial to the first app group. They are primarily focused on early-stage execution success. To the contrary, the second and third app teams are very demanding of the Grid vendor. These users are evaluating the overall attributes of the product, and thus are more critical. Features such as resource sharing, administration, network utilization, and the like are very important to them for appropriate reasons.

Imagine a scenario where you have a couple of applications that are interested in Grid technology and looking into what it would take for them to undertake the effort. There is no central point of support, and asking the vendor of support will cost some hefty consulting dollars. The new users don't necessarily feel that they were involved in the original decision making process, and they have to somewhat live with the decision that was already made. They are looking for the path of least resistance to deployment and look for immediate ROI. This makes all the usability features very important to these folks.

Now, there are two paths that can take place here: a number of smaller pockets of Grid installations, or one Grid that is slowly growing. If one or more of the "very important" items are not met, these new users opt to 1) use a different vendor, or 2) do not join the first Grid install and create an installation of their own. Either case is not good for the organization, but the second in my opinion is worse. If multiple pockets of Grid installations take place, it gives the organization a false sense of security for moving towards the COE model that I spoke about. Ideally, you want more applications to join the Grid, and after the critical mass has been reached, move to the COE model. If multiple installations take place, a management decision will "force" these pockets to merge and that's never easy. The adoption will slow down, and you feel resistance from every newcomer that you want to join the COE.

Conclusion

Much of the time, an organization does not forsee the challenges ahead in Grid expansion. It is not that they are ignorant, but that they simply have not encountered the issues previously. This is why the Pre-COE era is very critical and challenging. Therefore, as you approach this scenario, make certain to select a Grid vendor that shares your vision of the future, while demonstrating flexibility for change. After all, a decade ago, having a 1000-node Grid was rare. Today, it is common and even expected for most organizations. Do everything you can to prepare organizationally, technologically, and strategically to ensure that the evolution process is as painless as possible.

About the Author

Mr. Sedighi is currently the CTO and founder of SoftModule. SoftModule is a startup company with engineering and development offices in Boston and Tel-Aviv and a Sales and Management office in New York. He is also the Chief Architect for SoftModule's Grid Appliance product that has risen from a current need in the market to manage excessive demands of computing power at a lower cost.

Before SoftModule, Mr. Sedighi held a Senior Consulting Engineer position at DataSynapse, where he designed and implemented Grid and Distributed Computing fabrics for the Fortune 500. Before DataSynapse, Mr. Sedighi spent a number of years at TIBCO Software, where he implemented high-speed messaging solutions for organizations such as the New York Stock Exchange, UBS, Credit Suisse, US Department of Energy, US Department of Defense, and many others. Mr. Sedighi received his BS in Electrical Engineering and MS in Computer Science, both from Rensselaer Polytechnic Institute.

Sitemap | Contact Us

Thanks for your registration, follow us on our social networks to keep up-to-date