The Large-Grid Issue
It is one thing to discuss Grid and/or virtualized environments of small stature, but how can one manage, maintain, support, or even build large-Grid environments? In this article, I will focus the discussion around this very question, trying to shed some light on available technologies, ideas, and methodologies surrounding this issue.
To Boldly Do What No One Has Done Before...
To expand; to deal with departmental politics; to deal with the IT department; to deal with multiple users who think their applications are the most important; are you ready for the challenge?
Furthermore, I will make a radical proposition here and now:
"A large Grid is any extension to the original and first application deployment on that Grid."
What do I mean by this?
Very few, if any, medium-to-large organizations establish a relationship with a Grid vendor at the onset. The process is a step approach; in other words, the first application tests a model, followed by the second, the third, and so forth. After a critical mass has been reached, the company chooses a vendor to be used by the subsequent applications and departments. Typically, this initial vendor will not address the broad range of demands from the user community. Frankly, that is not the point of this article.
The problem arises when the second, the third, and other departments get wind of this new and useful vendor, and they all want to get on the bandwagon and deploy applications on the Grid. After all, "grid" is the coolest and catchiest phrase in any company at this point! I will leave the question of whether an application is Grid-worthy for another time, and assume that these applications are in fact perfect candidates for the Grid. What are you to do now? How do you extend your tiny Grid infrastructure to the masses? At this point, it doesn't matter whether your installation is 10 Processing Elements (PE's), 1000-PE's, or whatever. All that matters is: How can you ensure that all parties play nicely and don't step on each other's toes?
Qualifying for a Large Grid?
I will not delve into the resource and organizational aspects of expanding your Grid installation because that topic is worthy of an independent article (or book). This article aims at the technical aspects of evolving from the initial app support to full COE infrastructure. What do I mean by this? I will focus on the "interim phase"; in other words, the period of time during which your grid support demands evolve beyond the initial installs, but do not yet qualify for the full support infrastructure found in a COE. In other words, your daily battle becomes increasingly challenging, but you can't yet justify a call for troop expansion. Figure 1 depicts the situation visually.
Figure 1: Timeline of Grid expansion
As you can see from Figure 1, you face major challenges in the early and interim phases of Grid expansion. With the initial app deployment, you have no infrastructure, no support, are greatly dependent on the Grid vendor, and have minimal experience to draw from. Furthermore, at this stage, you will need to define a QA cycle, versioning strategy, and so on. The experience you derive from this effort may provide some insights as you deploy your second and third apps, but practically speaking, you are still "flying by the seat of your pants." You are simultaneously learning and executing.
Let me dig a little deeper and cover some of the technical aspects of your expansion. I have compiled a list of attributes in the Table 1 that I believe are critical.
Here are a few assumptions that I make about Table 1 and the timeline.
- First-application developers are more willing to take a chance. Remember, these are the pioneers in an organization and are more willing to go the extra mile to get a system to work. They have a vision of what Grid technology can do for the organization in the long haul and are willing to deal with some of its early-stage inefficiencies.
- Second, third, or forth application groups are typically not willing to deal with any of the inefficiencies that the first group did. They see the benefit of Grid, and they want to get on the bandwagon because they have a critical issue that needs to be solved or they want to share the spotlight. In either case, they are not willing to deal with the headaches. It's highly likely that multiple Grid installs will take place in the form of pockets of smaller Grids emerging in various locations. Administration is problematic and there is a requirement for inter-departmental collaboration. Most of the traditional benefits of "the Grid" are not being realized in this environment. As the number of applications continues to increase, the inefficiencies become more apparent and the need for internal support infrastructure becomes more obvious.
- The organization is at a point where the "pockets of installs" have gone out of control, thus justifying a group dedicated to supporting and promoting Grid across the organization. Best practices, training, and ROIs are developed and now the COE takes control of the Grid.
Table 1: Attributes to look for in expanding your Grid