October 30, 2014
Hot Topics:
RSS RSS feed Download our iPhone app

Using Regular Expressions Groups to Isolate Sub-Matches

  • March 16, 2005
  • By Tom Archer
  • Send Email »
  • More Articles »

Extracting Specific Groups

Note from Figure 2 that each group collection contains—as its first group object—the entire match. Therefore, any defined groups (per the placement of parenthesis in the pattern) start at the second group object in the group collection. Since the DisplayGroups function does most of what you need, you can simply modify it a bit to create a function—ExtractAreaCodes—that is specific to extracting area codes from a text value:
void ExtractAreaCodes(String* input)
{
  try
  {
    StringBuilder* results = new StringBuilder();
    results->AppendFormat(S"The Area Codes for '{0}' are:\r\n\r\n", input);

    String* pattern = S"(\\d{3})-\\d{3}-\\d{4}";

    Regex* rex = new Regex(pattern);

    // for all the matches
    for (Match* match = rex->Match(input); 
         match->Success; 
         match = match->NextMatch())
    {
      results->AppendFormat(S"\t{0}\r\n", match->Groups->Item[1]->Value);
    }
    MessageBox::Show(results->ToString());
  }
  catch(Exception* pe)
  {
    MessageBox::Show(pe->Message);
  }
}

As you can see, the only major changes to the function were to hard-code the pattern—as this function is dedicated to area codes—and the following parameter to the result object's AppendFormat call, which extracts the second group from the match's group collection object:

match->Groups->Item[1]->Value

Now you can test the ExtractAreaCodes function like this:

ExtractAreaCodes(S"My phone numbers are 770-555-1212 and 404-555-1212");

Doing so yields the expected results shown in Figure 3.


Figure 3. You Can Use Standard Collection Notation to Retrieve Specific Groups from a Match

Looking Ahead

You've learned the basics of how to define groups or sub-matches within a regular expression pattern and how to enumerate all the groups of a match as well as extract a specific group. At this point, you should be able to modify the code in this tip for situations where you need to both locate a particular pattern in a string using regular expressions and then extract specific sub-matches.

One thing that's not so nice about the ExtractAreaCodes function is that the code is hard-coded to retrieve the second object from the group collection. What if the pattern changes such that another group appears before the area code? The programmer would need to change the ExtractAreaCodes function—as well as any other functions depending on the specific order of groups within the group collection. Therefore, the next tip will cover how to name groups (in order to avoid this code-maintenance hassle) and explain how to define "non-capturing" groups.

About the Author

Tom Archer owns his own training company, Archer Consulting Group, which specializes in educating and mentoring .NET programmers and providing project management consulting. If you would like to find out how the Archer Consulting Group can help you reduce development costs, get your software to market faster, and increase product revenue, contact Tom through his Web site.





Page 2 of 2



Comment and Contribute

 


(Maximum characters: 1200). You have characters left.

 

 


Sitemap | Contact Us

Rocket Fuel