http://www.developer.com/net/cplus/article.php/3497216/Using-Regular-Expressions-to-Parse-for-Email-Addresses.htm
The first thing the GetEmailAddresses function does is construct an ArrayList object. This object is returned to the caller of the function, and it holds all of the email addresses located in the passed input string. From there, the static Regex::Matches method is called with the desired email pattern (which I'll cover shortly). The result of the Matches method is a MatchCollection. The MatchCollection object is then enumerated with a for loop, each "match" (representing an email address) is added to the ArrayList object, and finally the ArrayList is returned to the caller.
The GetEmailAddresses function can be used as follows where the returned ArrayList object is enumerated and each email address is written to the console:
The pattern used in the GetEmailAddresses function correctly yields the two addresses I specified in the sample call. The following is the pattern itself (with the double backslashes replaced by single backslashes, as that's specific to C++ and not really part of the actual pattern):
If you've read the previous installments of this series, you hopefully can read this pattern. Here's a breakdown of each component of the pattern:
The main differences between this pattern and the previous one are the following:
So the natural question at this point would be "Is this pattern guaranteed to find every single valid email address?" After doing quite a bit of research on this issue it turns out that an all-encompassing email regular expression pattern is almost 6,000 bytes in length! However, that pattern would be necessary to catch only a very miniscule percentage of email addresses that the patterns illustrated in this article won't. The two patterns that I've covered will catch 99 percent of all email addresses.
Hopefully along the way, those of you who are new to regular expressions saw just how powerful they can be. Just think of how much manual text parsing code would be necessary to parse a block of code for (almost) every conceivable email address. Compare that with the single line of code it takes with regular expressions! For those who wish to learn still more about working with the .NET regular expressions classes, my bookExtending MFC Applications with the .NET Frameworkprovides a full 50-page chapter on the subject and introduces half a dozen demo applications with code that you can easily plug into your own production code.
Tom Archer owns his own training company, Archer Consulting Group, which specializes in educating and mentoring .NET programmers and providing project management consulting. If you would like to find out how the Archer Consulting Group can help you reduce development costs, get your software to market faster, and increase product revenue, contact Tom through his Web site.
Using Regular Expressions to Parse for Email Addresses
April 12, 2005
This final installment in my series on using the .NET regular expressions classes from Managed C++ takes much of what the previous installments taught to create production-quality, regular expression patterns for validating email addresses and parsing bodies of text for all email addresses. The first section begins with a basic pattern thatwhile not all-encompassingcauses the regular expressions parser to match the majority of email addresses in a supplied input string. The remainder of the column presents two more complex patterns that catch almost any email address format, and it fully explains the components of each pattern.
Basic Email Pattern
First, examine a generic functionGetEmailAddressesthat takes as its only argument an input string and returns an array of found email addresses. The email regular expression pattern utilized in this function is very basic, but it "catches" the majority of emails you run across:
using namespace System::Text::RegularExpressions;
using namespace System::Windows::Forms;
using namespace System::Collections;
...
ArrayList* GetEmailAddresses(String* input)
{
try
{
ArrayList* al = new ArrayList();
MatchCollection* mc =
Regex::Matches(input,
S"[\\w]+@[\\w]+.[\\w]{2,3}");
for (int i=0; i < mc->Count; i++)
al->Add(mc->Item[i]->Value);
return al;
}
catch(Exception* pe)
{
MessageBox::Show(pe->Message);
}
}
ArrayList* addrs =
GetEmailAddresses(S"I can be reached at tom@archerconsultinggroup.com "
S"or info@archerconsultinggroup.com.");
for (int i = 0; i < addrs->Count; i++)
{
Console::WriteLine(addrs->Item[i]);
}
[\w]+@[\w]+.[\w]{2,3}

Click here for a larger image.
Advanced Email Regular Expressions Pattern
While the previous email pattern would catch most of the email addresses, it is far from complete. This section illustrates a step at a time how to build a much more robust email pattern that will catch just about every valid email address format. To begin with, the following pattern catches "exact matches". In other words, you shouldn't use it to parse a document, but rather to validate a single email address:
^[^@]+@([-\w]+\.)+[A-Za-z]{2,4}$
Personally, I find it easier to read a pattern by dissecting it into components and then attempting to understand each of the components as they relate to the overall pattern. Having said that, this pattern breaks down to the following parts:
In order to test for "direct matches", you need a very simple function like the following:
using namespace System::Text::RegularExpressions;
...
bool ValidateEmailAddressFormat(String* email)
{
Regex* rex =
new Regex(S"^[^@]+@([-\\w]+\\.)+[A-Za-z]{2,4}$");
return rex->IsMatch(email);
}
You then can call this function like this:
bool b;
// SUCCESS
b = ValidateEmailAddressFormat("tom.archer@archerconsultinggroup.com");
// FAILURE!!
b = ValidateEmailAddressFormat("tom.archerarcherconsultinggroup.com");
Now, let's tweak the pattern so that it can be used to parse a document for all of its contained email addresses:
([-\.\w^@]+@(?:[-\w]+\.)+[A-Za-z]{2,4})+
Regular Expressions: A Lot of Ground to Cover
My original intention for a series on using the .NET regular expressions classes from Managed C++ was to simply cover some basic patterns and usages. However, the more I wrote, the more I realized needed to be covered. So it turned out to be a much-longer-than-planned series. It covered splitting strings, finding matches within a string, using regular expression metacharacters, grouping, creating named groups, working with captures, performing advanced search-and-replace functions, and finally writing a complex email pattern.
Acknowledgements
I would like to thank Don J. Plaistow, a Perl and Regular Expressions guru who helped me tremendously when I first started learning regular expressions. Don's help was especially helpful with regards to the email patterns in this article.
About the Author