LanguagesPHPValidating Emails with PHP

Validating Emails with PHP

Developer.com content and product recommendations are editorially independent. We may make money when you click on links to our partners. Learn More.

How many times have you heard of email being termed “the ultimate killer application”? Indeed, the impact that it has made on all of our lives in such a short period of time is immeasurable. After all, email offers a highly effective mode of communication, both in terms of QoS (quality of service) and cost. Furthermore, it’s asynchronous nature offers both parties the freedom of participating at a time most convenient to their time schedule. Given such qualities, it isn’t a surprise that email has become a defacto tool of business for communicating with its clientele. These days, users are expected to supply a valid email for just about everything, provided in exchange for the privilege of downloading software, learning more about the latest sales, and even for simply offering a comment to a favorite blogger’s most recent entry. For many users, admittedly myself included at times, the constant address beggaring has resulted in the prompt insertion of nonsensical, whimsical addresses such as “abc@defg”, “23424!@@@asdfa.com”, and my personal favorite, “blah”, whenever possible.

Nonetheless, to be sure there are times when the provision of a valid email address is an absolute necessity, not only for the site operator but also for the user. For example, it is often in the interests of both parties that a confirmation email is sent to a specific address whenever goods are purchased from the site. A valid email address is also vital in the case that the user is signing up for an email-based service, such as a newsletter or alerts such as stock or weather notifications. And while one would hope that the user would have enough sense to steer clear of invalid addresses such as those above-described, one must still take into account the possibility that a typing error could occur when supplying the address. Not taking the time to properly verify user input in this regards is a detriment to not only the organization, but also the user who has shelled out time and perhaps money for your service!

In this article, I’ll show you how to use the PHP scripting language to aid in the validation of email addresses on not only the level of syntactical correctness, but also of actual existence on the destination domain! These easily implementable procedures will go a very long way towards eliminating future asdf’s, blah’s, and other invalid entries from your user database. As a byproduct, you’ll also learn a bit more about regular expressions, and PHP’s regular expression and networking functions.

Because validation is a process comprised of two parts: syntax and existence, I’ll divide the discussion into these two components. I’ll then conclude the tutorial by assembling both processes, and offering a few examples.

Validating Email Syntax

Certainly we’re all quite familiar with the typical email addressing structure: username@domain.suffix. However, you might not be aware that these are also all valid addresses:

  • fred_anderson@example.pps.k12.oh.us
  • i-l-o-v-e-e_m_a_i_l@example.info
  • —-____.—.___@example.com

Resultingly, you need to make sure that your validation code covers all possibilities! The only way to do so is to ensure that the supplied address conforms to the Internet message format rules as set forth by RFC 2822. Because reading an RFC is about as entertaining as a root canal, I’ll offer a very broad summary of the rules here:

  1. An email address must follow the pattern: <username>@<domain>.<tld>
  2. The username can consist of the letters a through z, the numbers 0 through 9, and the underscore (‘_’), hyphen (‘-‘), and period (‘.’) characters. Furthermore, the username cannot begin or conclude with a period.
  3. The domain part follows the same rules as those specified for the username.
  4. The tld, acronym for “top-level domain” can consist solely of one or more sequences of the letters a-z, each separated by a period. Furthermore, the suffix must begin with a period, and cannot conclude with a period. Finally, the suffix must be a valid Internet domain suffix as approved by the Internet Assigned Numbers Authority (IANA). Examples include “.com”, “.net”, “.co.uk”, “.tv”, and “.ca”. If you’re interested, IANA’s Web site offers a comprehensive list of all valid TLDs.
  5. Email addresses are case-insensitive.

Because of the innumerable syntactical variations which could arise as a result of these rules, we’ll need to devise a regular expression capable of accounting for all possibilities. Furthermore, because addresses are case-insensitive, the regular expression should ignore character casing. I’ll provide the regular expression in its entirety here, and then offer a thorough explanation of its components. Note that this is not intended to be an introduction to regular expressions, although I think that those at least familiar with the concept should be able to follow along. If you’re a complete beginner to the matter, a quick search should turn up numerous excellent resources.

The Validation Expression

^([_a-z0-9-]+)(.[_a-z0-9-]+)*@([a-z0-9-]+)(.[a-z0-9-]+)*(.[a-z]{2,4})$

What a mouthful! Not to worry, as I elaborate upon the purpose of each component below.

Username (further broken down into two components)

^([_a-z0-9-]+)

The carat symbol signals that the ensuing arguments are to apply to the very beginning of the string. The characters located within the square brackets denote the allowable characters. Finally, the plus sign signals that at least one of the characters located in the preceding bracketed range is required. So for example, this regular expression would validate “jason”, “go49ers” and “999999”, but not “.ilovespam”, “:-)hi” or “%$^abc”.

(.[_a-z0-9-]+)*

This second component operates exactly like the first, save for two important differences. First, it only comes into play in the case that it begins with a period. Secondly, because the expression is concluded with an asterisk, it doesn’t have to appear at all! Let’s take this and the first component into account, and offer a few usage examples. The following items are all examples of valid usernames:

  • jason
  • go49ers.sunday
  • a.b
  • jason_gilmore

While the following items are all examples of invalid usernames:

  • .hello.world
  • @-@.wow
  • $$big.spender$$

Let’s move on to the next component.

At-Symbol

@

This denotes the ubiquitous “at” symbol. Because it is not followed by any frequency indicators (“*” or “+” for example), one and only one instance of this character is required.

Domain + TLD

[a-z0-9-]+(.[a-z0-9-]+)*

The domain and TLD components of the regular expression very closely resemble the assembled username expression, save for that underscores and periods are not allowed! Therefore you can apply all of the same rules and examples described above to this component, provided that you keep in mind that the aforementioned characters are taboo.

TLD

(.[a-z]{2,4})$

Again, given your familiarity with the previous components, nothing here should really come as a surprise. The top-level domain must begin with a period, and can consist of solely alphabetical characters (a-z). As is denoted by the curly brackets, {2,4} located towards the conclusion of the string, this alphabetical string must consist of no less than two and no more than four characters. Finally, the dollar-sign located at the conclusion of the component signals that the string cannot contain any additional characters following this two-to-four character alphabetical string. So for example, this regular expression would satisfy “.info”, “.us”, “.com” and “.net”, but not “com”, “.123”, or “.jason”.

Let’s implement this regular expression, using it in conjunction with one of PHP’s great regular expression functions, eregi(), to validate email syntax.
The eregi() Function
The regular expression function we’ll use to compare the input email against the regular expression is eregi(). This function operates identically to ereg(), save for that it ignores the argument’s character casing. A formal introduction follows.

eregi()

bool eregi(string pattern, string str [, array regs])

The eregi() function verifies whether str satisfies the regular expression defined by pattern. In the case that the optional regs array parameter is included, any parenthesized substrings located within the pattern will be stored here.

Let’s use this function and our regular expression to validate an email address’ syntax.

Consider executing this script several times, inserting a variety of both valid and invalid addresses.

While this mechanism does a great job of validating email syntax, it does little in the case that a user does indeed manage to offer a syntactically correct, yet nonetheless nonexistent email. For example, what if the user meant to insert the address “johnny.rocket@example.com”, but instead mistakely entered “johnny.rocket@exmple.com”? While this address validates in terms of the syntactical requirements, the user will nonetheless not receive any future correspondence! Resultingly, I’d like to introduce a secondary mechanism into our validation scheme; one that actually queries the domain server in an attempt to discern whether the domain, and the user even exists. This mechanism is introduced in the next section.

Validating Domain Existence

Once the email’s syntax is validated, it’s time to ensure that the domain exists and is configured to accept mail. The easiest way to do so is to verify that an MX (Mail Exchange) record for that domain exists. You can do so easily using PHP’s getmxrr() function, introduced here.

getmxrr()

int getmxrr(string hostname array mxrr [, array weight])

The getmxrr() function will contact a DNS server in an attempt to determine whether MX (mail exchange) records for that host exist, returning TRUE if the records are found, and FALSE otherwise. If records are found, they are placed within the input parameter mxrr. If the optional weight array parameter is included, then the respective weight attributes for each record are placed there. However, because we’re only interested in determining whether MX records for the given domain exist, we can invoke getmxrr() like so:

<?php
   $email = "johnny.rocket@example.com";
   list($username,$domain) = split("@",$email);
   if (getmxrr($domain,$mxrecords))
      echo "Email domain exists!";
   else
      echo "Email domain does not exist!";
?>

Putting it All Together

Let’s assemble both the syntax and existence validation logic. To encourage reuse, the logic is incorporated into a function named validate_email(), which accepts as input a single parameter, $email. The function, along with a usage example is presented here:

<?php

function validate_email($email)
{

   // Create the syntactical validation regular expression
   $regexp = "^([_a-z0-9-]+)(.[_a-z0-9-]+)*@([a-z0-9-]+)(.[a-z0-9-]+)*(.[a-z]{2,4})$";

   // Presume that the email is invalid
   $valid = 0;

   // Validate the syntax
   if (eregi($regexp, $email))
   {
      list($username,$domaintld) = split("@",$email);
      // Validate the domain
      if (getmxrr($domaintld,$mxrecords))
         $valid = 1;
   } else {
      $valid = 0;
   }

   return $valid;

}

$email = "johnny-rocket@example.com";

if (validate_email($email))
   echo "Email is valid!";
else
   echo "Email is invalid!";

?>

Conclusion

Given the mission-critical importance of communicating order confirmations, download instructions, newsletters and the like via email, taking the steps necessary to validate such user-input items will be well worth your effort. As you’ve learned in this tutorial, the ease in which such a feature can be implemented really leaves you with no real excuse for not doing so! If you wind up adding email validation to your application as a result of this tutorial, I’d love to hear about it! Please email me at jason@wjgilmore.com with all the details.

About the Author

W. Jason Gilmore (http://www.wjgilmore.com/
) is an Internet application developer for the Fisher College of Business. He’s the author of the upcoming book, PHP 5 and MySQL: Novice to Pro, due out by Apress in 2004.
His work has been featured within many of the computing industry’s leading
publications, including Linux Magazine, O’Reillynet, Devshed, Zend.com, and
Webreview. Jason is also the author of A Programmer’s Introduction to PHP
4.0 (453pp., Apress). Along with colleague Jon Shoberg, he’s co-author of
“Out in the Open,” a monthly column published within Linux magazine.

# # #

Get the Free Newsletter!

Subscribe to Developer Insider for top news, trends & analysis

Latest Posts

Related Stories