April 17, 2014
Hot Topics:
RSS RSS feed Download our iPhone app

Validating Emails with PHP

  • December 17, 2003
  • By W. Jason Gilmore
  • Send Email »
  • More Articles »

How many times have you heard of email being termed "the ultimate killer application"? Indeed, the impact that it has made on all of our lives in such a short period of time is immeasurable. After all, email offers a highly effective mode of communication, both in terms of QoS (quality of service) and cost. Furthermore, it's asynchronous nature offers both parties the freedom of participating at a time most convenient to their time schedule. Given such qualities, it isn't a surprise that email has become a defacto tool of business for communicating with its clientele. These days, users are expected to supply a valid email for just about everything, provided in exchange for the privilege of downloading software, learning more about the latest sales, and even for simply offering a comment to a favorite blogger's most recent entry. For many users, admittedly myself included at times, the constant address beggaring has resulted in the prompt insertion of nonsensical, whimsical addresses such as "abc@defg", "23424!@@@asdfa.com", and my personal favorite, "blah", whenever possible.

Nonetheless, to be sure there are times when the provision of a valid email address is an absolute necessity, not only for the site operator but also for the user. For example, it is often in the interests of both parties that a confirmation email is sent to a specific address whenever goods are purchased from the site. A valid email address is also vital in the case that the user is signing up for an email-based service, such as a newsletter or alerts such as stock or weather notifications. And while one would hope that the user would have enough sense to steer clear of invalid addresses such as those above-described, one must still take into account the possibility that a typing error could occur when supplying the address. Not taking the time to properly verify user input in this regards is a detriment to not only the organization, but also the user who has shelled out time and perhaps money for your service!

In this article, I'll show you how to use the PHP scripting language to aid in the validation of email addresses on not only the level of syntactical correctness, but also of actual existence on the destination domain! These easily implementable procedures will go a very long way towards eliminating future asdf's, blah's, and other invalid entries from your user database. As a byproduct, you'll also learn a bit more about regular expressions, and PHP's regular expression and networking functions.

Because validation is a process comprised of two parts: syntax and existence, I'll divide the discussion into these two components. I'll then conclude the tutorial by assembling both processes, and offering a few examples.

Validating Email Syntax

Certainly we're all quite familiar with the typical email addressing structure: username@domain.suffix. However, you might not be aware that these are also all valid addresses:

  • fred_anderson@example.pps.k12.oh.us
  • i-l-o-v-e-e_m_a_i_l@example.info
  • ----____.---.___@example.com

Resultingly, you need to make sure that your validation code covers all possibilities! The only way to do so is to ensure that the supplied address conforms to the Internet message format rules as set forth by RFC 2822. Because reading an RFC is about as entertaining as a root canal, I'll offer a very broad summary of the rules here:

  1. An email address must follow the pattern: <username>@<domain>.<tld>
  2. The username can consist of the letters a through z, the numbers 0 through 9, and the underscore ('_'), hyphen ('-'), and period ('.') characters. Furthermore, the username cannot begin or conclude with a period.
  3. The domain part follows the same rules as those specified for the username.
  4. The tld, acronym for "top-level domain" can consist solely of one or more sequences of the letters a-z, each separated by a period. Furthermore, the suffix must begin with a period, and cannot conclude with a period. Finally, the suffix must be a valid Internet domain suffix as approved by the Internet Assigned Numbers Authority (IANA). Examples include ".com", ".net", ".co.uk", ".tv", and ".ca". If you're interested, IANA's Web site offers a comprehensive list of all valid TLDs.
  5. Email addresses are case-insensitive.

Because of the innumerable syntactical variations which could arise as a result of these rules, we'll need to devise a regular expression capable of accounting for all possibilities. Furthermore, because addresses are case-insensitive, the regular expression should ignore character casing. I'll provide the regular expression in its entirety here, and then offer a thorough explanation of its components. Note that this is not intended to be an introduction to regular expressions, although I think that those at least familiar with the concept should be able to follow along. If you're a complete beginner to the matter, a quick search should turn up numerous excellent resources.

The Validation Expression

^([_a-z0-9-]+)(\.[_a-z0-9-]+)*@([a-z0-9-]+)(\.[a-z0-9-]+)*(\.[a-z]{2,4})$

What a mouthful! Not to worry, as I elaborate upon the purpose of each component below.

Username (further broken down into two components)
^([_a-z0-9-]+)

The carat symbol signals that the ensuing arguments are to apply to the very beginning of the string. The characters located within the square brackets denote the allowable characters. Finally, the plus sign signals that at least one of the characters located in the preceding bracketed range is required. So for example, this regular expression would validate "jason", "go49ers" and "999999", but not ".ilovespam", ":-)hi" or "%$^abc".

(\.[_a-z0-9-]+)*

This second component operates exactly like the first, save for two important differences. First, it only comes into play in the case that it begins with a period. Secondly, because the expression is concluded with an asterisk, it doesn't have to appear at all! Let's take this and the first component into account, and offer a few usage examples. The following items are all examples of valid usernames:

  • jason
  • go49ers.sunday
  • a.b
  • jason_gilmore

While the following items are all examples of invalid usernames:

  • .hello.world
  • @-@.wow
  • $$big.spender$$

Let's move on to the next component.

At-Symbol
@

This denotes the ubiquitous "at" symbol. Because it is not followed by any frequency indicators ("*" or "+" for example), one and only one instance of this character is required.

Domain + TLD
[a-z0-9-]+(\.[a-z0-9-]+)*

The domain and TLD components of the regular expression very closely resemble the assembled username expression, save for that underscores and periods are not allowed! Therefore you can apply all of the same rules and examples described above to this component, provided that you keep in mind that the aforementioned characters are taboo.

TLD
(\.[a-z]{2,4})$

Again, given your familiarity with the previous components, nothing here should really come as a surprise. The top-level domain must begin with a period, and can consist of solely alphabetical characters (a-z). As is denoted by the curly brackets, {2,4} located towards the conclusion of the string, this alphabetical string must consist of no less than two and no more than four characters. Finally, the dollar-sign located at the conclusion of the component signals that the string cannot contain any additional characters following this two-to-four character alphabetical string. So for example, this regular expression would satisfy ".info", ".us", ".com" and ".net", but not "com", ".123", or ".jason".

Let's implement this regular expression, using it in conjunction with one of PHP's great regular expression functions, eregi(), to validate email syntax. The eregi() Function The regular expression function we'll use to compare the input email against the regular expression is eregi(). This function operates identically to ereg(), save for that it ignores the argument's character casing. A formal introduction follows.

eregi()

bool eregi(string pattern, string str [, array regs])

The eregi() function verifies whether str satisfies the regular expression defined by pattern. In the case that the optional regs array parameter is included, any parenthesized substrings located within the pattern will be stored here.

Let's use this function and our regular expression to validate an email address' syntax.

Consider executing this script several times, inserting a variety of both valid and invalid addresses.

While this mechanism does a great job of validating email syntax, it does little in the case that a user does indeed manage to offer a syntactically correct, yet nonetheless nonexistent email. For example, what if the user meant to insert the address "johnny.rocket@example.com", but instead mistakely entered "johnny.rocket@exmple.com"? While this address validates in terms of the syntactical requirements, the user will nonetheless not receive any future correspondence! Resultingly, I'd like to introduce a secondary mechanism into our validation scheme; one that actually queries the domain server in an attempt to discern whether the domain, and the user even exists. This mechanism is introduced in the next section.





Page 1 of 2



Comment and Contribute

 


(Maximum characters: 1200). You have characters left.

 

 


Sitemap | Contact Us

Rocket Fuel