Regular expressions have been around for a long time. They come in very handy for text processing tasks. Some attribute the success of Perl to its superb ability of handling regular expressions. While there have been third-party classes that support regular expressions, with JDK 1.4, Java provides native support via the java.util.regex package.
Usage of regular expressions boils down to two components:
- Defining the regular expression. This is a pattern that describes what is to be matched.
- Applying the regular expression on a sequence of characters. Determining whether a match was found or not, successive finds, and replacements are some of the common operations involving regular expressions.
The Pattern class focuses on the first component. You use this class to define a regular expression. By representing the regular expression as an object, you can reuse the expression in your code. This is particularly useful in cases where you need to apply the same expression to multiple strings (e.g., line-by-line processing of a text file). The following line creates a pattern based on the regular expression "[0-9]"
which matches any digits from 0 to 9. This pattern can also be represented as "/d"
.
Pattern p = Pattern.compile("[0-9]");
The documentation for the Pattern class provides a useful summary of regular expression constructs. It also provides a comparison to Perl 5 regular expressions. The Pattern class includes several fields that allow you to control its behavior. They include CANON_EQ, CASE_SENSITIVE, DOTALL, MULTILINE, and UNICODE_CASE. You can pass these flags to the compile()
method to alter its behavior. For example, CASE_SENSITIVE enables case-sensitive matching. The pattern()
method returns a String representing the regular expression that was compiled. The matcher()
method takes a CharSequence as input and creates a “matcher” object that will be used to apply the regular expression. The matches()
method returns a boolean after applying the regular expression and the split()
method returns an array of Strings after attempting to split its input around matches found based on the pattern.
Once you have the pattern defined, you will use the Matcher class to apply that pattern to a character sequence (usually a String). The following two lines demonstrate this:
Matcher m = p.matcher("abcd55efg"); boolean matchFound = m.matches();
Listing 1 is a simple class that searches for a pattern consisting of characters followed by two digits, followed by more characters. The pattern is applied to the string "abcd55efg"
and the result is printed. In this case, a match should be found. To experiment with various regular expressions, you can make the arguments to the compile()
and matcher()
methods correspond to the command-line parameters.
Listing 1.
import java.util.regex.*; class regexSample { public static void main(String args[]) { Pattern p = Pattern.compile("[a-z]*[0-9][0-9][a-z]*"); Matcher m = p.matcher("abcd55efg"); boolean matchFound = m.matches(); if (matchFound) System.out.println("Match was found."); else System.out.println("No match."); } }
Aside from a “match” operation, the Matcher class provides a number of other methods. Once a match is found, the start()
and end()
methods return the starting and ending index of the characters in the match. These values are useful if successive matching is needed using the find()
method. Unlike matches()
, the find()
method will return true
when a portion of the input source matches the regular expression pattern. Listing 2 shows the same class but with the find()
method. The regular expression has been changed to "[0-9][0-9]"
, which is two successive digits.
Listing 2.
class regexSample2 { public static void main(String args[]) { Pattern p = Pattern.compile("[0-9][0-9]"); Matcher m = p.matcher("abcd55efg"); boolean matchFound = m.find(); if (matchFound) System.out.println("Match was found."); else System.out.println("No match."); } }
We will get a match because the pattern of two successive digits is within the input source (i.e., 55
).
The lookingAt()
method tries to find the pattern starting from the beginning of the input source. If you change the input source in Listing 2 to "55abcdefg"
, then using the lookingAt()
method, you will get a match because the beginning of the input source is a sequence of two successive digits. There are also methods like appendReplacement()
and replaceAll()
, which allow you to make changes to the input sequence based on the regular expression.
Regular expressions are a useful programming tool. The fact that Java now natively supports them simplifies many programming tasks that used to require cumbersome code dealing with character arrays and StringTokenizer.
About the Author
Piroz Mohseni is a principle with Bita Technologies, focusing on business improvement through the effective use of technology. His areas of interest include enterprise Java, XML, and e-commerce applications.