September 30, 2014
Hot Topics:
RSS RSS feed Download our iPhone app

Building a Regular Expression Stream Search with the .NET Framework

  • January 4, 2008
  • By Jeffrey Juday
  • Send Email »
  • More Articles »

As you can see, the methods above, as fitting for a byte stream, read and write byte data. Converting bytes to a string is the role of some encoding classes in the System.Text namespace. In our solution, we used the ASCIIEncoding class. The following example illustrates how you can convert bytes to a string by using the ASCIIEncoding class.

ASCIIEncoding encoder = new ASCIIEncoding();
copyValue = encoder.GetString(data);

At this point, we have all the tools to compose a solution. There is, however, one other issue to address before we're ready to assemble the solution.

A Buffered Solution for Easing Ingestion

Streams can be large. Although we could have loaded an entire Stream into a string, we wanted to avoid the overhead of storing an entire Stream in memory. So, one last issue to confront is: How do you load portions of the Stream when you need to search the entire Stream for a pattern?

We chose the following approach to address the issue.

  • Create a buffer large enough to store the entire string pattern.
  • Add a portion of the Stream to the front of the buffer and trim from the back the same number of bytes you added to the front.

The approach works well if you know how large the target search pattern can be. This may not always be the case with all Regular Expressions, but because we were looking for simple patterns, it was a safe assumption.

We also needed to avoid making the number of characters we trim and add too large. Too large of trim and add values compared to the size of the buffer risk cutting too many characters off of the end of the buffer, missing the pattern. So, in the example string below, a buffer of 7 and a trim and add value of 6 would miss the string pattern "zabcdef" embedded in the middle of the string.

Abcdefghijklmnopqrstuvwxyzabcdefghijklmnopqrstuvwxyzhskfhsljds
   flashjdsdfkllsdfjasdfnnn

An algorithm using the values above would split the target pattern in two once you reach characters in the pattern of the Stream.

Now, it's time to look at our complete solution embodied in a single class called StreamSearchExpression.

StreamSearchExpression

Earlier, you learned about the relationship between the buffer, trim/add, and the patterns you are matching. Rather than making these values dynamic, class users provide the values in the class constructor. The class constructor appears below.

public StreamSearchExpression(Stream stream, string[] patterns,
   int bufferSize, int trailLeadAdd)
{
   _stream = stream;
   _patterns = patterns;
   _bufferSize = bufferSize;
   _trailLeadAdd = trailLeadAdd;
}

The Check method on StreamSearchExpression initiates the searching process. The Check method appears below.

public bool Check(out string patternMatched, out long positionEnd)
{
   bool patternPresent = false;
   StringBuilder builder = new StringBuilder();

   patternMatched = "";
   positionEnd = -1;

   if (_stream.Length > 0)
   {

      InitBuffer(builder);

      patternPresent = IsMatchInBuffer(builder,
         out patternMatched, out positionEnd);

      if (patternPresent)
      {//you're done; it was right at the beginning
      }
      else
      {
         while ((!(patternPresent)) &&
            (!(_stream.Length == _stream.Position)))
         {
            MoveBuffer(builder, _trailLeadAdd);

            patternPresent = IsMatchInBuffer(builder,
               out patternMatched, out positionEnd);
         }
      }
   }
   else
   {
      patternPresent = false;
   }

   return patternPresent;
}

As you can see, Check loops through the Stream, advancing the buffer until one of the patterns in the array of patterns is found. Although a Regular Expression can be written to work like an array of patterns, we opted for the array mostly to eliminate the need to write a more complicated Regular Expression.





Page 2 of 3



Comment and Contribute

 


(Maximum characters: 1200). You have characters left.

 

 


Sitemap | Contact Us

Rocket Fuel