Microsoft & .NETVisual C#Managed Extensions: Parsing CSV Files with Regular Expressions

Managed Extensions: Parsing CSV Files with Regular Expressions

Welcome to this week’s installment of .NET Tips & Techniques! Each week, award-winning Architect and Lead Programmer Tom Archer demonstrates how to perform a practical .NET programming task using either C# or Managed C++ Extensions.

In my latest book, Extending MFC Applications with the .NET Framework, I devote an entire chapter to using the .NET Regular Expression classes. In that chapter, I even included a regular expression that can parse text for essentially any email address format. Since the book’s publication, many readers have requested my help with their regular expressions for parsing various types of data. Some of the most popular requests I receive have to do with reading comma-delimited text files (sometimes referred to as “CSV files”) and handling scenarios where the data contains quotes, commas, and blanks. Therefore, in this week’s installment of the .NET Tips & Techniques series, I present a very simple means of handling these cases.

Returning Comma-delimited Data in an Array

In the name of reusability, I’ve placed the text-parsing code into a class called Csv and provided a static method (LineToArray) that takes a comma-delimited string and returns an array of String objects, where each string represents a row of data. That way, the Csv class’s client need only call this method and then use a for loop to enumerate the array. Here is that class/method:

using namespace System::Text::RegularExpressions;

...

__gc class Csv
{
public:
  static String* LineToArray(String* line) __gc[]
  {
    String* pattern = S",(?=(?:[^"]*"[^"]*")*(?![^"]*"))";
    Regex* r = new Regex(pattern);

    return r->Split(line);
  }
};

Using the StreamReader and Csv Classes

At this point, the client can focus on opening and reading the text file, calling the Csv::LineToArray method (for each line of text read), and iterating through the returned array of String objects. Reading a text file can be accomplished in several ways. I typically use the StreamReader class because my language of choice is Visual C++/MFC and this class closely mimics the interface of the MFC CStdioFile class.

The two main StreamReader methods used for reading are ReadToEnd and ReadLine. The difference between the two is that the ReadToEnd method is used in situations where you wish to read the entire file into a String object, while the ReadLine method is used to read each line of text from an ASCII file (as delimited by a carriage-return/line-feed pair). When reading a text file where each record will be treated independently, you’ll most likely use the ReadLine method.

The following code snippet simply opens and reads each line of text from a file (c:data.txt):

using namespace System::IO;

...

StreamReader* reader = NULL;

try
{
  // load data from text (csv) file
  reader = new StreamReader(S"c:data.txt");
  String* data;
  String* dataArray[];
  int currRec = 0;

  while (0 < reader->Peek())
  {
    // get a single line of text
    data = reader->ReadLine();

    // call routine to place delimited 
    // text into an array
    dataArray = Csv::LineToArray(data);

    // print the array of text items
    Console::WriteLine(S"Record {0} : ", __box(currRec++));
    for (int i = 0; i < dataArray->Length; i++)
    Console::WriteLine(S"[{0}] = [{1}]", __box(i), dataArray[i]);
    Console::WriteLine();
  }		
  reader->Close();
}
catch(Exception* e)
{
  Console::WriteLine(e->Message);
}
__finally
{
  if (NULL != reader) reader->Close();
}

Note the use of the StreamReader::Peek method, which doesn’t alter the stream’s pointer but instead returns the next character to be read. If a value of -1 is returned, that indicates that there is no more data to be read. For each line of text read, the code then calls the Csv::LineToArray method and displays the returned string array’s contents.

The following figure illustrates the running of this article’s demo against an included sample text file to test the scenarios mentioned at the outset.

Download the Demo

To download the accompanying demo for this article, click here.

About the Author

The founder of the Archer Consulting Group (ACG), Tom Archer has been the project lead on three award-winning applications and is a best-selling author of 10 programming books as well as countless magazine and online articles.

Get the Free Newsletter!

Subscribe to Developer Insider for top news, trends & analysis

Latest Posts

Related Stories