http://www.developer.com/

Back to article

Using Memory-Mapped Files in .NET 4.0


July 14, 2009

Introduction

Assume you have the need to manipulate multi-gigabyte files and read and write data to them. One option would be to access the file using a sequential stream, which is fine if you need to access the file from the beginning to the end. However, things get more problematic when you need random access. Seeking the stream is naturally a solution, but unfortunately a slow one.

If you have background in Windows API development, then you might be aware of an old technique called memory-mapped files (sometimes abbreviated MMF). The idea of memory-mapped files or file mapping is to load a file into memory so that it appears as a continuous block in your application's address space. Then, reading and writing to the file is simply a matter of accessing the correct memory location. In fact, when the operating system loader fetches your application's EXE or DLL files to execute their code, file mapping is used behind the scenes.

Using memory-mapped files from .NET applications is not new in itself, as it has been possible to use the underlying operating system APIs using Platform Invoke (P/Invoke) available already in .NET 1.0. However, in .NET 4.0, using memory-mapped files becomes available for all managed code developers without using the Windows APIs directly.

Memory-mapped files and large files are often associated together in the minds of developers, but there's no practical limit to how large or small the files accessed through memory mapping can be. Although using memory mapping for large files make programming easier, you might observe even better performance when using smaller files, as they can fit entirely in the file system cache.

The information and the code listings in this article are based on the .NET 4.0 Beta 1 release, available since May 2009. As is the case with pre-release software, technical details, class names and available methods might change once the final RTM version of .NET becomes available. This is worth keeping in mind while studying or developing against any beta library.

The New Namespace and its Classes

For .NET 4.0 developers, the interesting classes that work with memory-mapped files live in the new System.IO.MemoryMappedFiles namespace. Presently, this namespace contains four classes and several enumerations to help you access and secure your file mappings. The actual implementation is inside the assembly System.Core.dll.

The most important class for the developer is the MemoryMappedFile class. This class allows you to create a memory-mapped object, from which you can in turn create a view accessor object. You can then use this accessor to manipulate directly the memory block mapped from the file. Manipulation can be done using the convenient Read and Write methods.

Note that since direct pointers are not considered a sound programming practice in the managed world, such an access object is needed to keep things tidy. In traditional Windows API development in native code, you would simply get a pointer to the beginning of your memory block.

That said, the process or acquiring a memory-mapped file and the necessary accessor object, you need to follow three simple steps. First, you need a file stream object that points to (an existing) file on disk. Secondly, you can create the mapping object from this file, and as a final step, you create the accessor object. Here is a code example in C#:

  FileStream file = new FileStream(
    @"C:\Temp\MyFile.dat", FileMode.Open);
  MemoryMappedFile mmf =
    MemoryMappedFile.CreateFromFile(file);
  MemoryMappedViewAccessor accessor =
    mmf.CreateViewAccessor();

The code first opens a file with the System.IO.FileStream class, and then passes the stream object instance to the static CreateFromFile method of the MemoryMappedFile class. The third step is to call the CreateViewAccessor method of the MemoryMappedFile class.

In the above code, the CreateViewAccessor method is called without any parameters. In this case, the mapping begins from the start of the file (offset zero) and ends at the last byte of the file. You can however easily map in any portion of the file. For instance, if your file would be one gigabyte in size, then you could map, say, a view at the one megabyte mark with a view size of 10,000 bytes. This could be done with the following call:

  MemoryMappedViewAccessor accessor =
    mmf.CreateViewAccessor(1024 * 1024, 10000);

Later on, you will see more advanced uses for these mapped views. But first, you need to learn about reading from the view.

Reading from the Mapped File

To use a previously mapped memory location, you need to use the methods of the MemoryMappedViewAccessor class. For instance, to read ten bytes starting from the beginning of the file map, you could use the ReadByte method as follows:

  ...
  MemoryMappedViewAccessor accessor =
    mmf.CreateViewAccessor();
  byte[] buffer = new byte[10];
  for (int index = 0; index < buffer.Length; index++)
  {
    buffer[index] = accessor.ReadByte(index);
  }

The Read method can either fill in the contents of a given general object, or it can take a more specific object using generics with Read<T> or ReadArray<T>. For instance, assume you had an object of type Guid (defined as a structure), then the two ReadNNN method calls below would have similar results:

  // method 1:
  byte[] buffer = new byte[16];
  accessor.ReadArray(0, buffer, 0, buffer.Length);
  Guid guid = new Guid(buffer);
  MessageBox.Show(guid.ToString());
  
  // method 2:
  Guid guid2 = new Guid();
  accessor.Read(0, out guid2);
  MessageBox.Show(guid2.ToString());

Note that in both Read method calls, you have to specify the location from which the reading is to begin. This zero- based offset is always relative to the mapped view, but not necessarily the original file. When you create the memory mapping object, you need to specify a window of memory through which you want to manipulate the file (Figure 1). If you don't specify any offset as in the code listings above, then the view is assumed to start from the beginning of the file.



Click here for larger image

Figure 1. View offsets are always relative to the mapped view.

To help providing flexibility, you can start from the offset of zero and run until the length of the file or you can start from the middle and map only a portion of the file. Reading through the accessor object is done by offsets relative to the view. That is, the original file offset would then be the view's starting offset plus the view offset.

Remember also that memory mapping objects and the files underneath have operating system handles. Thus, it is important to remember to dispose the objects after you are done with them; otherwise they will remain open for an indefinite amount of time until garbage collection kicks in. A good practice is to use try-finally blocks or use the C# using statements.

If you are happy working with .NET stream objects but would still like to benefit from memory-mapped files, then you are in luck. The MemoryMappedFile class contains a handy method called CreateViewStream, which returns a MemoryMappedViewStream object. This object allows sequential access to the mapped view; this is probably the biggest disadvantage of using mapped view streams compared to using the accessor objects which allow random access. But if you are happy with this limitation, then the CreateViewStream method is your friend.

Sharing Memory Between Objects and Processes

Previously, you saw how you can use memory-mapped files to ease access to a file's contents using simple memory operations. The next step is to learn how to use this knowledge to share memory inside your application and also between processes.

When mapping a file's contents into memory, you need to specify, among other things, which file on disk you want and which portion of it you are mapping. This is easy, but what's not as obvious is that you can map the same file multiple times, even if the mapped regions are the same or they overlap (Figure 2).


Figure 2. Portions of a file can be mapped multiple times.

By utilizing this knowledge, it is possible to let multiple threads to access the contents of the file, without having to worry about concurrency or locking. It is simply enough to know how to read and write from the memory block. And with the view accessor class, you already know how to do that. Here is an example of how you can map the beginning of a file into more than one accessor object. Naturally, the file name of the file must match each time you create a new view accessor. This is easy to accomplish if you use the same memory mapping object twice:

  ...
  MemoryMappedViewAccessor accessor1 =
    mmf.CreateViewAccessor();
  MemoryMappedViewAccessor accessor2 =
    mmf.CreateViewAccessor();
      
  // write
  byte writeChr = Encoding.ASCII.GetBytes("Z")[0];
  accessor1.Write(0, writeChr);
  
  // read
  byte readChr = accessor2.ReadByte(0);
  string status = (readChr == writeChr) ? "Match!" : "No match!";
  MessageBox.Show(status); // match

Note that once the writing to the view's memory block has completed, the contents of the file has changed. The operating system might not immediately flush the changed data to disk, but usually this is near-instant. No separate commit or flush operation is needed; this is one of the beauties of memory-mapped files.

To share the mapped file between processes, you must give your view a name. The name allows you to open a synchronized view in more than one process, and it goes without saying that the name must be unique among object names in the system. Assume that you want to send a string from one process into another. Here is the code to open a named memory-mapped view, and to write a simple string to the view:

  MemoryMappedFile mmf = MemoryMappedFile.CreateOrOpen(
    "my-mmf-map-name", 1000);
  MemoryMappedViewAccessor accessor =
    mmf.CreateViewAccessor();
  string message = "Hello, Memory-Mapped World!";
  byte[] asciiBytes = Encoding.ASCII.GetBytes(message);
  accessor.WriteArray(0, asciiBytes, 0, asciiBytes.Length);
  MessageBox.Show("Message written.");

Note how in the above code there is no physical file to contain the data. Because of this, you need to specify a capacity parameter when calling the CreateOrOpen method. In the above code, this is set to 1,000 bytes. The capacity defines the size of the memory block. But more on this shortly. Returning to the example of sharing information between processes, the next step would be to use the similarly-named view in another process to read the string back:

  MemoryMappedFile mmf = MemoryMappedFile.CreateOrOpen(
    "my-mmf-map-name", 1000);
  MemoryMappedViewAccessor accessor =
    mmf.CreateViewAccessor();
  byte byteValue;
  int index = 0;
  StringBuilder message = new StringBuilder();
  do
  {
    byteValue = accessor.ReadByte(index);
    if (byteValue != 0)
    {
      char asciiChar = (char)byteValue;
      message.Append(asciiChar);
    }
    index++;
  } while (byteValue != 0);
  MessageBox.Show("Found text: \""+message+"\".");

In the above code, the second process opens the same memory-mapped view by using the CreateOrOpen static method of the MemoryMappedFile class. Then, the accessor object is created just the same as before, and the data is read byte-by-byte until a zero terminator byte is found. Then, the message is processed, which in this case means showing it on the screen. A very easy way to do inter-process communication (IPC) between processes!

Creating, Expanding and Truncating files

So far, you have learned how to access memory-mapped files that have already existed on disk, or created on the fly for inter-process communications. What if you wanted to create a file from scratch, expand or truncate a file mapped into memory? Luckily, all these three scenarios are straightforward to implement.

First, if you wanted to create a new file to create a memory-mapped view on, you could execute the following code:

  FileStream file = new FileStream(
      @"C:\Temp\MyNewFile.dat", FileMode.CreateNew);
  MemoryMappedFile mmf =
      MemoryMappedFile.CreateFromFile(file, null, 1000);
  MemoryMappedViewAccessor accessor =
      mmf.CreateViewAccessor();

Here, a new file is created by specifying the CreateNew mode in the call to the FileStream's constructor. This will create a new, zero-length file on disk. Such empty files cannot be directly used to create views, and so the CreateFromFile method call must contain a capacity parameter. In the above example, the file will have a capacity of 1,000 bytes, and if nothing else is written to the file, then the file will contain values of zero, i.e. null characters.

Given the above situation of a file with length of 1,000 bytes, how would you continue writing past that limit? If you map a view and use the accessor object's Write method to try to write past the capacity (size) of the file, the operation will silently fail (it is possible that future .NET 4.0 releases will act differently). As such, you cannot simply expand a file by writing past the end of the file, as you could for instance with streams.

How then would you expand a file? The answer lies again in the capacity parameter of the CreateFromFile method call in the MemoryMappedFile class. If you specify a larger capacity than the actual file on disk, then Windows will extend the file to match the capacity given. Naturally, this can only succeed if there is enough free disk space, so capacity increases will not (always) work even if you had enough memory.

The following code listing shows how to expand the previously described 1,000 byte file to 2,000 bytes:

  FileStream file = new FileStream(
      @"C:\Temp\MyNewFile.dat", FileMode.Open);
  MemoryMappedFile mmf =
      MemoryMappedFile.CreateFromFile(file, null, 2000);

The capacity parameter is defined as a C# long, which means that it is a signed 64-bit value (System.Int64). You are then not limited to 2 gigabyte views at a time, but can instead use much larger views. Practically speaking, the only limit is the free virtual address space in your application, around 8 terabytes if you have a 64-bit Windows operating system and compile your .NET application to be a 64-bit application (the x64 platform target mode in Visual Studio). On a regular 32-bit system, the limit is usually less than 2 GB, depending on system setup and available memory. The third common operation, truncation, is done a bit differently than the previous two scenarios: truncating the file must be made at the file level. If you try to specify a capacity parameter value smaller than the actual file on disc, you will get an error stating that the capacity value cannot be smaller than the file size. Thus, you must choose another approach, and one way is the use the FileStream's SetLength method.

To get a size of a file, you could test the Length property of the stream, or use the similarly named property of the System.IO.FileInfo class.

Conclusion

In this article, you learned about memory-mapped files and the managed support classes for them in .NET Framework 4.0. Memory mapping is a useful technique that allows an easy way to read and write to files using simple memory operations. No stream seeking is required, and you do not have to worry about files being larger than can fit into memory: simply map a portion of the file as needed, and you are done.

Memory mapping is also useful in sharing data between an application's threads, but also between processes running on the same system. To share data between processes, all you need to do is give your mapped view object a unique name. If this name matches that in the other process(es), then the data is automatically shared.

With .NET 4.0, you can use managed classes to use memory- mapped files. Reading and writing to a mapped view is done through an accessor object, which can also take a form of a more traditional stream. Getting the accessor object itself is usually a three step process: first open the file using a FileStream, then create a memory mapping object, and finally get the accessor through the mapping object.

The accessor object allows you to easily read and write the most primitive data types, but generic data type support allows you to get to more complex types, including arrays. Strings can be read and written on a byte-per-byte basis; you need to remember to do proper encoding to read and write correctly.

Memory mapping is a valuable technique to access data in files, small or large. With .NET 4.0, managed code developers should learn this new method available to them, and use it whenever needed. It shares as a good alternative to the more traditional methods of accessing files. Happy memory-mapping!

Links

What's New in the .NET Framework 4
The System.IO.MemoryMappedFiles namespace
The MemoryMappedViewAccessor class

About the Author

Jani Järvinen is a software development trainer and consultant in Finland. He is a Microsoft C# MVP and a frequent author and has published three books about software development. He is the group leader of a Finnish software development expert group at ITpro.fi and a board member of the Finnish Visual Studio Team System User Group. His blog can be found at http://www .saunalahti.fi/janij/. You can send him mail by clicking on his name at the top of the article.

Sitemap | Contact Us

Thanks for your registration, follow us on our social networks to keep up-to-date