Microsoft & .NETUsing Memory-Mapped Files in .NET 4.0

Using Memory-Mapped Files in .NET 4.0

Introduction


Assume you have the need to manipulate multi-gigabyte
files and read and write data to them. One option would be
to access the file using a sequential stream, which is fine
if you need to access the file from the beginning to the
end. However, things get more problematic when you need
random access. Seeking the stream is naturally a solution,
but unfortunately a slow one.


If you have background in Windows API development, then
you might be aware of an old technique called memory-mapped
files (sometimes abbreviated MMF). The idea of memory-mapped
files or file mapping is to load a file into memory so that
it appears as a continuous block in your application’s
address space. Then, reading and writing to the file is
simply a matter of accessing the correct memory location. In
fact, when the operating system loader fetches your
application’s EXE or DLL files to execute their code, file
mapping is used behind the scenes.


Using memory-mapped files from .NET applications is not
new in itself, as it has been possible to use the underlying
operating system APIs using Platform Invoke (P/Invoke)
available already in .NET 1.0. However, in .NET 4.0, using
memory-mapped files becomes available for all managed code
developers without using the Windows APIs directly.


Memory-mapped files and large files are often associated
together in the minds of developers, but there’s no
practical limit to how large or small the files accessed
through memory mapping can be. Although using memory mapping
for large files make programming easier, you might observe
even better performance when using smaller files, as they
can fit entirely in the file system cache.


The information and the code listings in this article are
based on the .NET 4.0 Beta 1 release, available since May
2009. As is the case with pre-release software, technical
details, class names and available methods might change once
the final RTM version of .NET becomes available. This is
worth keeping in mind while studying or developing
against any beta library.


The New Namespace and its Classes


For .NET 4.0 developers, the interesting classes that
work with memory-mapped files live in the new
System.IO.MemoryMappedFiles namespace.
Presently, this namespace contains four classes and several
enumerations to help you access and secure your file
mappings. The actual implementation is inside the assembly
System.Core.dll.


The most important class for the developer is the
MemoryMappedFile class. This class allows you to create a
memory-mapped object, from which you can in turn create a
view accessor object. You can then use this accessor to
manipulate directly the memory block mapped from the file.
Manipulation can be done using the convenient Read and Write
methods.


Note that since direct pointers are not considered a
sound programming practice in the managed world, such an
access object is needed to keep things tidy. In traditional
Windows API development in native code, you would simply get
a pointer to the beginning of your memory block.


That said, the process or acquiring a memory-mapped file
and the necessary accessor object, you need to follow three
simple steps. First, you need a file stream object that
points to (an existing) file on disk. Secondly, you can
create the mapping object from this file, and as a final
step, you create the accessor object. Here is a code example
in C#:



FileStream file = new FileStream(
@”C:TempMyFile.dat”, FileMode.Open);
MemoryMappedFile mmf =
MemoryMappedFile.CreateFromFile(file);
MemoryMappedViewAccessor accessor =
mmf.CreateViewAccessor();


The code first opens a file with the
System.IO.FileStream class, and then passes the
stream object instance to the static
CreateFromFile method of the
MemoryMappedFile class. The third step is to
call the CreateViewAccessor method of the
MemoryMappedFile class.


In the above code, the CreateViewAccessor
method is called without any parameters. In this case, the
mapping begins from the start of the file (offset zero) and
ends at the last byte of the file. You can however easily
map in any portion of the file. For instance, if your file
would be one gigabyte in size, then you could map, say, a
view at the one megabyte mark with a view size of 10,000
bytes. This could be done with the following call:



MemoryMappedViewAccessor accessor =
mmf.CreateViewAccessor(1024 * 1024, 10000);


Later on, you will see more advanced uses for these
mapped views. But first, you need to learn about reading
from the view.

Reading from the Mapped File


To use a previously mapped memory location, you need to use the methods of the MemoryMappedViewAccessor class. For instance, to read ten bytes starting from the beginning of the file map, you could use the ReadByte method as follows:



MemoryMappedViewAccessor accessor =
mmf.CreateViewAccessor();
byte[] buffer = new byte[10];
for (int index = 0; index < buffer.Length; index++)
{
buffer[index] = accessor.ReadByte(index);
}


The Read method can either fill in the contents of a given general object, or it can take a more specific object using generics with Read<T> or ReadArray<T>. For instance, assume you had an object of type Guid (defined as a structure), then the two ReadNNN method calls below would have similar results:


  // method 1:
byte[] buffer = new byte[16];
accessor.ReadArray(0, buffer, 0, buffer.Length);
Guid guid = new Guid(buffer);
MessageBox.Show(guid.ToString());

// method 2:
Guid guid2 = new Guid();
accessor.Read(0, out guid2);
MessageBox.Show(guid2.ToString());



Note that in both Read method calls, you have to specify the location from which the reading is to begin. This zero- based offset is always relative to the mapped view, but not necessarily the original file. When you create the memory mapping object, you need to specify a window of memory through which you want to manipulate the file (Figure 1). If you don’t specify any offset as in the code listings above, then the view is assumed to start from the beginning of the file.

Remember also that memory mapping objects and the files underneath have operating system handles. Thus, it is important to remember to dispose the objects after you are done with them; otherwise they will remain open for an indefinite amount of time until garbage collection kicks in. A good practice is to use try-finally blocks or use the C# using statements.


If you are happy working with .NET stream objects but would still like to benefit from memory-mapped files, then you are in luck. The MemoryMappedFile class contains a handy method called CreateViewStream, which returns a MemoryMappedViewStream object. This object allows sequential access to the mapped view; this is probably the biggest disadvantage of using mapped view streams compared to using the accessor objects which allow random access. But if you are happy with this limitation, then the CreateViewStream method is your friend.

Sharing Memory Between Objects and Processes


Previously, you saw how you can use memory-mapped files
to ease access to a file’s contents using simple memory
operations. The next step is to learn how to use this
knowledge to share memory inside your application and also
between processes.


When mapping a file’s contents into memory, you need to
specify, among other things, which file on disk you want and
which portion of it you are mapping. This is easy, but
what’s not as obvious is that you can map the same file
multiple times, even if the mapped regions are the same or
they overlap (Figure 2).



Figure 2. Portions of a file can be mapped multiple times.

By utilizing this knowledge, it is possible to let
multiple threads to access the contents of the file, without
having to worry about concurrency or locking. It is simply
enough to know how to read and write from the memory block.
And with the view accessor class, you already know how to do
that. Here is an example of how you can map the beginning of
a file into more than one accessor object. Naturally, the
file name of the file must match each time you create a new
view accessor. This is easy to accomplish if you use the
same memory mapping object twice:




MemoryMappedViewAccessor accessor1 =
mmf.CreateViewAccessor();
MemoryMappedViewAccessor accessor2 =
mmf.CreateViewAccessor();

// write
byte writeChr = Encoding.ASCII.GetBytes(“Z”)[0];
accessor1.Write(0, writeChr);

// read
byte readChr = accessor2.ReadByte(0);
string status = (readChr == writeChr) ? “Match!” : “No match!”;
MessageBox.Show(status); // match



Note that once the writing to the view’s memory block has
completed, the contents of the file has changed. The
operating system might not immediately flush the changed
data to disk, but usually this is near-instant. No separate
commit or flush operation is needed; this is one of the
beauties of memory-mapped files.


To share the mapped file between processes, you must give
your view a name. The name allows you to open a synchronized
view in more than one process, and it goes without saying
that the name must be unique among object names in the
system. Assume that you want to send a string from one
process into another. Here is the code to open a named
memory-mapped view, and to write a simple string to the
view:



MemoryMappedFile mmf = MemoryMappedFile.CreateOrOpen(
“my-mmf-map-name”, 1000);
MemoryMappedViewAccessor accessor =
mmf.CreateViewAccessor();
string message = “Hello, Memory-Mapped World!”;
byte[] asciiBytes = Encoding.ASCII.GetBytes(message);
accessor.WriteArray(0, asciiBytes, 0, asciiBytes.Length);
MessageBox.Show(“Message written.”);


Note how in the above code there is no physical file to
contain the data. Because of this, you need to specify a
capacity parameter when calling the
CreateOrOpen method. In the above code, this is
set to 1,000 bytes. The capacity defines the size of the
memory block. But more on this shortly. Returning to the
example of sharing information between processes, the next
step would be to use the similarly-named view in another
process to read the string back:



MemoryMappedFile mmf = MemoryMappedFile.CreateOrOpen(
“my-mmf-map-name”, 1000);
MemoryMappedViewAccessor accessor =
mmf.CreateViewAccessor();
byte byteValue;
int index = 0;
StringBuilder message = new StringBuilder();
do
{
byteValue = accessor.ReadByte(index);
if (byteValue != 0)
{
char asciiChar = (char)byteValue;
message.Append(asciiChar);
}
index++;
} while (byteValue != 0);
MessageBox.Show(“Found text: “”+message+””.”);


In the above code, the second process opens the same
memory-mapped view by using the CreateOrOpen
static
method of the MemoryMappedFile
class. Then, the accessor object is created just the same as
before, and the data is read byte-by-byte until a zero
terminator byte is found. Then, the message is processed,
which in this case means showing it on the screen. A very
easy way to do inter-process communication (IPC) between
processes!

Creating, Expanding and Truncating files


So far, you have learned how to access memory-mapped
files that have already existed on disk, or created on the
fly for inter-process communications. What if you wanted to
create a file from scratch, expand or truncate a file mapped
into memory? Luckily, all these three scenarios are
straightforward to implement.


First, if you wanted to create a new file to create a
memory-mapped view on, you could execute the following
code:



FileStream file = new FileStream(
@”C:TempMyNewFile.dat”, FileMode.CreateNew);
MemoryMappedFile mmf =
MemoryMappedFile.CreateFromFile(file, null, 1000);
MemoryMappedViewAccessor accessor =
mmf.CreateViewAccessor();


Here, a new file is created by specifying the
CreateNew mode in the call to the FileStream’s
constructor. This will create a new, zero-length file on
disk. Such empty files cannot be directly used to create
views, and so the CreateFromFile method call
must contain a capacity parameter. In the above example, the
file will have a capacity of 1,000 bytes, and if nothing
else is written to the file, then the file will contain
values of zero, i.e. null characters.


Given the above situation of a file with length of 1,000
bytes, how would you continue writing past that limit? If
you map a view and use the accessor object’s Write method to
try to write past the capacity (size) of the file, the
operation will silently fail (it is possible that future
.NET 4.0 releases will act differently). As such, you cannot
simply expand a file by writing past the end of the file, as
you could for instance with streams.


How then would you expand a file? The answer lies again
in the capacity parameter of the CreateFromFile
method call in the MemoryMappedFile class. If
you specify a larger capacity than the actual file on disk,
then Windows will extend the file to match the capacity
given. Naturally, this can only succeed if there is enough
free disk space, so capacity increases will not (always)
work even if you had enough memory.


The following code listing shows how to expand the
previously described 1,000 byte file to 2,000 bytes:



FileStream file = new FileStream(
@”C:TempMyNewFile.dat”, FileMode.Open);
MemoryMappedFile mmf =
MemoryMappedFile.CreateFromFile(file, null, 2000);


The capacity parameter is defined as a C# long, which
means that it is a signed 64-bit value (System.Int64). You
are then not limited to 2 gigabyte views at a time, but can
instead use much larger views. Practically speaking, the
only limit is the free virtual address space in your
application, around 8 terabytes if you have a 64-bit Windows
operating system and compile your .NET application to be a
64-bit application (the x64 platform target mode in Visual
Studio). On a regular 32-bit system, the limit is usually
less than 2 GB, depending on system setup and available
memory. The third common operation, truncation, is done a
bit differently than the previous two scenarios: truncating
the file must be made at the file level. If you try to
specify a capacity parameter value smaller than the actual
file on disc, you will get an error stating that the
capacity value cannot be smaller than the file size. Thus,
you must choose another approach, and one way is the use the
FileStream’s SetLength method.


To get a size of a file, you could test the Length
property of the stream, or use the similarly named property
of the System.IO.FileInfo class.


Conclusion


In this article, you learned about memory-mapped files
and the managed support classes for them in .NET Framework
4.0. Memory mapping is a useful technique that allows an
easy way to read and write to files using simple memory
operations. No stream seeking is required, and you do not
have to worry about files being larger than can fit into
memory: simply map a portion of the file as needed, and you
are done.


Memory mapping is also useful in sharing data between an
application’s threads, but also between processes running on
the same system. To share data between processes, all you
need to do is give your mapped view object a unique name. If
this name matches that in the other process(es), then the
data is automatically shared.


With .NET 4.0, you can use managed classes to use memory-
mapped files. Reading and writing to a mapped view is done
through an accessor object, which can also take a form of a
more traditional stream. Getting the accessor object itself
is usually a three step process: first open the file using a
FileStream, then create a memory mapping object, and finally
get the accessor through the mapping object.


The accessor object allows you to easily read and write
the most primitive data types, but generic data type support
allows you to get to more complex types, including arrays.
Strings can be read and written on a byte-per-byte basis;
you need to remember to do proper encoding to read and write
correctly.


Memory mapping is a valuable technique to access data in
files, small or large. With .NET 4.0, managed code
developers should learn this new method available to them,
and use it whenever needed. It shares as a good alternative
to the more traditional methods of accessing files. Happy
memory-mapping!


Links


What’s New in the .NET Framework 4

The System.IO.MemoryMappedFiles namespace

The MemoryMappedViewAccessor class

About the Author


Jani Järvinen is a software development trainer and
consultant in Finland. He is a Microsoft C# MVP and a
frequent author and has published three books about software
development. He is the group leader of a Finnish software
development expert group at ITpro.fi and a board member of
the Finnish Visual Studio Team System User Group. His blog
can be found at http://www
.saunalahti.fi/janij/
. You can send him mail by clicking
on his name at the top of the article.

Latest Posts

Related Stories