If the tutorials in this series had been ordered on the basis of their relative importance, this and the upcoming lessons dealing with memory use would have been first. Understanding how CE uses and manages memory in general is the key to your ultimate success on the CE platforms. This may seem like unnecessary detail, especially if your application doesn’t do any memory allocation directly. However, even if you don’t explicitly allocate memory, it is allocated in your behalf–by the loader–for your program’s stack, heap, and static data storage. Thus, you still have to make informed choices to create a memory efficient application.
What does it mean to be memory efficient on a Win CE device? It means that your code, your data, and everything else in CE land should be able to gracefully coexist in 2 MB. No kidding. If this hasn’t got your attention, take a moment and look at the file sizes of a few of your desktop Win 32 applications.
In the early days of Windows, we used to speak of the need for applications to be “well behaved”. What we meant by this was that no application should acquire and hold so many resources that other applications couldn’t execute when they got a time slice. In essence, understanding the rules and limitations of CE’s memory environment is what makes your application “well behaved” in the CE world.
There are four areas we’ll examine in this and future lessons on memory management:
- How CE memory is laid out
- How to choose memory allocation strategies for your application
- How to optimize the placement of application data
- What happens when the system is in a very low memory state
An understanding of these issues will allow you to make a smooth transition to the memory constrained environment of Windows CE.
Porting Tip: To be an effective CE programmer, you must understand CE memory architecture. The rules are not difficult to understand, but they are completely different than what you are used to on the desktop.
Something New Under The Sun
Before we begin to dissect the architecture of CE, let’s step back and take in the panoramic view. Up to this point, we have framed the discussion in terms of porting Win32 code to Windows CE. This approach is informed by the bias that Windows CE is fundamentally nothing more than an “itty-bitty” Windows. While this has been a useful fiction up to this point, it is, in fact, not at all the case.
Windows CE is an entirely new computing model, a marriage between embedded systems technology and the larger Windows world. Its defining characteristic is mobility. The CE user doesn’t “go to” a computer or a network; rather they carry these with them. In the CE world view, the network is wherever you are. Up to now, we’ve often heard the term “seamless” used to describe interoperable software applications. CE brings seamlessness to the human / computer interface.
As software developers, the importance of the “Windows-ness” of CE is twofold: First, CE abstracts handheld embedded computing with the stable, proven Win32 API, giving us a truly cross platform tool set for this genre of devices; And second, we have a large existing code base that is at least nominally portable to this API. Essentially, clothing personal embedded computing in the Win32 programming model provides developers with a huge productivity advantage. However, if we view this model too literally or too credulously, we lose sight of the real potentials of CE: inherent mobility and discretionary connectivity.
We need the Win32 abstraction for productivity reasons, but we also need to know how things really work “under the hood”.
Anatomy 101
In part, the durability, and hence mobility, of CE devices derives from the fact that they have no moving parts ( disk drives, for example ). All storage is either in ROM ( Read Only Memory )or non-volatile RAM (Random Access memory ). ROM is used to store the operating system and any bundled applications a vendor distributes as part of the device.
RAM is divided into two regions: the object store, and program memory. The object store is functionally equivalent to the file system of a desktop computer. Files, application data, and third party executable code are stored here. All files in the object store are maintained in a special compressed format. This compression process isn’t visible to you, the programmer, but it’s one reason that CE programs appear to execute more slowly than desktop programs — everything, including executable code, must be uncompressed when you access it and recompressed when you store it. Users can adjust the proportions of memory devoted to the object store and to program memory.
Porting Tip: Writeable memory is divided into a persistent storage area, the object store, and an area in which code executes, program memory. Users can adjust the proportion of memory devoted to each of these areas.
All third party code and some ROM based code is stored in compressed format. Program memory is where code executes if it was stored in compressed form. This system is highly space efficient, because Windows CE uses demand paging to load pages of executables: The loader uncompresses requested pages, and brings them into program memory a page at a time ( page size is determined by device manufacturers, but in practice is either 1K or 4 K), as they are needed. Older pages are swapped out as the execution point moves past them and they are no longer needed ( a Least Recently Used algorithm determines which pages are swapped ).
Some ROM resident code is not stored in compressed format. Obviously, it would be extremely wasteful to compress and uncompress parts of the kernel and OS that are almost constantly in use. This type of code executes in place, that is, in ROM, using only small amounts of program memory for stack and heap space. This means that it doesn’t incur the overhead of the compression cycle or the load process — providing a significant performance advantage and power savings, but at the cost of increased storage space.
Other less frequently used ROM resident executables and DLLs are stored in compressed format. This code is treated exactly the same as RAM resident compressed code. It is uncompressed and loaded on demand, executing RAM based program memory.
How CE Applications Allocate Memory
The Windows CE memory management API is sophisticated, but it is lean. It consists of three groups of API functions: the VirtualAlloc family, the HeapAlloc family and the LocalAlloc family. Don’t use the C runtime functions to allocate memory on CE — this will cause unpredictable behavior.
Porting Tip: Eliminate all calls to calloc(), malloc() and alloc(). Replace these with the Windows CE memory allocation APIs that best suite your needs.
Basically, the size, duration, and consistency of your application’s memory allocation will determine which type of allocation strategy you’ll want to use. Our two goals in allocating memory under CE are to use memory as sparingly as possible, and when we are done with it, to make sure that it is made available to other processes as quickly as possible. Here are three important differences between the families of allocation functions:
- When they withdraw memory blocks from the allocation pool, giving your application exclusive access to them
- How much memory they actually remove from the allocation pool
- When they return the memory to the allocation pool after you free it
Let’s look at the allocation APIs based on types of allocation scenarios.
Making Large Allocations for Fairly Limited Durations
Say you have an application that displays large bitmapped graphics. If you are using standard bitmap file formats, you must load the entire file in order to transfer it to the screen. Depending on the color depth of the images, this could amount to a substantial allocation of memory. However, you don’t need to hold the memory for very long — only long enough to do the raster drawing. The VirtualAlloc() / VirtualFree() family of functions might be what you need for this kind of job. Below is the declaration for VirtualAlloc():
LPVOID VirtualAlloc( LPVOID lpAddress, DWORD dwSize, DWORD flAllocationType, DWORD flProtect);
The parameters are, in the order shown, the desired base address of the requested allocation, the size in bytes of the allocation request, a flag that specifies whether to reserve or actually commit the allocation, and the access permissions for the allocation. If the first parameter is NULL, the block can be allocated anywhere space is available. The second parameter ( dwSize ) is rounded up to the next full page size. Pay careful attention to the value of dwSize — if it is one byte over the physical page size, you’ll end up allocating two pages. You can get the physical page size for a device using this function:
VOID GetSystemInfo( LPSYSTEM_INFO lpSystemInfo);
For our purposes, there are two important values for the flAllocationType flags: MEM_COMMIT and MEM_RESERVE. MEM_RESERVE indicates your intention, at some future point in time, to actually use space. You don’t actually have physical access to the space after reserving it. To get access, you must call VirtualAlloc() on the page ( or pages ) with the MEM_COMMIT flag. This two-step strategy has a pair of important advantages: Remember, we’re allocating whole pages at a time, which under CE is a very large amount of memory. You call VirtualAlloc() to reserve a block of pages. When you actually need the pages, VirtualAlloc() can commit single pages in the reserved block. This allows you to ensure space will be available before you begin a memory intensive operation. However, it doesn’t actually withdraw physical memory from the allocation pool until you need it.
To free space allocated by VirtualAlloc(), you call VirtualFree(). The key thing to know about this call is that it returns freed pages to the allocation pool immediately.
BOOL VirtualFree(LPVOID lpAddress, DWORD dwSize, DWORD dwFreeType);
The parameters, in the order shown, are the base address of the block being freed, the size to free, and a flag that specifies what change to make in the block’s allocation status. The flag parameter dwFreeType specifies whether to decommit a page ( dwFreeType = MEM_DECOMMIT) or to completely free the page( dwFreeType = MEM_RELEASE). When a page is decommited, it can be reallocated by the process that reserved it. When it is freed, it is returned to the system memory and can be allocated by anybody.
Porting Tip: Virtual Alloc is best used for large allocations of fairly short duration.
- VirtualAlloc() allocates memory in whole page increments
- Any unused memory inside a page is wasted; it can’t be used to satisfy other allocation requests
- Memory can be reserved without withdrawing it from the physical allocation pool
- Memory can’t be accessed until it is committed, which removes it from the physical allocation pool
- When you call VirtualFree() to release memory, it is immediately available to other processes.
Looking Ahead
VirtualAlloc is easy to use, but it allocates a great deal of memory. Because its syntactically similar to the C runtime family (calloc(), malloc(), etc. ), it’s tempting to make a quick porting dash to VirtualAlloc(). However, you shouldn’t use this scheme unless you can productively exploit most of the page or pages being withdrawn from the allocation pool. If you anticipate allocation patterns that use small amounts of memory for variable durations, but more than you can put in the local program heap, you might be better of setting up a private heap. In the next installment, we’ll see how to allocate and use a private heap, and learn about the advantages and disadvantages of this approach to memory allocation.
About the Author
Nancy Nicolaisen is a software engineer who has designed and implemented highly modular Windows CE products that include features such as full remote diagnostics, CE-side data compression, dynamically constructed user interface, automatic screen size detection, entry time data validation.
In addition to writing for Developer.com, she has written several books including Making Win 32 Applications Mobile.
# # #