Performance and Its Killers
As you may guess, a performance in general is a number of characteristics that may be somehow measured. You can apply it either for devices or applications during their execution and have a look at RAM usage, booting time, and so forth. In particular, in case of mobile applications, device characteristics and features often dictate quite high and tough requirements to satisfy user expectations while you have very limited resources available on the device. Therefore, mobile applications should be designed carefully and employ every possibility to improve their performance.
This article will demonstrate a few simple but nevertheless useful examples to make it happen. File I/O operations, heap usage, and heavy loops are just a few examples of ‘performance killers’ to be noted. Most of them may be avoided easily when you keep an eye on them. You will explore most of the common mistakes that may help you significantly boost up your program’s performance.
File I/O
Basic reads and writes
If you have ever moved your existing application from, for example, Windows Mobile 2003 SE to Windows Mobile 5.0, you could not help but notice that all operations with files became dramatically slower. The same effect can be observed when you start working with some kind of flash card instead of the internal device’s memory (when it is not flash, of course). The reason is obvious enough—all read/write operations depend on the flash block size, regardless of how much data you want to read from or save to the flash card. Then, knowing this block size and adjusting buffers in your application accordingly can magically increase throughput of I/O operations.
In versions prior to WinCE 5.0, you might gather such information only via direct communication with a device driver DeviceIoControl API. In Windows CE 5.0, you luckily have the
WINBASEAPI BOOL CeGetVolumeInfo( LPCWSTR pszRootPath, CE_VOLUME_INFO_LEVEL InfoLevel, LPCE_VOLUME_INFO lpVolumeInfo);
function that fills in the CE_VOLUME_INFO structure:
typedef struct _CE_VOLUME_INFO { DWORD cbSize; DWORD dwAttributes; DWORD dwFlags; DWORD dwBlockSize; TCHAR szStoreName[STORENAMESIZE]; TCHAR szPartitionName[PARTITIONNAMESIZE]; } CE_VOLUME_INFO, *PCE_VOLUME_INFO, *LPCE_VOLUME_INFO;
for the given pszRootPath of the file system, where dwBlockSize is the block size of your flash in bytes.
Simple calculations show you that if you store, for example, 4 bytes every time to fit 1 KB in total with 512 bytes as flash block size (as usual value for flashes), it will require 256 calls. But, if you buffer the whole process at 512 bytes, you can end up only with 4 ‘write’ operations. Usually, every such ‘read/write’ requires a call to the kernel. Every single I/O may be quick enough, but summarizing the grand total effect of time loss on hundreds or thousands of read/write operations may easily make your application as slow as a snail.
Another good example of when you might want to use buffering is using it along with compression or encryption. Many algorithms use blocks as their input and output, so buffering is only a natural way of being there. In some situations, it may be even worth reading the whole block from flash, making the required changes, and then storing it back to achieve better performance.
Hidden I/O operations
An inefficiency of file I/O operations may be well hidden. This is especially true for C++ applications, when actual implementation of serialization operations for complex objects such as a list or an array is based on their items code; for example, when operator >> is overloaded for a simple item. This may result in many small time-consuming reads or writes. Consider the following example:
CSampleList list; stream >> list;
The problem lies in the container’s implementation of operator >>:
stream >> count; TSomeObject obj; for (int i = 0; i < count; i++) { stream >> obj; Add(obj); }
As you can see, it calls a further operator >> on every container’s item, which obviously costs too much in terms of performance. As a reasonable alternative approach, you can consider the following example:
stream >> count; stream.Read(pData,count * sizeof(TSomeObject));
This code will work much faster, but the price is that TSomeObject should be a simple type and its internal format has to be the same as expected.
Using memory-mapped files
As a logical continuation of buffered I/O, another option to increase the speed of I/O operations is memory mapping the file that you are writing to. It may be used as a suitable cache mechanism if your application can survive some data loss in case of a reset or power failure. Thus, you benefit from large blocks of data being written to the flash when you finally store such mapped files. For more details, please refer to the following APIs:
LPVOID WINAPI MapViewOfFile( HANDLE hFileMappingObject, DWORD dwDesiredAccess, DWORD dwFileOffsetHigh, DWORD dwFileOffsetLow, DWORD dwNumberOfBytesToMap ); BOOL WINAPI UnmapViewOfFile( LPCVOID lpBaseAddress ); WINBASEAPI BOOL WINAPI FlushViewOfFile( LPCVOID lpBaseAddress, DWORD dwNumberOfBytesToFlush ); WINBASEAPI HANDLE WINAPI CreateFileMapping( HANDLE hFile, LPSECURITY_ATTRIBUTES lpFileMappingAttributes, DWORD flProtect, DWORD dwMaximumSizeHigh, DWORD dwMaximumSizeLow, LPCTSTR lpName ); WINBASEAPI HANDLE WINAPI CreateFileForMapping( LPCTSTR lpFileName, DWORD dwDesiredAccess, DWORD dwShareMode, LPSECURITY_ATTRIBUTES lpSecurityAttributes, DWORD dwCreationDisposition, DWORD dwFlagsAndAttributes, HANDLE hTemplateFile);
Heap Usage
Now, turn to more general areas. On embedded systems, the stack size is often limited, so a heap should be used instead. Nevertheless, if used without care, this also may cause performance problems. Consider the following code snippet:
while (expr) { CSomeObject *pObj = new CSomeObject; DoSomething(pObj); delete pObj; }
If such a loop is executed for a large number of iterations, this amount of heap calls is just redundant and leads to heap fragmentation. You might consider the following scenario, which reuses temporary variables where possible:
CSomeObject *pObj = new CSomeObject; while (expr) { DoSomething(pObj); pObj->Reset(); } delete pObj;
Loops and Repeated Code
Look at the following code snippet:
void CSomeClass::SimpleOp(TSimple& aSimple1, TSimple& aSimple2) { ... CComplex complex = CreateComplex(aSimple1); // use complex somehow but don't modify it ... } void CSomeClass::ComplexOp() { TSimple s1, s2; while (expr) { SimpleOp(s1,s2); // do something but don't modify s1 ... } }
As you can see, a variable of CComplex type is created in the SimpleOp function, but never modified later on. Hence, a call to SimpleOp at every loop iteration is absolutely redundant and can be removed:
void CSomeClass::SimpleOp(CComplex& aComplex, TSimple& aSimple) { ... // use aComplex somehow but don't modify it ... } void CSomeClass::ComplexOp() { TSimple s; CComplex c = ...; // create an object of CComplex type somehow while (expr) { SimpleOp(c,s); // do something but don't modify s1 ... } }
In fact, such complex type creation may be quite an expensive operation, so dragging it out of the loop may add its bit of fuel to the overall application performance fire.
Type Conversions
It may seem funny, but how many times you have been faced with the situation when you have had to make many subsequent conversions just to transform data from one representation to another, as in the following snippet:
TIntType nInt; TCharType cChar = GetCharFromSomeData(); ConvertCharToInt(cChar,nInt); TMoreComplexType aObj = SomeFunc(nInt);
It might be a legacy system that determines the stored data format, but in many cases it is just a bad design resulting in an unnecessary waste of processor time. You obviously should avoid it where possible.
Another aspect of the ‘type’ problem is when you need some temporary object to interface to some data; for example:
aObject.Func().DoSomething(); aObject.Func().DoSomethingElse(); aObject.Func().DoSomethingAgain(); ...
Putting aside unnecessary function calls, a new temporary object is created on the stack after each call to aObject.Func(), which is not so good by itself. A situation may become even worse when you intended to have Func() as inline, but the compiler was commanded to produce the smallest code size. Thus, for frequently used functions, it may decide to skip this inline qualifier without letting you know about it! Having said this, it’s much better to re-factor the above code as follows:
CSomeType aObj = aObject.Func(); aObj.DoSomething(); aObj.DoSomethingElse(); aObj.DoSomethingAgain(); ...
You can justly say that this all is obvious, but…simply recall how many times you have seen it yourself.
Work with Your Compiler, not Against It
In all modern IDEs, compilers do a lot of optimizations to produce either smaller code or a faster execution path. Therefore, it is always a good practice to learn the basics of the compiler you are going to use in your development. Such knowledge will allow you to predict more accurately the final outcome and relate it to the code you write, thus avoiding a generation of inefficient binary for some particular cases. Besides, learning a little bit about an assembler also can help you understand what’s going on.
Conclusion
This article discussed several simple yet destructive ‘performance killers’ tbat may cause an unacceptable decrease in performance. In many cases, you can avoid such ‘bad practices’ and then focus on some other components that affect the overall effectiveness of your particular application; for example, external legacy systems, databases, networking, or whatever else. Hopefully, this article will help you on this rocky way.
About the Author
Alex Gusev started to play with mainframes at the end of the 1980s, using Pascal and REXX, but soon switched to C/C++ and Java on different platforms. When mobile PDAs seriously rose their heads in the IT market, Alex did it too. After working almost a decade for an international retail software company as a team leader of the Windows Mobile R department, he has decided to dive into Symbian OS ™ Core development.