LanguagesMoving Forward in Programming with Assembler

Moving Forward in Programming with Assembler

Developer.com content and product recommendations are editorially independent. We may make money when you click on links to our partners. Learn More.

Prerequisite

This article assumes that the reader has installed MASM32. If you have not, it is available from http://www.masm32.com/.

Introduction

In the last article, you saw how to set up Visual Studio to compile an Assembler file with the Microsoft Assembler. In this article, I will begin to describe the language itself, and some of the instructions that it contains.

Variables

There are no variables in Assembler—or at least not in the C++ sense. In Assembler you have registers and memory addresses. Bear in mind, you’re talking the same language as the processor now.

For instance, the processor doesn’t know that you have an integer called ‘nMyInteger’. It doesn’t know you have a class called “CMyClass”. All it knows about is the ‘registers’ and that it can access memory given an address.

So what are registers? And how do I access memory?

Registers

Put simply, a register is like a variable, but just for the processor’s use. When I said that there are no variables in the sense of higher level languages, I meant what I said. There are only a set number of registers that exist on the processor chip. Think of it this way: A register is a hard-coded variable for the processor; it exists physically on the chip.

These registers can represent numbers the same size as the ‘bit’ count of the processor. In other words, in a 32-bit processor these numbers are 32 bits in size. In C++ terms, they are DWORDs.

They are also unsigned. Negative numbers (if required) are represented by 0x100000000 + (negative number). So, -1 would be represented by 0xFFFFFFFF, -2 by 0xFFFFFFFE, and so forth.

There are a quite a few registers in modern Intel processors, but there are only six you should be using in your applications:

eax - Accumulator Register
ebx - Base Register
ecx - Counter Register
edx - Data Register
esi - Source (for memory operations) register
edi - Destination (for memory operations) register

The registers eax, ebx, ecx, and edx can be split into their constituent bytes by changing the way that they are referred to. For instance, for the accumulator (in other words, the a register):

al    : First (lower) byte of the low word in the eax register
ah    : Second (higher) byte of the low word in the eax register
ax    : Lower word (i.e. 2-bytes) of the eax register
      ; (i.e. (ah << 8) + al)
eax   : The whole register (4-bytes)

The same naming convention goes for ebx, ecx, and edx (not esi or edi) as shown in the following image.

The names come from the origins of the processor. The ‘e’ notation means ‘extended’ register; in other words, the 32-bit flavours of each of the registers when 16-bit processors gave way to 32-bit processors. 16-bit processors only had ax, bx, cx, dx, and so forth; so, when the 32-bit processors came along, the extra 16 bits available were denoted by a preceding ‘e’. There is no way of accessing the top 16 bits of the registers directly.

The Mov (Move) Instruction

Now, start with the simplest instruction: the mov (move) instruction. The mov instruction is how you ‘move’ values about inside of the processor. For instance:

mov eax, 100

This ‘moves’ 100 into the eax register. It’s the same as saying eax=100. To define the move instruction, think of it as this:

mov (destination), (source)

The source and destination have to be the same size (in bits). Here are some examples of ‘mov’ instructions:

mov al, bl         ; move the lower byte of ebx into the lower byte
                   ; of eax
mov al, 0ffh       ; move 0xFF into the lower byte of eax
mov ah, 0ffh       ; move 0xFF into the high byte of the low word
                   ; (2-bytes) of eax
mov ax, 0ffffh     ; move 0xFFFF into the low word of eax
mov eax, 0ffffh    ; move 0xFFFF into eax

We can move the contents of memory into a register and vice-versa by using square brackets to indicate ‘contents of’. The number of bytes moved is determined by the register name:

mov al, [esi]     ; move the byte contained in the memory address
                  ; in register esi into the lower byte of eax
mov [edi], bl     ; move the byte value in the lowest byte of ebx
                  ; into the memory address in register edi
mov cx, [esi]     ; move the word (2-byte) value contained in the
                  ; memory address of register esi into the lower
                  ; word of ecx
mov [edi], edx    ; move the dword (4-byte) value contained in edx
                  ; into the memory address contained in register edi

You also can include an offset when using the ‘contents of’ (square brackets) operator:

mov al, [esi + 3]    ; move the byte contained in the memory address
                     ; in register esi + 3 into the lower byte of eax
mov [edi + 2], dx    ; move the lower word (2-bytes) contained in
                     ; edx into the memory address contained in the
                     ; register edi + 2

Functions

A function is declared in the following form:

TestProc proc dwValue1:DWORD, wValue2:WORD, bValue3:BYTE

   ret

TestProc endp

The preceding code is an example of a blank function, but it shows the basics. The name of the function is given first, followed by proc. The parameters to the function are defined in the subsequent list in the form <name>:<type>. Some of the basic types available are DWORD, WORD, and BYTE.

The end of the function is marked by a line containing the name of the function followed by endp.

The ret statement is the return statement; in other words, it marks the places where the function is to be exited. A ret statement MUST be included at the end of the function.

If the code is called from C++, the registers ebx, esi, and edi must be restored to their original values before returning from the function. The usual way of doing this is by using push and pop, which will be covered later.

The return value of the function is in the eax register. The function parameters can be accessed by name in most instructions; for example:

TestProc proc dwValue1:DWORD, dwValue2:DWORD

   mov eax, dwValue1
   add eax, dwValue2
   ret

TestProc endp

This function adds dwValue1 to dwValue2 and returns the result.

To access functions in C++, you must declare a function with the same name and parameters. The size of the parameters in C++ must equal the size of the parameter defined in the Assembler code. They must also be declared as extern “C” and using the stdcall calling convention. For example, the C++ definition for the above assembler function is:

extern "C" unsigned int __stdcall TestProc(unsigned int dwValue1,
                                           unsigned int dwValue2);

If a pointer is to be passed in, it is declared as a DWORD parameter in the assembler function as pointers (in 32-bit operating systems) are 32 bits in size. Similarly, a ‘char’ would be passed as a BYTE, a ‘WCHAR’ as a WORD, and so on.

If an assembler function is designed to be exported from a static DLL, you do not need to define it; just include its name in the .def file of the DLL and then it can be used like any other C++ function declared this way.

Push, Pop, and the Stack

The processor contains a stack onto which registers, constants, and the contents of memory can be pushed and popped by using the push and pop instructions.

The stack is intended to overcome the small number of registers available. It gives an effective, quick way of saving and restoring the contents of registers.

The stack is a first-in-last-out queue of values. The push instruction adds a value to the head of the queue and the pop instruction removes the value from the head of the queue and places it in a register or memory address. For example:

TestFunction proc

   mov eax, 100
   push eax    ; Stack now contains { 100 }

   mov eax, 200
   push eax    ; Stack now contains { 200, 100 }

   mov eax, 300
   push eax    ; Stack now contains ( 300, 200, 100 }

   pop eax     ; eax = 300, stack = { 200, 100 }
   pop eax     ; eax = 200, stack = { 100 }
   pop eax     ; eax = 100, stack = { }

   ret

TestFunction endp

A typical use of the stack is to restore the values of the registers ebx, esi, and edi before exiting from a function. For example:

TestFunction proc

   push ebx
   push esi
   push edi

   ; code goes in here

   pop edi
   pop esi
   pop ebx
   ret

TestFunction endp

Obviously, only the values of the registers that are being used need to be saved, but this does demonstrate the use of the push and pop instructions.

An important point to note is that, when exiting a function, the stack should always be in the same state that it was in when entering a function. Another way of saying this is that for every push statement there needs to be a corresponding pop statement before the function returns.

Flags and the Instructions that Affect and Use Them

A flag is a setting in the processor that can either be true or false. The processor contains a set of flags to indicate end states after operations. There are a number of flags, but the one that I’m going to be dealing with in this article is the ‘Zero’ flag. This flag is set by certain operations to indicate that a register has become zero. Other operations set this flag to indicate equality.

Consider the decrement dec instruction. This decrements the register or value specified. If the result is zero, the zero flag is set. For example:

TestFunction proc

   mov eax, 2
   dec eax    ; eax == 1
   dec eax    ; eax == 0, zero flag is set
   ret

TestFunction endp

There are other operations that behave differently depending on the states of a particular flag. One of these operations is the jump. Its raw form is jmp, which jumps the program execution to a location in memory (usually specified with a label, the same way as a goto statement in C++). It has various forms, two of which are jz (jump if zero) and jnz (jump if not zero).

By using these instructions and the knowledge of the zero flag, you now can write loops:

LoopFunction proc

   xor eax, eax    ; efficient way of saying eax = 0
   mov ecx, 5      ; ecx is the register generally used for counters

LoopStart:         ; this is a label, used for labelling code positions
   inc eax
   dec ecx
   jnz LoopStart

   ; eax now equals 5

   ret

LoopFunction endp

Conclusion

I have covered some of the basic instructions involved in Assember and demonstrated their use. I have also explained what a register is and the registers that exist in Assembler. I have also shown how to define functions with parameters in Assembler and write definitions in C++ for them.

In the next installment of this tutorial, I will cover arithmetic operations, and some of the macros that MASM provides to ease the development of Assembler code.

Get the Free Newsletter!

Subscribe to Developer Insider for top news, trends & analysis

Latest Posts

Related Stories