x86 ISA Notes

References:

  1. Programming from the ground up - Book by Jonathan Bartlett
  2. https://c9x.me/x86/

SAL/SAR/SHL/SHR

more

Shifts the bits in the first operand (destination operand) to the left or right by the number of bits specified in the second operand (count operand). Bits shifted beyond the destination operand boundary are first shifted into the CF flag, then discarded. At the end of the shift operation, the CF flag contains the last bit shifted out of the destination operand.

The destination operand can be a register or a memory location. The count operand can be an immediate value or register CL. The count is masked to 5 bits, which limits the count range to 0 to 31. A special opcode encoding is provided for a count of 1.

The shift arithmetic left (SAL) and shift logical left (SHL) instructions perform the same operation; they shift the bits in the destination operand to the left (toward more significant bit locations).

Logical shift treats the number as a bunch of bits, and shifts in zeros. This is the >> operator in C.

Arithmetic shift treats the number as a signed integer (in 2s complement), and "retains" the topmost bit, shifting in zeros if the topmost bit was 0, and ones if it was one. C's right-shift operator has implementation-defined behavior if the number being shifted is negative.

What is the difference between arithmetic shift left and logical shift left?

They are the same operation. Both shift the bits to the left and adds a '0' to the LSB.

MUL

Unsigned Multiply

Opcode    Mnemonic    Description
F6 /4    MUL r/m8    Unsigned multiply (AX = AL * r/m8).
F7 /4    MUL r/m16    Unsigned multiply (DX:AX = AX * r/m16).
F7 /4    MUL r/m32    Unsigned multiply (EDX:EAX = EAX * r/m32).

Description

Performs an unsigned multiplication of the first operand (destination operand) and the second operand (source operand) and stores the result in the destination operand. The destination operand is an implied operand located in register AL, AX or EAX (depending on the size of the operand); the source operand is located in a general-purpose register or a memory location. The action of this instruction and the location of the result depends on the opcode and the operand size as shown in the following table.

MUL Results
Operand Size    Source 1    Source 2    Destination
Byte            AL            r/m8        AX
Word            AX            r/m16       DX:AX
Doubleword      EAX            r/m32       EDX:EAX

The result is stored in register AX, register pair DX:AX, or register pair EDX:EAX (depending on the operand size), with the high-order bits of the product contained in register AH, DX, or EDX, respectively. If the high-order bits of the product are 0, the CF and OF flags are cleared; otherwise, the flags are set.

Note: The first and destination operand location are same. But the destination takes uses EDX register for the high order bits based on the size of operands. [refer above]

DIV

The DIV instruction (and its counterpart IDIV for signed numbers) gives both the quotient and remainder. For unsigned, remainder and modulus are the same thing. For signed idiv, it gives you the remainder (not modulus) which can be negative:

e.g. -5 / 2 = -2 rem -1. x86 division semantics exactly match C99's % operator.

DIV r32 divides a 64-bit number in EDX:EAX by a 32-bit operand (in any register or memory) and stores the quotient in EAX and the remainder in EDX. It faults on overflow of the quotient.

Unsigned 32-bit example (works in any mode)

mov eax, 1234          ; dividend low half
mov edx, 0             ; dividend high half = 0.  prefer  xor edx,edx

mov ebx, 10            ; divisor can be any register or memory

div ebx       ; Divides 1234 by 10.
        ; EDX =   4 = 1234 % 10  remainder
        ; EAX = 123 = 1234 / 10  quotient

LOOP

refer loop section in this page

LEAL

refer leal section in this page

PUSH / PUSHL

the instruction pushl %eax is equivalent to:

subl $4, %esp
movl %eax, (%esp)

POP / POPL

the instruction popl %eax is equivalent to:

movl (%esp), %eax
addl $4, %esp

What does pushl $RECORD_FIRSTNAME + record_buffer do?

It looks like we are combining and add instruction with a push instruction, but we are not.

You see, both RECORD_FIRSTNAME and record_buffer are constants. The first is a direct constant, created through the use of a .equ directive, while the latter is defined automatically by the assembler through its use as a label (it’s value being the address that the data that follows it will start at).

Since they are both constants that the assembler knows, it is able to add them together while it is assembling your program, so the whole instruction is a single immediate-mode push of a single constant.

C Idioms in Assembly Language

If Statement

In C, an if statement consists of three parts - the condition, the true branch, and the false branch.


if(a == b) {
        /* True Branch Code Here */
}

else {
        /* False Branch Code Here */
}

/* At This Point, Reconverge */

In assembly language, this can be rendered as:

#Move a and b into registers for comparison
movl a, %eax
movl b, %ebx

#Compare
cmpl %eax, %ebx

#If True, go to true branch
je true_branch

false_branch:
        #This label is unnecessary,
        #only here for documentation
        #False Branch Code Here

        #Jump to recovergence point
        jmp reconverge

true_branch:
        #True Branch Code Here

reconverge:
        #Both branches recoverge to this point

Variables and Assignment

Global and static variables are declared using .data or .bss entries. Local variables are declared by reserving space on the stack at the beginning of the function.

For example, consider the following C code:

int my_global_var;

int foo() {

        int my_local_var;
        my_local_var = 1;
        my_global_var = 2;
        return 0;
}

This would be rendered in assembly lagnuage as:

.section .data
.lcomm my_global_var, 4
.type foo, @function

foo:
        pushl %ebp
        movl %esp, $ebp
        subl $4, %esp                # Make room for my_local_var
        .equ my_local_var, -4        # Can now use my_local_var to
                                     # find the local variable

movl $1, my_local_var(%ebp)
movl $2, my_global_var

movl %ebp, %esp
popl %ebp
ret

In the C programming language, after the compiler loads a value into a register, that value will likely stay in that register until that register is needed for something else.

Loops

In C, a while loop looks like this:

while(a < b)
{
        /* Do stuff here */
}
/* Finished Looping */

This can be rendered in assembly language like this:

loop_begin:
        movl a, %eax
        movl b, %ebx
        cmpl %eax, %ebx
        jge loop_end

loop_body:
        # Do stuff here
        jmp loop_begin

loop_end:
        # Finished looping

The x86 assembly language has some direct support for looping as well. The %ecx register can be used as a counter that ends with zero. The loop instruction will decrement %ecx and jump to a specified address unless %ecx is zero.

For example, if you wanted to execute a statement 100 times, you would do this in C:

for(i=0; i < 100; i++) {
        /* Do process here */
}

In assembly language it would be written like this: loop_initialize:

movl $100, %ecx

loop_begin:
        #
        # Do Process Here
        #

        # Decrement %ecx and loops if not zero
        loop loop_begin

rest_of_program:
# Continues on to here

One thing to notice is that the loop instruction requires you to be counting backwards to zero.

Pointers

Pointers are very easy. Remember, pointers are simply the address that a value resides at. Let’s start by taking a look at global variables. For example:

int global_data = 30;

In assembly language, this would be:

.section .data
global_data:
        .long 30

Taking the address of this data in C:

a = &global_data;

Taking the address of this data in assembly language:

movl $global_data, %eax

Example code:

.section .data

secret_num:
    .long 0xBEEF

print_decorator:
    .ascii "%X\n\0"


.section .text

.globl _start

_start:

    # Print Value
    pushl secret_num
    pushl $print_decorator
    call printf

    addl $8, %esp


    # Print Address
    pushl $secret_num
    pushl $print_decorator
    call printf

    addl $8, %esp


    pushl $0
    call exit

Output:

BEEF
804B014

You see, with assembly language, you are almost always accessing memory through pointers. That’s what direct addressing is. To get the pointer itself, you just have to go with immediate mode addressing.

Local variables are a little more difficult, but not much. Here is how you take the address of a local variable in C:

void foo()
{
        int a;
        int *b;
        a = 30;
        b = &a;
        *b = 44;
}

The same code in assembly language:

foo:

#Standard opening
pushl %ebp
movl %esp, %ebp

#Reserve two words of memory
subl $8, $esp

.equ A_VAR, -4
.equ B_VAR, -8

#a = 30
movl $30, A_VAR(%ebp)

#b = &a
movl $A_VAR, B_VAR(%ebp)
addl %ebp, B_VAR(%ebp)

#*b = 30
movl B_VAR(%ebp), %eax
movl $30, (%eax)

#Standard closing
movl %ebp, %esp
popl %ebp

ret

LEAL

As you can see, to take the address of a local variable, the address has to be computed the same way the computer computes the addresses in base pointer addressing. There is an easier way - the processor provides the instruction leal , which stands for "load effective address". This lets the computer compute the address, and then load it wherever you want. So, we could just say:

# b = &a
leal A_VAR(%ebp), %eax
movl %eax, B_VAR(%ebp)

It’s the same number of lines, but a little cleaner. Then, to use this value, you simply have to move it to a general-purpose register and use indirect addressing, as shown in the example above.