December 9, 2015 | Uncategorized | No Comments
This article will explains some of the more important syntactic and semantic differences between two different assembler style: Intel and AT&T. The AT&T style is used by GNU Assembler (GAS) and the Intel style used by Netwide Assembler (NASM).
Although the goal of this article would be specifically show the differences between syntax, the source codes provided here has been tested on Linux machine using corresponding assembler. This article is written in purpose to help you more easily convert from one flavor of assembler to another.
Building the Program
When there is a source code involved, you can use following command to build the example. Do assembling and linking, depend on your compiler.
ELF (Executable and Linkable Format) is executable format used by Linux.
nasm -f elf -o program.o program.asm
as -o program.o program.s
ld -o program program.o
Linking when an external C library used
ld --dynamic-linker /lib/ld-linux.so.2 -lc -o program program.o
The structure of a program is at least consists of a section for code, heap, and stack.
; Intel format ; Text segment begins section .text global _start ; Program entry point _start: ; codes are written here
# AT&T format # Text segment begins .section .text .globl _start # Program entry point _start: # codes are written here
1. Operands Order
One of the noticeable difference between the AT&T and Intel formats is the way they refer to source and destination operands within an instruction. The order of source and destination operand are swapped each other.
Under Intel format, after the instruction come destination followed by a comma and the source operand (the first operand is the destination). Under the AT&T these roles are reversed: the source comes before the comma and the destination (the first operand is source).
; Intel format PUSH EBP MOV EBP, ESP SUB ESP, 48 CMP [EBP+4], 10
# AT&T format pushl %ebp movl %esp, %ebp subl $0X48, %esp
At first glance, AT&T is likely more natural as we read and write from left to right. However, this lead to some flaw and inconsistencies (won’t be discussed here).
The primary reason of AT&T source-operand order is due to the VAX assembly format for which the AT&T was originally invented. The Motorola 68000 and its descendents were heavily influenced by the VAX. Likewise, their assembly language format moves in this direction as well.
2.1 Value & Register Naming
AT&T is quite famous for it’s heavy use of prefixes. In AT&T, registers are prefixed with a ‘%’ and immediate value are prefixed with a ‘$’. Intel in other hand, use plain naming for registers and immediate value.
Intel syntax sure doesn’t use prefix for writing register. However it uses a suffix to distinguish hexadecimal and binary number from decimal. The ‘h’ suffix used for hexadecimal number and the ‘b’ suffix used for binary number. Also, Intel use 0 in front of the hexadecimal number, while AT&T use 0x.
; Intel format MOV EAX, 1 MOV EBX, 0FFh INT 80h
# AT&T format movl $1, %eax movl $0xFF, %ebx int $0x80
2.2 Instruction Naming
AT&T format use slightly different names than the Intel format. They differ in keeping with VAX and Motorola traditions where instruction names include a suffix which describes the size of the data they modify. Under Intel format, these data size directives are normally described using the ‘BYTE PTR’, ‘WORD PTR’, ‘DWORD PTR’, prefix phrases.
; Intel format MOVZX EAX, BYTE PTR [ESI+5] SUB EAX, 30 DEC WORD PTR [EBX] INC CX CMP AL,5
# AT&T format movzbl 0x5(%esi), %eax subl $0x30, %eax decw (%ebx) incw %cx cmpb $0x5, %al
Under AT&T format, instruction suffixes are “b” for byte size operations (8-bits), “w” for word size operations (16-bits), “l” for double-word operations (32-bits), and “q” for quad-word operations (64-bits). As you may notice in the ‘movzbl’ (Move with zero-extend) instruction, more than one suffix letter is used when an instruction’s source and destination operand differ in size. The first suffix letter describes the source operand while the second letter describes the destination.
3. Memory Addressing
Under Intel format, memory addressing is simple calculation of address enclosed by ‘[‘ and ‘]’. The AT&T format in other hand use ‘(‘ and ‘)’ to enclose it.
3.1 Indirect Addressing Mode
Recall to x86 assembly language, there are five indirect addressing modes when writing an instruction:
- immediate indirect
- register indirect
- base register + offset indirect
- index register * width + offset indirect
- and base register + index register * width + offset indirect.
; Intel format ; Immediate MOV EAX,  ; Register MOV EAX, [ESI] ; Register + Offset MOV EAX, [EBP-8] ; Register * Width + Offset MOV EAX, [EBX*4 + 0100] ; Base + Register*Width + Offset MOV EAX, [EDX + EBX*4 + 8]
# Intel format # Immediate movl 0x0100, %eax # Register movl (%esi), %eax # Register + Offset movl -8(%ebp), %eax # Register * Width + Offset movl 0x100(,%ebx,4), %eax # Base + Register*Width + Offset movl 0x8(%edx, %ebx,4), %eax
Under AT&T format, indirect addressing mode are written to the general form of “OFFSET(BASE, INDEX, WIDTH)”. OFFSET, if present, must be a constant integer. BASE and INDEX, if either is present, must be registers. WIDTH, if present, applies to the register named in Index, and must be the constant 1,2, 4, 8. If width is not specified, the default constant 1 is taken.
Under Intel, indirect addressing is as simple as math formula “[BASE + INDEX*WIDTH + OFFSET] using the same terminology as used by AT&T format.
Under AT&T, all immediate addresses are written simply as an OFFSET with a missing BASE, INDEX, and WIDTH parameter. The immediate indirect address is written by itself with no special prefix or suffix characters.assembly, programming