Tag Archive : assembly

/ assembly

Gentle Introduction to AVR Assembly

December 11, 2015 | Article | No Comments

What is Assembly Language?

Assembly language is a low-level programming language for programmable device. It is an alphanumeric representation of machine code. In contrast to high-level programming language which are generally portable across multiple systems, assembly language is very architecture specific. Assembly language is very strong correspondence between the language and the architecture’s machine code instruction.

Assembly consists of a list of instructions that are in no way comparable to anything you might know from C, Basic or Pascal. It does not have structure like high-level language does (like for or while)

Let’s see the example below. The code below is an example of a AVR assembly code written in assembly language. Each line of the code is an instruction telling the micro controller to carry out a task.

            ADD     R16, R17       ; Add value in R16 to value in R17
            DEC     R17            ; Minus 1 from the value contained in R17
            MOV     R18, R16       ; Copy the value in R16 to R18
END:        JMP     END            ; Jump to the label END

As we stated before, the instruction or mnemonic are note general but specific. Each company provides a set of instructions for their micro controllers. Also note that not all instructions are available to all micro controllers. We should consult datasheet for each AVR microcontroller.

Why Assembly?

Assembler or other languages, that is the question. Why should I learn another language, if I already learned other programming languages? The best argument: while we live in France we are able to get through by speaking English, but we will  never  feel  at  home  then,  and  life  remains  complicated.  We can  get  through  with  this,  but  it  is  rather inappropriate. If things need a hurry, we should use the country’s language.

Many people who dwelling in higher-level language start diving into AVR assembly. The reasons are sometimes similar to people who come to assembly language in x86, such as:

  • to know the architecture better
  • analyze bug
  • use hardware features which aren’t supported by higher-level language
  • need time-critical code
  • just for fun

It is necessary to know assembly language, e.g. to understand what the higher-level language compiler produced. Without understanding assembly language we do not have a chance to proceed further in these cases.

High Level Languages and Assembly

High level languages insert additional nontransparent separation levels between the CPU and the source code. An example for such an nontransparent concept are variables. These variables are storages that can store a number, a text string or a single Boolean value. In the source code, a variable name represents a place where the variable is located, and, by declaring variables, the type (numbers and their format, strings and their length, etc.).

For learning assembler, just forget the high level language concept of variables. Assembler only knows bits, bytes, registers and SRAM bytes. The term “variable” has no meaning in assembler. Also, related terms like “type” are useless and do not make any sense here.

High level languages require us to declare variables prior to their first use in the source code, e. g. as Byte (8-bit), double word (16-bit), integer (15-bit plus 1 sign bit). Compilers for that language place such declared variables somewhere in the available storage space, including the 32 registers. If this placement is selected rather blind by the compiler or if there is some  priority rule used,  like  the  assembler  programmer carefully  does  it,  is  depending  more  from the  price  of  the compiler. The programmer can only try to understand what the compiler “thought” when he placed the variable. The power to decide has been given to the compiler. That “relieves” the programmer from the trouble of that decision, but makes him a slave of the compiler.

The instruction “A = A + B” is now type-proofed: if A is defined as a character and B a number (e. g. = 2), the formulation isn’t accepted because character codes cannot be added with numbers. Programmers in high level languages believe that this type check prevents them from programming nonsense. The protection, that the compiler provides in this case by prohibiting your type error, is rather useless: adding 2 to the character “F” of course should yield a “H” as result, what else? Assembler allows us to do that, but not a compiler.

Assembler allows us to add numbers like 7 or 48 to add and subtract to every byte storage, no matter what type of thing is in the byte storage. What is in that storage, is a matter of decision by the programmer, not by a compiler. If an operation with that content makes sense is a matter of decision by the programmer, not by the compiler. If four registers represent a 32-bit-value or four ASCII characters, if those four bytes are placed low-to-high, high-to-low or completely mixed, is just up to the programmer. He is the master of placement, no one else. Types are unknown, all consists of bits and bytes somewhere in the available storage place. The programmer has the task of organizing, but also the chance of optimizing.

Of a similar effect are all the other rules, that the high level programmer is limited to. It is always claimed that it is saver and of a better overview to program anything in subroutines, to not jump around in the code, to hand over variables as parameters, and to give back results from functions. Forget most of those rules in assembler, they don’t make much sense. Good assembler programming requires some rules, too, but very different ones. And, what’s the best: most of them have to be created by yourself to help yourself. So: welcome in the world of freedom to do what we want, not what the compiler decides for us or what theoretical professors think would be good programming rules.

High level programmers are addicted to a number of concepts that stand in the way of learning assembler: separation in different access levels, in hardware, drivers and other interfaces. In assembler this separation is complete nonsense, separation would urge us to numerous workarounds, if we want to solve your problem in an optimal way.

Because most of the high level programming rules don’t make sense, and because even puristic high level programmers break their own rules, whenever appropriate, see those rules as a nice cage, preventing us from being creative. Those questions don’t play a role here. Everything is direct access, every storage is available at any time, nothing prevents your access to hardware, anything can be changed – and even can be corrupted. Responsibility remains by the programmer only, that has to use his brain to avoid conflicts when accessing hardware.

The other side of missing protection mechanisms is the freedom to do everything at any time. So, smash your ties away to start learning assembler. We will develop your own ties later on to prevent yourself from running into errors.

What is Really Easier in Assembly?

All words and concepts that the assembler programmer needs is in the datasheet of the processor: the instruction and the port table. Done! With the words found there anything can be constructed. No other documents necessary. How the timer is started (is writing “Timer.Start(8)” somehow easier to understand than “LDI R16,0x02” and “OUT TCCR0,R16”?), how the timer is restarted at zero (“CLR R16” and “OUT TCCR0,R16”), it is all in the data sheet. No need to consult a more or less good documentation on how a compiler defines this or that. No special, compiler-designed words and concepts to be learned here, all is in the datasheet. If we want to use a certain timer in the processor for a certain purpose in a certain mode of the 15 different possible modes, nothing is in the way to access the timer, to stop and start it, etc.

What is in a high level language easier to write “A = A * B” instead of “MUL R16,R17”? Not much. If A and B aren’t defined as bytes or if the processor type is tiny and doesn’t understand MUL, the simple MUL has to be exchanged with some other source code, as designed by the assembler programmer or copy/pasted and adapted to the needs. No reason to import an nontransparent library instead, just because you’re to lazy to start your brain and learn.

Assembler teaches us directly how the processor works. Because no compiler takes over your tasks, we are completely the master of the processor. The reward for doing this work, we are granted full access to everything. If we want, we can program a baud-rate of 45.45 bps on the UART. A speed setting that no Windows PC allows, because the operating system allows only multiples of 75 (Why? Because some historic mechanical teletype writers had those special mechanical gear boxes, allowing quick selection of either 75 or 300 bps.). If, in addition, we want 1 and a half stop bytes instead of either 1 or 2, why not programming your own serial device with assembler software. No reason to give things up.

The Instruction Set

Instruction Set or Instruction Set Architecture (ISA) is set of instruction for computer architecture, including native data types, instructions, registers, addressing modes, memory architecture, interrupt and exception handling, and external I/O. In simple word, it is the set of instruction we use to program the device.

Every instruction is actually a number. These numbers are stored in flash memory and when the chip is powered, the CPU then start to fetch instruction from storage and executed what the instruction want. The instructions are 16-bit number. But there is no need for us to learn that binary code and pattern, because in assembler those codes are mapped to human-readable instruction. The so called mnemonics are machine instruction represented as human-readable text.

Still, the CPU understands instruction expressed in 16-bit word. The mnemonics only represents those instruction.

Instruction Set can be divided into some categories:

  • Data handling and memory operations: set register to a fixed value, move data from memory location to register or vice.
  • Arithmetic and logic operations: add, subtract, multiply, divide, bitwise operations, comparison.
  • Control flow operations: branch, conditionally branch, function call.

You can see the complete AVR instruction set here: Download

The Directives

Apart from the microcontroller instruction set that is used in writing an AVR assembly code the AVR Assembler support the use of assembler directives. Assembler Directives are used as instruction to the AVR assembler and are in no way translated to machine language during the assembly process.

Assembler Directives are use by the programmer to assist in adjusting the location of the program in memory, defining macros and so fort. A list of the AVR assembler directives available to the AVR assembler is given below.

[table “9” not found /]

Each assembler directives in an AVR assembly program must be preceded by a period “.” as shown below where the directives INCLUDE and ORG are used.

.include "m8515def.inc"
.org $00

begin:     ADD R16, R17      ; Add R17 to R16
           MOV R17, R18      ; Copy R18 to R17

If previous article provide video tutorial for assembly programming in Linux, in this occasion we will cover about assembly in Windows.

Programming in Assembly is not as easy as higher level language. The main factor is because you can’t find syntax such as if, while, etc.

However one of our kind heart friend from SecurityTube has made us a very interesting video for learning Assembly. Here the list of video. Please bear in mind that I don’t made these and I have no claim over it. As stated in other page, this site and NEST is originally serve a purpose as personal documentation.

These videos are actually good starter for anyone who want to dive deeper in computer field, especially who have interest in cracking, exploitation, etc.

The tutorials consists of nine modules. For some modules, there are source code accompanying. You can either download each file individually or download it as a pack. Note that this series is continuation of previous series, therefore please read and at least know assembly before go to this page.

Module 1 – Processor Mode

we will look at the different processor modes – Real, Protected, Virtual 8086, SMM etc., then we will understand the different memory models – Flat and Segmented, and how they apply to Real and Protected mode. We will then look at the key differences between the AT&T and Intel syntax for assembly. Once we have understood all these basics, we will code a “Hello World” program which will run in real mode, using 16 bit assembly and assemble it using the Debug program which ships by default with Windows.

Download: EmbedUpload | MirrorCreator

Module 2 – Protected Mode Assembly

In this video, we will understand the basics of Protected mode operation and then look at Windows assembly basics. It is important to note that Windows runs in Protected mode. We then look at how to compile and link assembly language programs using MASM and LINK. We go on to create a HelloWorld program in 32-bit assembly.

Download: EmbedUpload | MirrorCreator

Module 3 – Win32 Assembly Using Masm32

In this video, you will be introduced to MASM32 and why it’s runtime library is a good choice. We will install and use MASM32 to program 3 examples in this video – Console mode HelloWorld, Windows HelloWorld and a simple program to print user input. These will teach you how to use the MASM32 library effectively.

Download: EmbedUpload | MirrorCreator

Files for this module: HelloMasm32.asm | HelloWindows.asm | Reflect.asm

Module 4 – Masm Data Types

In this video, we will look at different data types which can be defined in MASM – byte, word, dword, qword, fword, sbyte, sword etc. and also how to declare and initialize variables with them. Then we will learn about more complicated data types – arrays and strings. Finally, we will code a simple calculator program for addition and subtraction in assembly. The inputs will be read from the user.

Download: EmbedUpload | MirrorCreator

File for this Module: Numbers.asm

Module 5 – Procedures

In this video, we will understand how to write procedures in Assembly language. The whole idea is to first create a prototype definiation for the procedure, then the define it and finally call in in code.

Download: EmbedUpload | MirrorCreator

File for this Module: Concat.asm | InputProc.asm

Module 6 – Macros

In this video, we will understand how to write Macros in Assembly language. Most of you may have already used Macros in high level languages like C, and the assembly macro writing is almost similar.

Download: EmbedUpload | MirrorCreator

File for this Module: MacroDemo.asm

Module 7 – Program Control Using Jmp

In this video, we will learn how to change the program flow using JMP family of instructions – both conditional and unconditional.

Download: EmbedUpload | MirrorCreator

File for this Module: StringReverse.asm | MultiPrint.asm

Module 8 – Decision Directives

In this video, we will learn how to use conditional statements such as If, Else and Elseif to write code in assembly.

Download: EmbedUpload | MirrorCreator

File for this Module: IfDemo.asm

Module 9 – Loops

In this video, we will learn how to use loops in assembly. We will touch upon the basic LOOP mnemonic, its variations such as LOOPE, LOOPZ etc. based on processor flags, along with WHILE and REPEAT loops. An interesting fact is the use of the IF statement with the BREAK and CONTINUE ones, to change how a loop executes.

Download: EmbedUpload | MirrorCreator

File for this Module: LoopDemo.asm and WhileDemo.asm

This article will explains some of the more important syntactic and semantic differences between two different assembler style: Intel and AT&T. The AT&T style is used by GNU Assembler (GAS) and the Intel style used by Netwide Assembler (NASM).

Although the goal of this article would be specifically show the differences between syntax, the source codes provided here has been tested on Linux machine using corresponding assembler. This article is written in purpose to help you more easily convert from one flavor of assembler to another.

Building the Program

When there is a source code involved, you can use following command to build the example. Do assembling and linking, depend on your compiler.

Assembling

ELF (Executable and Linkable Format) is executable format used by Linux.

nasm -f elf -o program.o program.asm
as -o program.o program.s
Linking
ld -o program program.o
Linking when an external C library used
ld --dynamic-linker /lib/ld-linux.so.2 -lc -o program program.o

Basic Structure

The structure of a program is at least consists of a section for code, heap, and stack.

; Intel format

; Text segment begins
section .text

   global _start

; Program entry point
   _start:

; codes are written here
# AT&T format

# Text segment begins
.section .text

   .globl _start

# Program entry point
   _start:

# codes are written here

1. Operands Order

One of the noticeable difference between the AT&T and Intel formats is the way they refer to source and destination operands within an instruction. The order of source and destination operand are swapped each other.

Under Intel format, after the instruction come destination followed by a comma and the source operand (the first operand is the destination). Under the AT&T these roles are reversed: the source comes before the comma and the destination (the first operand is source).

; Intel format

PUSH EBP
MOV EBP, ESP
SUB ESP, 48
CMP [EBP+4], 10
# AT&T format

pushl %ebp
movl %esp, %ebp
subl $0X48, %esp

At first glance, AT&T is likely more natural as we read and write from left to right. However, this lead to some flaw and inconsistencies (won’t be discussed here).

The primary reason of AT&T source-operand order is due to the VAX assembly format for which the AT&T was originally invented. The Motorola 68000 and its descendents were heavily influenced by the VAX. Likewise, their assembly language format moves in this direction as well.

2. Naming

2.1 Value & Register Naming

AT&T is quite famous for it’s heavy use of prefixes. In AT&T, registers are prefixed with a ‘%’ and immediate value are prefixed with a ‘$’. Intel in other hand, use plain naming for registers and immediate value.

Intel syntax sure doesn’t use prefix for writing register. However it uses a suffix to distinguish hexadecimal and binary number from decimal. The ‘h’ suffix used for hexadecimal number and the ‘b’ suffix used for binary number. Also, Intel use 0 in front of the hexadecimal number, while AT&T use 0x.

; Intel format

MOV EAX, 1
MOV EBX, 0FFh
INT 80h
# AT&T format

movl $1, %eax
movl $0xFF, %ebx
int $0x80

2.2 Instruction Naming

AT&T format use slightly different names than the Intel format. They differ in keeping with VAX and Motorola traditions where instruction names include a suffix which describes the size of the data they modify. Under Intel format, these data size directives are normally described using the ‘BYTE PTR’, ‘WORD PTR’, ‘DWORD PTR’, prefix phrases.

; Intel format

MOVZX EAX, BYTE PTR [ESI+5]
SUB EAX, 30
DEC WORD PTR [EBX]
INC CX
CMP AL,5
# AT&T format

movzbl 0x5(%esi), %eax
subl $0x30, %eax
decw (%ebx)
incw %cx
cmpb $0x5, %al

Under AT&T format, instruction suffixes are “b” for byte size operations (8-bits), “w” for word size operations (16-bits), “l” for double-word operations (32-bits), and “q” for quad-word operations (64-bits). As you may notice in the ‘movzbl’ (Move with zero-extend) instruction, more than one suffix letter is used when an instruction’s source and destination operand differ in size. The first suffix letter describes the source operand while the second letter describes the destination.

3. Memory Addressing

Under Intel format, memory addressing is simple calculation of address enclosed by ‘[‘ and ‘]’. The AT&T format in other hand use ‘(‘ and ‘)’ to enclose it.

3.1 Indirect Addressing Mode

Recall to x86 assembly language, there are five indirect addressing modes when writing an instruction:

  1. immediate indirect
  2. register indirect
  3. base register + offset indirect
  4. index register * width + offset indirect
  5. and base register + index register * width + offset indirect.
; Intel format

; Immediate
MOV EAX, [0100]

; Register
MOV EAX, [ESI]

; Register + Offset
MOV EAX, [EBP-8]

; Register * Width + Offset
MOV EAX, [EBX*4 + 0100]

; Base + Register*Width + Offset
MOV EAX, [EDX + EBX*4 + 8]
# Intel format

# Immediate
movl 0x0100, %eax

# Register
movl (%esi), %eax

# Register + Offset
movl -8(%ebp), %eax

# Register * Width + Offset
movl 0x100(,%ebx,4), %eax

# Base + Register*Width + Offset
movl 0x8(%edx, %ebx,4), %eax

Under AT&T format, indirect addressing mode are written to the general form of “OFFSET(BASE, INDEX, WIDTH)”. OFFSET, if present, must be a constant integer. BASE and INDEX, if either is present, must be registers. WIDTH, if present, applies to the register named in Index, and must be the constant 1,2, 4, 8. If width is not specified, the default constant 1 is taken.

Under Intel, indirect addressing is as simple as math formula “[BASE + INDEX*WIDTH + OFFSET] using the same terminology as used by AT&T format.

Under AT&T, all immediate addresses are written simply as an OFFSET with a missing BASE, INDEX, and WIDTH parameter. The immediate indirect address is written by itself with no special prefix or suffix characters.

Assembly Primer for Hackers – Video Tutorial

December 9, 2015 | Article | No Comments

Programming in Assembly is not as easy as higher level language. The main factor is because you can’t find syntax such as if, while, etc.

However one of our kind heart friend from SecurityTube has made us a very interesting video for learning Assembly. Here the list of video. Please bear in mind that I don’t made these and I have no claim over it. As stated in other page, this site and NEST is originally serve a purpose as personal documentation.

These videos are actually good starter for anyone who want to dive deeper in computer field, especially who have interest in cracking, exploitation, etc.

The tutorials consists of eleven modules. For some modules, there are source code accompanying. You can either download each file individually or download it as a pack.

Module 1 – System Organization

Assembly language is probably the most important thing one needs to master if he desires to enter the world of code exploitation, virus writing and reverse engineering. In this multi-part video series I will try to provide a simple primer to Assembly language which will help you get started. These videos are in no way meant to be exhaustive but rather will only act as a guide on how to begin. <br><br>In this first part, I explain the basics of computer organization, CPU registers – general purpose, segment and instruction pointer. Also covered is  virtual memory organization, program memory organization, program stack and stack operations.

Download: EmbedUpload | MirrorCreator

Module 2 – Virtual Memory Organization

In this video we take an in-depth look at virtual memory organization concepts. The entire discussion is explained by taking a live example using the SimpleDemo.c code. We look at how one can use the /proc/PID/maps to peek into the layout of a program’s virtual memory and interpret useful things. Also, we show how the Address Space Layout Randomization (ASLR) works in the latest 2.6 kernels and why this is significant from a security point of view. We also show how this can be disabled at runtime if the need be. This video is very important from an code exploitation perspective as it teaches us how to check for the presence of ASLR on a given system.

Download: EmbedUpload | MirrorCreator

Module 3 – Gdb Usage

GDB (GNU Debugger) is probably one of the most important tools one needs to be familiar with in order to be a good assembly language programmer. In this video we go through a quick primer on how to use GDB to disassemble code, set breakpoints, trace through code, examine CPU registers and memory locations, examine the program stack and many other important use cases which will help us in later videos when we actually start coding in Assembly and want to debug our code.

Download: EmbedUpload | MirrorCreator

File for this Module: SimpleDemo.c

Module 4 – Hello World

In this video we will look at the structure of assembly language programs – .data, .bss, .text segments, how to pass arguments to linux system calls in assembly, using GAS and LD to assemble and link code and finally in the end we go through a step by step approach to create our first “Hello World” program.

Download: EmbedUpload | MirrorCreator

File for this Module: JustExit.s | HelloWorldProgram.s

Module 5 – Data Types

In this video we will go through an in-depth primer on data types which are used in assembly. We do a live demo on how to look at data in memory using GDB for .ascii, .int, .short, .float (.data) and .comm, .lcomm (.bss) types.

Download: EmbedUpload | MirrorCreator

File for this Module: VariableDemo.s

Module 6 – Moving Data

In this video we look at how to transfer data between registers and memory locations using the MOV series of instructions. We discuss data transfer between registers, immediate values and registers, memory locations and registers, immediate values and memory locations, indexed memory addressing schemes, indirect addressing using registers and many other important concepts. It is important to note that all the above are explained in detail using example code in the video.

Download: EmbedUpload | MirrorCreator

File for this Module: MovDemo.s

Module 7 – Working with Strings

In this video we will look at how to work with strings in Assembly. We will demonstrate how we can move strings from one memory location to the other using the MOVS instruction set, discuss the concept of the Direction Flag (DF) and how to set and clear it using STD and CLD, how to execute multiple string copy instructions using the REP instruction, how to load strings from memory into the EAX register using the LODS instruction set, how to store strings from the EAX register back into memory using the STOS instruction set and finally we shall look at how to compare strings using the CMPS instruction set.

Download: EmbedUpload | MirrorCreator

Module 8 – Unconditional Branching

In this video we will look at how to alter the program execution flow using unconditional branching. We will look at how to use the JMP instruction to make an unconditional branching to a new location in the code segment and how to use the CALL statement in conjunction with RET to save the program execution state. We will demonstrate all the concepts using very simple code snippets to aid understanding.

Download: EmbedUpload | MirrorCreator

File for this Module: UnconditionalBranching.s

Module 9 – Conditional Branching

In this video we will look at Conditional Branching in Assembly Language using the JXX family of instructions and the LOOP instruction.

The conditional jump instructions such as JA, JAE, JZ, JNZ etc. use various flags in the EFLAGS register such as the Zero Flag (ZF), the Parity Flag (PF), Overflow Flag (OF), Sign Flag (SF) etc.  to determine which instruction path to take next. In this video we will look at the JZ condition jump instruction in great detail. JZ using the Zero Flag (ZF) to determine if the last instruction resulted in the Zero operation or not and then chooses to jump to a specified location if it was set. We will also look at the LOOP instruction which used the ECX register to loop over a set of instructions over and over again.

Download: EmbedUpload | MirrorCreator

File for this Module: ConditionalBranching.s

Module 10 – Functions

In this video we will look at how to write functions in Assembly Language. <br><br>The most important step in writing functions in assembly is to understand how to pass arguments to them and then read their return values. We will look at 2 techniques – using registers and using global memory locations to understand how this can be done. In this demo we will use our familiar “Hello World” program to demonstrate how to code a simple function using the “write()” syscall.

We will use  the Function.s program to demonstrate argument passing using the CPU registers and Function2.s to demo argument passing using global memory location in the .BSS segment.

Download: EmbedUpload | MirrorCreator

File for this Module: Function.s | Function2.s

Module 11 – Functions Stack

In this video, we will look at how to use the Stack to pass arguments to functions. <br><br>In course of this video we will look into exactly how the Stack works, how to store arguments on the stack, how the “call” instruction stores the return address on the stack, the logic behind storing the EBP register on the stack, how and why EBP is used to reference function arguments and local variables in a function and how to adjust the ESP to accommodate all this. This video is very important as a lot of learning from this will be used in the Buffer overflow video series I plan to make next.

Download: EmbedUpload | MirrorCreator

File for this Module: Function3.s

Introduction to Reverse Engineering

December 5, 2015 | Article | No Comments

What is Reverse Engineering?

Some people ask me, what is reverse engineering mean? Well, mostly reverse engineering including cracking a binary program, but it’s not limited to it only.

Reverse engineering is the process of taking a compiled binary and attempting to recreate (at least understand) the original way of program works. The programs are written in higher level languages such as C\C++, Visual Basic, Pascal, etc and understandable enough for human (at least programmer). But the machine is not. Computer doesn’t speak these language. They only know a language consist of binary logic, 1 or 0, the machine codes. After a programmer write their code, the codes then translated / compiled to the machine specific format. This code is only consist of low level instruction represented by hexadecimal number. Yes, it is not very human friendly and often require great deal of brain power to figure out what the instruction mean.

So why we do reverse engineering?

Reverse engineering is quite useful and can be applied to many areas of computer science. At least there are five categories:

    1. Making it possible to interface to legacy code (where we don’t have the original code source)
    2. Breaking protection.
    3. Studying virus and malware
    4. Evaluating software quality and robustness
    5. Adding functionality to existing softwares

The first category is reverse engineering code to interface with existing binaries when the source code is not available.

The second category (most motivating reason) is breaking protection. This including disabling time trials, disable registration, and basically everything else to get commercial software for free.

The third category is studying virus and malware code. Reverse engineering is required as not a lot of virus coders out there open their source code and write instruction on how they wrote the code. The information such as what it is supposed to accomplish, and how it will accomplish this is hidden in the virus body.

The fourth category is evaluating software security and vulnerabilities. When creating large application or system, reverse engineering is used to make sure that the system does not contain any major vulnerabilities, security flaws, and frankly, to make it as hard as possible to allow crackers to crack the software.

The final category is adding functionality to existing software. Don’t like the graphics used in your web design software? Change them. Want to add a menu item to encrypt your documents in your favorite word processor? Add it. Want to annoy your co-workers to no end by adding derogatory message boxes to Windows calculator?

So what knowledge we require?

As you can probably guess, a great deal of knowledge is necessary to be an effective reverse engineer. Fortunately, a great deal of knowledge is not necessary to ‘begin’ reverse engineering. To have fun with reversing and to get something out of these tutorials you should at least have a basic understanding of how program flow works (for example, you should know what a basic if…then statement does, what an array is, and have at least seen a hello world program). Secondly, becoming familiar with Assembly Language is highly suggested; You can get thru the tutorials without it, but at some point you will want to become a master or at least guru at ASM to really know what you are doing. In addition, a lot of your time will be devoted to learning how to use tools. These tools are invaluable to a reverse engineer, but also require learning each tool’s shortcuts, flaws and idiosyncrasies. Finally, reverse engineering requires a significant amount of experimentation; playing with different packers/protectors/encryption schemes, learning about programs originally written in different programming languages (even Delphi), deciphering anti-reverse engineering tricks…the list goes on and on.

But I can highlight that, lot of reading and practicing will help you.

What kinds of tools are used?

There are many different kinds of tools used in reversing. Many are specific to the types of protection that must be overcome to reverse a binary. There are also several that just make the reverser’s life easier. And then some are what I consider the ‘staple’ items- the ones you use regularly. For the most part, the tools fit into a couple categories:

1. Disassemblers

Disassemblers attempt to take the machine language codes in the binary and display them in a friendlier format. They also extrapolate data such as function calls, passed variables and text strings.  This makes the executable look more like human-readable code as opposed to a bunch of numbers strung together. There are many disassemblers out there, some of them specializing in certain things (such as binaries written in Delphi). Mostly it comes down to the one your most comfortable with. I invariably find myself working with IDA (there is a free version available http://www.hex-rays.com/), as well as a couple of lesser known ones that help in specific cases.

2. Debuggers

Much like a disassembler, debuggers allow the reverse engineer to step through the code, running one line at a time and investigating the results. This is invaluable to discover how a program works. Finally, some debuggers allow certain instructions in the code to be changed and then run again with these changes in place. Examples of debuggers are Windbg and Ollydbg. I almost solely use Ollydbg (http://www.ollydbg.de/), unless debugging kernel mode binaries, but we’ll get to that later.

3. Hex editors

Hex editors allow you to view the actual bytes in a binary, and change them. They also provide searching for specific bytes, saving sections of a binary to disk, and much more. There are many free hex editors out there, and most of them are fine. We won’t be using them a great deal in these tutorials, but sometimes they are invaluable.

4. PE and resource viewers/editors

Every binary designed to run on a specific machine has a very specific section of data at the beginning of it that tells the operating system how to set up and initialize the program. It tells the OS how much memory it will require, what support-libraries the program needs to borrow code from, information about dialog boxes and such. This is called the Portable Executable, and all programs designed to run on windows needs to have one.

In the world of reverse engineering, this structure of bytes becomes very important, as it gives the reverser needed information about the binary. Eventually, you will want to (or need to) change this information, either to make the program do something different than what it was initially for, or to change the program BACK into something it originally was (like before a protector made the code really hard to understand). There are a plethora of PE viewers and editors out there. I use CFF Explorer (http://www.ntcore.com/exsuite.php) and LordPE (http://www.woodmann.com/collaborative/tools/index.php/LordPE), but you can feel free to use whichever you’re comfortable with.

Most files also have resource sections. These include graphics, dialog items, menu items, icons and text strings. Sometimes you can have fun just by looking at (and altering 😛   ) the resource sections. I will show you an example at the end of this tutorial.

5. System Monitoring tools

When reversing programs, it is sometimes important (and when studying virii and malware, of the utmost importance) to see what changes an application makes to the system; are there registry keys created or queried? are there .ini files created? are separate processes created, perhaps to thwart reverse engineering of the application? Examples of system monitoring tools are procmon, regshot, and process hacker. We will discuss these later in the tutorial.

6. Miscellaneous tools and information

There are tools we will pick up along the way, such as scripts, unpackers, packer identifiers etc. Also in this category is some sort of reference to the Windows API. This API is huge, and at times, complicated. It is extremely helpful in reverse engineering to know exactly what called functions are doing.

Interesting?

What is Assembly Programming Language?

December 3, 2015 | Article | No Comments

Assembly is considered as the oldest programming language. Learning assembly will improve our understanding or machine. In this first article of assembly tutorial we will discuss about what is assembly and why we should learn it.

What is Assembly Language?

Strictly speaking, assembly language is a programming language. It is corresponded to command / statement of computer instruction. The assembly programming is not so machine-independent language as processor has so much variation. In the next article we will cover only assembly language which refer to IBM-PC compatible’s instruction set or known as Intel x86 processor family.

Assembly language is a set of instruction specific to particular computer system. To write a assembly language we need an assembler or compiler which translate our source code to machine language. In term of programming language we won’t see human language as much as C\C++ does, but rather mnemonic for basic instruction like: mov, mul, div, add, sub, inc, etc.

Assembly language is considered as low level programming as it is the closest programming language to machine code. Because of the lack of abstraction in assembly language, assembly programming is also considered as difficult language (but don’t worry, we will discuss it).

Why Assembly Language?

There are many programming language out of there, but why we need to learn assembly? One of reason is to learn computer architecture and give deeper understanding about what is done inside. Another reason would be a certain programming language has difficulties to accessing some machine specific instructions. For a better performance, assembly can be used as it is simple and closest to machine.

Programming in assembly is not always using only assembly language. Once we can interfacing our code with other programming language. Often this method is used for creating program as assembly manages specific implementation of system while higher programming language manage the rest. To do this we need to know how higher language structure and translated into assembly (every language would)

The machine Language

A electronic-based computer only knows two type of value: 1 and 0 known as binary value. In machine language we actually instruct some CPU to utilize register or memory with a fix-size command in binary form. A machine language is built from set of number 1 and 0 and interpreted by CPU. A CPU usually has small program embedded on chip, known as microcode, which translate machine instruction to hardware signals.

As noted before, a machine language instruction is a fix-length instruction. Usually there are 16 bits for a single instruction. The first 8 bits is used for operation code, or opcode, which tell us what the instruction will do or achieve. The next 8 bits is operand.

Let’s see this example:

1011000000000101

The first 8 bits is opcode for move instruction which assign a value to specific place. The next operand tell us about location of register AL and what value would fill it.

A register is a high speed memory inside of CPU chip. It is identified in assembly language by 2 or 3 character, such as: AH, AL, CX, or EBX.

Instruction set is set of machine instruction that can be executed by machine. Intel processor family has downward compatibility which means a newer processor can understand instruction for earlier processor.

In earlier of computing, every program must be written in machine language (hence in series of 1 and 0). This is more complicated and frustrating too so people invented assembly language to ease their job.

Social media & sharing icons powered by UltimatelySocial