Diving Into Radare2

I’ve been improving my reverse engineering skills lately and decided to have a go at using radare2 after a recommendation on an IRC channel I frequent. After reading through some blog posts and the radare2 book (which is awesome, by the way) I decided to reverse a small shellcode using only r2 to see how easy it would be to get used to.

First I picked up a random shellcode from exploit-db and settled on this one which promised to contain some XOR encoding, which I figured would give me some semi-complicated operations to carry out using only r2. I would have to manually XOR some bytes and decompile the output to read the shellcode’s final payload, at least.

First lets compile the following payload:

char shellcode[]="\xeb\x1d\x5e\x48\x31\xc9\xb1\x31\x99\xb2\x90\x48\x31\xc0\x8a"
                 "\x06\x30\xd0\x48\xff\xcc\x88\x04\x24\x48\xff\xc6\xe2\xee\xff"
                 "\xd4\xe8\xde\xff\xff\xff\x95\x9f\xab\x50\x13\xd8\x76\x19\xd8"
                 "\xc7\xc6\xc2\x76\x19\xd8\xf9\xbd\xb4\x94\x57\xf6\xc0\xc0\x77"
                 "\x19\xd8\xf8\xe3\xbf\xfe\x94\xb4\xd4\x57\xf9\xf2\xbf\xbf\xb4"
                 "\x94\x57\xc0\xc0\x42\xa1\xd8\x50\xa1";
int main(int i,char *a[])
{
    (* (int(*)()) shellcode)();
}

With the following simple command:

➜  ~ gcc --version
gcc (Debian 4.7.2-5) 4.7.2
Copyright (C) 2012 Free Software Foundation, Inc.
This is free software; see the source for copying conditions.  There is NO
warranty; not even for MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE.

➜  ~ gcc shellcode.c -fno-stack-protector -z execstack -o shellcode
➜  ~

Then open it with r2:

  ~ r2 shellcode
 -- See you in shell
[0x7fef42cacaf0]>

As can be seen by the r2 prompt, we are currently positioned at offset 0x7fef42cacaf0. Lets analyse the whole file and seek to the main subroutine:

[0x7fef42cacaf0]> aaa
[x] Analyse all flags starting with sym. and entry0 (aa)
[Cannot determine xref search boundariesr references (aar)
[x] Analyse len bytes of instructions for references (aar)
[Oops invalid rangen calls (aac)
[x] Analyse function calls (aac)
[ ] [*] Use -AA or aaaa to perform additional experimental analysis.
[x] Constructing a function name for fcn.* and sym.func.* functions (aan))
[0x7fef42cacaf0]> s main
[0x004004ac]> pdf
            ;-- main:
/ (fcn) sym.main 29
|           ; var int local_10h @ rbp-0x10
|           ; var int local_4h @ rbp-0x4
|           ; DATA XREF from 0x004003bd (entry0)
|           0x004004ac      55             push rbp
|           0x004004ad      4889e5         mov rbp, rsp
|           0x004004b0      4883ec10       sub rsp, 0x10
|           0x004004b4      897dfc         mov dword [rbp - local_4h], edi
|           0x004004b7      488975f0       mov qword [rbp - local_10h], rsi
|           0x004004bb      baa0086000     mov edx, obj.shellcode
|           0x004004c0      b800000000     mov eax, 0
|           0x004004c5      ffd2           call rdx
|           0x004004c7      c9             leave
\           0x004004c8      c3             ret
[0x004004ac]>

We can see that r2 has analysed our bin and named the shellcode subroutine for us and that the program is putting the address of the shellcode into edx, then calling it. Lets take a look at the shellcode now.

[0x004004ac]> s obj.shellcode
[0x006008a0]> pdf
Cannot find function at 0x006008a0
[0x006008a0]>

Hmm.. looks like r2 hasn’t recognized this part of the code as a function, lets jump into visual disassembly mode by executing V to get into visual mode and pressing p to cycle to the disassembly view. You may need to jump back to this position by pressing o and then typing obj.shellcode, this is because when you first enter visual mode r2 will seek to the current instruction pointer.

[0x006008a0 256 ./shellcode]> pd $r @ obj.shellcode
       ,.-> ;-- shellcode:
            ; DATA XREF from 0x004004bb (sym.main)
       ,.-> 0x006008a0      eb1d           jmp 0x6008bf                ;[1]
       ||   ;-- str._H1__1:
       ||   0x006008a2     .string "^H1\x,9\x+11" ; len=7
       ||   0x006008a9      b290           mov dl, 0x90                ; 144
      .---> 0x006008ab      4831c0         xor rax, rax
      |||   0x006008ae      8a06           mov al, byte [rsi]
      |||   0x006008b0      30d0           xor al, dl
      |||   0x006008b2      48ffcc         dec rsp
      |||   0x006008b5      880424         mov byte [rsp], al
      |||   0x006008b8      48ffc6         inc rsi
      `===< 0x006008bb      e2ee           loop 0x6008ab               ;[2]
       ||   0x006008bd      ffd4           call rsp
       `--> 0x006008bf      e8deffffff     call str._H1__1             ;[3]
        |   0x006008c4      95             xchg eax, ebp
        |   0x006008c5      9f             lahf
        |   0x006008c6      ab             stosd dword [rdi], eax
        |   0x006008c7      50             push rax
        |   0x006008c8      13d8           adc ebx, eax
       ,==< 0x006008ca      7619           jbe 0x6008e5                ;[4]
       ||   0x006008cc      d8c7           fadd st(7)
       ||   0x006008ce      c6c276         mov dl, 0x76                ; 'v' ; 118
       ||   0x006008d1      19d8           sbb eax, ebx
       ||   0x006008d3      f9             stc
       ||   0x006008d4      bdb49457f6     mov ebp, 0xf65794b4
       ||   0x006008d9      c0c077         rol al, 0x77
       ||   0x006008dc      19d8           sbb eax, ebx
       ||   0x006008de      f8             clc
       |`=< 0x006008df      e3bf           jrcxz obj.shellcode         ;[5]
       |    0x006008e1      fe             invalid
       |    0x006008e2      94             xchg eax, esp
       |    0x006008e3      b4d4           mov ah, 0xd4                ; 212
       `--> 0x006008e5      57             push rdi
            0x006008e6      f9             stc
            0x006008e7      f2bfbfb49457   mov edi, 0x5794b4bf
            0x006008ed      c0c042         rol al, 0x42

After a quick look at this we can see that the shellcode jumps down to 0x6008bf which is an instruction to call.. a string? Looks like r2 has mistaken that line of code for data, but that’s okay because we can fix it. Lets scroll down to that section by pressing j once and mark it as code by pressing dc (a good mnemonic for this is define code). After doing that we get the following:

[0x006008a0 325 ./shellcode]> pd $r @ obj.shellcode
        ,=< ;-- shellcode:
            ; DATA XREF from 0x004004bb (sym.main)
        ,=< 0x006008a0      eb1d           jmp 0x6008bf                ;[1]
        |   ;-- str._H1__1:
        |   0x006008a2      5e             pop rsi
        |   0x006008a3      4831c9         xor rcx, rcx
        |   0x006008a6      b131           mov cl, 0x31                ; '1' ; 49
        |   0x006008a8      99             cdq
        |   0x006008a9      b290           mov dl, 0x90                ; 144
        |   0x006008ab      4831c0         xor rax, rax
        |   0x006008ae      8a06           mov al, byte [rsi]
        |   0x006008b0      30d0           xor al, dl
        |   0x006008b2      48ffcc         dec rsp
        |   0x006008b5      880424         mov byte [rsp], al
        |   0x006008b8      48ffc6         inc rsi
        |   0x006008bb      e2ee           loop 0x6008ab               ;[2]
        |   0x006008bd      ffd4           call rsp
        `-> 0x006008bf      e8deffffff     call str._H1__1             ;[3]
            0x006008c4      95             xchg eax, ebp
            0x006008c5      9f             lahf
            0x006008c6      ab             stosd dword [rdi], eax
            0x006008c7      50             push rax
            0x006008c8      13d8           adc ebx, eax
            0x006008ca      7619           jbe 0x6008e5                ;[4]
            0x006008cc      d8c7           fadd st(7)
            0x006008ce      c6c276         mov dl, 0x76                ; 'v' ; 118
            0x006008d1      19d8           sbb eax, ebx
            0x006008d3      f9             stc
            0x006008d4      bdb49457f6     mov ebp, 0xf65794b4
            0x006008d9      c0c077         rol al, 0x77
            0x006008dc      19d8           sbb eax, ebx
            0x006008de      f8             clc
            0x006008df      e3bf           jrcxz obj.shellcode         ;[5]
            0x006008e1      fe             invalid
            0x006008e2      94             xchg eax, esp
            0x006008e3      b4d4           mov ah, 0xd4                ; 212
            0x006008e5      57             push rdi
            0x006008e6      f9             stc
            0x006008e7      f2bfbfb49457   mov edi, 0x5794b4bf
            0x006008ed      c0c042         rol al, 0x42

Much better. Now lets take a look at what this code is doing, shall we?

We can see that after the call instruction from 0x6008bf is executed we are popping an address from the stack into rsi. The call instruction puts the next instruction’s address onto the stack, which means rsi is now pointing to 0x006008c4, which looks like a lot of junk code. Remember that this is an XOR’d shellcode so this is not surprising.

Next the code will zero out rcx by XORing it with itself and we set the counter (cl) to 0x31 and data (dl) to 0x90 and then zero out rax. This is all to set up a loop that will loop through 0x31 bytes of data starting at rsi, XORing each byte with the value 0x90 and pushing it onto the stack.

At 0x006008bd the execution is passed to the newly decoded instructions at rsp (the top of our stack). We need to somehow decode this ourselves so we can have a look at it, keeping in mind that the code was pushed onto the stack backwards.

We could take advantage of r2’s write mode by turning it on with e io.cache = true and then XOR the code with the wox command and analyse the output, but then we would also need to reverse the byte order (as the data is pushed onto the stack it will be backwards if we do it in the correct order) and we don’t really want to complicate things. For this we should take advantage of r2’s debugging abilities.

Lets quit r2 (press q until it closes completely) and reopen our shellcode in debug mode:

r2 -d shellcode

Then enter aaa to analyse the file again and then seek to our obj.shellcode flag and go through the same process of defining that string as code from visual mode. We should now be looking at the same screen as before.

Like in vim we can enter a command mode without exiting visual mode in r2 by pressing :. From here we can execute normal r2 commands without needing to jump back to the r2 shell. Because we are responsible people and we never run code that we haven’t read yet we will set a break point at the instruction to call out to the decoded instructions pointed to by rsp at address 0x006008bd by entering the following command:

db 0x006008bd

Now lets run the program by entering the dc command (debugger continue) and then seek to the location that rsp currently points to with s rsp. We should now be looking at the following:

[0x7ffcef34c437 325 ./shellcode]> pd $r @ rsp
            ;-- rsp:
            0x7ffcef34c437      90             nop
            0x7ffcef34c438      31c0           xor eax, eax
            0x7ffcef34c43a      4831d2         xor rdx, rdx
            0x7ffcef34c43d      50             push rax
            0x7ffcef34c43e      50             push rax
            0x7ffcef34c43f      c704242f2f62.  mov dword [rsp], 0x69622f2f
            0x7ffcef34c446      c74424046e2f.  mov dword [rsp + 4], 0x68732f6e
            0x7ffcef34c44e      4889e7         mov rdi, rsp
            0x7ffcef34c451      50             push rax
            0x7ffcef34c452      50             push rax
            0x7ffcef34c453      66c704242d69   mov word [rsp], 0x692d
            0x7ffcef34c459      4889e6         mov rsi, rsp
            0x7ffcef34c45c      52             push rdx
            0x7ffcef34c45d      56             push rsi
            0x7ffcef34c45e      57             push rdi
            0x7ffcef34c45f      4889e6         mov rsi, rsp
            0x7ffcef34c462      4883c03b       add rax, 0x3b
            0x7ffcef34c466      0f05           syscall
            0x7ffcef34c468      c70440000000.  mov dword [rax + rax*2], 0

Now this code is purposely confusing, first it zeros out both eax and rdx and pushes them onto the stack, growing it. We then move some values into the space on the stack, and move the current stack pointer into rdi. We then do the same thing again, making space as we go, via pushing a zero’d out rax and saving the new value of the stack pointer to rsi. Afterwards we can see that we are adding 0x3b to rax (which is 0) and executing a syscall. 0x3b is 49 in octal, so we are calling syscall 49, which has the following call signature:

int sys_execve(const char *filename, const char *argv[], const char *const envp[]);

The shellcode is calling sys_execve, which starts a process! This means that the code must be pointing rdi to the filename and rsi to the arguments for the call. Keep in mind that by convention the first element in the arguments to sys_execve should be the same as the filename. Lets place a break point before the syscall and take a look at the stack:

[0x7ffc1b7d3c7c 325 ./shellcode]> ?0;f tmp;s.. @ rdi+61 # 0x7ffc1b7d3c7c
- offset -       0 1  2 3  4 5  6 7  8 9  A B  C D  E F  0123456789ABCDEF
0x7ffc1b7d3c17  3f3c 7d1b fc7f 0000 2f3c 7d1b fc7f 0000  ?<}...../<}.....
0x7ffc1b7d3c27  0000 0000 0000 0000 2d69 0000 0000 0000  ........-i......
0x7ffc1b7d3c37  0000 0000 0000 0000 2f2f 6269 6e2f 7368  ........//bin/sh
0x7ffc1b7d3c47  0000 0000 0000 0000 bf08 6000 0000 0000  ..........`.....
orax 0xffffffffffffffff   rax 0x0000003b           rbx 0x00000000
 rcx 0x00000000           rdx 0x00000000            r8 0x7f4b83a26300
  r9 0x7f4b83a392e0       r10 0x00000000           r11 0x7f4b836bcdb0
 r12 0x004003a0           r13 0x7ffc1b7d3d80       r14 0x00000000
 r15 0x00000000           rsi 0x7ffc1b7d3c17       rdi 0x7ffc1b7d3c3f
 rsp 0x7ffc1b7d3c17       rbp 0x7ffc1b7d3ca0       rip 0x7ffc1b7d3c86

Now lets take a closer look at the values themselves. Keep in mind that rsi is pointing to an array of strings and that we need to reverse the byte order to get the address it points to. First we will use p8 8 @ rsi to print 8 bytes at rsi, then we will reverse those bytes and print the value that they point to. Finally we will print the second argument.

[0x7ffc1b7d3c7c]> psz @ rdi
//bin/sh
[0x7ffc1b7d3c7c]> p8 8 @ rsi
3f3c7d1bfc7f0000
[0x7ffc1b7d3c7c]> psz @ 0x00007ffc1b7d3c3f
//bin/sh
[0x7ffc1b7d3c7c]> psz @ 0x00007ffc1b7d3c2f
-i
[0x7ffc1b7d3c7c]>

From this we can see that this shellcode will simply start a shell, by launching /bin/sh with the argument -i (which will force the shell to launch in interactive mode).

I hope this will be of some help to someone as a simple intro to using radare2 as a debugger. As someone who lives in the terminal as much as possible I am loving using r2, but hopefully this will convince others that it is actually not any harder to use than a visual decompiler / debugger.

PS: I would like some feedback as to what people would prefer to see used for the examples in my articles. Would you prefer the text be placed in a plaintext code block instead of images? I am aware that some people hate posts with too many images and I’ve been meaning to step up my game and actually start writing more posts. You can either let me know in the comment section or on twitter or email me (contact info can be found in the footer of this blog).


2016-06-24: After a comment emailed to me by Otto Ebeling I have made an edit to the article where the call to sys_execve is explained. I got a bit wrong where I was using rsi to address the parameters as a string, but it is actually pointing to an array of strings. Thanks for pointing that out :)

Written on June 22, 2016