Reverse engineering the Linux OS, a first approach
Courtesy of fravia's page of reverse engineering
Well, another VERY remarkable essay, that I am proud to present. SiuL+Hacky tackles
here NEW UNCOVERED ground, and teaches all of you the first elements of Linux reverse
engineering... you would have tought, as I did, that such reversing would have been
useless, since the main characteristic of Linux (and of the whole GNU initiative) was to
give freely the source code of any program. Yet the deficiencies of Windoze are to-day so
evident that more and more "commercial" programmers are turning to Linux despite
all efforts by Gate's lackeys. And if you say "commercial" you say of course
limited egotistical pusillanimous minds, that introduce their banal protection schemes
even into the Linux world, until yesterday incontaminated.
Enjoy this GREAT essay/tutorial by SiuL+Hacky, let's hope that he will send us more essays
on this subject!
BTW, you'll find inside here dasm: a disassembler for Linux
*WRITTEN* by SiuL+Hacky himself!
I. Linux Introduction. ------------------------- Probably all of you know about linux, but I don't know how many people has linux installed in their computers. I have (as many people do) both o.s. in different partitions of my hard-disk. Sometimes people thinks of Operative Systems as religions (it use to happen also with editors), so I'm not gonna tell you: INSTALL IT if you want your soul to be saved ! If you are not sure, after reading this document, I think you should know for sure what to do. A friend of mine told some time ago a joke about Operative Systems compared with Airlines. When you travel with Microsoft Airlines, you may find beautiful women at the checking desk, you may enjoy amazing entertaining shows before departure, when you climb in the aeroplane it is really comfort and full of charming stewardesses. Ok, after taking off the aeroplane explodes and nobody knows why. When travelling with unix airlines you may travel safely, but passengers must carry themselves the pieces of the aeroplane. Unix is for you if you if you feel right working with DOS-boxes under Windows, if you use to work with network environments, if you want speed and safety back (your brand-new Pentium acts like a Pentium, not like 386) and if you find lack of excitement configuring W95 programs. You may recover this bittersweet feeling of being in the middle of a deserted island when things go wrong. But if you hate command line programs with thousands of switches, unix is not for you. One of the main characteristics of linux, is that it's a "free environment". The applications (and kernel itself) are developed by people and are offered to "the world" completely free. Most applications are developed (more or less) under GNU License. Moreover, a lot of the programs are provided with the source code (and you compile it). Though it has been ported to several platforms, is especially popular in x86 computers, and many users come from DOS. II. A Cracker inside Linux world. --------------------------------- Linux is cool for hacking, but I had never heard anything about cracking in linux. As I told you, software is free and there's no "bunch of shareware programmers". Imagine ... protecting a program and giving you the source code, really nonsense. But wait, Linux is not perfect, programs are not beautiful and user-friendly. One of the problems I found from start with linux, is multimedia. Multimedia is new in Dos/Windows world, so the old unix dinosaur, that hasn't changed in the last twenty years (though if you look inside "new" operative systems they are not that different) was not supposed to have lot of multimedia support. I have a cheap Soundblaster clone, and I cannot make it "cry" through my speakers. I am not waiting for Dennis Ritchie saying "bye bye" when logging out, but I like to "play" with sound algorithms and other stuff. Surprisingly in just one day I downloaded two sound-programs with the same nasty protections of their DOS brothers. It is really strange, and I don't know if it is going to be usual in the future; probably it will depend on Microsoft (once more), and if it finally gets into Linux world (now it is just a rumour). Anyway, I decided to crack them. In Linux, people use to program in C (the Linux kernel is made in C) and I found practically no assembler references. I had no idea if cracking linux was gonna be easy or not, but the fact was that I had to start from scratch practically. Most of the utilities I found are binary utilities that come with GCC (GNU C compiler), and that every linux user may find in the different distributions or elsewhere in the Web. I didn't know of their existence, but I had them in my computer. Well, this is for you. III. Tools of the trade. ------------------------- Here you'll find some tools that I have found or make myself, and will make cracking easier. Mostly are "Windoze" brothers. First of all, slight differences, mnemonics are named in a different way. I would say it's even better (Sacrilegious !), but anyway you'll have no problem getting these changes. You just have to be careful with operands, especially in mov instructions, because they are reversed, I mean: mov source, destiny instead of usual DOS: mov destiny, source 1) GDB. GNU Debugger. GNU Compiler has its own debugger, it's called gdb and it has even a front-end for X Windows. It is neither Softice nor DOS Debug, but it is thought to work with the source code and executables with debug information. You can debug a program with assembler instructions, but is not comfortable. For example, you are not seeing the current assembler instruction, nor registers. This do not pretend to be a replace for the man page of gdb. There are lots of useful information in books or INFO documents, but here you'll get some useful clues for starting. It has some features that you cannot find in Softice, for instance, you can debug a program that is already running ! You may use the "attach" command for it. Gdb runs in a virtual console, so may run your favorite programs while debugging. Assembler instructions are executed with the "stepi" and "nexti" commands, but you cannot fire the program with these instructions. The programs are broken with Control-C, but you will not "surf" inside every instruction of kernel code. Usually you'll stop the program (for instance while waiting for a key) in a system call. Programs do not use to call directly to system calls, because a kernel update could make them crash. They call C functions, and C libraries (more or less like DLLs) will make the system calls. If you want to see a disassembled listing, use the "disassemble" command ("disas" will do also) + an address (0xaddress), though that address is just used to get a function (the function owner of the inst. with the address given) and gdb shows you the whole listing of the function from start. That's not cool, you know, life is tough. At least you can see current instruction with "display/i $eip". After breaking the program use "Continue" to resume execution. The "display" command is also good for showing the value of a particular register (don't forget $ sign), but if you want to show all registers use "info registers". Finally if you want to change their value use "set $eax=3" for instance. There's a wide range of breakpoints. You can set usual breakpoints "br *address", clear them, disable them, use conditional breakpoints (YES!), hardware breakpoints ... And finally the "backtrace" command is more or less like Softice "stack", and "finish" should make 'p ret', but do not trust it very much. Well there are lots of commands, study them, but after realizing the power of the dead approach, I'm sure you will not want gdb anymore. 2) STRACE This is really a nice tool, especially for spying the program and its behaviour. It logs every System Call made by a program, WITH PARAMETERS and in a way you'll love it, as I'll show you afterwards. I like to use it this way: strace -oOUTPUT_FILE -i TARGET_FILE where OUTPUT_FILE is the file where you want the log to be dumped. -i: appends the value of eip when the call was made. It seems like a bliss, but be careful: LIBRARIES USE TO MAKE SYSTEM CALLS, not programs. 3) STRINGS It should be a great tool, because show you strings inside a binary file, and then you can identify the evil program that is punishing you, but there's a simple and easier way to do it using the amazing "grep" command. For example if you are looking for strings as "Register", run this: grep Register * and it'll show you all the files in the current directory containing the string "Register". But the first field of this command is a general PATTERN, so it may be an exact match or a match as complicated as you want (learn REGULAR EXPRESSIONS for it). 4) HEX EDITORS What is a crack, without an Hex-Editor ? ("mental" cracking is hard, by now). There are very few of them in Unix (that I know of). Get one of them at: ftp://vieta.math.uni-sb.de/pub/misc/hexer-0.1.4c.tar.gz It uses "VI"-style. You know, vi is the "official" editor in Unix. It seems that every "cool-unix-guy" must love it, or he'll be an "aficionado". I do prefer JOE, which "looks-like" old WordStar and old WordPerfect and you'll know how to quit the first time you run it :-). Anyway, you may use, as I do, good Dos HEXEDITORS like Norton Diskedit (version 4 or 5). I'm not kidding, a DOS emulator (DOSEMU) is available in Linux, and works fine with real mode and DOS4GW programs. There's a Windows emulator, but it is long ago in " an early alpha stage ". Don't try it. 5) OBJDUMP Well, at last a candle in the middle of the darkness. If is difficult to find assembler references, to find disassembling references is like looking for Money 3.0 (perhaps FidoNet has again the answer :-). I found only a switch in this program that gives a "dump disassembly". This program gives you the information and data of the different sections (more about sections later) of a linux object (executable) file. It is possible to get the assembler listing of a program you have made (there's a switch in the compiler), but objdump is the only program I found that disassemble an arbitrary executable. It also gather information of the different "Sections" of the file. But the problem, is that there's no analysis information in the disassembled file. Some switches of objdump: -d: Displays the assembler mnemonics contained in the code Sections. Note that mnemonics are displayed in the "linux-way". Something like this: 0804a37a repnz scasb %es:(%edi),%al 0804a37c notl %ecx 0804a37e movl %ecx,0xfffffc0c(%ebp) 0804a384 movb $0x0,0xfffffc16(%ebp,%ecx,1)Download dasm.txt here! (If you want to save a web file and you don't know how, and all it does is display on the screen, try to hold down the shift key when you click on it: it might solve your problem :-)
I programmed it in PERL. Why ? Well since my very first steps in perl I realize it was perfect for text-processing files (I knew nothing about sed, awk ...). The syntax is not very beautiful or high-level-looking; it's an interpreted language, so it is not the fastest. Anyway it always has the tools you are looking for (or you always dreamt of) and enables you to do a lot of things at the same time. It's very popular in CGI scripts. I learnt perl and CGI with a very good book by Eric Herrmann. Sorry, I tried not to make it very cryptic, but PERL is PERL, and if you don't know perl you'll probably don't understand it. For this reason I'll explain how it works. BTW a perl interpreter (perl 5.0) may be found in any LINUX distribution, though interpreters for DOS are available too. Well let's start with jmp/call processing: - The (DYNAMIC) SYMBOL TABLE is read and the elements are put into an associative array indexed by the addresses. For instance: $st_element{"0xprint_address"}="print"; - Then all call / jmp instructions are processed into another associative array, in this way: $jumping{"jump_to_address"}="jump_from_address"; - After this, the addresses of assembled instructions (from .text section) are checked against $jumping elements, and if it do exists, the reference is written. - In the same process, call instruction are processed and if they call a function from the symbol table, it is also written. For string processing, we must get further knowledge of how executables are build in linux. The most common format is ELF-32bits ( Executable and Linkable Format). The structure of the object is : * ELF HEADER * PROGRAM TABLE HEADER * SECTION 1 * ... * SECTION N * SECTION HEADER TABLE These sections will be "segments" when the program is executed. Some important sections are .init (initialization code), .fini ( termination code), .data (pretty obvious), .text (code), .rodata (Read-only data), and so on. Do you remember lesson 8.1 and Win32 exe files ? Don't you think it's pretty much the same ? These are ELF-TYPES: Elf32_Addr 4 bytes unsigned Elf32_Half 2 bytes unsigned Elf32_Off 4 bytes unsigned Elf32_Sword 4 bytes signed Elf32_Word 4 bytes unsigned And ELF Header is something like this: typedef struct { unsigned char e_ident[16]; Elf32_Half e_type; Elf32_Half e_machine; Elf32_Word e_version; Elf32_Addr e_entry; Elf32_Off e_phoff; Elf32_Off e_shoff; Elf32_Word e_flags; Elf32_Half e_ehsize; Elf32_Half e_phentsize; Elf32_Half e_phnum; Elf32_Half e_shentsize; Elf32_Half e_shnum; Elf32_Half e_shstrndx; } Elf32_Ehdr; For us, is important the member e_shoff, that keeps information about the file offset of the Section Header Table. The SHT is an array of Elf32_Shdr structures. The element e_shnum tells the number of entries in the SHT, and e_shentsize gives the size in bytes of each entry. This is the Elf32_Shdr: typedef struct { Elf32_Word sh_name; Elf32_Word sh_type; Elf32_Word sh_flags; Elf32_Addr sh_addr; Elf32_Off sh_offset; Elf32_Word sh_size; Elf32_Word sh_link; Elf32_Word sh_info; Elf32_Word sh_addralign; Elf32_Word sh_entsize; } Elf32_Shdr ; The offset of each section is taken from each sh_offset member. The name of each section is a little bit more complicated, because sh_name is an index into the section header String Table Section. Well, stop, I don't want you to get confused. Fortunately, objdump give us that information. Strings are located in the .rodata Section (for obvious reasons), and objdump gives the file offset of the section. If you want complete information on ELF format, there's a PostScript document for you: ftp://tsx-11.mit.edu/pub/linux/packages/GCC/ELF.doc.tar.gz There (or in any other mirror), you'll find a lot of interesting things. Ok, then for string processing, dasm reads Section .rodata offset, and get its content from the binary file. We get starting address and size of .rodata section, so to make string processing: - The whole .rodata section is read in a variable. - Dasm looks for inmediate operands (with $ prefix) and checks if they own to .rodata section. - If true, the string (null terminated) is extracted from .rodata section, and the reference is written. The rest, is dirty details about format processing. The program calls objdump, and you just have to use it this way: dasm exec_file processed_output_file I've tested it with several programs, but if you find any bug, problem or you have any question, suggestion or whatever, report them to me at: [email protected] NOTE: In dasm, I don't use the hex values of the instructions (switch --show-raw-insn), because the output is not tabbed and it wastes disk space. When we'll need this data, I'll show you how to get it easily. IV. THE CRACKS --------------- For applying all this theory, we're gonna crack the couple of programs I told you. I chose them because they are very different and appropriate for beginning, you'll see. The first one is a disabled program with password registration, the second one is a trial with 2 level of time protection and the same nasty behaviour of its windows brothers. 1) ftp://ftp.fhg.de/pub/layer3/l3v270.linux.tar.gz What the hell is this ? Well, it's an encoder/decoder of MPEG layer III. If you don't know about it, it's a standard for audio compression (a really exciting subject). Every time you run the decoder you're asked about entering a registration code, because sample rates and other features are restricted to "registered users". Let's have some fun with the new tools: "strace -oSalida l3dec" will dump system calls in a file called Salida. Do it, answer that you don't want to enter Reg.Cod., and get something like this (filtered by me): write(2, "\n*** l3dec V2.70 ISO/MPEG Au"..., 71) = 71 write(2, "| "..., 71) = 71 write(2, "| copyright Fraunhofer"..., 71) = 71 write(2, "| "..., 71) = 71 <<<< Look! It is writing the file header open("./l3dec", O_RDONLY) = 4 <<<< get current directory close(4) = 0 open("./register.inf", O_RDONLY)=-1 ENOENT (No such file or directory) <<<FILE sndconf seconds of evaluation time left -> FILE modules/soundbase The second file is not executable, is a "relocatable Elf file" (a module). No problem. It is logical, for a countdown the protection must dwell in a resident program. This protection is a little bit more complicated than the first one, but is not a tough protection at all. Dasm sndconf, and look for "License expired" (Be indulgent with this long listing, trust me, it's easy): 08052101 cmpl %esi,0x10(%eax); <<<< some comparing 08052104 jl 08052110; <<<< if not less flag=0 08052106 movl $0x0,0xfffffd84(%ebp) Referenced from jump/call at 080520f3 ; 08052104 ; 08052110 cmpl $0x0,0xfffffd84(%ebp); <<< flag=1 seems to be good 08052117 jne 08052150; <<< jump somewhere 08052119 pushl %ebx; <<< the game is over outlaw! 0805211a pushl %edi Possible reference to string: "License expired: %02d/%04d" 0805211b pushl $0x806fc08 Reference to function : printf 08052120 call 08049138 Possible reference to string: "Please download a fresh version from http://www.4front-tech.com" 08052125 pushl $0x806fb97 Reference to function : printf 0805212a call 08049138 0805212f pushl %ebx 08052130 pushl %edi Possible reference to string: "License expired: %02d/%04d" 08052131 pushl $0x806fc08; <<<< I love this formatted strings 08052136 pushl $0x807e6d0 Reference to function : fprintf 0805213b call 08049368 08052140 addl $0x20,%esp 08052143 pushl $0xffffffff Reference to function : exit 08052145 call 08049598; <<<< beggar off 0805214a leal 0x0(%esi),%esi Referenced from jump/call at 08052117 ; <<< Do you remember the flag ? 08052150 movl $0x1,0xfffffd84(%ebp); <<< jump here if above flag=1 0805215a movl 0xfffffd94(%ebp),%eax 08052160 movl %eax,0xfffffd80(%ebp) 08052166 decl %eax 08052167 movl %eax,0xfffffd94(%ebp) 0805216d movl 0xfffffd80(%ebp),%esi 08052173 decl %esi 08052174 jns 08052186 08052176 decl 0xfffffd90(%ebp) 0805217c movl $0xb,0xfffffd94(%ebp) Referenced from jump/call at 08052174 ; 08052186 movl 0xfffffd7c(%ebp),%eax 0805218c movl 0x14(%eax),%edx 0805218f movl 0xfffffd90(%ebp),%ecx 08052195 cmpl %ecx,%edx 08052197 jle 080521a3; <<< jumping flag=0 08052199 movl $0x0,0xfffffd84(%ebp);<<< flag=0 BAD GUY ! Referenced from jump/call at 08052197 ; 080521a3 cmpl %edx,%ecx 080521a5 jne 080521c2 080521a7 movl 0xfffffd94(%ebp),%eax 080521ad movl 0xfffffd7c(%ebp),%esi 08052160 movl %eax,0xfffffd80(%ebp) 08052166 decl %eax 08052167 movl %eax,0xfffffd94(%ebp) 0805216d movl 0xfffffd80(%ebp),%esi 08052173 decl %esi 08052174 jns 08052186 08052176 decl 0xfffffd90(%ebp) 0805217c movl $0xb,0xfffffd94(%ebp): Referenced from jump/call at 08052174 ; 08052186 movl 0xfffffd7c(%ebp),%eax 0805218c movl 0x14(%eax),%edx 0805218f movl 0xfffffd90(%ebp),%ecx 08052195 cmpl %ecx,%edx 08052197 jle 080521a3; <<< jumping again badflag 08052199 movl $0x0,0xfffffd84(%ebp); <<< flag =0 Referenced from jump/call at 08052197 ; 080521a3 cmpl %edx,%ecx 080521a5 jne 080521c2 080521a7 movl 0xfffffd94(%ebp),%eax 080521ad movl 0xfffffd7c(%ebp),%esi 080521b3 cmpl %eax,0x10(%esi) 080521b6 jl 080521c2; <<< again 080521b8 movl $0x0,0xfffffd84(%ebp) Referenced from jump/call at 080521a5 ; 080521b6 ; 080521c2 pushl %ebx 080521c3 pushl %edi Possible reference to string: "License will expire after: %02d/%04d" 080521c4 pushl $0x806fc24 Ejem, if flag=1 your license don't expire, and then lot of possibilities of flag=0. Pretty obvious. Use your favorite dos/unix hexeditor (or copy the file to your dos partition, reboot and run the damned Windoze hexeditor) and do a general Search/Replace: (... objdump -d --show-raw-insn sndconf | grep 080521b) Every c7 85 84 fd ff ff 00 00 00 00 movl $0x0,0xfffffd84(%ebp) changes to: c7 85 84 fd ff ff 01 00 00 00 movl $0x1,0xfffffd84(%ebp);ALWAYS GOOD! You'll notice that the message even disappear. But we must get rid of the countdown too. Dasm soundbase and look for "seconds" (you may see that this file has line information): Possible reference to string: "OSS: The evaluation time has elapsed. Please reload the driver." <<<< if you're executing this part <<<< you are a really bad guy 00005901 <sound_open_sw+71> pushl $0x944 RELOC: 00005902 R_386_32 .rodata; << look! objdump smts helps 00005906 <sound_open_sw+76> call 00005907 <sound_open_sw+77> <<< movl $0xffffffed,%eax Possible reference to string: "d: Driver partially removed. Can't open device" <<<< String references sometimes fail 00005910 <sound_open_sw+80> addl $0x4,%esp 00005913 <sound_open_sw+83> popl %ebx 00005914 <sound_open_sw+84> popl %esi 00005915 <sound_open_sw+85> ret 00005916 <sound_open_sw+86> leal 0x0(%esi),%esi 00005919 <sound_open_sw+89> leal 0x0(%esi,1),%esi Referenced from jump/call at 000058ff ; 00005920 <sound_open_sw+90> movl 0x0,%eax RELOC: 00005921 R_386_32 jiffies_R2f7c7437 00005925 <sound_open_sw+95> subl %eax,%edx 00005927 <sound_open_sw+97> movl %edx,%eax Possible reference to string: "en configured" 00005929 <sound_open_sw+99> movl $0x64,%ecx 0000592e <sound_open_sw+9e> xorl %edx,%edx 00005930 <sound_open_sw+a0> divl %ecx,%eax 00005932 <sound_open_sw+a2> pushl %eax Possible reference to string: "OSS: %d seconds of evaluation time left" <<< Here you are a not so good guy 00005933 <sound_open_sw+a3> pushl $0x99e RELOC: 00005934 R_386_32 .rodata 00005938 <sound_open_sw+a8> call 00005939 <sound_open_sw+a9> RELOC: 00005939 R_386_PC32 printk_Rad1148ba; << printing what? Possible reference to string: "river partially removed. Can't open device" 0000593d <sound_open_sw+ad> addl $0x8,%esp Referenced from jump/call at 000058e8 ; 000058ec ; 000058f6 ; 00005940 <sound_open_sw+b0> movl %ebx,%eax; <<<I want to jump here ! Look at this, before seeing the rest of the code: - If you are a not so good guy you come from 58ff - You bypass the countdown message if you come from 58e8;58ec and 58f6 - If you don't get these jumping you are a really bad guy. It seems to be a REAL HOT AREA. Ok, you cannot wait anymore, I'll show you: 000058e0 <sound_open_sw+50> movl 0x1148,%edx RELOC: 000058e2 R_386_32 .data 000058e6 <sound_open_sw+56> testl %edx,%edx 000058e8 <sound_open_sw+58> je 00005940; <<< FIRST OPPORTUNITY 000058ea <sound_open_sw+5a> testl %ebx,%ebx 000058ec <sound_open_sw+5c> je 00005940; <<< movl %ebx,%eax Possible reference to string: "artially removed. Can't open device" 000058f0 <sound_open_sw+60> andl $0xf,%eax Possible reference to string: " Driver partially removed. Can't open device" 000058f3 <sound_open_sw+63> cmpl $0x6,%eax 000058f6 <sound_open_sw+66> je 00005940; <<< THIRD ONE 000058f8 <sound_open_sw+68> movl 0x0,%eax RELOC: 000058f9 R_386_32 jiffies_R2f7c7437 000058fd <sound_open_sw+6d> cmpl %edx,%eax 000058ff <sound_open_sw+6f> jbe 00005920; <<< LAST ONE EVEN BEING <<< A NOT S.G. GUY If i'm honest i don't like this variety. If you look for hits for the FIRST key variable 0x1148 (apparently 0x1148=0 is a good thing), it is never (directly) assigned to 0. I don't like, perhaps it works, but I do prefer the other two options (that deal with the same thing). Change: 000058f0 <sound_open_sw+60> 83 e0 0f andl $0xf,%eax 000058f3 <sound_open_sw+63> 83 f8 06 cmpl $0x6,%eax 000058f6 <sound_open_sw+66> 74 48 je 00005940 to: 000058f0 <sound_open_sw+60> 83 e0 0f andl $0xf,%eax 000058f3 <sound_open_sw+63> 83 f8 06 cmpl $0x6,%eax 000058f6 <sound_open_sw+66> eb 48 jmp 00005940 It apparently works, and I say apparently 'cause I told before that this buggy module doesn't work anyhow :-) Well, easy cracks for a new area. Good linuxing ! SiuL+Hacky
(c) SiuL+Hacky 1997. All rights reversed