How HyperLock 386 Works and How to Crack It? - (C) by Dirk Gently



Such articles can't be read in usual papers, that's for sure :) I was ill at ease as I coulnd't decide whether I should allow a papermag to release my article- I was afraid of BSA and its comrades, of course. At last, in 1994, as a part of a big article of mine on Turbo Debugger and other debugging tools, I published it. Fortunately, nobody wanted to arrest me :)
By reading this article, we'll understand how these, very effective and anti-cracker protections work. I'll show you how to foil hackers armed with Turbo Debugger (and other deebuggers as well) by writing such, top protections.
You'll have to have the following programs to understand/ try the following part. Turbo Debugger is required, in the first place. TDUMP, which is an additional utility of TD, should be used to check the initial CS:IP address of the program we are going to debug. And, finally, one should need a program that contains such a protection. I've chosen Mega-EM, the well-known MT-32 and GMidi emulator for the GUS. Click here to download it. The ZIP contains both Mega-EM 2.00023b (1603.exe) and Mega-EM 2.02 (1622.exe). Before starting their cracker programs, 1603dec.exe and 1622dec.exe respectively, rename Mega-EM's to megaem.exe. Be warned, do NOT confuse the two versions, as MegaEMs have different initial CS:IPs and it must be supplied when compiling the sources of crackers. All other 1.X-2.X versions have been protected with HyperLOCK 386, but the newest versions have no protection (the 3.X series). The protected MegaEM versions all have some 90k size (there are some old MegaEMs shipped with commercial games that aren't protected. They are some 26 kBytes long).
Of course, I have to tell you why I've chosen just MegaEM. Well, it was in 1993 that I bought my first GUS. I was very happy, of course, when I heard of MegaEM, which was a brand new product those times. Well, I wanted to know how it works- I was the most important cracker here those times. I was only too pleased to hear that MegaEM's author refused releasing the sources of MegaEM. And, in addition, I didn't want to pay 50 US$s for registrating MegaEM. So, I didn't hesitate for long, I started cracking it. And, to tell the truth, it took me quite a long time - abbout a day! - to crack it :)
Well, let's talk about the protection itself. It uses so-called layer encrytpion. What does it mean? We'll see that the protection, which has been put on the original MegaEM, has layers, which cause a lot of problems when trying to debug them, because only the first layer contains executable code, all the other, inner layers are decoded when the outer layers run. You can guess that, of course, the lack of code segment protection helps a lot when writing such protections- one doesn't have to reserve memory to decode encrypted code areas.
Let's load MEGAEM.EXE under TD. The PC (Instruction Pointer / Program Counter) will be set to 0123. There is a JMP 01CA command at this position, which seems usual. The same stands for the area from 01ca - the code seems logical and executable. Oops... What's that from 01ea? The cmp ax,3092 command looks OK, but commands like enter 4321, 5D can't be considered to be real instructions, used in usual, MS-LOSS-based programs. And we could go on checking how strange 'commands' the code has. So, one must assume that the code between 01ca and 01e9 decodes this second layer, and, when the first layer (between the above position) ends, the execution continues on 01ea, which position contains a REAL command by that time.
I used bold to mark the code that changes in other links of the protection. Just compare e.g. this area of protection of MegaEM2.02 to that of some other versions and you'll see what I mean. This means, of course, that our program which will crack ANY link of a particular version of HyperLOCK 386 must have check what commands are there in the recent linking of HLOCK 386, import them all and CHANGE its own commands to them. This requires, of course, that the cracker routine we're going to write must be written, at least partly, in assembly.

cs:01CA cli
cs:01CB mov    di,0123
cs:01CE mov    al,11
cs:01D0 mov    cx,0648
cs:01D3 mov    ah,cs:[di]
cs:01D6 cmp    di,01EA   ; 01ea= the starting address of the 2nd layer
cs:01DA jb     01DF
cs:01DC xor    cs:[di],al
cs:01DF inc    di
cs:01E0 sub    al,ah
cs:01E2 shr    al,1
cs:01E4 xor    al,ah
cs:01E6 rol    al,1
cs:01E8 loop   01D3
cs:01ea cmp    ax,3092 [...] ; this is the first, non-decoded command of
				 ; the 2nd layer

Try to execute this 1st layer. Try both Run to... (f4) and Step-execution (f7). Of course, the latter requires a big deal of patience, as the mov cx,0648 command is being executed $648*(10 times, if di is bigger than 01ea, and 9 times, if less). Anyway, it's very instructive to notice that pressing the F8 on 01e8> LOOP causes the debugger to freeze.
It dosn't matter which way would you execute this first layer, the code of the second layer won't work. Why?
One might assume that we didn't do anything illogical. The code of the first layer seems OK- it seems to be a simple routine that computes the XOR sum... but let's stop for a moment, and check more accurately WHAT code this routine makes the CRC sum FROM! The point is that the code counts this internal CRC sum not only from locations outside this 01d3-01ea (the loop), but also it includes the inner locations. And think of it: if you set up a breakpoint in this loop, it will cause the Turbo Debugger to put a CCh op. code there (Int 3). When the cycle READS (at cs:01D3) the actual executable code, byte by byte, it'll read this CC and NOT the original code. This is what the protection of this first layer is based on. Of course, swapping this CC opcode and the original opcode is transparent when debugging routines that do NOT read from their code area to find out whether they're being debugged or not; but ANY routine can invent such tricks to disallow correct decoding of the inner layers if it computes an initial, preferably at least 16-bit long CRC sum (the routine above uses only a 8-bit one. It is a weak point of it: think of it, if you can't find out how to execute the above routine without getting the XOR sum wrong, but you can RUN 256 TESTS with XOR values from 0 to FF to check which initial XOR value decodes the 2nd layer. You'll find it very fast).
So far, we've learnt how to recognise the presence of a debugger. You can see that without considering the RIGHT process of debugging the above routine you'll NEVER find out how to restore the second layer. Simply pressing F9, of course, doesn't put any CCs anywhere, but the machine executes all the commands so fast that you won't have enough time to press Ctrl-Break to see what happened to Layer 2 etc.... because all the inner layers (regarding HLOCK 386, it's the third layer that erases the entire memory area of layer 1 and layer 2 before giving back the execution to the real program that is being protected (Mega-EM, before linking the HLOCK 386. If you still don't see how such a linking works, I recommend you to check my IntroMaker Toolkits on my homepage. It's very instructive to check all of them,, because, as I've noticed, most of PC-coders/crackers don't really know how to handle the EXE header, how to 'infect' EXE files etc...).
Still, how should we debug the above routine? One has to notice at once that the routine reads only one byte at once, while the LOOP that reads and processes this byte is some 10-15 bytes long. So, if we move the cursor to 01e8 and keep pressing F4 while the value of DI rises from its initial 0123 to 01d4 (the maximum we may allow should be 01e8- if we allow DI to reach this value, the loop would read the opcode from the point we're standing at (with the false CC, of course) only in the next cycle. So, reaching this DI value, let's move the cursor to e.g. 01d3, which is the first command in the loop, and go on pressing F4. Before long, DI will exceed 01e9- we can see that decoding of layer 2 works and it is done without problems. NOW we can move the cursor to 01ea and we can press F4 there to execute the remaining cycles in one step.
This was the first layer- and it was hard enough, wasn't it? And, unfortunately, I have to announce that the second layer is much more complicated.
Layer 2 can't be run under TD at any rate, because it operates with not only detecting opcodes of INT 3's, but it also rewrites fundamental interrupt vectors that DO freeze Turbo Debugger. This part is even more useful for them who want to write USABLE protections.
The following code is the already de-XOR-ed layer 2.
cs:01EA push   ds
cs:01EB push   cs
cs:01EC pop    ds
cs:01ED pop    es
cs:01EE mov    [0110],es  ;we save ES in order to restore it (because in the
                          ; next step we clear ES)
cs:01F2 mov    es,[010A]  ;it's very instuctive to check the way the
                          ; protection zeroes the register! No immediate
                          ; values, nothing. Loading 0 to registers
                          ; immediately would make most crackerds think
                          ; that the program is going to rewrite the
                          ; interrupt table, which requires a segmentregister
                          ; to have a zero value (of course noone would
                          ; be so naive to think that an author of a
                          ; protection would use 'usual' and 'official'
                          ; ways of updating the intr. descriptor table :-)
cs:01F6 mov    ax,[0110]
cs:01F9 add    ax,0010
cs:01FC add    [010A],ax
cs:0200 add    [010C],ax
cs:0204 add    [010E],ax  ;we save the recent CS (depending on DOS - where
                          ; has it loaded the EXE. We update with this
                          ; value 010a, 010c and 010e.
cs:0208 push   es:word ptr [0000]
cs:020D push   es:word ptr [0002] ; if we haven't noticed yet that we were
                                  ; going to read/write intr. table, well,
                                  ; this is the point where one SHOULD
                                  ; notice it :) Saving Int0's original
                                  ; offset and segment.
cs:0212 push   cs
cs:0213 pop    es:word ptr [0002] ; and we point INT 0 to the ACTUAL CS.
                                  ; of course, we don't load CS immediately
                                  ; to avoid crackers' noticing the fact
                                  ; that we're going to address an OWN
                                  ; routine.
cs:0218 mov    ax,051F
cs:021B mov    es:[0000],ax       ; yes, Int0's new address is CS:051f

cs:021F push   es:word ptr [0004]
cs:0224 push   es:word ptr [0006]
cs:0229 push   cs
cs:022A pop    es:word ptr [0006]
cs:022F mov    ax,051F
cs:0232 mov    es:[0004],ax       ; and we do the same to INT 1. It'll point
                                  ; to CS:051f
cs:0236 push   es:word ptr [0008]
cs:023B push   es:word ptr [000A]
cs:0240 push   es:word ptr [000C]
cs:0245 push   es:word ptr [000E]  ;saving the original values of int 2 and
                                   ; int 3
cs:024A push   cs
cs:024B pop    es:word ptr [000E]
cs:0250 mov    ax,02C1
cs:0253 mov    es:[000C],ax        ;int 3 = cs:02c1
cs:0257 push   es:word ptr [0018]  ;int 6 = cs:0274
cs:025C push   es:word ptr [001A]
cs:0261 push   cs
cs:0262 pop    es:word ptr [001A]
cs:0267 mov    ax,0274
cs:026A mov    es:[0018],ax
cs:026E jmp    0279                ; we avoid some datas and int6's entry point

cs:0270 stc
cs:0271 lock dec sp                ; don't be afraid of such commands,
                                   ; this is data area
cs:0273 pop    es
cs:0274 xor    ax,ax               ;int 6 entry point
cs:0276 jmp    049E
cs:0279 xor    ax,ax
cs:027B mov    es,ax
cs:027D mov    ax,[0436]
cs:0280 mov    es:[0008],ax
cs:0284 mov    ax,[0471]
cs:0287 mov    es:[0004],ax
cs:028B mov    bh,00
cs:028D mov    si,0123
cs:0290 cld
ciklus: cs:0291 lodsb             ;yes, we're going to compute a new
                                  ; CRC-sum, starting from CS:0123 again.
                                  ; Direction flag=0, so we'll address
                                  ; ascending addresses.
cs:0292 xor    bh,al
cs:0294 push   si
cs:0295 mov    ax,0911
cs:0298 mov    si,4647
cs:029B mov    di,0000
cs:029E mov    dx,0497
cs:02A1 int    03                 ;we'll call INT 3, which is, of course,
                                  ; our OWN intr. routine. The datas we're
                                  ; going to pass it are in the registers.
                                  ;Int 3 modifies the XOR we're computing,
                                  ; and, in addition, it also modifies both
                                  ; int1 and int2 - it uses them as temporary
                                  ; registers. Turbo Debugger freezes if
                                  ; anything modifies Int1/ int2 so it's
                                  ; impossible to debug this routine. And
                                  ; the usual debugging trick (using int f1
                                  ; instead of 01) would require a lot of
                                  ; effort, as regards checking the actual
                                  ; code position we're reading from and
                                  ; correct the byte we've read if we use
                                  ; e.g. an int f3 command instead of int 03
                                  ; etc. Of course, I prefer transforming
                                  ; these 'dangerous' interrupts to such
                                  ; a high, unused area, and thsi hlock 386
                                  ; was one of the very few protections that
                                  ; made it almost impossible to use simple
                                  ; interrupt transforming.

cs:02A2 add    bh,dh
cs:02A4 push   sp
cs:02A5 pop    ax
cs:02A6 cmp    ax,sp
cs:02A8 jne    02BB
cs:02AA pushf
cs:02AB pop    ax
cs:02AC mov    si,ax
cs:02AE xor    ax,7000
cs:02B1 push   ax
cs:02B2 popf
cs:02B3 pushf
cs:02B4 pop    ax
cs:02B5 xor    ax,si
cs:02B7 je     02BB
cs:02B9 jmp    0326             ;it jumps at any rate
cs:02BB mov    ax,0002
cs:02BE jmp    049E             ;the code from 02a4 doesn't affect TD at all.
                                ; (so we'll never jump to 049e from 02be)

int3: cs:02C1 push   es         ;here begins int 3
cs:02C2 push   ax
cs:02C3 push   bx
cs:02C4 push   cx
cs:02C5 cmp    ax,0911
cs:02C8 jne    0320
cs:02CA xor    ax,ax
cs:02CC mov    es,ax
cs:02CE mov    ax,es:[0008]
cs:02D2 mov    bx,es:[0004]
cs:02D7 mov    cx,ax
cs:02D9 mul    word ptr [03C5]
cs:02DD shl    cl,1
cs:02DF shl    cl,1
cs:02E1 shl    cl,1
cs:02E3 add    ch,cl
cs:02E5 add    dx,cx
cs:02E7 add    dx,bx
cs:02E9 shl    bx,1
cs:02EB shl    bx,1
cs:02ED add    si,di
cs:02EF add    dx,bx
cs:02F1 add    dh,bl
cs:02F3 mov    cl,05
cs:02F5 shr    di,03
cs:02F8 shl    bx,cl
cs:02FA add    dh,bl
cs:02FC shl    si,04
cs:02FF add    ax,0001
cs:0302 adc    dx,0000
cs:0305 mov    es:[0008],ax  ; TD freezes here (rewriting int1/2)
cs:0309 mov    es:[0004],dx
cs:030E pop    cx
cs:030F pop    bx
cs:0310 pop    ax
cs:0311 push   bx
cs:0312 add    bh,dh
cs:0314 mov    bl,bh
cs:0316 xor    es:[0008],bx
cs:031B pop    bx
cs:031C pop    es
cs:031D ret    0004             ;=iret, end of int3
cs:0320 mov    ax,0003          ;'unstable system'.
cs:0323 jmp    049E             ;error, error code in ax, exiting (after deleting
                                ; layer 2, of course)

cs:0326 pop    si               ;we jump here from 02b9
cs:0327 xor    eax,eax
cs:032A push   ax
cs:032B push   bx
cs:032C push   dx
cs:032D mov    ax,FFFF
cs:0330 mov    dx,FFFF
cs:0333 mov    bx,0001
cs:0336 div    bx  ;we call Int 0 by dividing by 0. of course, TD doesn't
                   ;execute int 0's, either. dh= output.

cs:0338 xor    bh,dh            ;bh is the xor value
cs:033A cmp    si,0551          ;have we reached the starting address of layer3?
                                ; if not, we read the next code byte from
                                ;layer 1/2; if yes, jmp to 052e to decode layer 3

cs:033E jne    0291
cs:0342 jmp    052E

; error handling routine: exiting, after deleting layer 1 and 2
cs:049E push   cs
cs:049F push   cs
cs:04A0 pop    ds
cs:04A1 pop    es
cs:04A2 mov    dx,0345
cs:04A5 cmp    ax,0001
cs:04A8 jne    04AD
cs:04AA mov    dx,03C7
cs:04AD cmp    ax,0002
cs:04B0 jne    04B5
cs:04B2 mov    dx,0438
cs:04B5 cmp    ax,0003
cs:04B8 jne    04BD
cs:04BA mov    dx,0473
cs:04BD mov    di,051C
cs:04C0 mov    si,sp
cs:04C2 sub    si,0040
cs:04C5 mov    dword ptr [di],00000000
cs:04CC add    di,0004
cs:04CF cmp    di,si
cs:04D1 jb     04C5
cs:04D3 std
cs:04D4 mov    di,0345
cs:04D7 dec    di
cs:04D8 mov    cx,di
cs:04DA dec    cx
cs:04DB rep stosb
cs:04DD mov    ah,09
cs:04DF int    21
cs:04E1 xor    ax,ax
cs:04E3 mov    es,ax
cs:04E5 pop    es:word ptr [001A] ; restoring int. vectors
cs:04EA pop    es:word ptr [0018]
cs:04EF pop    es:word ptr [000E]
cs:04F4 pop    es:word ptr [000C]
cs:04F9 pop    es:word ptr [000A]
cs:04FE pop    es:word ptr [0008]
cs:0503 pop    es:word ptr [0006]
cs:0508 pop    es:word ptr [0004]
cs:050D pop    es:word ptr [0002]
cs:0512 pop    es:word ptr [0000]
cs:0517 mov    ax,4C02
cs:051A int    21                ;exit to DOS
cs:051C jmp    0338

cs:051F add    sp,0004           ;single step (int 1) is addressing here
cs:0522 popf
cs:0523 pop    dx
cs:0524 pop    bx
cs:0525 pop    ax
cs:0526 cmp    si,0551
cs:052A jne    0291              ;have we read all the code bytes from l1/2?
cs:052E mov    di,sp             ;we jump here from 0342, too
cs:0530 sub    di,0020
cs:0533 mov    bl,[si]
cs:0535 xor    [si],bh
cs:0537 mov    bh,bl
cs:0539 push   si
cs:053A push   di
cs:053B mov    ax,0911
cs:053E mov    si,4647
cs:0541 mov    di,0000
cs:0544 mov    dx,0497
cs:0547 int    03
cs:0548 pop    di
cs:0549 pop    si
cs:054A add    bh,dh
cs:054C inc    si
cs:054D cmp    si,di
cs:054f jb     0553  ; and we're decoding layer 3 until stack pointer-20.
                     ; SP is given in the EXE header. Of course, its actual
                     ; value is somewhat smaller than given, but it's far
                     ; away from the end of our executable code.



;3rd layer- this decodes the ORIGINAL program.
; Of course, decoding the original code is quite complicated,
; and is varied in each link.

cs:0551 xor    di,di
cs:0553 mov    es,di
cs:0555 mov    ax,[0752]
cs:0558 mov    es:[0008],ax
cs:055C mov    ax,[0754]
cs:055F mov    es:[0004],ax
cs:0563 mov    es,[010A]
cs:0567 cld
cs:0568 mov    ah,[0751]
cs:056C cmp    dword ptr [074D],00008000 ;if the code we have to decode is
                                         ; bigger than 32k, we cut it into
                                         ; 32k-blocks. The last block is
                                         ; decoded by the routine from 066f.

cs:0575 jbe    066F
cs:0579 mov    cx,8000
cs:057C mov    al,es:[di]
cs:057F xor    es:[di],ah
cs:0582 mov    ah,al
cs:0584 dec    byte ptr [075A]
cs:0588 je     05A5
cs:058A add    ah,0D
cs:058D inc    di
cs:058E loop   057C
cs:0590 sub    dword ptr [074D],00008000
cs:0599 mov    di,es
cs:059B add    di,0800
cs:059F mov    es,di
cs:05A1 xor    di,di
cs:05A3 jmp    056C
cs:05A5 push   ax
cs:05A6 push   di
cs:05A7 mov    bx,ax
cs:05A9 mov    ax,0911
cs:05AC mov    si,4647
cs:05AF mov    di,0000
cs:05B2 mov    dx,05C1
cs:05B5 int    03
cs:05B6 pop    di
cs:05B7 pop    ax
cs:05B8 add    ah,dh
cs:05BA mov    byte ptr [075A],20
cs:05BF jmp    058D
cs:05C1 dec    ax  ; another error-handling routine for layer 3
cs:05C2 inc    dx
cs:05C3 dec    di
cs:05C4 dec    di
cs:05C5 push   sp
cs:05C6 or     ax,0E0E
cs:05C9 pop    ds
cs:05CA pop    es
cs:05CB mov    dx,0345
cs:05CE cmp    ax,0001
cs:05D1 jne    05D6
cs:05D3 mov    dx,03C7
cs:05D6 cmp    ax,0002
cs:05D9 jne    05DE
cs:05DB mov    dx,0438
cs:05DE cmp    ax,0003
cs:05E1 jne    05E6
cs:05E3 mov    dx,0473
cs:05E6 mov    di,0653
cs:05E9 mov    si,sp
cs:05EB sub    si,0040
cs:05EE mov    dword ptr [di],00000000
cs:05F5 add    di,0004
cs:05F8 cmp    di,si
cs:05FA jb     05EE
cs:05FC std
cs:05FD mov    di,0608
cs:0600 mov    cx,di
cs:0602 sub    cx,051F
cs:0606 xor    ax,ax
cs:0608 dec    cx
cs:0609 rep stosb
cs:060B mov    di,0345
cs:060E dec    di
cs:060F mov    cx,di
cs:0611 dec    cx
cs:0612 rep stosb
cs:0614 mov    ah,09
cs:0616 int    21
cs:0618 xor    ax,ax
cs:061A mov    es,ax
cs:061C pop    es:word ptr [001A]
cs:0621 pop    es:word ptr [0018]
cs:0626 pop    es:word ptr [000E]
cs:062B pop    es:word ptr [000C]
cs:0630 pop    es:word ptr [000A]
cs:0635 pop    es:word ptr [0008]
cs:063A pop    es:word ptr [0006]
cs:063F pop    es:word ptr [0004]
cs:0644 pop    es:word ptr [0002]
cs:0649 pop    es:word ptr [0000]
cs:064E mov    ax,4C02
cs:0651 int    21
cs:0653 push   ax
cs:0654 push   di
cs:0655 mov    bx,ax
cs:0657 mov    ax,0911
cs:065A mov    si,4647
cs:065D mov    di,0000
cs:0660 mov    dx,05C1
cs:0663 int    03
cs:0664 pop    di
cs:0665 pop    ax
cs:0666 add    ah,dh
cs:0668 mov    byte ptr [075A],20
cs:066D jmp    068C
cs:066F cmp    dword ptr [074D],0000  ;decoding smaller blocks than 32k
cs:0675 je     068F
cs:0677 mov    cx,[074D]
cs:067B mov    al,es:[di]
cs:067E xor    es:[di],ah
cs:0681 mov    ah,al
cs:0683 dec    byte ptr [075A]
cs:0687 je     0653
cs:0689 add    ah,0B
cs:0689 add    ah,0B
cs:068C inc    di
cs:068D loop   067B
cs:068F xor    ax,ax
cs:0691 mov    es,ax
cs:0693 mov    ax,es:[0008]
cs:0697 cmp    ax,[0756]
cs:069B jne    06A7
cs:069D mov    ax,es:[0004]
cs:06A1 cmp    ax,[0758]
cs:06A5 je     06AC
cs:06A7 xor    ax,ax
cs:06A9 jmp    05C7
cs:06AC cmp    word ptr [074B],0000
cs:06B1 je     06D7
cs:06B1 je     06D7
cs:06B3 mov    si,075B
cs:06B6 mov    cx,[074B]
cs:06BA mov    dx,[010A]
cs:06BE cld
cs:06BF mov    di,[si]
cs:06C1 mov    ax,[si+02]
cs:06C4 add    ax,dx
cs:06C6 mov    es,ax
cs:06C8 mov    dword ptr [si],00000000
cs:06CF add    si,0004
cs:06D2 add    es:[di],dx
cs:06D5 loop   06BF
cs:06D7 xor    al,al   ; erasing the entire code area of layer 1, 2 and 3
                       ; between 0123 and 06e4
cs:06D9 mov    di,0123
cs:06D9 mov    di,0123
cs:06DC mov    cx,06E4
cs:06DF sub    cx,di
cs:06E1 push   ds
cs:06E2 pop    es
cs:06E3 cld             ;increasing addresses
cs:06E4 rep stosb
cs:06E6 xor    ax,ax
cs:06E8 mov    es,ax
cs:06EA pop    es:word ptr [001A]  ;restoring intr. vectors
cs:06EF pop    es:word ptr [0018]
cs:06F4 pop    es:word ptr [000E]
cs:06F9 pop    es:word ptr [000C]
cs:06FE pop    es:word ptr [000A]
cs:0703 pop    es:word ptr [0008]
cs:0703 pop    es:word ptr [0008]
cs:0708 pop    es:word ptr [0006]
cs:070D pop    es:word ptr [0004]
cs:0712 pop    es:word ptr [0002]
cs:0717 pop    es:word ptr [0000]
cs:071C mov    ss,[010E]  ;restoring SS
cs:0720 mov    sp,[0749]  ;restoring SP
cs:0724 xor    bx,bx
cs:0726 pushf
cs:0727 xor    cx,cx
cs:0729 mov    bp,sp
cs:072B or     word ptr [bp],0200
cs:0730 xor    bp,bp
cs:0732 push   word ptr [010C]  ;we save the CS of the original program
                                ;(NOT the protection!)

cs:0736 mov    di,ax
cs:0736 mov    di,ax
cs:0738 push   word ptr [0747]  ;and saving the IP of 'infected', protected
                                ; program
cs:073C mov    si,bx
cs:073E mov    es,[0110]
cs:0742 push   es
cs:0743 pop    ds
cs:0744 mov    dx,cx
cs:0746 iret                    ;a FAR (32-bit) RETURN: starting the protected
                                ; program

Debugging such a protection, as we've already seen, is impossible. The only way to decode a program that has been protected with such a protection is SIMULATING it. The easiest way to get over such problems is to SIMULATE them - to write a -preferably assembly-based- program that contains almost the same code, except for the anti-debug code, but reads from FILE and writes the decoded code to a FILE, too. By comparing the code below to the code of the original prtotection you can see how this works.
Of course, when writing a protection, do NOT forget that the original program you're to protect should be encoded in an extremely difficult way. Simple XORs shoudl be avoided etc. Not even increment-XOR should be used.
;CopyRight (C) by DirkGent@iRC
;A *WORKING* c0de to crack HyperLOCK 386.
;1, it pays attention to handle the varying codes
;2, the beginning values of decoding
;
;Input filename: megaem.exe
;and the output: decoded.exe
;
;there is only one must for you: you have to TDUMP the file you want to
;free and write the INITIAL CS VALUE into the following EQU:
;        CSWithAntidebugRtn equ 1603h (at the 23rd row!)
; the present value is for MegaEm 2.02

; Fortunately, it IS possible to track this routine! The Interrupts
;used by THIS routine:
; int0  -> int 0f0h     3c0-3c3
; (int1 -> int 0f1h     3c4-3c7)
; (int2 -> int 0f2h     3c8-3ca)
; int3  -> int 0f3h     3cb-3ce
; (int8 -> int 0f4h     ...)

.model small
.386
.code
jmp start

;!!!! one EQU, to be supplied BEFORE assembling this cracker routine !!!!!!
CSWithAntidebugRtn equ 1623h

FNameIn db 'MEGAEM.exe',0
FHandleIn dw 0 ;filehandle of the input file
FNameOut db 'decoded.exe',0
FHandleOut dw 0
Buff	db 0
TempDD  dd 0 ;for conversion between 32- and 16-bit registers (SEEK)
;the three parameters which are readable without decryption
OrigCS dw 0 ;the original CS in the EXE file
OrigSS dw 0 ;the original SS in the EXE file
_01cf db 0  ;the beginning XOR value at 01cf (the first decryptor rtn)

;the second set of parameters, grabbable after/under decoding the code from 01eah
_0298SIValue dw 0  ;used for int3 call in the XOR maker rtn
_029bDIValue dw 0
_03c5 dw 0  ;value to multiply in the int 3 rtn
_0436 dw 0 ;begin
_0471 dw 0
_053eSIValue dw 0  ;used for int3 call in the decoder routine of the code from 0551
_0541DIValue dw 0

;the third set of parameters, grabbable after/under decoding the code from 0551h
OrigIP dw 0  ;the original IP in the EXE file
OrigSP dw 0  ;the original SP in the EXE file
_074d dd 0  ;the size of the program to be decompressed
FileSize dd 0 ;as above
_0751 db 0  ;XOR byte to begin with (ah)
_0752 dw 0  ;INT2 beginner offset
_0754 dw 0  ;INT1 beginner offset
_075a db 0  ;counter: when do we have to include an INT3 call while decompressing the main program?

SeekAndWordRead macro AddyOffset,Variable
mov bx,FHandleIn
mov ax,4200h
mov edx,20h+CSWithAntidebugRtn*16+100h+AddyOffset
mov TempDD,edx
mov cx, word ptr [TempDD+2]
int 21h
call _LodsbToBuffer
mov al,Buff ;reading the LOW byte
mov Variable,al
endm

SeekTo macro Addy
mov ax,4200h
mov edx,Addy
mov TempDD,edx
mov cx, word ptr [TempDD+2]
int 21h
endm

SeekAndWordReadFromTargetFile macro AddyOffset,Variable
mov bx,FHandleOut
mov ax,4200h
mov edx,20h+CSWithAntidebugRtn*16+100h+AddyOffset
mov TempDD,edx
mov cx, word ptr [TempDD+2]
int 21h
call _LodsbToBufferFromNewFile
mov al,Buff ;reading the LOW byte
call _LodsbToBufferFromNewFile
mov ah,Buff
mov Variable,ax
endm

start:
push cs
pop ds

mov	dx, offset FNameIn
mov	ax, 3d00h ;open
int	21h
mov	byte ptr [FHandleIn], al

mov	dx, offset FNameOut
mov	ax, 3c00h ;create
mov	cx,0
int	21h
mov	byte ptr [FHandleOut], al

;getting the first XOR value at 01cf before actually starting the decrunching
SeekAndWordRead 00cfh,_01cf

call GrabCode1

SeekTo 0 ;point back to the beginning

;now, step to cs:01ea, saving the word at cs:010c as well (orig. CS)
;and building up the first coded area from 01ea to 0551.

mov ecx,0
mov al,_01cf
ReadTheFirstSectionUntil0551:
call _LodsbToBuffer
inc ecx
 mov ah,Buff
cmp ecx,20h+CSWithAntidebugRtn*16+100h+0eah   ;above 01ea, we must XOR the actual code as well
jbe TheAddyIsBelow0eah
 xor Buff,al
TheAddyIsBelow0eah:
cmp ecx,20h+CSWithAntidebugRtn*16+100h+023h
jbe TheAddyIsBelow023h
ChangeableCode: sub al,ah ;this 4 commands is likely to be changed in every issues
 shr al,1
 xor al,ah
 rol al,1
 cmp ecx,20h+CSWithAntidebugRtn*16+100h+0548h+240+0023h ;!!!!!!!! decoding everything
 je _0550HasBeenWrittenOut
TheAddyIsBelow023h:
call _StosbFromBuffer
jmp ReadTheFirstSectionUntil0551

_0550HasBeenWrittenOut:
call _StosbFromBuffer
;now, get the following bytes from the undecoded file: section 1 and 2
SeekAndWordReadFromTargetFile 0ch,OrigCS
SeekAndWordReadFromTargetFile 0eh,OrigSS
SeekAndWordReadFromTargetFile 02c5h,_03c5
SeekAndWordReadFromTargetFile 0336h,_0436
SeekAndWordReadFromTargetFile 0371h,_0471
SeekAndWordReadFromTargetFile 043fh,_053eSIValue
SeekAndWordReadFromTargetFile 0442h,_0541DIValue
SeekAndWordReadFromTargetFile 0199h,_0298SIValue
SeekAndWordReadFromTargetFile 019Ch,_029bDIValue

;back to 0123h
SeekTo 20h+CSWithAntidebugRtn*16+100h+0023h

;so, we are going to write the third part out- let's compute the beginning XOR
; value!

mov ax,0
mov es,ax
push   cs
pop    es:word ptr [03c2h] ;int 0 segment addy
mov    ax,offset int0
mov    es:[03c0h],ax
push   cs
pop    es:word ptr [03ceh] ;int 3 segment addy
mov    ax,offset int3
mov    es:[03cch],ax

xor    ax,ax
mov    es,ax
mov    ax,_0436         ;!!!!!!!!!VAR!!!!!!!!!!!
mov    es:[03c8h],ax
mov    ax,_0471         ;!!!!!!!!!VAR!!!!!!!!!!!
mov    es:[03c4h],ax
mov    bh,00 ;lameness... It's FIXED to 0 :)
mov    si,0123h ;we read the OutFile to emulate the passed first XOR cycle
WeHaventReached551Yet:
call _LodsbToBufferFromNewFile ;to count the CRC, we have seeked back to
                               ; 0123 in the outfile and we are reading from it
inc si ;additional ofcuz, SI shows us if we have already reached the point from
       ; where we must actually CHANGE the code itself as well
mov al,Buff
xor    bh,al
push   si
mov    si,_0298SIValue
mov    di,_029bDIValue
MOV    DX,0497H ;permanent
int    0f3h
add    bh,dh
;0326!
pop    si
int 0f0h
xor    bh,dh
cmp    si,0551h
jne    WeHaventReached551Yet
jmp    WeHaveReached551

int3: push   es
;push   ax
push   bx
push   cx
xor    ax,ax
mov    es,ax
mov    ax,es:[03c8h]
mov    bx,es:[03c4h]
mov    cx,ax
mul    _03c5
shl cl,3
add    ch,cl
add    dx,cx
add    dx,bx
shl bx,2
add    si,di
add    dx,bx
add    dh,bl
mov    cl,05
shr    di,03
shl    bx,cl
add    dh,bl
shl    si,04
add    ax,0001
adc    dx,0000
mov    es:[03c8h],ax
mov    es:[03c4h],dx
pop    cx
pop    bx
;pop    ax
push   bx
add    bh,dh
mov    bl,bh
xor    es:[03c8h],bx
pop    bx
pop    es
ret    0004

int0:
add    sp,0004
popf
cmp    si,0551h
jne WeHaventReached551Yet
WeHaveReached551:
;now we have the correct XOR value in bh, let's do again some DEXORing
call _LodsbToBufferFromNewFile ;we are still reading from the TARGET file
;and step back!
push bx
mov bx,FHandleOut
mov ax,4201h ;relative seek (-1 byte)
mov dx,0ffffh
mov cx,0ffffh
int 21h
pop bx
mov bl,Buff
push ax
mov ah,bl
xor ah,bh
mov Buff,ah
pop ax
call _StosbFromBuffer
mov    bh,bl
push   si
push   di
mov    si,_053eSIValue
mov    di,_0541DIValue
MOV    DX,0497H
int    0f3h
pop    di
pop    si
add    bh,dh
inc    si
cmp    si,0a00h ;it's quite high, but at least we don't have to pay attention to SP-conversions
jb WeHaveReached551

; now the third phase- to decompress the main file itself
; First, we have to grab the begin XOR values:
SeekAndWordReadFromTargetFile 0647h,OrigIP
SeekAndWordReadFromTargetFile 0649h,OrigSP
SeekAndWordReadFromTargetFile 0654h,_0754
; now read the original EXEsize to be decrypted
SeekAndWordReadFromTargetFile 064dh,_0752
mov ax,_0752
mov word ptr _074d,ax ;the lower word is ready
mov word ptr FileSize,ax ;the lower word is ready
SeekAndWordReadFromTargetFile 064fh,_0752
mov ax,_0752
mov word ptr [_074d+2],ax ;the upper word is ready
mov word ptr [FileSize+2],ax ;the lower word is ready

call _LodsbToBufferFromNewFile
mov al, Buff
mov _0751,al  ;begin XOR value

SeekAndWordReadFromTargetFile 0652h,_0752
mov _075a,1 ;counter to jump to an INT3 (permanent 1 as well, even if it has an own data byte)

mov	bx, FHandleOut
mov	ax, 3e00h ;close outfile
int	21h

mov ah,4ch
int 21h

mov	dx, offset FNameOut ;we can record the final version of the decoded
                            ; EXE! So we simply create a new file with the
                            ; same name with the prev. tempfile
mov	ax, 3c00h ;create
mov	cx,0
int	21h
mov	byte ptr [FHandleOut], al

mov	bx, FHandleIn
SeekTo 0

;COPYING the EXE header (2 paragraphs)
mov   cx,20h
ReadTheHeader:
call _LodsbToBuffer
call _StosbFromBuffer
dec cx
jne ReadTheHeader

xor    di,di
;mov    es,di
mov    ax,_0752
mov    es:[03c8h],ax
mov    ax,_0754
mov    es:[03c4h],ax
cld
mov    ah,_0751
LetsRestoreTheNextNext32k: cmp    dword ptr _074D,00008000h
jbe    LessThan32kLetsUnpackItInOneTurn
mov    cx,8000h
ReadTheNextByteFromTheINFECTEDFile: call _LodsbToBuffer
mov al,Buff
xor Buff,ah
mov    ah,al
dec    byte ptr _075A  ;decrementing the when-we-have-to-put-an-int03-routine-in counter
je     OkInsertAnInt3Call
add    ah,0Dh
ReturnFromCallingInt3:
call _StosbFromBuffer
loop ReadTheNextByteFromTheINFECTEDFile

sub    dword ptr _074D,00008000h
jmp    LetsRestoreTheNextNext32k

OkInsertAnInt3Call:
push   ax
mov    bx,ax
mov    si,4647h
mov    di,0000
mov    dx,05C1h
int    0f3h
pop    ax
add    ah,dh
mov    byte ptr _075A,20h ;we call int 3 quite rarely, due to speed problems
jmp    ReturnFromCallingInt3

LessThan32kLetsUnpackItInOneTurn:
mov    cx,word ptr _074D
ReadTheNextByteFromTheINFECTEDFile_2: call _LodsbToBuffer
mov al,Buff
xor Buff,ah
mov    ah,al
dec    byte ptr _075A  ;decrementing the when-we-have-to-put-an-int03-routine-in counter
je     OkInsertAnInt3Call_2
add    ah,0bh
ReturnFromCallingInt3_2:
call _StosbFromBuffer
loop ReadTheNextByteFromTheINFECTEDFile_2

;storing the original SS:SP in the final, uncoded file
mov bx, FHandleOut
SeekTo 14d
mov ax,OrigSS
mov Buff,al
call _StosbFromBuffer
mov Buff,ah
call _StosbFromBuffer ;send out SS
mov ax,OrigSP
mov Buff,al
call _StosbFromBuffer
mov Buff,ah
call _StosbFromBuffer ;send out SP

;write out the real CS:IP
SeekTo 20d
mov ax,OrigIP
mov Buff,al
call _StosbFromBuffer
mov Buff,ah
call _StosbFromBuffer
mov ax,OrigCS
mov Buff,al
call _StosbFromBuffer
mov Buff,ah
call _StosbFromBuffer

;counting abd storing the new DOS size in the header
SeekTo 2 ;pointing to MOD and DIV-word in the header
mov ax,word ptr [FileSize]
add ax,20h ;we'll calculate in the header size as well
mov dx,word ptr [FileSize+2] ;upper word
mov bx,512d
div bx
inc ax ;we have (Exesize div 512)+1 in AX, while MOD is in DX
mov Buff,dl
call _StosbFromBuffer
mov Buff,dh
call _StosbFromBuffer ;sending out DIV
mov Buff,al
call _StosbFromBuffer
mov Buff,ah
call _StosbFromBuffer

mov	bx, FHandleOut
mov	ax, 3e00h ;close outfile
int	21h
mov ah,4ch
int 21h

OkInsertAnInt3Call_2:
push   ax
mov    bx,ax
mov    si,4647h
mov    di,0000
mov    dx,05C1h
int    0f3h
pop    ax
add    ah,dh
mov    byte ptr _075A,20h ;we call int 3 quite rarely, due to speed problems
jmp    ReturnFromCallingInt3_2

;getting 8 byte from 01e0 to 01e8 and putting it in our OWN decryptor routine!
GrabCode1: mov bp,0
mov ax,4200h
mov edx,20h+CSWithAntidebugRtn*16+100h+00e0h
mov TempDD,edx
mov cx, word ptr [TempDD+2]
int 21h
GettingTheNextCodeByte: call _LodsbToBuffer
mov al,Buff ;reading the LOW byte
mov byte ptr [ChangeableCode+bp],al
inc bp
cmp bp,8
jne GettingTheNextCodeByte
ret

_LodsbToBuffer: ;read the next byte from the SOURCE file
pusha
mov	bx,FHandleIn
mov	ax,3f00h
mov	cx,0001
mov	dx,offset Buff
int	21h
popa
ret

_LodsbToBufferFromNewFile: ;read the next byte from the TARGET file
pusha
mov	bx,FHandleOut
mov	ax,3f00h
mov	cx,0001
mov	dx,offset Buff
int	21h
popa
ret

_StosbFromBuffer: ;write Buff to the TARGET file
pusha
mov	bx,FHandleOut
mov	ax,4000h
mov	cx,0001
mov	dx,offset Buff
int	21h
popa
ret

end