List of StrongARM instruction execution cycles! 2000/06/05 ----------------------------------------------- mrh/icb Heyho! Welcome to the first *valid* list of StrongARM instruction execution cycles! This list was entirely compiled using testing results - no information from 'official' ART or ARM announcements was used. And - gosh! - nearly all values differ from the officially announced ones! In fact most instructions execute slower than stated by ART/ARM. So it seems to me that these official value shall boost the SA sales?! Well, I may be wrong... tested and written by _ _ ____ __ __ _ _ / ^ \/ - _> / \_/____\__> <__/__>__/__<__< of iCEBiRD e-mail: bawa@thepentagon.com instruction syntax type ADC Rd,Rn,Op2............1 ADD Rd,Rn,Op2............1 AND Rd,Rn,Op2............1 B address.................6 BL address.................6 BIC Rd,Rn,Op2............1 CMN

Rn,Op2...............1 CMP

Rn,Op2...............1 EOR Rd,Rn,Op2............1 LDM Rn,{Rlist}<^>..4 LDR Rd,adr.......2 MLA Rd,Rm,Rs,Rn..........8 MOV Rd,Op2...............1 MUL Rd,Rm,Rs.............8 MRS Rd,................10 MSR ,Rm.............10 MVN Rd,Op2...............1 ORR Rd,Rn,Op2............1 RSB Rd,Rn,Op2............1 RSC Rd,Rn,Op2............1 SBC Rd,Rn,Op2............1 SMLA Rl,Rh,Rm,Rn.........11 SMUL Rl,Rh,Rm,Rn.........11 STM Rn,{Rlist}<^>..5 STR Rd,adr.............3 SUB Rd,Rn,Op2............1 SWI number..................7 SWP Rd,Rn,[Rn]..............9 TEQ

Rn,Op2...............1 TST

Rn,Op2...............1 UMLA Rl,Rh,Rm,Rn.........11 UMUL Rl,Rh,Rm,Rn.........11 legend Rx : plain register (R0-R14, PC) without shift Rd : destination register for operations Rh : high word of 64 bit MUL result Rl : low word of 64 bit MUL result Rlist : registerlist of LDM/STM instructions Rs,Rn : second factor-register in MUL instructions (-> MUL Rd,Rm,Rs) Op2 : immediate constant, plain register or shifted register - All execution cycles given are valid for cached instructions and data, only. - All instructions with a 'false' condition code take 1 cycle. type special cases examples SA cycles ARM250 -------------------------------------------------------------------------------- 1 * s=1 if register controled shift, ADD R0,R0,R2,LSL #4 1+s 1+s s=0 otherwise * P condition used TEQP R0,#0 3+s 1 * Rd=PC, S condition used MOVS PC,R14 4+s 4+s ADDS PC,PC,R4,LSL #2 * MOV PC,Rx (Rx=reg. without shift) MOV PC,R14 2+p 4 p=2: Rx changed in previous cycle p=1: Rx constant since 1 cycle p=0: Rx constant since >=2 cycles * Rd=PC and is calculated by this MOV PC,R14,LSL #2 3+s 4 instruction in some way SUB PC,PC,#44 -------------------------------------------------------------------------------- 2 * f=1 if Rd is needed in next instr. LDR R4,[R2,#32]! 1+f+e (cache) 4 f=0 otherwise e=1 if LDRSB/LDRSH sign-extension e=0 otherwise * Rd=PC LDR PC,[R2,R4 LSL #2] 4 (cache) 7 -------------------------------------------------------------------------------- 3 * - STR R4,[R3,R2,LSL #2] 1 (writebuffer)4 -------------------------------------------------------------------------------- 4 * n=number of registers in Rlist LDMIA R0,{R0-R4} f+n (cache) 3+n f=1 if last register loaded is needed in next instruction f=0 otherwise * n=1 (only 1 register is loaded) LDMDB R0,{R4} 2 (cache) 4 * ^ condition for userbank register LDMIA R0,{R13}^ 2+n (cache) 3+n load is used * Rlist includes PC LDMFD R13!,{R10,PC}^ 3+n (cache) 6+n -------------------------------------------------------------------------------- 5 * n=number of registers in Rlist STMFD R13!,{R0-R1} n+u (writebuffer) u=2 if ^ condition used for 3+n storing user bank registers u=0 otherwise 4 * n=1 (only 1 register is stored) STMDB R0,{R4} 2+u (writebuffer) -------------------------------------------------------------------------------- 6 * - BL &80AC 2 4 -------------------------------------------------------------------------------- 7 * - SWI &42 ? 4 -------------------------------------------------------------------------------- 8 * f=1 if Rd is needed in next instr. MLA R0,R1,R2,R0 x+f+s 1-17 (exception: see note #1) or S condition is used or next instruction is any multiplication f=0 otherwise x=1 if ABS(Rs) in range &00000000-&000007FF x=2 if ABS(Rs) in range &00000800-&007FFFFF x=3 if ABS(Rs) in range &00800000-&7FFFFFFF s=2 if S condition used s=0 otherwise * Rd=PC MUL PC,R2,R2 4+x -------------------------------------------------------------------------------- 9 * SWP works, but does not use write- SWP R0,R1,[R3] ? (>100) 5 backbuffers, therefore it is extremely slow. :( -------------------------------------------------------------------------------- 10 * - MRS R3,SPSR_all 1 - -------------------------------------------------------------------------------- 11 * f=1 if Rh is needed in next instr. UMUL R3,R4,R5,R6 1+x+f+s - (exception: see note #1 and #2) or S condition is used f=0 otherwise x=1 if ABS(Rn) in range &00000000-&000007FF x=2 if ABS(Rn) in range &00000800-&007FFFFF x=3 if ABS(Rn) in range &00800000-&7FFFFFFF s=2 if S condition used s=0 otherwise -------------------------------------------------------------------------------- - All execution cycles given are valid for cached instructions, only. - All instructions with a 'false' condition code take 1 cycle. note #1:If next instruction is a type 1 instruction with register controlled shift and Rd/Rh is not the shift-control register then f=0. Some code for better understanding: assuming ABS(R2) in range &0-&7FF -> fastest MUL execution case 1: MUL R0,R1,R2 ; 2 cycles (x=1,f=1,s=0) MOV R2,R2,LSL R0 ; 2 cycles case 2: MUL R0,R1,R2 ; 1 cycle (x=1,f=0,s=0) MOV R2,R0,LSL R2 ; 2 cycles note #2:If next instruction is a 64 bit multiplication (SMUL,UMUL,SMLA,UMLA) and Rh is involved as multiplier then f=0. assuming ABS(R2) in range &0-&7FF -> fastest MUL execution SMUL R0,R1,R2,R2 ; 2 cycles (x=1, f=0(!), s=0) SMLA R0,R1,R2,R2