2-bit transfer protocol in an IRQ-loader by Lasse Öörni
-------------------------------------------------------
This is a continuation of the previous IRQ-loader rant, that focused on a 1-bit
transfer protocol and almost seemed to be against 2-bit fastloaders :) The
loader discussed here is very much like the previous one, so only the changes
in the protocol are explained in detail.
2-bit transfer, as mentioned in the previous rant, requires good synchronization
between the C64 and diskdrive. Once the transfer of a byte is fired up, the
1541 sends 2 bits at a time on both the CLK & DATA lines of the serial bus
without any waiting or handshaking.
What gives problems is that synchronization between the C64 & diskdrive can't
be cycle-exact; usually the diskdrive is waiting in a loop for the C64 to reply
before firing up the strictly timed transfer, so the timing might be off as
much as the length of this waiting loop.
The other problem is that the diskdrive CPU runs at 1Mhz, but the C64 runs
either slower (PAL) or faster (NTSC). This has to be taken into account also.
In fact it seems that when the transfer routine is coded in a certain way, the
difference between correct PAL & NTSC timing is only 1 clock cycle on the C64
side; the disk drive code doesn't have to be adjusted at all.
The drawbacks with this kind of 2-bit transfer are:
- Interrupts will be disabled during the transfer of a byte, so any raster
effects might be displaced.
- Sprites are not allowed onscreen (would steal CPU cycles and mess with the
timing.)
- Hitting RESTORE while loading will cause an NMI interrupt and also mess the
timing. There are two ways I can think of to handle this: Extend the transfer
protocol to have a resend option (the NMI interrupt will set a resend flag) or
disable NMIs by triggering a one-shot CIA2 interrupt but never acknowledging
the NMI. None of these are in use in the loader, it will just load incorrectly
if NMIs are occurring.
I thank Marko Mäkelä for his C64->diskdrive asynchronous protocol, drive init
code and original main drive loop, K.M/TABOO for inspiration on the badline
detection and MuOn for testing the transfer routines on a real NTSC machine.
The 2bit transfer code in this loader is coded by me but heavily inspired by
various game loadersystems; for example Technocop.
2-bit loader disk image and source
;COVERT BITOPS 2bit fastloader for Rant #9
;Based on:
;Marko Mäkelä's IRQ-loader
;- Drive init code
;- Drive main loop
;- Asynchronous C64->drive communication
;Technocop loader and other game loadersystems
;- Drive->C64 2-bit communication
;K.M/Taboo's 2bit loaders
;- Badline detection
status = $90 ;Kernal zeropage variables
messages = $9d
fa = $ba
bufferstatus = $02
stackptrstore = $03
ciout = $ffa8 ;Kernal routines
listen = $ffb1
second = $ff93
unlsn = $ffae
acptr = $ffa5
chkin = $ffc6
chkout = $ffc9
chrin = $ffcf
chrout = $ffd2
ciout = $ffa8
close = $ffc3
open = $ffc0
setmsg = $ff90
setnam = $ffbd
setlfs = $ffba
clrchn = $ffcc
getin = $ffe4
load = $ffd5
save = $ffd8
processor 6502
org 2049
;Example main program. Inits the fastloader and loads a file using it. After-
;wards the drive can be used normally.
sys: dc.b $0b,$08 ;Address of next instruction
dc.b $0a,$00 ;Line number(10)
dc.b $9e ;SYS-token
dc.b $32,$30,$36,$31 ;2061 as ASCII
dc.b $00
dc.b $00,$00 ;Instruction address 0 terminates
;the basic program
start: jsr initfastload
jsr initmusicplayback ;Now that we can play music while
ldx #"D" ;loading, let's not forget it :-)
ldy #"A"
jsr fastload
jsr stopmusicplayback
rts
initmusicplayback:
sei
lda #<raster
sta $0314
lda #>raster
sta $0315
lda #50 ;Set low bits of raster
sta $d012 ;position
lda $d011
and #$7f ;Set high bit of raster
sta $d011 ;position (0)
lda #$7f ;Set timer interrupt off
sta $dc0d
lda #$01 ;Set raster interrupt on
sta $d01a
lda $dc0d ;Acknowledge timer interrupt
lda #$00
jsr $1000
cli
rts
stopmusicplayback:
sei
lda #<$ea31
sta $0314
lda #>$ea31
sta $0315
lda #$00
sta $d01a
lda #$81
sta $dc0d
inc $d019
lda #$00
sta $d418
cli
rts
raster: inc $d020
jsr $1003
dec $d020
dec $d019
jmp $ea31
;INITFASTLOAD
;
;Uploads the fastloader to disk drive memory and starts it.
;This routine is Marko Mäkelä's work, except for the 2-bit transfer
;preparations.
;
;Parameters: -
;Returns: -
;Modifies: A,X,Y
AMOUNT = 32 ;Bytes in one M-W command
The fastloader initialization code starts with PAL/NTSC detection. I didn't
want to rely on the value of $02a6 memory location so I implemented it with
raster-line based detection. This code measures the highest rasterline number
on the screen, and draws conclusions from that.
initfastload: sei
lda #$00
il_detectntsc: ldx $d011 ;Get the biggest rasterline in the
bmi il_detectntsc ;area >= 256 to detect NTSC/PAL
il_detectntsc2: ldx $d011
bpl il_detectntsc2
il_detectntsc3: cmp $d012
bcs il_detectntsc4
lda $d012
il_detectntsc4: ldx $d011
bmi il_detectntsc3
cli
cmp #$20 ;PAL has 312 lines, but this check is
bcc il_isntsc ;somewhere in the middle :)
For a PAL machine, the BNE instruction in the getbyte delay code (3 cycles,
takes the branch) is replaced with a BEQ instruction (2 cycles, doesn't take
the branch)
lda #$f0 ;Adjust 2-bit fastload transfer
sta fastload_delay ;delay for PAL
The rest of the initialization is like in the 1-bit loader.
il_isntsc: lda #<drvprog ;Initialize selfmodifying code
sta il_mwbyte+1
lda #>drvprog
sta il_mwbyte+2
lda #<drive
sta mwcmd+2
lda #>drive
sta mwcmd+1
il_mwloop: jsr il_device ;Set drive to listen
ldx #lmwcmd - 1
il_sendmw: lda mwcmd,x ;Send M-W command
jsr ciout
dex
bpl il_sendmw
ldx #0
il_mwbyte: lda drvprog,x ;Send AMOUNT bytes of drive
jsr ciout ;code
inx
cpx #AMOUNT
bne il_mwbyte
jsr unlsn ;Unlisten starts the command
lda mwcmd+2
clc
adc #AMOUNT
sta mwcmd+2
bcc il_nohigh
inc mwcmd+1
il_nohigh: lda il_mwbyte+1
clc ;Move pointers
adc #AMOUNT
sta il_mwbyte+1
tax
bcc il_nohigh2
inc il_mwbyte+2
il_nohigh2: lda il_mwbyte+2
cpx #<drvprogend
sbc #>drvprogend
bcc il_mwloop
jsr il_device ;Set drive to listen again
ldx #lmecmd - 1
il_sendme: lda mecmd,x ;Send M-E command
jsr ciout
dex
bpl il_sendme
jmp unlsn ;Unlisten starts the command
il_device: lda fa
jsr listen
lda #$6f
jmp second
;FASTLOAD
;
;Loads a file with fastloader. INITFASTLOAD must have been called first.
;Any normal KERNAL disk operations will cause the fastloader drive code to
;exit (as ATN line goes low) and after that, INITFASTLOAD has to be called
;again.
;
;Parameters: X: First letter of filename, Y: Second letter of filename
;Returns: C=0 OK, C=1 error
;Modifies: A,X,Y
fastload: stx filename
sty filename+1
sta $d07a ;SCPU to slow mode
tsx ;Store stackpointer, needed when
stx stackptrstore ;finishing loading
ldx #$01 ;Byte counter.
fastload_sendouter:
ldy #$08 ;Bit counter
fastload_sendinner:
bit $dd00 ;Wait for CLK & DATA high
bvc fastload_sendinner
bpl fastload_sendinner
lsr filename,x ;Rotate byte to be sent
lda $dd00
and #$ff-$30
ora #$10
bcc fastload_zerobit
eor #$30
fastload_zerobit:
sta $dd00
lda #$c0 ;Wait for CLK & DATA low
fastload_sendack:
bit $dd00
bne fastload_sendack
lda $dd00
and #$ff-$30 ;Set DATA and CLK high
sta $dd00
dey
bne fastload_sendinner
dex ;All bytes sent?
bpl fastload_sendouter
Here something has changed. In this protocol the disk drive uses DATA=low to
signal that data is available; in idle state (reading a sector etc.) both CLK
and DATA lines are high. So, we don't need a delay. In the 1-bit loader the
disk drive pulled DATA low to signal that data was NOT available; it had to be
ensured that the disk drive had time to do this.
lda #$00 ;Initialize buffer counter
sta bufferstatus
jsr fastload_getbyte ;Get file start address
sta fastload_sta+1
jsr fastload_getbyte
sta fastload_sta+2
fastload_loop: jsr fastload_getbyte ;Then get bytes one by one. Getbyte
fastload_sta: sta $1000 ;routine exits when all have been
inc $d020 ;received
dec $d020 ;"Loading effect" :)
inc fastload_sta+1
bne fastload_loop
inc fastload_sta+2
jmp fastload_loop
fastload_getbyte:
ldx bufferstatus ;Bytes still in buffer?
beq fastload_fillbuffer
lda loadbuffer-1,x
dex
stx bufferstatus
rts
fastload_fillbuffer:
jsr fastload_get ;Get number of bytes to transfer
cmp #$01 ;$00 indicates successful end of load
bcc fastload_loadend ;and $01 an error
beq fastload_loadend ;Carry is set already (error sign)
sbc #$01 ;Carry is 1 here
sta bufferstatus ;Store buffer length to bytecounter
ldx #$00
fastload_gnbloop:
jsr fastload_get ;Get the buffer byte by byte
sta loadbuffer,x
inx
cpx bufferstatus
bcc fastload_gnbloop
bcs fastload_getbyte
fastload_loadend:
ldx stackptrstore ;Restore stackpointer & exit loader
txs
sta $d07b ;SCPU to fast mode
rts
Here is the new getbyte routine for 2-bit transfer. It starts with waiting for
the disk drive to pull DATA low.
fastload_get: bit $dd00 ;Wait for 1541 to signal data ready by
bmi fastload_get ;setting DATA low
After that, the badline waiting.
sei
fastload_waitbadline:
lda $d011
clc ;Wait until a badline won't distract
sbc $d012 ;the timing
and #$07
beq fastload_waitbadline
Now that we're certain, that a bad line won't disturb us for a while, we can
begin the actual byte transfer. We pull CLK low to signal the disk drive that
we want to receive a byte. From here onwards timing is very important!
lda $dd00
ora #$10
sta $dd00 ;Set CLK low
After CLK has been pulled low, there has to be 14 clock cycles delay for PAL
and 15 cycles for NTSC (determined experimentally), before we start reading
the data bits. At the end of this delay, we set CLK back high so that we can
"see" what the disk drive is putting on the CLK line.
fastload_delay: bne fastload_delay2 ;This will be 3 cycles
fastload_delay2:nop
and #$03
sta fastload_eor+1
sta $dd00 ;Set CLK high to be able to read the
And here comes the highly optimized :) byte receiving, 2 bits at a time.
The corresponding sending code on the disk drive side has the same amount
of cycles, except...
lda $dd00 ;bits the diskdrive sends
lsr
lsr
eor $dd00
lsr
lsr
eor $dd00
lsr
lsr
...for this EOR instruction. This is to ensure NTSC machines won't go ahead
of the disk drive. On the other hand, the disk drive will soon set CLK & DATA
back high, marking a return to idle state, so we can't be too slow either in
grabbing the last bits. The EOR is necessary because the video bank bits are
present in the lowest 2 bits of $dd00.
fastload_eor: eor #$00
eor $dd00
cli
rts
;DRVPROG - Code executed in the disk drive.
RETRIES = 5 ;Amount of retries when reading a sector
acsbf = $01 ;Buffer 1 command
trkbf = $08 ;Buffer 1 track
sctbf = $09 ;Buffer 1 sector
iddrv0 = $12 ;Disk drive ID
id = $16 ;Disk ID
datbf = $14 ;Temp variable
buf = $0400 ;Sector data buffer
drvprog: ;Address in C64's memory
rorg $0500 ;Address in diskdrive's memory
drive: cli ;Enable interrupts while waiting the first byte
jsr getbyte ;(to allow motor to stop)
sta namecmp2+1
sei ;Disable interrupts now
jsr getbyte
sta namecmp1+1
Also, now the readsect subroutine takes the track & sector in X & Y registers,
instead of them having to be stored on the zeropage by the caller.
ldx #18
ldy #1 ;Read disk directory
dirloop: jsr readsect ;Read sector
bcc error ;If failed, return error code
ldy #$02
nextfile: lda buf,y ;File type must be PRG
and #$83
cmp #$82
bne notfound
lda buf+3,y ;Check first letter
namecmp1: cmp #$00
bne notfound
lda buf+4,y ;Check second letter
namecmp2: cmp #$00
beq found
notfound: tya
clc
adc #$20
tay
bcc nextfile
ldy buf+1 ;Go to next directory block, go on until no
ldx buf ;more directory blocks
bne dirloop
error: ldx #$01 ;Send $01 - error in loading file
loadend: txa
jsr sendbyte
jmp drive ;Go back to wait for the filename
found: iny
nextsect: ldx buf,y ;File found, get starting track & sector
beq loadend ;If at file's end, send byte $00
lda buf+1,y
tay
jsr readsect ;Read the data sector
bcc error
ldy #$ff ;Amount of bytes to send - assume $ff
lda buf
bne sendblk
ldy buf+1 ;Possibly less if it's the last block
sendblk: tya
sendloop: jsr sendbyte ;Send the amount of bytes that will be sent
lda buf,y ;Send the sector data in reverse order
dey
bne sendloop
beq nextsect
readsect: stx trkbf
sty sctbf
ldy #RETRIES ;Retry counter
jsr success ;Turn on led
retry: lda #$80
sta acsbf ;Command:read sector
Here is the key to getting good loading speeds. Interrupts must only be enabled
when the command is already waiting in the command register (an interrupt has
probably been pending while we've been sending the last sector and now it will
be executed right after the CLI instruction, so sector reading commences as
fast as it can.)
cli
poll1: lda acsbf ;Wait until ready
bmi poll1
sei
cmp #1
beq success ;Also sets carry flag to 1
lda id ;Check for disk ID change
sta iddrv0
lda id+1
sta iddrv0+1
dey ;Decrease retry counter
bne retry
failure: clc
success: lda $1c00
eor #$08
sta $1c00
rts
And here's the disk drive side of the 2-bit transfer routine. It relies on a
table to convert 4 bits at a time to the CLK & DATA signals (a byte can be
shifted left to get the second bit pair of a nybble)
sendbyte: sta datbf
lsr
lsr
lsr
lsr
The DATA=low must not be set until the disk drive really is ready to send a
byte, because the C64 will not wait after that.
ldx #$02 ;Set DATA=low
stx $1800
tax
lda sendtbl,x ;Get the CLK,DATA pairs for the low nybble
pha
lda datbf
and #$0f
tax
Here, wait for CLK to go low.
lda #$04
sendbyte_wait: bit $1800 ;Wait for CLK=low
beq sendbyte_wait
And start the bit-pair sending. 8 clock cycles per pair, just like on the C64
side.
lda sendtbl,x ;Get the CLK,DATA pairs for the high nybble
sta $1800
asl
and #$0f
sta $1800
pla
sta $1800
asl
and #$0f
sta $1800
Then, after some delay, set the CLK & DATA lines back to high state.
nop
nop
lda #$00 ;After a suitable delay, reset both CLK & DATA
sta $1800 ;high
rts
sendtbl: dc.b $0f,$07,$0d,$05
dc.b $0b,$03,$09,$01
dc.b $0e,$06,$0c,$04
dc.b $0a,$02,$08,$00
getbyte: ldy #8 ;Counter: receive 8 bits
recvbit:
lda #$85
and $1800 ;Wait for CLK==low || DATA==low
bmi gotatn ;Quit if ATN was asserted
beq recvbit
lsr ;Read the data bit
lda #2 ;Prepare for CLK=high, DATA=low
bcc rskip
lda #8 ;Prepare for CLK=low, DATA=high
rskip: sta $1800 ;Acknowledge the bit received
ror datbf ;and store it
rwait: lda $1800 ;Wait for CLK==high || DATA==high
and #5
eor #5
beq rwait
lda #0
sta $1800 ;Set CLK=DATA=high
dey
bne recvbit ;Loop until all bits have been received
lda datbf ;Return the data to A
rts
gotatn: pla ;If ATN gets asserted, exit to the operating
pla ;system. Discard the return address.
rts
rend
drvprogend:
mwcmd: dc.b AMOUNT,>drive,<drive,"W-M"
lmwcmd = . - mwcmd
mecmd: dc.b >drive,<drive,"E-M"
lmecmd = . - mecmd
Filename buffer, sector buffer and music data.
filename: dc.b 0,0
loadbuffer: dc.b 254,0
org $1000
incbin music.bin
With the standard sector interleave of 10, this loader achieves about 5x
loading speed compared to the KERNAL routines. Going below that interleave
results in a drop in loading speed as the disk has to spin one more round. The
next step for more speed is rewriting the sector read routine at least
partially, but that is totally outside my knowledge :)
So, here ends the explanation of this 2-bit loader. Remember to do RESTORE
protection in actual production code :)
Lasse Öörni
loorni@student.oulu.fi