Category:6502 Assembly: Difference between revisions

m
Fixed new syntax for lang
m (→‎Know Your Opcodes: clarification)
m (Fixed new syntax for lang)
 
(9 intermediate revisions by the same user not shown)
Line 24:
The 6502 has much fewer registers than its contemporaries, and as such the zero page is useful as a set of "registers," albeit slower. The 6502 is also limited in its stack operations, as it cannot push X or Y onto the stack directly, and must destroy the accumulator in order to do so. This creates a problem when a function needs to preserve multiple registers yet takes its input from the accumulator. The easiest solution is to use a zero page memory address to preserve the accumulator and the stack for X and Y. (Or vice versa.)
 
On the 65816, the zero page is called the "direct page," and it can be relocated. The 65816's D register points to the direct page. The location of the direct page can be changed at runtime using the <code>TCD</code> command. This feature lets the programmer set up different pages of RAM for different tasks, and switch the direct page to that page temporarily to speed up that task. Unfortunately, this also makes it very difficult to read someone else's assembly and figure out what they're actually doing, as it's not clear what memory addresses they're actually loading from.
 
==Little-Endian==
Line 50:
 
 
* The contents of a port can be updated by the hardware. Reading a port will not always return the same value each time it is read, even if it is never written to, and even if the value is not altered by the read itself. It is not the 6502 changing the contents of these ports, but rather the connected hardware. For example, [http://www.lang="6502asm".com/ lang="6502asm"] and [https://skilldrick.github.io/easy6502/ Easy6502] have two memory-mapped ports in the zero page. <code>$FE</code> returns a random 8-bit value when read, and <code>$FF</code> returns the last key pressed when read, acting as a keyboard input buffer. These ports can be read from and written to, but their values can also change independently of any code the user writes.
 
 
Line 56:
 
==Interrupts==
The 6502 has two interrupt types: <code>NMI</code> (Non-Maskable Interrupt) and <code>IRQ</code>(Interrupt Request). 6502 machines use the last 6 bytes of their address space to hold a vector table containing (in order) the addresses of the NMI routine, the program's start, and the IRQ routine. On most computers this is defined by the firmware, but on the NES or other similar embedded hardware you will need to declare these locations yourself.
 
As the name implies, the Non-Maskable Interrupt is one that can occur regardless of whether the processor has interrupts disabled. In other words, the <code>SEI</code> and <code>CLI</code> commands <i>cannot enable or disable the NMI</i>. The name "Non-Maskable" is a bit of a misnomer; while it's true that the 6502 cannot prevent <code>NMI</code> from occurring, the source of the <code>NMI</code> signal can still be disconnected, effectively preventing its occurrence. For example, on the NES, the <code>NMI</code> occurs every 1/60th of a second and only if bit 7 of memory address $2000 is set. If this bit is clear, no <code>NMI</code>. For a given hardware, the <code>NMI</code> comes from exactly one source, since an <code>NMI</code> cannot be detected during an <code>NMI</code>.
Line 70:
 
==A True 8-Bit Computer==
The 6502 is an 8-bit computer in the purest sense. Unlike the Z80, the 6502 is not capable of 16 bit operations within a single register. To work with a 16 bit number you will need to split it in two and work with each half individually. The carry flag is useful for this, as (like on other CPUs with a carry flag) it acts as a conditional addition., as in the example below.
 
<syntaxhighlight lang="C">unsigned short foo = 0x00C0;
foo = foo + 0x50;</syntaxhighlight>
 
Equivalent 6502 Assembly:
<syntaxhighlight lang="6502asm">LDA #$C0
STA $20 ;we'll use $20 as the memory location of foo, just to keep things simple. A real C compiler would use the stack.
LDA #$00
STA $21 ;low byte was #$C0, high byte was #$00
 
;now we add #$50
 
LDA $20 ;load #$C0
CLC
ADC #$50
STA $20
 
LDA $21
;this time we DON'T clear the carry before adding.
ADC #0 ;since there's a carry from the last addition, this actually adds 1! If there was no carry, it would add 0.
STA $21</syntaxhighlight>
 
==Processor Flags==
Line 165 ⟶ 184:
If an addition results in a wraparound from 255 to 0, the carry will be set. If the carry flag is set, the <code>ADC</code> instruction adds an additional 1 to the accumulator. In the example below, the labels <code>numLO</code> and <code>numHI</code> represent zero-page memory addresses, storing the 8 bit halves of a 16-bit variable. Also assume that <code>numLO</code> equals hexadecimal value F0 and <code>numHI</code> equals 03.
 
<langsyntaxhighlight lang="6502asm">
LDA numLO ;load #$F0 into the accumulator
CLC ;clear the carry
Line 173 ⟶ 192:
ADC #$00 ;add just the carry to the accumulator. If the carry flag is clear, the accumulator is unchanged.
;if the carry is set, the accumulator increases by 1.
STA numHI</langsyntaxhighlight>
 
The beauty of the above code is that its functionality doesn't result in an off-by-one error if the carry were not set by the first addition. In other words, if the addition of <code>numLO</code> and <code>#$10</code> didn't result in a wraparound, then the carry would not be set and the <code>ADC #$00</code> would leave <code>numHI</code> unchanged. This lets the programmer conditionally add 1 to the high byte based on the previous calculation, without having to branch.
Line 180 ⟶ 199:
==Decimal Mode==
The 8086, 68000, and z80 have special commands for Binary Coded Decimal math, where hex values are used to represent decimal numbers (the base 10 system we use, not to be confused with floating point.) The 6502 has a special Decimal Flag as part of its status register. If the Decimal Flag is set, instructions such as <code>ADC</code> and <code>SBC</code> will produce a result that is a valid decimal number (i.e. not containing digits A through F). The Decimal Flag is only affected by the two commands responsible for setting and clearing it, as well as interrupts on certain 6502 revisions.
<langsyntaxhighlight 6502>sed ;set the decimal flag, enabling decimal mode
lda #$19
clc
adc #$01 ;now the value in the accumulator equals #$20 rather than #$1A
cld ;resume normal operations</langsyntaxhighlight>
 
A few notes on Decimal Mode:
Line 196 ⟶ 215:
===Implied===
Some commands have no operands at all, or if none is given, the operand is assumed to be the accumulator.
<langsyntaxhighlight lang="6502asm">RTS ;return from subroutine, no operand needed.
ASL ;if no operand supplied, the accumulator is used. Some assemblers require you to type "ASL A" but others do not.</langsyntaxhighlight>
 
===Immediate===
A constant value is directly used as the argument for a command.
<langsyntaxhighlight lang="6502asm">LDA #3 ;load the number 3 into the accumulator
AND #%10000000 ;bitwise AND the binary value 1000 0000 with the value in the accumulator
SBC #$30 ;subtract hexadecimal 0x30 from the accumulator. If the carry flag is clear, also subtract 1 after that.</langsyntaxhighlight>
 
===Zero Page===
Line 209 ⟶ 228:
 
For these examples, assume that the zero page memory address $05 contains #$40 (hexadecimal 0x40).
<langsyntaxhighlight 6502>LDA $05 ;dereferences to whatever is stored at $05, in this case, #$40. #$40 is loaded into the accumulator.
ADC $05 ;add the value stored at address $05 to whatever is stored in the accumulator. If the carry flag is set, add 1 to the result.
ROR $05 ;rotate right the bits of the value stored at memory address $05. The value stored there changes from #$40 to #$20.</langsyntaxhighlight>
 
===Absolute===
A memory address stored outside the zero page is used as the argument for a command. This is slower and takes longer than the zero page. However, there are still certain things that absolute addressing is needed to do, such as jumping and reading/writing to or from memory-mapped ports.
 
<langsyntaxhighlight lang="6502asm">JMP $8000 ;move the program counter to address $8000. Execution resumes there.
STA $2007 ;store the value in the accumulator into address $2007 (this is the memory-mapped port on the NES for background graphics)</langsyntaxhighlight>
 
===Zero Page Offset By X/Y===
A zero page memory address offset by X or Y. The value in X or Y is added to the supplied address, and the resulting address is used as the operand. Only the X register can use the "Zero Page Offset by Y" mode. If you want to store the accumulator in a zero page address offset by Y, you'll need to use the absolute address by padding the front of the address with 00. Some assemblers do this automatically, which is why I got this wrong!
 
<langsyntaxhighlight lang="6502asm">LDX #$05 ;load 5 into X
LDA $02,x ;load the value stored in $07 into the accumulator. (2 + 5 = 7)
LDY #$04 ;load 4 into Y
LDX $12,y ;load the value stored in $16 into X. ($12 + $4 = $16)</langsyntaxhighlight>
 
===Absolute Offset By X/Y===
An absolute memory address offset by X or Y. This works similar to the zero page version. However, not all commands work with this mode. For example, the LDX and LDY commands work with this mode, but STX and STY do not. (LDA and STA work with all addressing modes except Zero Page Offset By Y.)
<langsyntaxhighlight lang="6502asm">LDX #$15
LDY #$20
LDA $4000,x ;evaluates to LDA $4015
SBC $7000,y ;the accumulator is reduced by the value stored at $7020. If the carry is clear, 1 is subtracted from the result</langsyntaxhighlight>
 
===Zero Page Indirect With Y===
This one's a bit confusing. The values at a pair of consecutive zero page memory addresses are dereferenced, their order is swapped, the two values are concatenated into a 16-bit memory address, THEN the value of y is added to that address, and <b><i>the value at that address</i></b> is used as the operand. Whew! Let's break it up into steps.
 
<langsyntaxhighlight lang="6502asm">LDA #$40
STA $02 ; $02 contains #$40
 
Line 245 ⟶ 264:
LDY #$06 ; Y contains #$06
 
LDA ($02),y ; load the value at address $2040+y = load the value at address $2046</langsyntaxhighlight>
 
Note that for this mode, you are <b>required</b> to offset by Y. If you really don't want to offset by Y, load #0 into Y first.
Line 252 ⟶ 271:
This is similar to the one above. In fact, the only difference besides the register we use is the order of operations. Rather than adding Y after the dereference and concatenation, X is added BEFORE that step. X is placed <i>inside</i> the parentheses to show this. This mode is useful for writing to non-consecutive memory addresses in quick succession, by storing the addresses at consecutive zero page locations. Once again, let's break it down:
 
<langsyntaxhighlight lang="6502asm">LDA #$40
STA $06
LDA #$20
Line 260 ⟶ 279:
 
LDA ($00,x) ;adds x to $00. Then the same thing happens as LDA ($06),y where y=0. This evaluates to LDA $2040, loading the accumulator
;with whatever value happens to be stored there.</langsyntaxhighlight>
 
Like before, you are <b>required</b> to use X in this mode. If you don't want to offset, just have X equal zero. In fact, when x and y both equal zero, <code>($HH,x) = ($HH),y</code> for all 8-bit hexadecimal values $HH.
 
===Zero Page Indirect, No X or Y===
This one isn't available on the original 6502, only on its revision, the 65c02. This behaves just like the two above, except it doesn't involve X or Y. Essentially this saves you the trouble of setting X or Y to zero temporarily just to do an indirect lookup without offsetting.
 
<syntaxhighlight lang="6502asm"> LDA ($00) ;same as "LDA ($00),y" when y = 0</syntaxhighlight>
 
==Quirks and Tricks For Efficient Coding==
===Looping Backwards Is Faster===
Looping is generally faster if the loop counter goes down rather than up. This is because <code>DEX</code> and <code>DEY</code> set the zero and negative flags if their value is zero or #$80 or greater. Generally speaking, this means that when your loop counter goes down, you don't have to use the <code>CMP</code> command to determine if the end of the loop is reached.
<langsyntaxhighlight lang="6502asm">LDX #3 ;set loop counter to 3.
loop:
;whatever you want to do in a loop goes here
DEX ;this statement basically has CPX #0 built-in at no additional cost
BNE loop</langsyntaxhighlight>
 
compared to:
<langsyntaxhighlight lang="6502asm">LDX #0 ;set loop counter to 0.
loop:
;whatever you want to do in a loop goes here
INX
CPX #3
BCC loop</langsyntaxhighlight>
 
The second version takes an additional command per loop for no added benefit. Sometimes you may need X to represent something else in addition to the loop counter, or you may have a large amount of data from an external source, which would take a lot of time to manually reverse the order of the entries. In those cases it may be better to take the "branch penalty" as-is.
Line 286 ⟶ 310:
This concept is related to the one above. If you are implementing your own flags variable in software for controlling the execution of some function, bits 7 and 6 (the leftmost two bits) are the easiest to check. The 6502 does not have the same "bit test" command that is seen on the 68000, z80, 8086, or ARM. The 6502's <code>BIT</code> command can quickly check the value of bits 7 or 6 of a number stored in memory, but the other bits take longer since you have no choice but to load that variable into the accumulator and <code>AND</code> it with a bit mask.
 
<langsyntaxhighlight 6502>softwareFlags equ $00
 
;check bit 7
Line 306 ⟶ 330:
BNE bit4set
 
;etc</langsyntaxhighlight>
 
The moral of the story is, since two of the flags are easier to check than the rest, the ones that need to be checked the fastest or most frequently should be flags 7 or 6.
Line 313 ⟶ 337:
Many of the best practices and "no-nos" you've been taught in computer science courses should be taken with a grain (or rather metric ton) of salt when programming on the 6502. For modern computers, with their blazing processor speeds and massive memory pools, neither the programmer nor the end user will notice that a few bytes here and there were wasted. For example, the rule that "every function can only have one exit point" can result in several wasted bytes and CPU cycles. While these are good principles for maintaining readability, there is a nonzero cost to performance, and this adds up on the 6502 far more than it would on any 32-bit architecture. Unfortunately, just like speed and bytecode, readability and efficiency are a trade-off you'll have to make in the world of assembly programming. It comes down to knowing the byte size and execution time of each CPU instruction (while each opcode is 1 byte, many take operands of 1 or 2 bytes).
 
<langsyntaxhighlight lang="6502asm">myRoutine:
lda testVariable ;2 bytes, 3 cycles
bne continue ;2 bytes, 2 cycles, 3 if branch taken
Line 320 ⟶ 344:
; rest of code goes here
end:
rts ;exit subroutine ;1 byte, 6 cycles</langsyntaxhighlight>
<b>Total: 8 bytes, 12 cycles if branch taken, 14 cycles if not.</b>
 
This version saves 1 byte that the <code>JMP</code> instruction wastes.
<langsyntaxhighlight lang="6502asm">myRoutine:
lda testVariable ;2 bytes, 3 cycles
bne continue ;2 bytes, 2 cycles, 3 if branch taken
Line 331 ⟶ 355:
; rest of code goes here
end:
rts ;exit subroutine ;1 byte, 6 cycles</langsyntaxhighlight>
<b>Total: 7 bytes, 12 cycles if branch taken, 14 cycles if not.</b>
And this version saves you even more:
<langsyntaxhighlight lang="6502asm">myRoutine:
lda testVariable ;2 bytes, 3 cycles
bne continue ;2 bytes, 2 cycles, 3 if branch taken
Line 341 ⟶ 365:
; rest of code goes here
end:
rts ;exit subroutine ;1 byte, 6 cycles</langsyntaxhighlight>
<b>Total: 6 bytes, 12 cycles if branch taken, 11 cycles if not.</b>
 
 
Here's another example of the trade-off between readability and efficient code.
<langsyntaxhighlight lang="6502asm">; compares the accumulator to a constant range of values.
; If the accumulator is within the bounds stored in the temp variables "lowerbound" and "upperbound" then y = 1, otherwise y = 0.
CompareRange_Constant:
Line 362 ⟶ 386:
LDY #0
end:
rts</langsyntaxhighlight>
 
The more efficient way is to do this, which yields the same result:
<langsyntaxhighlight lang="6502asm">; compares the accumulator to a constant range of values.
; If the accumulator is within the bounds stored in the temp variables "lowerbound" and "upperbound" then y = 1, otherwise y = 0.
CompareRange_Constant:
Line 380 ⟶ 404:
 
outOfBounds:
RTS</langsyntaxhighlight>
 
Often, 6502 Assembly will feel like hacking, and you'll be using some "shady" techniques to get things done. Most of the taboos of modern programming are valuable tools in the 6502 programmer's toolbox, but as always you should use them not for the sake of being a rebel, but when they are the best solution. Diligent commenting is a must, as these tools are not easy to understand when someone else is reading your code. For the most part, shaving off a few bytes really doesn't matter (unless you're programming for the Atari 2600 or something time-critical like vBlank or a scanline IRQ) so it's not a huge deal if you have a few wasted bytes here and there. The 6502 can still operate faster than you can blink. But it's important to know that there will be occasions where the "proper" methods of programming need to be tossed aside.
 
===Arrays and Structs===
Structs are a little strange in 6502 compared to other languages, and this is probably the reason why C is often considered a poor fit for the language. The biggest problem is that the 6502 has a hardware limit of 255 for pointer arithmetic essentially, because the indexed/offset addressing modes use an unsigned 8-bit offset. If you're using the <code>($??),y</code> indexed indirect addressing mode, you CAN do pointer arithmetic the way other processors would and increment $?? directly, but that's very slow.
 
We'll consider the following C struct (here, an int is 32-bit, a short is 16-bit, and a char is 8-bit. I'm not sure what cc65 uses)
 
<syntaxhighlight C>struct foo
{
unsigned short spam;
unsigned char eggs;
};
 
struct foo bar[4]; //create an array of four "foo" structs</syntaxhighlight>
 
And we'll pretend that some values have been assigned to the elements of the array (I can't remember the syntax at the moment, sorry!)
 
Normally, you would expect the structs to be laid out in memory like so:
<syntaxhighlight asm>word 0x1234 ;bar [0]
byte 0xAA
word 0x5678 ;bar [1]
byte 0xBB
word 0x9999 ;bar [2]
byte 0xCC
word 0xABCD ;bar [3]
byte 0xDD</syntaxhighlight>
 
However, this doesn't scale well with the 6502 since you're limited to an 8-bit offset. It's much more efficient to flip things "sideways" so to speak and create a structure of arrays. Doing things this way has a few advantages:
* Each element of the array can be searched directly by using X or Y as the index, without the need for complex pointer arithmetic.
* You do not modify the base address, so you can get it back just by setting X or Y to zero again. You don't need to back it up on the stack or in memory.
* In our example, if you stored this array of structs the way that C would when compiling to x86, you would only be able to make it about 85 elements wide before you needed to adjust your base address. With the method below, the size of each struct does not affect your total maximum size of the array.
 
<syntaxhighlight 6502>bar_spam_lo:
byte $34,$78,$99,$CD
bar_spam_hi:
byte $12,$56,$99,$AB
bar_eggs:
byte $AA,$BB,$CC,$DD</syntaxhighlight>
 
==Assembler Syntax==
Line 394 ⟶ 455:
 
Example:
<langsyntaxhighlight lang="6502asm">;typical skeleton for an NES ROM
 
.org $8000
Line 409 ⟶ 470:
dw NMI
dw RESET
dw IRQ ;you can use whatever names you want as long as they match. This is just for clarity.</langsyntaxhighlight>
 
===Value Labels===
Line 419 ⟶ 480:
 
Labeled values can be defined with a <code>define</code>, <code>=</code> or <code>equ</code> directive. This is useful for communicating the purpose of a zero page variable or constant. However, you must still place a <code>#</code> in front of the label if you wish for it to be interpreted as a constant value rather than a memory address.
<langsyntaxhighlight lang="6502asm">tempStorage equ $00 ;intended as a zero page memory address
maxScreenWidth equ $40 ;intended as a constant
 
LDA #maxScreenWidth
STA tempStorage</langsyntaxhighlight>
All labels <i>must</i> be uniquely named, however you may assign any number of differently named labels to the same value. Labels cannot begin with a number.
 
Line 430 ⟶ 491:
 
Like value labels, code labels must be unique. Some assemblers allow the use of local labels, which do not have to be unique throughout the entire program. Local labels often begin with a period or an @, depending on the assembler. A branch or jump to a local label is interpreted as a branch or jump to the closest local label with that name. Often these labels and any code that references them must all be contained between two global labels.
<langsyntaxhighlight lang="6502asm">
MyRoutine: ;this label is global. You cannot use the label "MyRoutine" anywhere else in your program
lda tempData
Line 437 ⟶ 498:
.skip: ;this label is local. You can use ".skip" multiple times, but not in the same function.
sta tempStorage
rts</langsyntaxhighlight>
 
===Defining Data===
Data can be defined with a <code>db</code> or <code>byte</code> directive for byte-length data or a <code>dw</code> or <code>word</code> directive for word-length (16-bit) data. (Some assemblers require a period before the directive name.) Each entry can be separated by commas or separate lines, each beginning with the appropriate directive. Each entry is placed in the order they are typed, from left to right, up to down. For example, the following data blocks are identical (apart from their labels and memory location), though they look different:
 
<langsyntaxhighlight lang="6502asm">
MyData:
db $00,$01,$02,$03
Line 456 ⟶ 517:
MyData4:
db $00,$01
db $02,$03</langsyntaxhighlight>
 
Unlike immediate operands, data does not get a # in front of the value. The values loaded are loaded as immediates regardless.
<langsyntaxhighlight lang="6502asm">LDA MyData ;load the value #$00 into the accumulator</langsyntaxhighlight>
 
Word data is a little different than byte data. Since the 6502 is little-endian, the bytes are stored in reverse order.
<langsyntaxhighlight lang="6502asm">WordData:
dw $2000,$3010,$4020
 
Line 468 ⟶ 529:
db $00,$20 ;each pair of bytes was stored on its own row for clarity. It makes no difference to the assembler.
db $10,$30
db $20,$40</langsyntaxhighlight>
 
===Label Arithmetic===
Assemblers offer varying degrees of label arithmetic. The operators +,-,*,or / that are typical in other programming languages can be applied to constants or labels. In addition, most 6502 assemblers offer special operators that are specific to the language. Some assemblers allow the [[C]] standard operators for bitwise operations, bit shifts, etc.
<langsyntaxhighlight lang="6502asm">pointer equ $20
pointer equ $20
 
LDA #$20+3 ;load #$23 into the accumulator
Line 496 ⟶ 556:
myTable:
db #$10,#$20,#$30,#$40
db #$50,#$60,#$70,#$80</langsyntaxhighlight>
 
 
 
==Citations==
1,489

edits