Gotchas: Difference between revisions

25,399 bytes added ,  5 months ago
m
→‎{{header|Wren}}: Changed to Wren S/H
m (→‎{{header|Wren}}: Changed to Wren S/H)
 
(45 intermediate revisions by 9 users not shown)
Line 1:
{{draft task}}
 
;Definition
Line 9:
=={{header|6502 Assembly}}==
===Numeric Literals===
Integer literals used in instruction operands need to begin with a #, otherwise, the CPU considers them to be a pointer to memory, which will be dereferenced. This applies to any integer representation, even ASCII.
<langsyntaxhighlight lang="6502asm">LDA 'J' ;load the 8-bit value stored at memory address 0x004A into the accumulator.
OR 3 ;bitwise OR the accumulator with the 8-bit value stored at memory address 0x0003
 
 
LDA #'7' ;load the ASCII code for the numeral 7 (which is 0x37) into the accumulator.</langsyntaxhighlight>
 
However, data blocks do ''not'' get the # treatment:
 
<langsyntaxhighlight lang="6502asm">byte $45 ;this is the literal constant value $45, not "the value stored at memory address 0x0045"</langsyntaxhighlight>
 
 
===Memory-Mapped Hardware===
Line 28 ⟶ 27:
* Certain instructions, such as <code>INC</code> and <code>DEC</code>, read from a value and write back to it. This can count as two accesses to a memory-mapped port (if the port cares about that, not all do.) A simple <code>LDA</code> or <code>STA</code> represents a single access.
* Generally speaking, you'll only be able to use <code>STA</code>,<code>STX</code>, or <code>STY</code> to write to ports. You'll need to read the documentation for your hardware.
 
 
Some examples of bad memory-mapped port I/O for various hardware:
<langsyntaxhighlight lang="6502asm">INC $2005 ;the intent was to scroll the NES's screen to the right one pixel. That's not gonna happen.
;What actually happens? Who knows! (I made this mistake once long ago.)</langsyntaxhighlight>
 
===Inverted Carry===
The carry flag is "backwards" compared to most languages with regard to comparisons and subtractions.
On most CPUs, carry set is used to mean "less than", and carry clear is used to mean "greater than or equal." The 6502 is the opposite!
<syntaxhighlight lang="6502asm">LDA #$20
CMP #$19
BCS foo ;this branch is always taken, since #$20 >= #$19. If this were any other CPU this branch would never be taken!</syntaxhighlight>
 
Not only that, carry clear indicates a borrow when subtracting. This is also the opposite to most CPUs. To do a normal subtraction you need to ''set'' the carry before subtracting.
<syntaxhighlight lang="6502asm">LDA #8
SEC
SBC #4 ;eight minus four</syntaxhighlight>
 
===Adding/Subtracting With Carry===
The 6502 (and even its revisions) have ''no way to add or subtract without involving the carry flag.'' The "carry" is essentially the same as "carrying the one" that we all learned in elementary school arithmetic. <code>ADC</code> adds an extra 1 if the carry flag was set when the <code>ADC</code> was executed. <code>SBC</code> subtracts an extra 1 if the carry was ''clear'' when the <code>SBC</code> was executed (as stated before, on most other CPUs the equivalent of <code>SBC</code> behaves the opposite to the 6502).
 
Therefore, any time you want to add two numbers without involving the carry flag you have to do this:
<syntaxhighlight lang="6502asm">CLC
ADC ___ ;your operand/addressing mode of choice goes here</syntaxhighlight>
 
Failure to correctly use the carry flag can often result in unexpected "off-by-one" errors.
 
===Bit Rotates===
Bit rotates are always performed "through carry." This chart will illustrate the concept:
 
<pre>Before Carry Before After Carry After
%11000000 0 ROL %10000000 1
%10000000 1 ROL %00000001 1
%00000001 1 ROL %00000011 0</pre>
 
If you were expecting <code>ROL</code> to immediately transform <code>%10000000</code> into <code>%00000001</code>, there is no single 6502 instruction that can do this. However, it can be achieved using macros.
 
As the above chart implied, all bit rotates ''depend on the value of the carry before the rotate'', so make sure you take that into account.
 
===Page Boundaries===
A "page" is a 256-byte region of memory, spanning from <tt>$xx00</tt> to <tt>$xxFF</tt>. This will often be referred to as "page <tt>xx</tt>." (e.g. page 03 = the <tt>$0300</tt> to <tt>$03FF</tt> memory range.)
 
An instruction that begins on one page and ends on another is said to "cross a page boundary." This often leads to a minor performance hit, in that the instruction may take an extra clock cycle it normally wouldn't if every byte of the instruction was on the same page. Typically this only applies to the complex addressing modes that use index registers.
 
However, the 6502 has a couple bugs regarding the following addressing modes, and they are very similar in how they operate.
* When using <code>$nn,x</code> or <code>$nn,y</code>, if <code>$nn + x</code> or <code>$nn + y</code> exceeds 255, the instruction will wrap around back to $00 rather than advancing to $0100. For a more concrete example:
 
<syntaxhighlight lang="6502asm">LDX #$FF
LDA $80,X ;loads from address $007F, not $017F</syntaxhighlight>
 
In other words, ''an indexed zero page addressing mode cannot exit the zero page.''
 
It should be noted that the above bug does '''not''' apply to indexed absolute addressing.
<syntaxhighlight lang="6502asm">LDA $2080,X ;loads from the correct address regardless of the value of X.</syntaxhighlight>
 
This also happens with the indirect JMP operation, which jumps to the 16-bit address stored at the specified address.
<syntaxhighlight lang="6502asm">LDA #$20
STA $3000
LDA #$40
STA $3001
JMP ($3000) ;evaluates to JMP $4020</syntaxhighlight>
 
The following will '''not''' execute in the way you expect, because this instruction has a similar bug where it doesn't advance to the next page when calculating the address.
<syntaxhighlight lang="6502asm">LDA #$20
STA $30FF
LDA #$40
STA $3100
JMP ($30FF) ;rather than take the high byte from $3100, the high byte is taken from $3000 instead.</syntaxhighlight>
 
As long as you don't put a number that looks like <code>$nnFF</code> in the parentheses as shown above, you can avoid this bug entirely.
 
=={{header|68000 Assembly}}==
===Numeric Literals===
Integer literals used in instruction operands need to begin with a #, otherwise, the CPU considers them to be a pointer to memory that will be dereferenced. This applies to any integer representation, even ASCII.
<langsyntaxhighlight lang="68000devpac">MOVE.L $12345678,D0 ;move the 32-bit value stored at memory address $12345678 into D0
MOVE.L #$12345678,D0 ;load the D0 register with the constant value $12345678</langsyntaxhighlight>
 
However, data blocks do ''not'' get the # treatment:
 
<langsyntaxhighlight lang="68000devpac">DC.B $45 ;this is the literal constant value $45, not "the value stored at memory address 0x0045"</langsyntaxhighlight>
 
===LEA Does Not Dereference===
When dereferencing a pointer using an address register, it is necessary to use parentheses. For these examples,
<langsyntaxhighlight lang="68000devpac">MOVEA.L #$A04000,A0 ;load the address $A04000 into A0
MOVE.L A0,D0 ;move the quantity $A04000 into D0
MOVE.B (A0),D0 ;get the 8-bit value stored at memory address $A04000, and store it into the lowest byte of D0.
MOVE.W (4,A0),D1 ;get the 16-bit value stored at memory address $A04004, and store it into the low word of D1.</langsyntaxhighlight>
 
However, the <code>LEA</code> instruction (load effective address) uses this parentheses syntax, but ''does not dereference!'' For extra weirdness, you don't put a # in front of a literal operand either.
 
<langsyntaxhighlight lang="68000devpac">LEA $A04000,A0 ;effectively MOVEA.L #$A04000,A0
LEA (4,A0),A0 ;effectively ADDA.L #4,A0</langsyntaxhighlight>
 
Also, as we have seen before, constant pointers do not need parentheses and are automatically dereferenced.
=={{header|MIPS Assembly}}==
===Delay Slots===
Due to the way MIPS's instruction pipeline works, an instruction after a branch is executed during the branch, ''even if the branch is taken.''
 
===Partitioned Registers===
<lang mips>move $t0,$zero ;load 0 into $t0
One key feature of the 68000 is that its instructions can operate at different "lengths" (8-bit, 16-bit, or 32-bit). When performing an operation at the "word length" (16-bit), only the least significant 16 bits are affected, and the rest of the register is ignored. The flags also only reflect the result of the calculation with respect to the instruction's "length", not the entire register. For example:
beqz $t0,myLabel ;branch if $t0 equals 0
<syntaxhighlight lang="68000devpac">MOVE.L #$12345678,D0 ;set the entire register to a known value for demonstration purposes.
addiu $t0,1 ;add 1 to $t0. This happens during the branch, even though the program counter never reaches this instruction.</lang>
 
MOVE.W #$7FFF,D0 ;D0 = $12347FFF
Now, you may think that the 1 gets added first and therefore the branch doesn't take place since the conditions are no longer true. However, this is '''not the case.''' The condition is already "locked in" by the time <code>addiu $t0,1</code> finishes. If you compared again immediately upon arriving at <code>myLabel</code>, the condition would be false.
ADD.W #1,D0 ;D0 = $12348000
TRAPV ;the above operation set the overflow flag, so this instruction will call the signed overflow handler.
;Even though the entire register didn't overflow, the portion we were operating on did, so that counts.</syntaxhighlight>
 
As implied by the previous example, loading a value at a length less than 32 bits into a register will leave the "high bits" the same. This can often cause subtle errors that lead to your program failing unexpectedly.
The easiest way to fix this is to put a <code>NOP</code> (which does nothing) after every branch.
 
<syntaxhighlight lang="68000devpac">MOVE.B #16-1,D0 ;loop 16 times
On earlier versions of MIPS, this also happened when loading from memory. The register you loaded into wouldn't have the new value during the instruction after the load. This "load delay slot" doesn't exist on MIPS III (which the Nintendo 64 uses) but it does exist on the PlayStation 1.
forloop:
; loop body goes here
DBRA D0,forloop</syntaxhighlight>
 
The above code is flawed in that <code>DBRA</code> (and its cousins) ''operate at word length.'' Given that, and the fact that we only loaded the loop counter at byte length, the loop will execute <code>$nn10</code> times instead of the intended <code>$10</code> times, where <code>$nn</code> is the prior value of the register being used as the loop counter.
<lang mips>la $a0,0xDEADBEEF
lw $t0,($a0) ;load the 32-bit value at memory address 0xDEADBEEF
addiu $t0,5 ;5 is actually added BEFORE the register has gotten its new value from the memory load above. It will be clobbered.</lang>
 
On most RISC architectures, loading a value less than 32 bits will clear the rest of the register. This is '''NOT''' the case on the 68000. Often, you'll need to "sanitize" the register you're using by clearing its upper bits yourself, using <code>AND.W #$FF</code> or <code>AND.L #$FFFF</code>.
Like with branches, putting a <code>NOP</code> after a load will solve this problem.
 
===Automatic Sign-Extension===
When moving values into ''address registers at word length'', the value is sign-extended first.
<syntaxhighlight lang="68000devpac">MOVEA.W #$8000,A0 ;MOVEA.L #$FF8000,A0
MOVEA.W #$7FFF,A0 ;MOVEA.L #$007FFF,A0</syntaxhighlight>
 
There is no sign-extension when moving values into data registers.
<syntaxhighlight lang="68000devpac">MOVE.W #$FF,D0 ;MOVE.W #$00FF,D0
MOVE.L #$8000,D2 ;MOVE.L #$00008000,D2</syntaxhighlight>
 
If you want sign-extension on data registers, you'll need to do it manually:
<syntaxhighlight lang="68000devpac">MOVE.W #$FF,D0
EXT.W D0 ;D0 = $????FFFF
 
MOVE.L #$8000,D1
EXT.L D1 ;D1 = $FFFF8000</syntaxhighlight>
 
=={{header|C}}==
===if(a=b)===
This is an easy mistake to make if you're in a hurry. The ''assignment operator'' <code>=</code> is different from the ''equality comparison operator'' <code>==</code>. Therefore you have the following similar looking but vastly different statements, both of which are valid.
 
<syntaxhighlight lang="C">if(a=b){} //assigns to "a" the value of "b". Then, if "a" is nonzero, the code in the curly braces is run.
 
if(a==b){} //runs the code in the curly braces if and only if the value of "a" equals the value of "b".</syntaxhighlight>
 
===Array Indexing===
Arrays are declared with the total number of elements in an array.
<syntaxhighlight lang="C">int foo[4] = {4,8,12,16};</syntaxhighlight>
 
However, arrays are zero-indexed, so in your code you'll never actually want to read from/write to the same subscript used when declaring the array. Doing so indexes the array out of bounds, reading from or writing to the address of whatever happens to be stored in memory after it.
 
Chances are this will compile even though it's not something you really want to have happen.
<syntaxhighlight lang="C">int foo[4] = {4,8,12,16};
int x = foo[0]; //x = 4
int y = foo[3]; //y = 16
int z = foo[4]; //z = ?????????</syntaxhighlight>
 
===The sizeof operator===
One of the most common misunderstandings about C arrays is the <code>sizeof</code> operator.
<syntaxhighlight lang="C">int foo()
{
char bar[20];
return sizeof(bar);
}</syntaxhighlight>
 
The gotcha is that <code>sizeof</code> can't be used to calculate anything at runtime. The <code>sizeof</code> operator merely provides a compile-time constant using the information you provided to the compiler about that variable. For arrays, the data type of the array and the number of elements are multiplied together to give the size. C uses this information to know how many bytes to reserve on the stack for the array, but the size isn't stored anywhere with the array. As far as the CPU knows, the fact that the above function's return value is the size of the array is merely a coincidence.
 
 
Put another way, <code>sizeof</code> is just a <code>#define</code> that doesn't pollute your namespace:
<syntaxhighlight lang="C">int foo()
{
#define size_of_bar 20 //the sizeof operator is the same as doing this essentially.
 
char bar[size_of_bar];
return size_of_bar;
}</syntaxhighlight>
 
As a result of this subtle nuance, many new C programmers will write the code below thinking that it will return a value corresponding to the number of elements the array was originally declared with.
<syntaxhighlight lang="C">int gotcha(char bar[])
{
return sizeof(bar);
}</syntaxhighlight>
 
This is not the case, as when passing an array to a function, you're actually passing <i> a pointer to element 0 of that array</i>. Therefore, any size information about that array is lost. In <code>main</code> we might write
<syntaxhighlight lang="C">myArray[40];
int x = gotcha(myArray);</syntaxhighlight>
 
which is actually the same as
<syntaxhighlight lang="C">myArray[40];
int x = gotcha(&myArray[0]);</syntaxhighlight>
 
As using <code>sizeof</code> on a memory address returns the number of bytes that the CPU's instruction pointer register can hold, you'll get the same return value from <code>gotcha</code> regardless of how many elements the array has. When C programmers talk about "arrays decaying into pointers" this is what they're referring to.
 
In order for a function to use an array's intended size, it must be passed in as a separate argument.
<syntaxhighlight lang="C">int foo(char buf[],int length){}
 
int main()
{
char myArray[30];
int j = foo(myArray,sizeof(myArray)); //passes 30 as the length parameter.
}</syntaxhighlight>
 
===gets()===
Computer science teachers (and even most compilers) will tell you to never use <code>gets()</code>. This function takes a pointer as an argument, and will take user input and store it at that memory address. What makes this function infamous and unsafe, however, relates to the above section. <code>gets()</code> only takes a pointer to a <code>char</code> array as an argument, and doesn't have any information about the size of the array it's writing to. The function continues to copy the user's input until the user has finished (i.e. until the Enter key is pressed). Since the user can type without limit, and the function doesn't know the size of the array, any input in excess of the array's size will overwrite whatever is in memory after the array. Since the default behavior of C is to allocate memory on the hardware stack, this can lead to overwriting a function's return address, known as a buffer overflow exploit, allowing a hacker to use <code>gets()</code> as a means to call any function in the program provided the hacker knows the function's address and the endianness of the CPU architecture.
 
===printf()===
The first function every C programmer learns (besides <code>main</code>), <code>printf</code> can be exploited in a similar fashion as <code>gets()</code>, but only if the programmer is irresponsible. <code>printf</code> can theoretically take any number of arguments; however there is no CPU that can actually support variadic functions in hardware (in the sense that the CPU knows how many arguments are passed into it without cheating, e.g. using a variable that holds the number of arguments as in <code>int argc, char** argv[]</code>).
 
The ability for <code>printf()</code> to take any number of arguments was pulled off with a dirty trick: the format string. Every time a <code>%</code> is encountered in the format string, <code>printf()</code> will accomplish the substitution using the next function argument, which depending on the calling convention starts off using registers and then pulls the rest from the stack. This isn't a problem as long as you <i><b>never let the user write the format string.</b></i> If the format string has more unescaped <code>"%"</code>s than there are arguments, <code>printf()</code> will read from the stack and assume whatever is there are the "missing" arguments. This lets a malicious user see the program's function call history which can be useful in figuring out other ways of exploiting the program.
 
<syntaxhighlight lang="C">int main()
{
int x = 3;
int y = 5;
int z = 7;
printf("%d %d %d %x %x",x,y,z); //on an Intel cpu the first %x reveals %%ebp and the second reveals the return address.)
}</syntaxhighlight>
 
=={{Header|Insitux}}==
 
A common "gotcha" in Insitux concerns one of its greatest strengths: ''any'' built-in operations under-loaded by exactly one parameter become closures. Under-loading by accidental omission is a common mistake, sometimes purely typological. Below is an example of accidental under-loading due to misuse of <code>max</code>.
 
<syntaxhighlight lang="insitux">
(let numbers [1 2 3 4]
maximum (max numbers)) ;should be (... max numbers)
(+ maximum 5)
</syntaxhighlight>
 
{{out}}
 
<pre>
3:2 (+ maximum 5)
Type Error: fast+ takes numeric arguments only, not closure.
In entry.ix
</pre>
 
This leads onto another "gotcha": Insitux substitutes any obvious instances of arithmetic operations used with two parameters with a faster alternative which, internally, does not need to iterate over ''N'' parameters. The reason this is a gotcha is two-fold: it can be a bit confusing to see e.g. <code>fast+</code> when you used <code>+</code>, and it complicates [https://www.rosettacode.org/wiki/Test_a_function#Insitux mocking] arithmetic operations.
 
<syntaxhighlight lang="insitux">
(mock + (fn a b ((unmocked +) a b 1)))
(+ 2 3)
</syntaxhighlight>
 
{{out}}
 
<pre>
5
</pre>
 
The example above "should" have returned <code>6</code>, but to achieve this we would need to mock <code>fast+</code> instead.
 
=={{header|J}}==
Line 79 ⟶ 276:
Issues with array rank and type should perhaps be classified as gotchas. J's display forms are not serialized forms and thus different results can look the same.
 
<langsyntaxhighlight Jlang="j"> ex1=: 1 2 3 4 5
ex2=: '1 2 3 4 5'
ex1
Line 88 ⟶ 285:
11 12 13 14 15
10+ex2
|domain error</langsyntaxhighlight>
 
Also, constant values with a single element are "rank 0" arrays (they have zero dimensions) while constant values with some other count of elements are "rank 1" arrays (they have one dimension -- the count of their elements -- they are lists).
Line 96 ⟶ 293:
Another gotcha with J has to do with function composition and J's concept of "[[wp:Rank_(J_programming_language)|rank]]". Many operations, such as <code>+</code> are defined on individual numbers and J automatically maps these over larger collections of numbers. And, this is significant when composing functions. So, a variety of J's function composition operators come in pairs. One member of the pair composes at the rank of the initial function, the other member of the pair composes at the rank of the entire collection. Picking the wrong compose operation can be confusing for beginners.
 
For example:<langsyntaxhighlight Jlang="j"> 1 2 3 + 4 5 6
5 7 9
+/ 1 2 3 + 4 5 6
Line 103 ⟶ 300:
21
1 2 3 +/@+ 4 5 6
5 7 9</langsyntaxhighlight>
 
Here, we are adding totwo lists and then (after the first sentence) summing the result. But as you can see in the last sentence, summing the individual numbers by themselves doesn't accomplish anything useful.
 
J also has gotchas with its token formation and syntax rules.
 
# The character '.' is also "token forming" in non-numeric words and tokens. Here, ':' also follows this rule. (In both cases, this was to allow the language to use strict ascii while representing characters which were originally conceived of as having an [[wikipedia:accent|accent]] or [[wikipedia:diacrit|diacrit]]. Though, once the word forming rule was established it was applied in ways where that original analogy was no longer relevant.)
# a sequence of numbers separated by spaces is a single token, in many languages, spaces are allowed in quoted strings, but J's numbers are not quoted.
# While most people are familiar with the character 'e' being used in numbers, J's token forming rules allow any letter to be used in numbers (even when the result is not a legal number -- also 'x' has a different significance than in C).
# J uses a different character to represent negative numbers than it uses for subtraction.
# J does not have precedence rules distinguishing between <code>+</code> and <code>*</code>. All such verbs have the same precedence and are processed strictly right to left (same as assignment statements and arabic numerals).
# Also, (like many languages) J uses index origin 0 and there are some people who think in terms of index origin 1.
 
Here's a example of the token forming gotcha:
<syntaxhighlight lang=J> 10 100 1000 +/ .*1 2 3 NB. vector multiplication
3210
10 100 1000 +/.*1 2 3 NB. complex conjugate of signum of right argument grouped by the left argument
1
1
1</syntaxhighlight>
 
Here's an example of using spaces in numbers, and how it can mess with people:
<syntaxhighlight lang=J>
A=: 2 3 5 7 11 NB. legal
B=: 999 NB. legal
B 0} A NB. functional array update (does not change A)
NB. the use of a single brace to denote indexing might also confuse people
999 3 5 7 11
0} A NB. legal
2
999 0} A NB. not legal
|rank error
(999)0} A NB. what was intended
999 3 5 7 11</syntaxhighlight>
 
Here's a few other examples of how J's [[j:Vocabulary/Constants|constant notation]] can mess with people:
<syntaxhighlight lang=J>
0x100 NB. 0*e^100 where e is the natural logarithm base 2.71828...
0
0xff
|ill-formed number
16b100 16bff
256 255
8ad90 NB. 8 on the complex plane at an angle of 90 degrees from the real axis
0j8
-100 NB. two tokens
_100
_100+10 NB. three tokens
_90
-100+10 NB. four tokens
_110</syntaxhighlight>
 
The precedence and right-to-left approach give J a syntactic simplicity similar to LISP or Forth (even simpler than some implementations of those languages) while remaining an [[wikipedia:Infix|infix]] language. However, this can be confusing for people who have trained themselves on languages with more complex syntax:
 
<syntaxhighlight lang=J>
1+2*3 NB. processed right-to-left
7
2*3+1 NB. processed right-to-left
8</syntaxhighlight>
 
Finally, J is very expressive (in the formal sense), which has the effect that many
meaningless sentences are legal, so you do get less immediate feedback on your
mistakes.
 
=={{header|Java}}==
<syntaxhighlight lang="java">
import java.util.ArrayList;
import java.util.HashSet;
import java.util.List;
import java.util.Set;
 
public final class Gotchas {
 
public static void main(String[] aArgs) {
// Gotcha 1: An integer argument to a Collection, such as a List, sets the capacity of the Collection,
// but does not fill the Collection with elements.
List<Integer> numbers = new ArrayList<Integer>(100);
// The above list has the capacity to hold 100 elements, but is currently empty.
// Setting the element with index 3 to a value of 42 will create a runtime exception,
// because the list has length 0, and this element does not yet exist.
numbers.set(3, 42);
// The gotcha is only revealed when a runtime exception is thrown.
System.out.println(numbers);
// java.lang.IndexOutOfBoundsException: Index 3 out of bounds for length 0
// Gotcha 2: Copying a Collection in a simple manner works,
// but means that changes to the original Collection are reflected in the copy,
// which is not normally the desired outcome.
Set<String> letters = new HashSet<String>();
letters.add("a"); letters.add("b"); letters.add("c");
// Create a copy of the set.
Set<String> copy = letters;
// The two sets are identical.
System.out.println(letters + " :: " + copy);
// Add an element to the set 'letters'.
letters.add("d");
// The same letter has been added to the set 'copy'.
// Both sets now contain the same 4 letters.
System.out.println(letters + " :: " + copy);
// In a program this can cause mysterious results which can be difficult to debug.
// The correct way to copy a Collection is to use a copy constructor as shown below.
Set<String> correctCopy = new HashSet<String>(letters);
letters.add("e");
// The set 'correctCopy' only contains its original 4 letters.
System.out.println(letters + " :: " + correctCopy);
}
 
}
</syntaxhighlight>
 
=={{header|jq}}==
 
jq supports multi-arity functions, e.g. `def f(a; b):` would be the preamble for `f/2`, a two-argument version of `f`; an invocation of such a function would also use a semi-colon, e.g. `f(1; 2)`. In other programming languages, commas rather than semi-colons would be used as argument separators.
 
The "gotcha" here is that even if `f` has been defined as a two-argument function, the expression `f(a,b)` would be valid if `f/1` were also defined. To add to the potential for confusion, `f(1,2)` might or might not be equivalent to `f(1), f(2)`, depending on how `f/1` has been defined. Sometimes, therefore, the inadvertent use of a comma instead of a semi-colon in the invocation of a function can result in difficult-to-diagnose problems.
 
As with most such problems involving jq programs, the `debug` filter is likely to be helpful.
 
One way to avoid falling into the trap in the first place is to spend some time understanding how the invocation `f(1,2)` might NOT be equivalent to `f(1),f(2)`. (Hint: `def(s): reduce s as $x (0; .+1);`)
 
=={{header|Javascript}}==
===Equality Comparisons===
The normal equality operator (<code>==</code>) is very infamous for its strange results when comparing two variables of different types. Javascript has a complicated set of type coercion rules, which was intended to simplify comparisons between integers and floating point, integers and their string representations (e.g. 2 == '2'), etc. However, this is often frustrating for the programmer when they want to know if two variables are equal and also have the same data type. The strict equality (<code>===</code>) operator will always return false if the variables are of two different types. Many new programmers are taught to always use <code>===</code> to avoid subtle bugs in their programs.
 
=={{header|Julia}}==
Line 179 ⟶ 501:
end
</pre>
 
=={{header|MIPS Assembly}}==
===Delay Slots===
Due to the way MIPS's instruction pipeline works, an instruction placed after a branch instruction is executed during the branch, ''even if the branch is taken.''
 
<syntaxhighlight lang="mips">move $t0,$zero ;load 0 into $t0
beqz $t0,myLabel ;branch if $t0 equals 0
addiu $t0,1 ;add 1 to $t0. This happens during the branch, even though the program counter never reaches this instruction.</syntaxhighlight>
 
Now, you may think that the 1 gets added first and therefore the branch doesn't take place since the conditions are no longer true. However, this is '''not the case.''' The condition is already "locked in" by the time <code>addiu $t0,1</code> finishes. If you compared again immediately upon arriving at <code>myLabel</code>, the condition would be false.
 
The easiest way to fix this is to put a <code>NOP</code> (which does nothing) after every branch.
 
On earlier versions of MIPS, this also happened when loading from memory. The register you loaded into wouldn't have the new value during the instruction after the load. This "load delay slot" doesn't exist on MIPS III (which the Nintendo 64 uses) but it does exist on the PlayStation 1.
 
<syntaxhighlight lang="mips">la $a0,0xDEADBEEF
lw $t0,($a0) ;load the 32-bit value at memory address 0xDEADBEEF
addiu $t0,5 ;5 is actually added BEFORE the register has gotten its new value from the memory load above. It will be clobbered.</syntaxhighlight>
 
Like with branches, putting a <code>NOP</code> after a load will solve this problem.
 
=={{header|Perl}}==
Perl has lists (which are data, and ephemeral) and arrays (which are data structures, and persistent), distinct entities but tending to be thought of as inter-changable. Combine this with the idea of <i>context</i>, which can be 'scalar' or 'list', and the results might not be as expected. Consider the handling of results from a subroutine, in a scalar context:
 
<langsyntaxhighlight lang="perl">sub array1 { return @{ [ 1, 2, 3 ] } }
sub list1 { return qw{ 1 2 3 } }
 
Line 194 ⟶ 536:
 
say scalar array2(); # prints '3', number of elements in array
say scalar list2(); # prints '1', last item in list</langsyntaxhighlight>
 
The behavior is documented, but does provide an evergreen topic for SO questions and blog posts.
Line 230 ⟶ 572:
<br>
Forward calls may thwart constant setup, eg:
<!--<langsyntaxhighlight Phixlang="phix">(phixonline)-->
<span style="color: #008080;">forward</span> <span style="color: #008080;">procedure</span> <span style="color: #000000;">p</span><span style="color: #0000FF;">()</span>
<span style="color: #000000;">p</span><span style="color: #0000FF;">()</span>
Line 238 ⟶ 580:
<span style="color: #0000FF;">?</span><span style="color: #000000;">hello</span> <span style="color: #000080;font-style:italic;">-- fatal error: hello has not been assigned a value</span>
<span style="color: #008080;">end</span> <span style="color: #008080;">procedure</span>
<!--</langsyntaxhighlight>-->
Not a problem if the first executed statement in your program is a final main(), or more accurately not a problem after such a last statement has been reached.<br>
Quite a few of the standard builtins avoid a similar issue, at the cost of things not officially being "constant" anymore, using a simple flag and setup routine.
Line 255 ⟶ 597:
<small> (It surprises me to read Andrew Koenig in ctraps.pdf saying
"In most languages, an array with n elements normally has subscripts ranging from 1 to n inclusive.")</small>
 
=={{header|Quackery}}==
 
My two most frequent whoopsies are using <code>if</code> instead of <code>iff</code>, and forgetting to put <code>again</code> at the end of a <code>while</code> loop. Both are hard to guard against as pretty much any sequence of words could be a reasonable construct in Quackery.
 
One gotcha specific to programmers with experience of Forth could be that the outputs of <code>/mod</code> are the opposite way round in Quackery.
 
=={{header|Raku}}==
Line 293 ⟶ 641:
strings may be used almost interchangeably in most cases.
 
<syntaxhighlight lang="raku" perl6line>say 123 ~ 456; # join two integers together
say "12" + "5.7"; # add two numeric strings together
say .sqrt for <4 6 8>; # take the square root of several allomorphic numerics</langsyntaxhighlight>
 
<pre>123456
Line 309 ⟶ 657:
each object are within. Works great for strings.
 
<syntaxhighlight lang="raku" perl6line>say my $bag = <a b a c a b d>.Bag;
say $bag{'a'}; # a count?
say $bag< a >; # another way</langsyntaxhighlight>
 
<pre>Bag(a(3) b(2) c d)
Line 319 ⟶ 667:
But numerics can present unobvious problems.
 
<syntaxhighlight lang="raku" perl6line>say my $bag = (1, '1', '1', <1 1 1>).Bag;
say $bag{ 1 }; # how many 1s?
say $bag{'1'}; # wait, how many?
say $bag< 1 >; # WAT
dd $bag; # The different numeric types LOOK the same, but are different types behind the scenes</langsyntaxhighlight>
 
<pre>Bag(1 1(2) 1(3))
Line 350 ⟶ 698:
and we want to print the contents to the console.
 
<syntaxhighlight lang="raku" perl6line>.say for (1,2,3), (4,5,6);</langsyntaxhighlight>
 
<pre>(1,2,3)
Line 358 ⟶ 706:
flatten the object due to the single argument rule.
 
<syntaxhighlight lang="raku" perl6line>.say for (1,2,3);</langsyntaxhighlight>
 
<pre>1
Line 368 ⟶ 716:
like multiple object parameters even if there is only one. (Note the trailing comma after the list.)
 
<syntaxhighlight lang="raku" perl6line>.say for (1,2,3),;</langsyntaxhighlight>
 
<pre>(1,2,3)</pre>
Line 374 ⟶ 722:
Conversely, if we want the flattening behavior when passing multiple objects, we need to manually, explicitly flatten the objects.
 
<syntaxhighlight lang="raku" perl6line>.say for flat (1,2,3), (4,5,6);</langsyntaxhighlight>
 
<pre>1
Line 406 ⟶ 754:
 
I've tried to construct an example below which illustrates these pitfalls.
<langsyntaxhighlight ecmascriptlang="wren">class Rectangle {
construct new(width, height) {
// Create two fields.
Line 438 ⟶ 786:
System.print(rect.area) // runtime error: Null does not implement *(_)
System.print(rect.diagonal) // runtime error (if previous line commented out)
// Rectangle does not implement 'sqrt(_)'</langsyntaxhighlight>
 
=={{header|X86 Assembly}}==
===Don't use LOOP===
This doesn't affect the 16-bit 8086, but <code>LOOP</code> has some quirks where it's slower than:
<syntaxhighlight lang="asm">label:
;loop body goes here
DEC ECX
JNZ label</syntaxhighlight>
 
Which is very ironic considering that <code>LOOP</code> was originally designed to be a more efficient version of the above construct. (It was more efficient in the original 8086, but not in today's version.) Thankfully, compilers are "aware" of this and don't use <code>LOOP</code>.
 
=={{header|Z80 Assembly}}==
===JP (HL) Does Not Dereference===
This is a bit of misleading syntax. For every other instruction, parentheses indicate a dereference of a pointer.
<langsyntaxhighlight Z80lang="z80">LD HL,(&C000) ;load the word at address &C000 into HL
LD A,(HL) ;treating the value in HL as a memory address, load the byte at that address into A.
EX (SP),HL ;exchange HL with the top two bytes of the stack.
 
JP (HL) ;set the program counter equal to HL. Nothing is loaded from memory pointed to by HL.</langsyntaxhighlight>
 
This syntax is misleading since nothing is being dereferenced. For example, if HL equals &8000, then <code>JP (HL)</code> effectively becomes <code>JP &8000</code>. It doesn't matter what bytes are stored at &8000-&8001. In other words, <code>JP (HL)</code> ''does not dereference.''
 
Strangely enough, [[8080 Assembly]] uses a more sensible <code>PCHL</code> (set Program Counter equal to HL) to describe this function. So this gotcha is actually exclusive to Z80.
 
===IN/OUT===
Depending on how the Z80 is wired, ports can be either 8-bit or 16-bit. This creates somewhat confusing syntax with the <code>IN</code> and <code>OUT</code> commands. A system with 16-bit ports will use <code>BC</code> even though the assembler syntax doesn't change. Luckily, this isn't something that's going to change at runtime. The documentation will tell you how to use your ports.
 
<syntaxhighlight lang="z80">ld a,&46
ld bc,&0734
out (C),a ;write &46 to port &0734 if the ports are 16-bit. Otherwise, it writes to port &34.</syntaxhighlight>
 
Unfortunately, this means that instructions like <code>OTIR</code> and <code>INIR</code> aren't always useful, since the B register is performing double duty as the high byte of the port and the loop counter. Which means that your port destination on systems with 16-bit ports is constantly moving! Not good!
 
===RET/RETI/RETN===
Here's one I didn't learn until recently. Depending on the wiring, <code>RETI</code> (return and enable interrupts) and <code>RETN</code> (return from Non-Maskable Interrupt) may end up functioning the same as a normal <code>RET</code>. This means that sometimes you have to use the following (which just makes anyone else reading your code think that you don't know what <code>RETI</code> does.)
<syntaxhighlight lang="z80">EI ;RETI doesn't enable interrupts on this Z80.
RET</syntaxhighlight>
 
Fortunately, there's a bit of a "reverse gotcha" that helps us out. When interrupts are enabled with <code>EI</code>, there is no chance that an interrupt will occur during the next instruction. <code>EI</code> doesn't actually enable interrupts until the instruction ''after'' it is finished.
9,476

edits