Segmentation fault protection

From Rosetta Code
Segmentation fault protection is a draft programming task. It is not yet considered ready to be promoted as a complete task, for reasons that should be found in its talk page.

A segmentation fault or segfault is an error that your computer will raise if a program attempts to access memory that it's not supposed to, for example, trying to write to executable code or read-only memory, dereferencing a null pointer, etc. Often this happens when an array is indexed out-of bounds, such as (for example) trying to read element 500 of an array with only 6 entries (see the pseudocode example below):

myArray = {1,2,3,4,5,6};
myVariable = myArray[500];        //causes a segfault

Although this seems like a silly example, it's a lot more common than those who are new to programming may think, and different languages have different safeguards in place for preventing this kind of thing from happening. Some languages would refuse to compile the above example, some will give you a warning and run anyway, and others may not even care at all.

Task

Showcase what built-in protections your language has for segmentation faults, if any. If your language doesn't have any, show what happens when your program commits a segmentation fault. (Be safe and don't destroy your computer!)


6502 Assembly

The 6502 doesn't have any handlers for access violation; the entire address space of the CPU is fair game. As you can imagine this isn't good for computer security.

However, the stack being fixed at $0100-$01FF means that the stack will never overwrite the heap, yet it can get to the point where new values pushed onto the stack overwrite the old, which can cause a CPU crash if you try to unwind the stack back to the beginning.

The 6502 uses memory-mapping to interact with external hardware, and reading/writing a memory location you normally shouldn't because your array indexed out of bounds can cause undefined behavior of external hardware, and even on some systems can result in a killer poke which can damage certain machines such as the Commodore PET. Memory-mapped ports don't work like normal memory; unlike normal memory, even reading a memory-mapped port can affect its related hardware, or affect the contents of other ports that are related to that hardware. (This isn't a property of the 6502 itself, but of the hardware connected to it. For example, changing the video display settings of the Apple II can be done by either reading from or writing to the port associated with that setting.)

It's very unlikely that you'll index an array out of bounds, however, as you can only index up to 255 bytes forward from the base address. What's more likely to happen is forgetting to pop all registers you pushed before returning from a subroutine, and the program counter getting loaded with some unknown value and executing from there. Again, the 6502 won't stop you from trying to execute RAM or memory-mapped ports, and there's no guarantee what will happen (most bytes with a low nibble of 2 will crash the CPU if it tries to execute them.)

68000 Assembly

The closest thing the 68000 has to a segfault is an "alignment fault." MOVE.W (An),Dn or MOVE.W (An),Dn where An is any address register that contains an odd value will crash the CPU. More specifically, it will trigger one of the CPU's traps, forcing the program counter to jump to an error handler that is looked up from the trap table in low memory (on some machines such as the NEOGEO this just reboots the CPU. Note that byte-length commands are not subject to this aligment rule, which can lead to problems when working with structs that contain mixed data types. The programmer can avoid this very easily by either padding all their data to word length, use of the EVEN assembler directive after a sequence of byte-length data, or carefully arranging structs/unions so that all byte data is in pairs. An example is below:

myStruct:          ;by default your assembler will ensure that labels always point to even memory locations.
dc.b $20
dc.b $40
dc.w $2222
dc.b $60
dc.b $80
dc.l $55555555

LEA myStruct,A0
MOVE.B (A0)+,D0
MOVE.B (A0)+,D1
MOVE.W (A0)+,D2
MOVE.B (A0)+,D3
MOVE.B (A0)+,D4
MOVE.L (A0)+,D5    ;none of these MOVE commands will cause an alignment fault.

As for executing RAM, or writing to code areas, there is no protection for either on the 68000. Failing to "balance" the stack will likely cause the alignment problem mentioned earlier, as the program counter needs to be even-aligned as well as any address registers you use.

(Note: Simply having an odd number stored in an address register will not crash the CPU by itself. As long as you never attempt to dereference the address register's value, no crash will occur. Otherwise you'd never be able to load byte data! As mentioned before, dereferencing an odd memory address is fine if you use a .B instruction to do so.

Julia

Most commonly, segfaults in Julia are indications of a problem in a called C library. However, the functions in Julia prefixed with unsafe_, such as unsafe_load(), can be used to reference a null pointer as in C, and can thus cause a segfault in the LLVM compiler runtime used to run Julia:

julia> unsafe_load(convert(Ptr{Int}, C_NULL))

Exception: EXCEPTION_ACCESS_VIOLATION at 0x511008e4 -- unsafe_load at .\pointer.jl:105 [inlined]
unsafe_load at .\pointer.jl:105
...

Nim

Nim can generates native code (using C as intermediate language) and it is possible for a program to cause a segmentation violation. Therefore the language has been designed to prevent this to happen, at least if the user doesn’t use unsafe features:

  1. Nim is a statically and strongly typed language. That means that is is possible to detect a lot of errors at compile time.
  2. Nim proposes a memory management model which ensures memory integrity.
  3. At run time Nim insures that all variables are initialized which prevents some problems.
  4. Nim uses copy semantic which means that an assignment copies the value instead of copying a reference. This prevents many errors related to aliasing.
  5. Nim generates code to check that indices are valid, that values are in a valid range, that operations do not overflow, etc. However, checking that a reference or a pointer is not nil is not done for performance reasons. This isn't really a problem, since accessing a null address instantly terminates the program (with a segmentation violation, of course). So doing a check to raise a non catchable exception would not be an improvement.

So far, so good, but Nim provides pragmas to deactivate the checks in some parts of the code. And it is also possible to compile the programs in “danger” mode. This way, we improve the performance but at a high price as it is no longer possible to ensure the memory integrity.

As a system language, Nim also provides unsafe features such as accessing the address of a variable, using type casting, deactivating the initialization of a variable, using unmanaged memory via pointers, etc. Of course, such constructions may cause memory corruption.

So, provided you don't use unsafe features, Nim generates code that greatly reduces the risk of a segmentation violation.

Perl

It's fairly hard to get into this kind of trouble with Perl, as it manages memory for you, etc, etc. But it can happen, and the #1 technique for keeping yourself on the straight and narrow is to start your programs off with:

use strict; use warnings;

This is so important, I'm going to say it again, louder:

use strict; use warnings;

For example, this code will segfault:

unpack p,1x8

But with the safety features enabled you get compilation errors:

use strict;
use warnings;
unpack p,1x8
Output:
Bareword "p" not allowed while "strict subs" in use at exp/SegV line 6.
Execution aborted due to compilation errors.

But if you absolutely feel the need to court disaster, at least run your code in an external process.

qx/perl -e 'unpack p,1x8' 2>&1/ or die "Couldn't execute command\n";

Phix

Phix has extensive compilation and runtime error handling, including bounds checking, unassigned variables, invalid assignments, and more, as well as exception handling, and strives to always deliver clear human-readable and actually useful error messages along with the file name and line number where they occurred - of course within reason, as the following example shows messages triggered by inline assembly (or within prebuilt dll/so) are inherently always going to be a bit more cryptic than the norm.

try
   #ilASM{ xor eax,eax
           mov eax,[eax]
         }
catch e
    ?e
end try
Output:
{30,10043489,3,21,"-1","test.exw",`C:\Program Files (x86)\Phix\`,"fatal exception [MEMORY VIOLATION] at #00994061"}

You could, of course, like the non-caught errors do, make that a bit prettier, in fact here is what you get without try/catch:

C:\Program Files (x86)\Phix\test.exw:3
fatal exception [MEMORY VIOLATION] at #008A9042

Global & Local Variables

--> see C:\Program Files (x86)\Phix\ex.err
Press Enter...

The generated ex.err contains a full callstack and itemises in painstaking detail all variables and their values at the point of failure.

Note that p2js does not support exception handling (yet) or inline assembly (ever) and of course relies entirely on the browser development tools for any and all error handling and debugging capabilities - you are expected to get things working on desktop/Phix before trying to run them in a browser.

Raku

Barring bugs in the compiler implementation, it should be impossible to generate a seqfault in standard Raku code. Memory is automatically managed, allocated and garbage collected by the virtual machine. Arrays are automatically managed, with storage allocation autovivified on demand. Uninitialized variables default to a Nil value, which decomposes to the base type of the variable. (If a base type was not specified, it could be Any type.)

my @array = [1, 2, 3, 4, 5, 6];
my $var = @array[500];

say $var;
Output:
(Any)

If you invoke the Nativecall interface and call out into C code or libraries however, all such protections are forfeit. With great power comes great responsibility.


Wren

In theory and barring bugs in the language's implementation Wren code should never segfault. Array bounds are always checked, memory allocation and reclamation is automatic and there is no support for pointers at all.

So the example code below compiles:

var myArray = [1, 2, 3, 4, 5, 6]
var myVariable = myArray[500]

but if you try to run it you get a "Subscript out of bounds" error.

However, when Wren is embedded in a C host, segfaults are certainly possible and there is no protection against them whatsoever. As Wren's virtual machine is not re-entrant a common source of segfaults is trying to treat it as if it was!

Z80 Assembly

The Z80, like the 6502, considers the entire address space of the CPU as a valid place to read, write, or execute. As a result, the hardware itself will not deny your program access to anything in the 64k address space. Writes to ROM will be at best silently ignored, or, in the case of certain Game Boy cartridges, be interpreted as a command to the bankswitching hardware.

Overwriting executable RAM and executing the stack/heap are both perfectly legal in the eyes of the CPU, and many Z80 programs took advantage of this to save memory or increase the speed of a subroutine. (It takes fewer clock cycles to store the accumulator as the operand of a future LD A,# instruction than it does to execute PUSH AF and POP AF.) The Z80 has no pipeline, branch prediction, or speculative execution so performing such self-modifying code tricks won't result in problems if done correctly.

Dereferencing a null pointer will return one or two arbitrary bytes that aren't particularly useful. It won't cause a crash, however. The first 8 bytes of the Z80's address space are used for a small subroutine that can be called with RST 0 or CALL &0000, which is such a small section of memory that most programmers will place a jump to the actual code they want to run at that address instead.