Array: Difference between revisions

← Older edit

Array (view source)

Revision as of 02:13, 18 July 2022

8,275 bytes added , 1 year ago

m

→‎Indexing

Puppydrum64

1,489

edits

Revision as of 18:16, 7 September 2021 (view source) Puppydrum64 (talk \| contribs) m (→‎Encoding an Array's End) ← Older edit		Latest revision as of 02:13, 18 July 2022 (view source) Puppydrum64 (talk \| contribs) m (→‎Indexing)
(23 intermediate revisions by 3 users not shown)
Line 41: * [[letter frequency]] ===[[Assembly]]=== An array is simply a sequence of values stored in consecutive memory locations. Its beginning is typically defined with some sort of label that points to the address where that array is stored. ~~Arrays~~Whether ~~are~~an array is mutable ~~unless~~or ~~they~~immutable ~~are~~depends on the hardware; in older assembly languages, an array is only typically immutable if it's stored in ROM,. ~~such~~Home ascomputer software that is stored on adisk ~~video~~or ~~game~~tape can define an array at compile time that can be mutable; ROM cartridge programs cannot. The syntax is the same for both, however. ROM cartridge programs will need to construct their array in ROM, copy it to RAM, and alter the copy. ~~<lang~~'''Example ~~6502asm>;~~using [[6502 Assembly ~~example~~]]:''' <lang 6502asm>ArrayRAM equ $00 ~~ArrayRAM equ $00 ;the beginning of an array, stored in zero page RAM~~ ~~ArrayROM:~~;the dbbeginning ~~0,5,10,15,20,25,30,35,40,45,50~~of ;an array, stored in ~~ROM</lang>~~zero page RAM ArrayROM: db 0,5,10,15,20,25,30,35,40,45,50 ;on Commodore 64 (for example) these values can be modified at runtime, but on the NES they are read-only.</lang> Almost all assembly languages have a method of loading from a memory address offset by some sort of variable amount. That offset is the index into the array. Depending on the size of each element that index is multiplied by the number of bytes each element takes up. What constitutes an "element," "row," or "column" of the array is entirely decided by the programmer. Arrays in assembly are always zero-indexed. ====Indexing==== Almost all assembly languages have a method of loading from a memory address offset by some sort of variable amount. That offset is the index into the array. Depending on the size of each element that index is multiplied by the number of bytes each element takes up. What constitutes an "element," "row," or "column" of the array is entirely decided by the programmer. Arrays in assembly are always zero-indexed. ~~<lang 68000devpac>;68000 Assembly example~~ ~~LEA myArray,A0 ;loading a labeled array name like this loads the address of the zeroth element~~ '''Example using [[68000 Assembly]]:''' <lang 68000devpac>LEA myArray,A0 ;loading a labeled array name like this loads the address of the zeroth element MOVE.W #451,D1 ;five elements per row, so to get the 4th row we multiply the row number by the elements per row, ;times the number of bytes per element Line 59 ⟶ 63: myArray: DC.B 10,11,12,13,14 ;typing the array in rows like this is only for the viewer's convenience - it means nothing to the CPU. ~~DC.B 10,11,12,13,14~~ DC.B 20,21,22,23,24 ;all 25 entries could have been on one line in the same order and it wouldn't change how they are stored. ~~DC.B 20,21,22,23,24~~ DC.B 30,31,32,33,34 DC.B 40,41,42,43,44 DC.B 50,51,52,53,54 Line 67 ⟶ 71: </lang> It is <b>much</b> easier to work with arrays in assembly if all rows are the same length. The best way to deal with a ragged or jagged array, such as an array of strings of different lengths, is to ~~pad~~store ''pointers to the ~~ends~~array ofentries ~~each~~rather ~~shorter~~than ~~row~~the ~~with~~intended ~~arbitrary~~entries ~~bytes~~themselves.'' This ~~lets~~equalizes ~~you~~the ~~select~~size anof ~~arbitrary~~all ~~row~~the elements of the array which makes it much easier to index. Another way to handle such arrays is with apadding, ~~simple~~by ~~bit~~adding ~~shift~~extra ~~applied~~null bytes to the ~~index~~ends until all rows are the same length, but for many data types it's better to construct an array of pointers. <lang 68000devpac>main: ;goal: Print "Orange" to stdout LEA Strings,A0 LEA (4,A0),A0 ;this is NOT a dereference operation, it merely adds 4 to A0. MOVE.L (A0),A0 ;now we dereference the pointer so that we have the address of "Orange" in A0. ;If we did MOVE.B (A0),D0 now, we'd load the "O" in "Orange" into D0. JSR PrintString ;or whatever you use to write to stdout RTS ;return Strings: DC.L AppleAddress DC.L OrangeAddress DC.L GrapeAddress AppleAddress: DC.B "Apple",0 even OrangeAddress: DC.B "Orange",0 even GrapeAddress: DC.B "Grape",0 even</lang> In assembly, there is '''nothing''' stopping your program from indexing an array out of bounds! The computer doesn't have an understanding of where your array "ends," so asking it to return (for example) the 500th element of an array with only 25 elements is perfectly legal. This typically won't happen for arrays that aren't indexed based on user input, but be careful nonetheless. Indexing an array out of bounds and trying to read from it can cause segmentation faults on more modern machines, which is the operating system's way of preventing your code from reading memory that is outside of the program that is trying to index the array. Older CPUs don't have this sort of protection, and indexing out of bounds will read whatever is stored in memory at that location, which is why it falls into the realm of undefined behavior- the result entirely depends on how the code in your program is arranged. ====Iteration==== Iteration over the elements of an array is fairly straightforward. ~~<lang 68000devpac>;68000 Assembly example~~ '''Example using [[68000 Assembly]]:''' ~~LEA myArray,A0~~ <lang 68000devpac>LEA myArray,A0 loop: MOVE.B (A0)+,D0 ; As is, this example code will inevitably index out of bounds. ; In practice there will be some way to end the loop, typically a byte count or a null terminator. JMP loop</lang> ~~<lang~~'''Example ~~z80>;~~using [[z80 Assembly ~~example~~]]:''' <lang z80>ld hl,myArray ;load the address of myArray into hl ld de,userRam ;load the address of work RAM into de ld bc,myArrayEnd-myArray ;assembler directive that auto-calculates the array size using labels placed at the beginning and end. ldir ;copy the entire contents of the array to work RAM</lang> ====Skip-Counting==== Skipping elements can be easily done with a "dummy read," whereby an auto-incrementing/decrementing addressing mode is used solely for updating the pointer, or by incrementing/decrementing a loop counter multiple times per loop. ~~<lang~~'''Example ~~asm>;~~using [[8086 Assembly ~~example~~]]:''' <lang asm>iterate: movsb ;store [ds:si] into [es:di], increment both pointers, and decrement cx. lodsb ;dummy read to increment the pointer and decrement cx. The value loaded into AL gets discarded. Line 108 ⟶ 143: ;8086 Assembly mov ax,[bx+di] ;load the BXth element of the array whose beginning is stored in DI, into AX. ;alternatively, load the DIth element of the array whose beginning is stored in BX, into AX. ;ARM Assembly ldr r0,[r1,#4] ;load the 1st (zero-indexed) element of the array whose pointer to its 0th element is stored in r1, into r0.</lang> ====Encoding an Array's End==== ~~=====~~'''Null Terminator~~=====~~''' This method is most commonly used with strings. An ASCII value that is not associated with any keyboard key, typically 0, is placed at the end of a string. In a typical PrintString assembly routine, the routine is given a pointer to the 0th entry of the string as its only parameter. The routine reads from the pointer, prints that letter, increments the pointer, and repeats until the terminator is read, at which point the routine ends. A string variable in [[C]] will place a 0 at the end of a string without you having to define it yourself. This method works well for strings and other arrays where the terminator's value is not a possible value for actual data. On more general arrays where the entries represent non-ASCII data, this causes problems where you have a datum that just so happens to equal the terminator. This method is most commonly used with strings. An ASCII value that is not associated with any keyboard key, typically 0, is placed at the end of a string. In a typical PrintString assembly routine, the routine is given a pointer to the 0th entry of the string as its only parameter. The routine reads from the pointer, prints that letter, increments the pointer, and repeats until the terminator is read, at which point the routine ends. Without the terminator, the program would not know when to stop reading and eventually crash. A string variable in [[C]] will place a 0 at the end of a string without you having to define it yourself. This method works well for strings and other arrays where the terminator's value is not a possible value for actual data. On more general arrays where the entries represent non-ASCII data, this causes problems where you have a datum that just so happens to equal the terminator. '''Example using [[8086 Assembly]]:''' <lang asm>PrintString: ; input: [DS:SI] = string pointer mov al,[ds:si] jz Terminated ;we've reached the terminator inc si call PrintChar ;call to hardware-specific printing routine jmp PrintString Terminated: ret HelloText: db "Hello World",0 ;a value in quotes is its ASCII equivalent, anything else is a numeric value.</lang> In cases like this, high-level languages often implement <i>escape characters</i> which when encountered in a string, result in a branch to a section that reads the next character without checking if it's a terminator or other control code. Effectively this removes the special meaning of that character but only if an escape character is before it. This concept is often reversed to allow the programmer to easily implement ASCII control characters, such as <code>\n</code> for new line (in ASCII this is represented by a 13 followed by a 10, for carriage return + line feed.) In this case the backslash signals to <code>printf()</code> that the next letter is associated with a particular ASCII control code. If the next character read is an "n" then ASCII 13 gets printed, followed by ASCII 10. After this, normal reading and printing resumes. The example below shows a simple implementation of an escape character. The first instance of the null terminator gets printed as-is rather than used to end the routine. '''Example using [[8086 Assembly]]:''' <lang asm>PrintString: ; modified to use the \ as an escape character. ; input: [DS:SI] = string pointer mov al,[ds:si] inc si cmp al,5Ch ;ascii for backslash, this is the escape character ;notice that the check for the escape character happens before the check for the terminator. jz EscapeNextChar cmp al,0h ;check the terminator jz Terminated ;we've reached the terminator call PrintChar ;call to hardware-specific printing routine jmp PrintString EscapeNextChar: mov al,[ds:si] ;perform an additional read, except this read doesn't compare the fetched character to anything. inc si call PrintChar ;print that character as-is jmp PrintString ;go back to the loop's beginning, skipping the terminator check entirely. Terminated: ret HelloText: db "Hello World\",0,13,10,0</lang> ~~<lang asm>HelloText: ;6502, z80, 8086~~ ~~db "Hello World",0~~ ~~HelloText:~~ ~~dc.b "Hello World",0 ;68000~~ ~~even ;aligns to the next 2 byte boundary if not already aligned.~~ '''End Label''' ~~HelloText~~ ~~.byte "Hello World",0 ;ARM~~ ~~.align 4 ;aligns to the next 4 byte boundary if not already aligned.</lang>~~ ~~=====End Label=====~~ This method is best for pre-defined arrays. Its usage was shown in earlier examples. Most assemblers can create a constant based off simple arithmetic using labels. A label's numeric value is the address it gets assembled to, decided at assemble time. Placing another label immediately after the end of an array will point it to the next byte after that array. The assembler subtracts the array's beginning label from this value to create a size constant that you don't have to manually adjust if you change the number of entries the array has. This method can be used with array variables in conjunction with a null terminator and padding the distance between the terminator and the end of the range dedicated to storing array variables. The major Achilles' heel of this method is that it only works on arrays that are of known length prior to assembling. In other words, you <b>cannot</b> make the array larger at runtime. ~~=====Size Variable=====~~ This method is best for arrays that are not pre-defined. Placed before the 0th element of the array is its maximum size. This value's location relative to the 0th element is always the same, regardless of how long the array actually is. As such, the pointer to the 0th element can be offset by a fixed negative amount to get the size. Alternatively, rather than storing the size of the array, a pointer to the last element can also be stored in front of the array. '''Example using [[68000 Assembly]]:''' <lang 68000devpac>main: LEA myArray,a0 MOVE.W #(MyArray_End-MyArray)-1,D7 ;the minus 1 corrects for DBRA, the subtraction of the two labels gives us the total byte count ;for .W or .L sized data you may need to right-shift D7 by 1 or 2 respectively, depending on what you're doing loop: MOVE.B (A0)+,D0 JSR PrintChar ;or whatever your implementation uses to write to stdout. DBRA D7,loop RTS ;exit program MyArray: DC.B "Hello World" MyArray_End: EVEN ;to get the correct byte count, you'll need the EVEN directive AFTER the end label.</lang> '''Size Value''' This method is best for arrays that are not pre-defined. Placed before the 0th element of the array is its maximum size. This value's location relative to the 0th element is always the same, regardless of how long the array actually is. As such, the pointer to the 0th element can be offset by a fixed negative amount to get the size. Alternatively, rather than storing the size of the array, a pointer to the last element can also be stored in front of the array. This lets you use pointer arithmetic to calculate the array's size, by subtracting the pointer of the 0th element from the last. This method is commonly used in picture file formats to mark where the picture data ends. '''Example using [[68000 Assembly]]:''' <lang 68000devpac>MyArray: DC.W 3,5 ;three rows, five columns ;it would have also been sufficient to have "DC.W 15" instead of "DC.W 3,5". DC.B 1,2,3,4,5 DC.B 6,7,8,9,10 DC.B 11,12,13,14,15</lang> ===[[ATS]]=== The array types built into ATS are an extremely complicated topic, and I shall mention only three points of interest. # ATS wants to prevent you from going outside the bounds of the array; and it wants to do so without runtime checks. There are numerous ways to get around that "desire", but the capability for strictness is there. # ATS wants to distinguish between the initialized and uninitialized parts of an array. Again, you can get around this "desire", but the capability to be rigorous is there. # ATS arrays are basically C arrays. An array is a contiguous block of memory at a particular machine address, with no other runtime structure. An array is represented by a pointer to the beginning of the memory block, as in C; but, furthermore, there is a ''view''. The view exists only as a typechecking entity; there is no runtime code associated with it. Suppose you have a fully initialized array, called '''a''', of fourteen '''int''' starting at address '''p'''. Then the view for the array may be written <lang ATS>@[int][14] @ p</lang> or equivalently as <lang ATS>array_v (int, p, 14)</lang> To access an element of the array, that view needs to be available to the typechecker. Let us suppose the view is in fact available, and that there are operator overrides to use the square brackets as (in this case one-dimensional) array indices. Then a program that says <lang ATS>let val x = a[3] in a[4] := x end</lang> should be compilable, but one that says <lang ATS>let val x = a[20] in a[30] := x end</lang> will trigger a type error, due to the out-of-bounds indices. Above we assumed '''a''' was fully initialized. Working with an array that is not fully initialized can become quite complex, and I shall mention only that the view for '''a''' before it is initialized would be <lang ATS>@[int?][14] @ p</lang> with a question mark. Fortunately there are routines in the prelude to fully initialize an array, for instance with a fill value such as zero or an empty string. There are also ways to fully deinitialize an array; to allocate an array in the heap, and to free one; to place an array in a stack frame; to make an array as a global variable. The curious reader is encouraged to go to [http://www.ats-lang.org/ the ATS website] for more information. ===[[Fortran]]=== Line 223 ⟶ 340: reduce range(2;n+1) as $i (1; . * $i) # => n!</lang> ==={{header\|ReScript}}=== <lang ReScript>let arr1 = [4, 2, 8, 14, 3, 6, 22, 17] let _ = Js.Array2.push(arr1, 5) arr1[3] = 9 let isEven = x => mod(x, 2) == 0 let square = x => x * x let arr2 = Js.Array2.filter(arr1, isEven) let arr3 = Js.Array2.map(arr2, square) let total = Js.Array2.reduce(arr3, \"+", 0) Js.log2("total: ", Js.Int.toString(total)) let arr4 = Js.Array2.sortInPlaceWith(arr3, (a, b) => a - b) let arr5 = Js.Array2.slice(arr4, ~start=2, ~end_=4) Js.Array2.forEach(arr5, x => Js.log(x)) switch Js.Array2.find(arr1, x => x < 0) { \| Some(x) => Js.log2("found: ", x) \| None => Js.log("no negative element found") }</lang> {{out}} <pre>$ bsc arr.res > arr.js $ node arr.js total: 604 36 64 no negative element found </pre> ===[[REXX]]===