Array

From Rosetta Code
Revision as of 22:27, 20 November 2008 by rosettacode>JimD

An array is a composite data type, meaning it can store multiple values, and so is in the collection category. The stored values are called elements of the array and are accessed by a sequence of indices. In a program, indices may be a mixture of constant values or variables which allow the elements accessed to vary under program control. The indices of an array are totally ordered.


The implementation details of an array gives rise to an important distinction between arrays and associative arrays.

The implementation of arrays is based on setting the bounds of indices of the array, the size of the array, normally by allocating a contiguous region of memory to hold the elements of the array, and using simple offset calculations on the indices from the origin of the memory to access memory elements. Some languages support extensions to allow such arrays to be resized, or re-shaped, in which the memory area is adjusted, but extent elements are retained.
By contrast an associative array maps the association between index "keys" and their associated values, generally using more complex hash functions on the keys of the array to map them to their corresponding elements (by pointers, references or memory addresses of some sort). Associative arrays are referred to variously as "hashes" (Perl), "maps" or "mappings" (Lua), "dictionaries" (Python) as well as "associative arrays" (Awk, ksh and others). The keys into associative arrays are normally not constrained to be integers (unlike arrays which generally required contiguous integer ranges). Different languages may impose various constraints on these keys. For example in Perl keys are evaluated as Perl "scalar" values such that keys of 1 (a literal integer), "1" (a string representing the same), 1.0 (a literal floating point which is numerically equivalent) and "1.0" (another string representing a numerically equivalent value) will all map to the same key/value in a hash. Other languages (such as Python) may treat each type of object as distinct. (See associative array for further discussion).
Non-associative arrays may have speed and memory consumption advantages. Associative arrays have greater flexibility in types used for keys and generally obviate the need to implement searches through the collection (Each component on which one would search can be implemented as a different associative array of references to their corresponding values or records).

Arrays with more than one index are called multidimensional arrays. For example, a matrix is a two-dimensional array.

Some languages (such as awk) do not support true arrays and merely emulate them through their associative arrays. Similarly some languages emulate multi-dimensional arrays by concatenation of dimensional indices into keys (perhaps a peculiarity of awk).

Common operations defined on arrays include:

  • Indexing: accessing an array element by its indices. (There is a one to one mapping between an index and its corresponding element).
  • Slicing: producing a subarray by putting some constraint on the indices. For example, PL/1 provides extracting of a row or a column of an array. In Ada any range of the index can be used in order to extract a subarray from a single-dimensional array. In Python slices can extract any contiguous subset of an array and extended slice notation can extract elements in reversed order and/or by traversing in a given "stride" --- for example a[100:0:-2] would return every odd element from 100 to the beginning of the list: a[99], a[97], ... a[1].
  • Iteration over the array's elements. Some languages have a foreach loop construct for array iteration, in others this must be done with conventional looping and arithmetic.
  • Iteration over the indices of an associative array.
  • Querying the bounds of array indices (determining the maximum element index of offset)
  • Querying the indices of an associative array (determining if the collection contains a value for any given key).
  • Operations on indices (next, previous, range etc)
  • Array programming languages provide operations applied to entire arrays, so programs in such languages often lack specific index reference (for example APL).

Multidimensional arrays in which the valid range of one index depends on the value of another are called ragged (also jagged). This term comes from a typical example of a ragged array, when a two-dimensional array is used to store strings of different length in its rows. When put on paper the right margin of the output become ragged.

The lower bound of non-associative arrays in many programming languages is commonly fixed at either 0 (C and relatives) or 1 (Old Fortran and relatives); or an arbitrary integer (Pascal and relatives, modern Fortran). In Ada any discrete type can used as an index. Zero-based indexing is best thought of in terms of the index being an offset from the beginning of the array. Thus the first element is located zero elements from this starting point. The alternative can be thought of as ordinal indexes referring to the first, second, ... and nth elements of the array.

In most programming languages, arrays are accessed by using the array brackets [ and ], e.g. in A[i], but exceptions exist, including Rexx which instead uses the dot operator ., such as in A.i; Fortran, Ada and BASIC which use round parentheses A(i), and in lisp dialects which use constructions like (ELT A n) for accessing and (SETA A n new_val) for setting (Interlisp) or (vector-ref A n) for accessing and (vector-set! A n new_val) for setting (Scheme). No bracket indexing occurs in J, an array language; instead, the normal syntax of function creation and function calling applies.

C

We wish to open a text file and compute letter frequency

FILE *any_text;
/* declare array */
int frequency[26]; 
/* declare a computed index */
int ch; 
 
any_text = fopen ("a_text_file.txt", "rt");
 
/* init the freq table: */
for (ch = 0; ch < 26; ch++) 
    frequency[ch] = 0;
 
ch = fgetc(any_text);
while (!feof(any_text)) {
    if (is_a_letter(ch))
        /* if the char is a letter, then increase character slot in the freq table: */
        frequency[ch-'A'] += 1;
    ch = fgetc(any_text);
}

J

The example task is the same: open a text file and compute letter frequency.
Written in this array programming language, no loops are specified.
Input is a directory-path with filename. Output is a 26-element single-axis integer array.

load 'files'     NB. fread is among these standard file utilities
ltrfreq=: 3 : 0
 letters=. (65+(i.26)) { a.                  NB. We'll work with minimal alphabet.
 reduced=. (#~ e.&letters) toupper fread y   NB. Omit non-letters. (y is the input.)
 sums   =. +/"_1 = reduced                   NB. Count how often each letter occurs.
 sums (letters I. (~. reduced))} 26 # 0      NB. Alphabetize the sums, then return.
)

OCaml

same task, open a text file and compute letter frequency

<ocaml>let () =

 let ic = open_in Sys.argv.(1) in
 let base = int_of_char 'a' in
 let arr = Array.make 26 0 in
 try while true do
   let c = Char.lowercase(input_char ic) in
   let ndx = int_of_char c - base in
   if ndx < 26 && ndx >= 0 then
     arr.(ndx) <- succ arr.(ndx)
 done
 with End_of_file ->
   close_in ic;
   for i=0 to 25 do
     Printf.printf "%c -> %d\n" (char_of_int(i + base)) arr.(i)
   done</ocaml>

Here is the documentation of the module Array. (there is also a Bigarray module)

Pascal

This defines an array suitable to hold a 64x64 truecolor image (i.e. red, green and blue RGB values all can go from 0 to 255) and then sets the color of a single pixel <pascal> type

 color = red, green, blue;
 rgbvalue = 0 .. 255;

var

 picture: array[0 .. 63, 0 .. 63, color] of rgbvalue

begin

 { set pixel (4,7) to yellow }
 picture[4, 7, red]   := 255;
 picture[4, 7, green] := 255;
 picture[4, 7, blue]  := 0

end. </pascal>

Python

Example: open a text file and compute letter frequency. <python>

  import string
  if hasattr(string, ascii_lowercase):
      letters = string.ascii_lowercase       # Python 2.6 and later
  else:
      letters = string.lowercase             # Earlier versions
  offset = ord('a')
  
  def countletters(file_handle):
      """Traverse a file and compute the number of occurences of each letter
      """return results as a simple 26 element list of integers.
      results = [0] * len(letters)
      for line in file_handle:
          for char in line:
              char = char.lower()
              if char in letters:
                  results[offset - ord(char)] += 1
                  # Ordinal of 'a' minus ordinal of any lowercase ASCII letter -> 0..25
      return results
  if __name__ == "__main__":
      sourcedata = open(sys.argv[1])
      lettercounts = countletters(sourcedata)
      for i in range(len(lettercounts)):
          print "%s=%d" % (chr(i + ord('a'), lettercounts[i]),  

</python>

This example defines the function and provides a sample usage. The if ... __main__... line allows it to be cleanly imported into any other Python code while also allowing it to function as a standalone script. (A very common Python idiom).

Using a numerically indexed array (list) for this is artificial and clutters the code somewhat. The more Pythonic approach would be:

<python>

  ...
  def countletters(file_handle):
      """Count occurences of letters and return a dictionary of them
      """
      results = dict()
      for line in file_handle:
          for char in line:
              if char.lower() in letters:
                  c = char.lower()
                  results[c] = results.get(c,0) + 1
      return results

</python>

Which eliminates the ungainly fiddling with ordinal values and offsets. More importantly it allows the results to be more simply printed using:

<python>

  lettercounts = countletters(sourcedata)
  for letter,count in lettercounts.items():
      print "%s=%s" % (letter, count),

</python>

Again eliminating all fussing with the details of converting letters into list indices.


Computational metrics

Access is O(1), appending is O(1), and insertion is O(n).