Array

From Rosetta Code
Revision as of 21:36, 6 June 2008 by rosettacode>Dmitry-kazakov (Multidimensional arrays added, operations on arrays added. Definitions extended and fixed)

An array is a composite data type, in the collection category, that stores multiple values all of the same declared type. The stored values are called elements and are accessed by a tuple of indices. By using a variable indices, the array may be accessed for a not predetermined set of indices, during the run of the program. All indices of an array are totally ordered.

An array of which some of the indices have infinite bounded subsets is known as an associative array. For example, arrays indexed by lexicographically ordered strings are associative.

Arrays with more than one index are called multidimensional arrays. For example, matrix is a two-dimensional array.

Basic operations defined on arrays are:

  • Indexing, accessing an array element by its tuple of indices;
  • Slicing, producing a subarray by putting some constraint on the indices. For example, PL/1 provides extracting of a rows or a columns of an array. In Ada any range of the index can be used in order to extract a subarray from a single-dimensional array;
  • Iteration of the arrays elements. Some languages have Loop/Foreach construct for array iteration;
  • Querying the bounds of array indices;
  • Operations on indices (next, previous, range etc);
  • Array programming languages provide operation applied to entire arrays, so programs in such languages often lack specific index reference.

Values of arrays are called array aggregates.

Multidimensional arrays with some of indices varying are called ragged. This term comes from a typical example of a ragged array, when a two-dimensional array is used to store strings of different length in its rows. When put on paper the right margin of the output become ragged.

Each array index is bounded at run time. Further when

  1. array is not associative
  2. each index is within the bounds (including the bounds)

then it is legal to access the array at the tuple of these indices.

The lower bound is in many programming languages fixed to either 0 (C and relatives) or 1 (Old Fortran and relatives), or arbitrary (Pascal and relatives, modern Fortran). In Ada any discrete type can used as an index.

For an empty array the lower bound of an index is greater than the upper bound. This causes the famous problem of declaring an empty array when the index type is a singleton (has only one value).

When bounds of all array indices are fixed, the array is said to be constrained. When some bounds are unknown until run time, the array is unconstrained. Non-associative unconstrained arrays are usually passed to the subprograms with a "dope" attached to the array body. The dope contains the actual bounds of unconstrained array indices. Associative arrays carry some indexing structure with them, like a hash table etc.

The minimal size of a non-associative array is determined by the current bounds of its indices. In all regular programming languages, the size of such array can be set by the programmer at compile time or after. In many modern programming languages the size of the array may be computed and allocated at run time.

In most programming languages, arrays are accessed by using the array brackets [ and ], e.g. in A[i], but exceptions are Rexx, instead using the dot operator ., such as in A.i, Fortran, Ada and BASIC which use round parentheses A(i), and of course in the lisps which uses constructions like (ELT A n) for accessing and (SETA A n new_val) for setting (Interlisp) or (vector-ref A n) for accessing and (vector-set! A n new_val) for setting (Scheme). No bracket indexing occurs in J, an array language; instead, the normal syntax of function creation and function calling applies.

C

We wish to open a text file and compute letter frequency

FILE *any_text;
/* declare array */
int frequency[26]; 
/* declare a computed index */
int ch; 
 
any_text = fopen ("a_text_file.txt", "rt");
 
/* init the freq table: */
for (ch = 0; ch < 26; ch++) 
    frequency[ch] = 0;
 
ch = fgetc(any_text);
while (!feof(any_text)) {
    if (is_a_letter(ch))
        /* if the char is a letter, then increase character slot in the freq table: */
        frequency[ch-'A'] += 1;
    ch = fgetc(any_text);
}

J

The example task is the same: open a text file and compute letter frequency.
Written in this array programming language, no loops are specified.
Input is a directory-path with filename. Output is a 26-element, single-axis integer array.

load 'files'     NB. fread is among these standard file utilities
ltrfreq=: 3 : 0
 letters=. (65+(i.26)) { a.                  NB. We'll work with minimal alphabet.
 reduced=. (#~ e.&letters) toupper fread y   NB. Omit non-letters. (y is the input.)
 sums   =. +/"_1 = reduced                   NB. Count how often each letter occurs.
 sums (letters I. (~. reduced))} 26 # 0      NB. Alphabetize the sums, then return.
)

Computational metrics

Access is O(1), appending is O(1), and insertion is O(n).