Introduction to Icon and Unicon

Purpose

The purpose of this page is to provide a Rosetta Code users with a enough supporting detail about Icon and Unicon to facilitate understanding and appreciation of these languages. It would be expected that the level of detail would be significantly smaller than any of the online books and reference materials listed on the Icon and Unicon pages.

Some of the sections should be referenceable from tasks.

Icon and Unicon Differences

The Icon Programming Language was the successor of a series of non-numeric languages including Commit, SNOBOL, SNOBOL4, and SL-5. Icon provided several innovations and improvements over its predecessors including: integrating the the powerful pattern matching capabilities of SNOBOL4 into a procedural language, banishing a number of unfortunate foibles, retaining the flexibility of a typeless langauge, reining in some programming side effects, keeping platform independence. Icon was one of the first bytecode languages undergoing continuous improvement from it's inception in the late 1970s.

Over the years various improvements, extensions, and experimental variants were added including a platform independent graphics interface, IDOL (an object oriented pre-processor), MT Icon (a Multi-Threaded variant), Jcon (an implementation in Java), and others. And while the graphics interface was integrated into Icon, many of these variants were not.

The Unicon Programming Language integrated a number of these extensions into a single variant of Icon. Unicon includes object-oriented support, improved system interfaces, messaging and data base interfaces. Additionally, a number of syntactic improvements to the language and semantic extensions to some functions have been implemented. Because of this Unicon isn't completely a superset of Icon.

Variables, Data Types, and Structures

un-Declarations

Icon and Unicon do not require strict static typing of variables as does languages like Pascal, nor does it require type definitions to reserve space such as in languages such as C. In fact, variables may happily change type from one moment to the next. Knowing this you might expect that declarations are non-existent.

Declarations are optional and any undeclared variables are either (a) parameters to procedures or (b) local to procedures. This design decision ensured that Icon/Unicon are not susceptible to the kind of side-effects that the global nature of variables in SNOBOL4 led to.

Still, declarations are desirable for clarity and needed for a number of special cases:

local - variables local to a procedure
static - permanent variables that transcend individual calls of procedures. While not visible outside of a procedure they are visible between different instances and across recursion of a procedure.
global - variables with global visibility

Additionally, the following declarations apply to non-variables:

record - used to define a structure with named fields
procedure - used to define a procedure and its parameters
invocable - used to control program linking to ensure procedures are included and available if they are called (e.g. through string invocation)
class - used to define an object class (Unicon)

Self-Descriptive Safe Types

Icon/Unicon data types are safe because they are implemented internally with descriptors which provide the data within a container. All operations on data know not only the value of the data but the type of the data. Programmers cannot incorrectly interpret the data as is possible in languages without type enforcement such as C or assemblers. Similarly, programmers are not constrained as with languages like Pascal that have strong static typing. Strong dynamic typing at run time results in Icon/Unicon because all operations on these self descriptive data types are consistent and correct.

Mutable and Immutable Types

Icon/Unicon has both mutable (changeable) and immutable (unchangeable) data types. All operations deal with these consistently and it isn't possible to directly discern which are which (say by returning the address of a value).

There are operations which can create separate copies of types and distinguish between different copies of mutable types. These operations can be applied to immutable types in an intuitive manner and they perform consistently within those contexts as is shown in the following code snippet:

<lang icon># copy .v. assignment

  mutable := []         # lists are mutable
  immutable := "abc"    # strings are not
  m2 := mutable         # assignment copies the reference
  m3 := copy(mutable)   # creates a (1 level) copy
  i2 := immutable       # assignment copies the reference
  i3 := copy(immutable) # same as assignment

value equal ( === )

  mutable === m2        # succeeds
  mutable === m3        # fails
  immutable === i2      # succeeds
  immutable === i3      # also succeeds</lang>

Furthermore even though strings are immutable, it is possible to construct the same string in different memory locations. You just can't tell if they are different or not.

Data Types

The following summarizes the data types of Icon and Unicon. Each type is described and its characteristics are noted:

Mutable .vs. Immutable - changeable or not
coercible or not - coercion is implicit type conversion by operations
convertible or not - convertible through explicit type conversion

null

null (immutable, uncoercible, unconvertable) is unique. It is a data type with only a single value, one instance only. While this may sound odd at first, &null is at the core of Icon/Unicon and contributes to the power and robustness of the language. Specifically, null arose to overcome a short coming that any SNOBOL4 programmer will be familiar with. Consider the following SNOBOL code:

<lang Snobol> numb1 = 2

   numb2 = 3
   output = "numb1+numb2=" numb1 + nmub2</lang>

In SNOBOL an undefined value defaults to the null string "" and the null string is coerced into a zero 0 if needed for math. Thus the surprise output above is 2 and not 5 as expected. While on close inspection the error is apparent, you can see how this could be missed in a large program predominated by global variables.

In Icon/Unicon &null is not coerced by operators and so: <lang Icon> write( 2 + &null ) # run time error 101

  write( "abc" || &null )    # run time error 103
  write( "abc", &null)       # no error as write ignores &null </lang>

The power comes from the simple null/non-null tests of which more anon.

integer

Integers (immutable,coercible,convertible) come in two forms. Regular integers and long integers; however, these are handled transparently for the programmer. Integer operations will coerce strings (or csets) of digits into integers. Otherwise, integers are much like integers in many other languages. Operations on two integers results in an integer.

real

Reals (immutable,coercible,convertible) are available in one size (large). Strings (and csets) can be coerced into reals by operations. Otherwise operations with mixed reals and integers coerce the integers into reals.

string

Strings (immutable,coercible,convertible) are variable length and may contain any character value within the platform's character set including the NUL character. String operations may coerce other data types such as integers or reals. Also strings are coerced by operations on other types. Syntactically, strings are delimited by double quotes (") and escape special characters using a back slash such as in this example "\"".

At the current time there is no support for Unicode.

cset

Character sets or csets (immutable,coercible,convertible) is a special type that is used in conjunction with string scanning and string matching operations. Syntactically, they look similar to strings but are delimited with single quotes ('). Semantically, csets are sets of characters in a universe consisting of the platforms full character set. Operations on csets include unions and intersections. Csets may be coerced to or from strings and numeric values.

Lists

Lists (mutable) are variable length structures containing arbitrary values. Lists are indexed and accessed with integers ranging from 1 to the size of the structure.

Operations exists to work with lists as arrays and as queues, and stacks.

Records

Records (mutable) provide fixed size data groupings that are accessible through field names.

Dynamic records can be constructed in Unicon with the constructor procedure. This is useful in conjunction with database access.

Sets

Sets (mutable) are unordered structures that contain unique instances of arbitrary values. Operations such as insertion, deletion, testing membership, intersection, and union are provided. Lists can be explicitly converted into sets.

Please note that copies of mutable types are considered as distinct values. <lang Icon> L1 := [] # a list

  L2 := copy(L1)  # and list
  S1 := set()     # a set
  S2 := set()     # another
  every insert(S1,L1|L2)   # S1 will have 2 members
  every insert(S2,L1|L1)   # S2 will have just 1</lang>

Tables

Tables (mutable) are structures where values are indexed by unique keys. Syntactically they look a lot like lists indexed by arbitrary values. Tables are one of the most commonly used and powerful features of Icon/Unicon. <lang Icon> T := table(0) # table with default 0 values

  every T[!words] +:= 1   # counts words</lang>

co–expressions

Co-expressions are a way of encapsulating code and state outside of the bounds of the normal program flow and scope. They can be used to create co-routines. Co-routines are created in a dormant state and can be passed around, copied and rewound, and invoked or activated anywhere within a program.

One of the most powerful aspects of co-expressions is the ability to use them to produce programmer defined control operations (PDCO).

procedures

Procedures are also data types that can be assigned both to and from. This allows for interesting capabilities such as tracing built-in functions, inserting a processing wedge around a procedure, creating procedures that can apply arbitrary functions, and modifying the behavior of procedures in more radical ways.

Consider the following code snippet which sets the variable verbose to the procedure write if any argument is "--verbose" or 1 (effectively a no-operation) otherwise: <lang Icon> if !arglist == "--verbose" then verbose := write else verbose := 1

  ...
  verbose("Some verbose mode diagnostic message :",var1, var2)

</lang>

Examples of parametrized procedures can be found in Sorting algorithms/Bubble sort#Icon and Unicon and Apply a callback to an array#Icon and Unicon.

classes and objects (unicon)

Unicon allows for user defined class and method definitions. Individual instances of these objects (mutable) can be created by constructor functions.

file and windows

Files and windows (mutable) allow Icon/Unicon to interface with the operating system. Operations on files and windows have side effects and they are considered mutable. Sockets and pipes are also implemented as "files".

Operators and Procedures

Intuitive Generalizations

One of the strengths of Icon/Unicon is that for the most part operators work on intuitive level across types yielding results that make sense. Examples, include:

*x returns the size of x. Number of characters for strings and csets. Number of elements in a table, list, or set. The number of defined fields in a record. And the number of times a co-expression has been evaluated.

!x generates the elements of x. Characters for stings and csets, Elements for tables, lists, sets, and records.

?x returns a random element or character from x.

x[a:b], x[a+:b], x[a:-b] return subsections of lists and strings.

This philosophy is continued in many of the built-in functions.

Strong Typing through Operators

Icon/Unicon has variously been described as an untyped or strongly typed langauge depending upon the perspective of the observer. Consider the following examples:

The lack of declaration and reassignment of x to different types of data suggest the language is untyped or loosely typed.

<lang Icon> x := 1

 x := "Abc123"
 x := table()</lang>

Having specific operators for ambiguous operations, such as comparisons, means the intent of the program should be clear. While this can sometimes be annoying (see Sorting algorithms/Bubble sort it is necessary because in the case of mixed type comparisons the intent of a generalized comparison can't be determined and the operator would not know which operand to coerce.

<lang Icon> a := "11"

 b := 2
 if a << b then write("a is lexically less than b.")     # "11" << "2" is true
 if a > b the write("a is numerically greater than b.")  # 11 > 2 is also true</lang>

Additionally, the strong typing is supported by safe data types.

Coercion: Implicit Type Conversions

Icon/Unicon performs implicit type conversions (called coercion) where it makes sense to do so. At the same time dangerous coercions are disallowed. Details of these coercions are covered under the topic of Data Types. Where a coercion is not possible, a run-time error is generated.

Program Flow and Control

At the core of the design of Icon/Unicon are a series of features that contribute to the power of the language. These features set Icon/Unicon apart from many traditional procedural languages.

Failure is an Option

Expression failure is a signal that cannot be ignored and is used to control program flow whether with operators, control structures, or procedures. Failure effectively short circuits the evaluation of the expression and forces processing into an alternate path if any exists.

This differs from the approach of many traditional procedural languages in which comparison operators return a true/false value. No value is associated with failure. It also means that evaluation of an expression will be force you onto the correct logic path.

The following are equivalent: <lang Icon> procedure AllwaysFail() # return a fail signal (explicit) return fail end

procedure AllwaysFail() # also return a fail signal &fail end

procedure AllwaysFail() # return a fail signal (implicit) end</lang>

Everything Returns a Value Except when it Doesn't

In Icon/Unicon just about everything expression returns a value (unless it fails).

For example: <lang Icon> d := if a > b then a - b else b - a # returns the positive difference of a and b

  d := a - (a > b) | b - a                 # is equivalent</lang>

In the above example the expression, a > b, either succeeds returning b or fails. The alternative paths are provided by (else) in the first case and (|) in the second.

A second consequence of comparison operators returning values rather than a true/false is that you can write expressions like these: <lang Icon> i < j < k # succeeds returning k, if i < j < k

 (i < j) < k    # shows more clearly how this works.  Note if i < j fails the expression fails and nothing further is evaluated</lang>

Goal-Directed Evaluation and Generators

A central feature of Icon and Unicon is what is known as Goal-Directed Evaluation, and the intimately related concept of Generators. Briefly the idea is that expressions can yield more than one result (Generators) and if a further part of the expression results in failure, the earlier Generators will be driven to yield more results. These features implement Logic Programming paradigms not unlike the backtracking found in Prolog or Regular Expressions. These features are built into the very core of the language. Prolog programmers will find it very familiar but of course with differences because Icon and Unicon do not use the functional language pattern matching technique of Prolog.

As noted previously, when an Icon/Unicon expression fails it will take an alternate path such as an else. In the case of generators, the alternate path is a resumption of the generator in order to produce another result. Thus expressions can consume the results of generators until they achieve their desired goal. If no result is acceptable then the overall expression fails.

Icon and Unicon provide a variety of operators, control structures, and procedures that work with generators. Examples include: <lang Icon> every expression do expression # a looping control that forces the generation of every result

  X[1 to 10 by 2]                   # 'to'/'to by' is a generator that yields successive numerical results
  !X                                # generate every element of X
  suspend expression                # used instead of return inside a generator to setup for resumption of the procedure
  |expression                       # turns non-generators, like read(), into endless generators</lang>

Another way of looking at it is to understand that every expression can yield a result sequence and any code using this expression may choose to ask for more results, gather them into a container or aggregate, or choose to use one value and then move on without asking for all possible results.

No Spaghetti

The clue's in the title. There is no 'goto' in Icon/Unicon.

Procedure Controls

fail

Causes the the enclosing procedure to terminate without returning value. This is different from returning void or a null value that many other languages do when the code does not return an actual value.

<lang Icon> x := "unassigned"

  x := arglist[i]    # Succeeds assigning x the value of arglist[i] or fails (leaving x unchanged) if i is out of bounds.
  write(x)</lang>

return expr

Return the results of 'expr' (the default value of 'expr' is &null). Apart from the usual meaning of return, if the expr fails, then the procedure fails as well. See Failure is an Option above. If the expr is capable of yielding more than one result, only the first result is used.

suspend expr

Suspend is semantically similar to 'return' with the exception that it sets up the possibility of producing additional results if needed. Rather than terminating the procedure as return does, suspend returns a result while leaving the procedure in suspension in the event additional results are needed. A suspended procedure will resume at the next point in the code. This capability is built directly into the run time rather than being an artificially constructed behaviour provided by Python or C#'s use of the 'yield' keyword. Every and all expressions may suspend or be involved in a suspending expression without any effort. Behaviorally this is closer to Prolog which also supports backtracking as a core part of the language. If the expr is capable of yielding more than one result, then suspend (if driven) will progressively yield all of those values.

A procedure can contain several uses of suspend and it's quite reasonable for the procedure to execute many of them in any chosen order.

Looping Controls

repeat

Repeat will loop endlessly and must be explicitly broken out of with a break, return, or suspend.

while

While will loop as long the expression succeeds. An optional expression can evaluated upon success via the do clause.

until

Until will loop until the expression fails. An optional expression can evaluated upon success via the do clause.

every

Every will produce all instances of a generator. An optional expression can evaluated upon success via the do clause.

break expr

Break is used to break out of or exit from one or more enclosing loops. By default value of expr is &null. While most loops don't yield results, it is possible to write code such as this: <lang Icon> x := while expression1 do {

     ...
     if expression2 then break "1" 
     }    # x will be the string "1" </lang>

Break can be used consecutively to break out of nested loops, such as in:

<lang Icon> break break next</lang>

Signals and Exceptions

stop

Terminates the current program and writes the result of expression to a file (&errout by default). For example: <lang icon> stop(&output,expr1,expr2) # writes expr1 and expr2 to standard output and terminates the program</lang>

error trapping

The keyword '&error' is normally zero, but if set to a positive value, this sets the number of fatal errors that are tolerated (i.e. converted to expression failure). The value of &error is decremented each time this happens. Therefore the now-common TRY-CATCH behaviour can be written as:

<lang Icon> &error := 1

   mayErrorOut()
   if &error == 1 then
       &error := 0     # clear the trap
   else {
       # deal with the fault
       handleError(&errornumber, &errortext, &errorvalue)   # keyword values containing facts about the failure
   }</lang>

error throwing

Errors can be thrown like this: <lang icon> runerr(errnumber, errorvalue) # choose an error number and supply the offending value</lang>

Contractions

Icon/Unicon have a rich set of operators which combined with the fact that all successful expressions produce values makes possible contractions. These appeal of these is somewhat a question of style and if taken to extremes there is also the possibility of being overtaken by one-liners. Having said that Icon/Unicon contractions are hardly in the same league as those of APL or J. A number of examples are presented below.

These examples initializes sum and adding all the contents of arglist. <lang Icon>sum := 0 # initial and every sum +:= !arglist do something() # loop in two statements

sum := 0; every sum +:= !arglist do something() # ; splice

every (sum := 0) +:= !arglist do something() # a common contraction

while sum := 0) +:= !arglist do something() # an error</lang>

Examples of using program control structures in expressions: <lang Icon>

  (if i > j then i else j) := 0     # sets the larger of i and j to 0
  d := if a > b then a-b else b-a   # sets d to the positive difference of a and b 
  x := case expr of {
     1: "Text 1"
     2: "Text 2"
     default: "Undefined text"
  }                                 # sets x to a string based on the value of expression</lang>

Appendix A - Icon and Unicon Differences (Details)

The purpose of this section is to capture some of the langauge differences at a detailed level.

Major Differences

Classes and Objects

<lang Unicon>package packagename

class classname : superclass(attributes)

  method meth1(att1)
      ...
      end

  method meth2(att1,att2)
     ...
     end

  initially (att1,att3 )
     ...

end</lang>

...

  object.name(parameters)
  object$superclassname.name(parameters)

Minor Differences

Co-expression calling. With co-expression ce, the following pairs of activations are equivalent in Unicon:

<lang Unicon> []@ce # sends a list to co-expression ce both Icon and Unicon

  [x,y,z]@ce
  ce()           # equivalent calls Unicon only
  ce(x,y,z)</lang>

Procedure call type casting allows for specification of type coercion functions and default values

<lang Unicon> procedure f1(i:integer:1,r:real,s:string,L:list,x:mycoercionproc)</lang>