Introduction to Icon and Unicon

Purpose

The purpose of this page is to provide a Rosetta Code users with a enough supporting detail about Icon and Unicon to facilitate understanding and appreciation of these languages. It would be expected that the level of detail would be significantly smaller than any of the online books and reference materials listed on the Icon and Unicon pages.

Some of the sections should be referenceable from tasks.

Icon and Unicon Differences

The Icon Programming Language was the successor of a series of non-numeric languages including Commit, SNOBOL, SNOBOL4, and SL-5. Icon provided several innovations and improvements over its predecessors including: integrating the the powerful pattern matching capabilities of SNOBOL4 into a procedural language, banishing a number of unfortunate foibles, retaining the flexibility of a typeless langauge, reining in some programming side effects, keeping platform independence. Icon was one of the first bytecode languages undergoing continuous improvement from it's inception in the late 1970s.

Over the years various improvements, extensions, and experimental variants were added including a platform independent graphics interface, IDOL (an object oriented pre-processor), MT Icon (a Multi-Threaded variant), Jcon (an implementation in Java), and others. And while the graphics interface was integrated into Icon, many of these variants were not.

The Unicon Programming Language integrated a number of these extensions into a single variant of Icon. Unicon includes object-oriented support, improved system interfaces, and messaging. Additionally, a number of syntactic improvements to the language and semantic extensions to some functions have been implemented. Because of this Unicon isn't completely a superset of Icon.

Variables, Data Types, and Structures

un-Declarations

Icon and Unicon do not require strict static typing of variables as does languages like Pascal, nor does it require type definitions to reserve space such as in languages such as C. In fact, variables may happily change type from one moment to the next. Knowing this you might expect that declarations are non-existent.

Declarations are optional and any undeclared variables are either (a) parameters to procedures or (b) local to procedures. This design decision ensured that Icon/Unicon are not susceptible to the kind of side-effects that the global nature of variables in SNOBOL4 led to.

Still, declarations are desirable for clarity and needed for a number of special cases:

local - variables local to a procedure
static - permanent variables that transcend individual calls of procedures. While not visible outside of a procedure they are visible between different instances and across recursion of a procedure.
global - variables with global visibility

Additionally, the following declarations apply to non-variables:

record - used to define a structure with named fields
procedure - used to define a procedure and its parameters
invocable - used to control program linking to ensure procedures are included and available if they are called (e.g. through string invocation)
class - used to define an object class (Unicon)

Self-Descriptive Safe Types

Icon/Unicon data types are safe because they are implemented internally with descriptors which provide the data within a container. All operations on data know not only the value of the data but the type of the data. Programmers cannot incorrectly interpret the data as is possible in languages without type enforcement such as C or assemblers. Similarly, programmers are not constrained as with languages like Pascal that have strong static typing. Strong dynamic typing at run time results in Icon/Unicon because all operations on these self descriptive data types are consistent and correct.

Mutable and Immutable Types

Icon/Unicon has both mutable (changeable) and immutable (unchangeable) data types. All operations deal with these consistently and it isn't possible to directly discern which are which (say by returning the address of a value).

There are operations which can create separate copies of types and distinguish between different copies of mutable types. These operations can be applied to immutable types in an intuitive manner and they perform consistently within those contexts as is shown in the following code snippet:

<lang icon># copy .v. assignment

  mutable := []         # lists are mutable
  immutable := "abc"    # strings are not
  m2 := mutable         # assignment copies the reference
  m3 := copy(mutable)   # creates a (1 level) copy
  i2 := immutable       # assignment copies the reference
  i3 := copy(immutable) # same as assignment

value equal ( === )

  mutable === m2        # succeeds
  mutable === m3        # fails
  immutable === i2      # succeeds
  immutable === i3      # also succeeds</lang>

Furthermore even though strings are immutable, it is possible to construct the same string in different memory locations. You just can't tell if they are different or not.

Data Types

The following summarizes the data types of Icon and Unicon. Each type is described and its characteristics are noted:

mutable or immutable - changeable or not
coercible or not - coercion is implicit type conversion by operations
convertible or not - convertible through explicit type conversion

&null

&null (immutable, uncoercible, unconvertable) is unique. It is a data type with only a single value, one instance only. While this may sound odd at first, &null is at the core of Icon/Unicon and contributes to the power and robustness of the language. Specifically, &null arose to overcome a short coming that any SNOBOL4 programmer will be familiar with. Consider the following SNOBOL code:

<lang Snobol> numb1 = 2

   numb2 = 3
   output = "numb1+numb2=" numb1 + nmub2</lang>

In SNOBOL an undefined value defaults to the null string "" and the null string is coerced into a zero 0 if needed for math. Thus the surprise output above is 2 and not 5 as expected. While on close inspection the error is apparent, you can see how this could be missed in a large program predominated by global variables.

In Icon/Unicon &null is not coerced by operators and so: <lang Icon> write( 2 + &null ) # run time error 101

  write( "abc" || &null )    # run time error 103
  write( "abc", &null)       # no error as write ignores &null </lang>

The power of &null comes from the simple null/non-null tests of which more anon.

integer

Integers (immutable,coercible,convertible) come in two forms. Regular integers and long integers; however, these are handled transparently for the programmer. Integer operations will coerce strings of digits into integers. Otherwise, integers are much like integers in many other languages. Operations on two integers results in an integer.

real

Reals (immutable,coercible,convertible) are available in one size (large). Strings can be coerced into reals by operations. Otherwise operations with mixed reals and integers coerce the integers into reals.

string

Strings (immutable,coercible,convertible) are variable length and may contain any character value within the platform's character set including the NUL character. String operations may coerce other data types such as integers or reals. Also strings are coerced by operations on other types. Syntactically, strings are delimited by double quotes (") and escape special characters using a back slash such as in this example "\"".

At the current time there is no support for Unicode.

cset

Csets (immutable,coercible,convertible)

Records

Records (mutable)

Sets

Sets (mutable, coercible?, convertible)

Tables

Tables (mutable, coercible?, convertible)

co–expression

procedure

objects (unicon)

Objects (mutable)

class (unicon)

file and windows

Files (mutable)

Operators and Procedures

Intuitive Generalizations

Strong Typing through Operators

Coercion: Implicit Type Conversions

Program Flow

Everything Returns a Value Except When it doesn't

Text from Flow Control Task

Prelude about Goal-Directed Evaluation and Generators

Some of this should likely belong under Category:Programming_paradigm/Logic_Programming

A central feature of Icon and Unicon is what is known as Goal-Directed Evaluation, and the intimately related concept of Generators. Without trying to be a tutorial, the idea is that expressions can yield more than one result (Generators) and if a further part of the expression results in failure, the earlier Generators will be driven to yield more results. The effect is not unlike the backtracking found in Prolog or Regular Expressions, however the feature is built into the very core of the language. Prolog programmers will find it very familiar but of course with differences because Icon and Unicon do not use the functional language pattern matching technique of Prolog.

To cut a long story short, every expression and statement can fail, or generate one or more results that can be consumed or applied in further parts of the statement. There are also a few keyword statements which force the generation of all possible outcomes for a statement or expression.

Another way of looking at it is to understand that every expression can yield a result sequence that can be empty; any code using this expression may choose to ask for more results, gather them into a container or aggregate, or choose to use one value and then move on without asking for all possible results.

goto

Does not exist in the Icon or Unicon language.

break expr

Default value of expr is the null value &null. This operator breaks out of the enclosing loop, yielding the expression as the result of the loop. Normally loops yield a failure ie no result, so you can write code like this:

The expression given to break can be another break, which effectively lets you break out of two levels of loop. Finally, the expression given to break can be the next command; breaks out of two levels of loop and re-enters the top of the third-level enclosing loop.

return expr

Default value of expr is &null. Apart from the usual meaning of return, if the expr value fails, then the procedure actually fails too, ie does not yield a value. See description of fail keyword. If the expr is capable of yielding more than one result, only the first result is asked for and used.

fail

Causes the the enclosing procedure to terminate without returning value. This is different from returning void or a null value that many other languages do when the code does not return an actual value.

The value of x will not be replaced if ftn() issues the fail command. If ftn fails, then Goal-Directed Evaluation will also fail the assignment, therefore x is not assigned a new value. If the flow of control through a procedure falls off the end, the procedure implicitly fails.

suspend expr

Default value of expr is &null. Any procedure containing the suspend command will yield a value to the calling code. However the procedure remains in a state of suspended animation ready to be reactivated if the calling code demands another result due to Goal Directed Evaluation. Note that this capability is built directly into the runtime rather than being an artifically constructed behaviour provided by Python or C#'s use of the 'yield' keyword. Every and all expressions may suspend or be involved in a suspending expression without any effort. Behaviourally much closer to Prolog which also supports backtracking as a core part of the language. If the expr is capable of yielding more than one result, then supend (if driven) will progressively yield all of those values.

A procedure can contain several uses of suspend and it's quite reasonable for the procedure to execute many of them in any chosen order.

stop(expr)

Terminate program with prejudice.

error trapping

The keyword &error is normally zero, but if set to a positive value, this sets the number of fatal errors that are tolerated and converted to expression failure; the value of &error is decremented if this happens. Therefore the now-common TRY-CATCH behaviour can be written as:

Various idiomatic simplifications can be applied depending on your needs.

error throwing

Errors can be thrown using the function