String comparison: Difference between revisions

From Rosetta Code
Content added Content deleted
(Add NetRexx implementation)
(Add ooRexx implementation)
Line 147: Line 147:
</lang>
</lang>
The list of strict comparison operators described in the [[#REXX|REXX]] sample apply to [[NetRexx]] too.
The list of strict comparison operators described in the [[#REXX|REXX]] sample apply to [[NetRexx]] too.

=={{header|ooRexx}}==
See the [[#NetRexx|NetRexx]] and/or the [[#REXX|REXX]] implementation.


=={{header|PARI/GP}}==
=={{header|PARI/GP}}==

Revision as of 01:14, 28 March 2013

Task
String comparison
You are encouraged to solve this task according to the task description, using any language you may know.

Basic Data Operation
This is a basic data operation. It represents a fundamental action on a basic data type.

You may see other such operations in the Basic Data Operations category, or:

Integer Operations
Arithmetic | Comparison

Boolean Operations
Bitwise | Logical

String Operations
Concatenation | Interpolation | Comparison | Matching

Memory Operations
Pointers & references | Addresses

The task is to demonstrate how to compare two strings from within the language and how to achieve a lexical comparison. The task should demonstrate:

  • Comparing two strings for exact equality
  • Comparing two strings for inequality (i.e., the inverse of exact equality)
  • Comparing two strings to see if one is lexically ordered before than the other
  • Comparing two strings to see if one is lexically ordered after than the other
  • How to achieve both case sensitive comparisons and case insensitive comparisons within the language
  • How the language handles comparison of numeric strings if these are not treated lexically
  • Demonstrate any other kinds of string comparisons that the language provides, particularly as it relates to your type system. For example, you might demonstrate the difference between generic/polymorphic comparison and coercive/allomorphic comparison if your language supports such a distinction.

Here "generic/polymorphic" comparison means that the function or operator you're using doesn't always do string comparison, but bends the actual semantics of the comparison depending on the types one or both arguments; with such an operator, you achieve string comparison only if the arguments are sufficiently string-like in type or appearance. In contrast, a "coercive/allomorphic" comparison function or operator has fixed string-comparison semantics regardless of the argument type; instead of the operator bending, it's the arguments that are forced to bend instead and behave like strings if they can, and the operator simply fails if the arguments cannot be viewed somehow as strings. A language may have one or both of these kinds of operators; see the Perl 6 entry for an example of a language with both kinds of operators.

See also:

AWK

In awk, the string matching operators are case sensitive, and the behaviour of the comparative operators depends on the locale being used. Be very careful with numeric strings, because whether they will be treated as numeric values or strings depends on how the values were obtained, and on which awk interpreter is being used. Numeric strings obtained from the input source, will be treated as numeric values, when compared with other strings containing numeric values. Strings valued defined as constants using doublequote enclosures will be treated as strings of characters and compared lexically. The behaviour of the operators when one value is considered to be numeric (eg from the input source), but the other value has been defined explicitly as a numeric string by using doublequote enclosures may also vary depending on which awk interpreter is being used.

<lang awk>BEGIN {

 a="BALL"
 b="BELL"
 if (a == b) { print "The strings are equal" }
 if (a != b) { print "The strings are not equal" }
 if (a > b) { print "The first string is lexically after than the second" }
 if (a < b) { print "The first string is lexically before than the second" }
 if (a >= b) { print "The first string is not lexically before than the second" }
 if (a <= b) { print "The first string is not lexically after than the second" }
 # to make a case insensitive comparison convert both strings to the same lettercase:
 a="BALL"
 b="ball"
 if (tolower(a) == tolower(b)) { print "The first and second string are the same disregarding letter case" }

}</lang>

BASIC

<lang basic>10 LET "A$="BELL" 20 LET B$="BELT" 30 IF A$ = B$ THEN PRINT "THE STRINGS ARE EQUAL": REM TEST FOR EQUALITY 40 IF A$ <> B$ THEN PRINT "THE STRINGS ARE NOT EQUAL": REM TEST FOR INEQUALITY 50 IF A$ > B$ THEN PRINT A$;" IS LEXICALLY HIGHER THAN ";B$: REM TEST FOR LEXICALLY HIGHER 60 IF A$ < B$ THEN PRINT A$;" IS LEXICALLY LOWER THAN ";B$: REM TEST FOR LEXICALLY LOWER 70 IF A$ <= B$ THEN PRINT A$;" IS NOT LEXICALLY HIGHER THAN ";B$ 80 IF A$ >= B$ THEN PRINT A$;" IS NOT LEXICALLY LOWER THAN ";B$ 90 END</lang>

On a platform that supports both uppercase and lowercase characters, the string comparitive operators are case sensitive. To perform case insensitive matching, make sure both strings are converted to the same lettercase. Here we assume that the BASIC has the UPPER$ and LOWER$ keyword pair for case conversion. If not, then some number crunching based on the character codes is required. (In Ascii add 32 to uppercase letter codes to get the lowercase equivalent). Note that any whitespace within the strings must also match exactly for the strings to be considered equal.

<lang basic>10 LET A$="BELT" 20 LET B$="belt" 30 IF UPPER$(A$)=UPPER$(B$) THEN PRINT "Disregarding lettercase, the strings are the same."</lang>

Burlesque

<lang burlesque> blsq ) "abc""abc"== 1 blsq ) "abc""abc"!= 0 blsq ) "abc""Abc"cm 1 blsq ) "ABC""Abc"cm -1 </lang>

cm is used for comparision which returns 1,0,-1 like C's strcmp. == is Equal and != is NotEqual.

D

See also Empty_string <lang d>import std.stdio, std.string, std.algorithm;

void main() {

   auto s = "abcd";
   /* Comparing two strings for exact equality */
   assert (s == "abcd"); // same object
   /* Comparing two strings for inequality */
   assert(s != "ABCD"); // different objects
   /* Comparing the lexical order of two strings;
   -1 means smaller, 0 means equal, 1 means larger */
     
   assert(s.icmp("Bcde") == -1); // case insensitive
   assert(s.cmp("Bcde") == 1); // case sensitive
   assert(s.icmp("Aabc") == 1); // case insensitive
   assert(s.cmp("Aabc") == 1); // case sensitive
   
   assert(s.icmp("ABCD") == 0); // case insensitive
   assert(s.cmp("ABCD") == 1); // case sensitive    

}</lang>


J

Solution: The primitive -: can be used to determine whether two strings are equivalent, but J doesn't have other inbuilt lexical comparison operators. They can defined as follows: <lang j>eq=: -: NB. equal ne=: -.@-: NB. not equal gt=: {.@/:@,&boxopen *. ne NB. lexically greater than lt=: -.@{.@/:@,&boxopen *. ne NB. lexically less than ge=: {.@/:@,&boxopen +. eq NB. lexically greater than or equal to le=: -.@{.@/:@,&boxopen NB. lexically less than or equal to</lang>

Usage: <lang j> 'ball' (eq , ne , gt , lt , ge , le) 'bell' 0 1 0 1 0 1

  'ball' (eq , ne , gt , lt , ge , le) 'ball'

1 0 0 0 1 1

  'YUP' (eq , ne , gt , lt , ge , le) 'YEP'

0 1 1 0 1 0</lang>

NetRexx

Translation of: REXX

The only change to the REXX program to make this work in NetRexx was to change "!=" to "\=" for the NOT EQUAL comparison. (Incidentally; the form shown hear will function equallly well as a REXX program: "\=" is valid REXX syntax for NOT EQUAL in most dialects.)

Works with: NetRexx
Works with: ooRexx
Works with: Regina

<lang NetRexx>animal = 'dog' if animal = 'cat' then

 say animal "is lexically equal to cat"

if animal \= 'cat' then

 say animal "is not lexically equal cat"

if animal > 'cat' then

 say animal "is lexically higher than cat"

if animal < 'cat' then

 say animal "is lexically lower than cat"

if animal >= 'cat' then

 say animal "is not lexically lower than cat"

if animal <= 'cat' then

 say animal "is not lexically higher than cat"

/* The above comparative operators do not consider

  leading and trailing whitespace when making comparisons. */

if ' cat ' = 'cat' then

 say "this will print because whitespace is stripped"

/* To consider all whitespace in a comparison

  we need to use strict comparative operators */

if ' cat ' == 'cat' then

 say "this will not print because comparison is strict"

</lang> The list of strict comparison operators described in the REXX sample apply to NetRexx too.

ooRexx

See the NetRexx and/or the REXX implementation.

PARI/GP

Strings are compared for equality and inequality with == and != and are compared with cmp or with the usual < > <= >=. Case-insensitive comparison is not built in.

Perl 6

Perl 6 uses strong typing dynamically (and gradual typing statically), but normal string and numeric comparisons are coercive. (You may use generic comparison operators if you want polymorphic comparison—but usually you don't. :)

String comparisons never do case folding because that's a very complicated subject in the modern world of Unicode. (You can explicitly apply an appropriate case-folding function to the arguments before doing the comparison, or for "equality" testing you can do matching with a case-insensitive regex, assuming Unicode's language-neutral case-folding rules are okay.) <lang perl6>sub compare($a,$b) {

   my $A = "{$a.WHAT.^name} '$a'";
   my $B = "{$b.WHAT.^name} '$b'";
   if $a eq $b { say "$A and $B are lexically equal" }
   if $a ne $b { say "$A and $B are not lexically equal" }
   if $a gt $b { say "$A is lexically after $B" }
   if $a lt $b { say "$A is lexically before than $B" }
   if $a ge $b { say "$A is not lexically before $B" }
   if $a le $b { say "$A is not lexically after $B" }
   if $a === $b { say "$A and $B are identical objects" }
   if $a !=== $b { say "$A and $B are not identical objects" }
   if $a eqv $b { say "$A and $B are generically equal" }
   if $a !eqv $b { say "$A and $B are not generically equal" }
   if $a before $b { say "$A is generically after $B" }
   if $a after $b { say "$A is generically before $B" }
   if $a !after $b { say "$A is not generically before $B" }
   if $a !before $b { say "$A is not generically after $B" }
   say "The lexical relationship of $A and $B is { $a leg $b }" if $a ~~ Stringy;
   say "The generic relationship of $A and $B is { $a cmp $b }";
   say "The numeric relationship of $A and $B is { $a <=> $b }" if $a ~~ Numeric;
   say ;

}

compare 'YUP', 'YUP'; compare 'BALL', 'BELL'; compare 24, 123; compare 5.1, 5; compare 5.1e0, 5 + 1/10;</lang>

Output:
Str 'YUP' and Str 'YUP' are lexically equal
Str 'YUP' is not lexically before Str 'YUP'
Str 'YUP' is not lexically after Str 'YUP'
Str 'YUP' and Str 'YUP' are identical objects
Str 'YUP' and Str 'YUP' are generically equal
Str 'YUP' is not generically before Str 'YUP'
Str 'YUP' is not generically after Str 'YUP'
The lexical relationship of Str 'YUP' and Str 'YUP' is Same
The generic relationship of Str 'YUP' and Str 'YUP' is Same

Str 'BALL' and Str 'BELL' are not lexically equal
Str 'BALL' is lexically before than Str 'BELL'
Str 'BALL' is not lexically after Str 'BELL'
Str 'BALL' and Str 'BELL' are not identical objects
Str 'BALL' and Str 'BELL' are not generically equal
Str 'BALL' is generically after Str 'BELL'
Str 'BALL' is not generically before Str 'BELL'
The lexical relationship of Str 'BALL' and Str 'BELL' is Increase
The generic relationship of Str 'BALL' and Str 'BELL' is Increase

Int '24' and Int '123' are not lexically equal
Int '24' is lexically after Int '123'
Int '24' is not lexically before Int '123'
Int '24' and Int '123' are not identical objects
Int '24' and Int '123' are not generically equal
Int '24' is generically after Int '123'
Int '24' is not generically before Int '123'
The generic relationship of Int '24' and Int '123' is Increase
The numeric relationship of Int '24' and Int '123' is Increase

Rat '5.1' and Int '5' are not lexically equal
Rat '5.1' is lexically after Int '5'
Rat '5.1' is not lexically before Int '5'
Rat '5.1' and Int '5' are not identical objects
Rat '5.1' and Int '5' are not generically equal
Rat '5.1' is generically before Int '5'
Rat '5.1' is not generically after Int '5'
The generic relationship of Rat '5.1' and Int '5' is Decrease
The numeric relationship of Rat '5.1' and Int '5' is Decrease

Num '5.1' and Rat '5.1' are lexically equal
Num '5.1' is not lexically before Rat '5.1'
Num '5.1' is not lexically after Rat '5.1'
Num '5.1' and Rat '5.1' are not identical objects
Num '5.1' and Rat '5.1' are not generically equal
Num '5.1' is not generically before Rat '5.1'
Num '5.1' is not generically after Rat '5.1'
The generic relationship of Num '5.1' and Rat '5.1' is Same
The numeric relationship of Num '5.1' and Rat '5.1' is Same

Python

Notes:

  • Python is strongly typed. The string '24' is never coerced to a number, (or vice versa).
  • Python does not have case-insensitive string comparison operators, instead use name.upper() or name.lower() to coerce strings to the same case and compare the results.

<lang python>def compare(a, b):

   print("\n%r is of type %r and %r is of type %r"
         % (a, type(a), b, type(b)))
   if a <  b:      print('%r is strictly less than  %r' % (a, b))
   if a <= b:      print('%r is less than or equal to %r' % (a, b))
   if a >  b:      print('%r is strictly greater than  %r' % (a, b))
   if a >= b:      print('%r is greater than or equal to %r' % (a, b))
   if a == b:      print('%r is equal to %r' % (a, b))
   if a != b:      print('%r is not equal to %r' % (a, b))
   if a is b:      print('%r has object identity with %r' % (a, b))
   if a is not b:  print('%r has negated object identity with %r' % (a, b))

compare('YUP', 'YUP') compare('BALL', 'BELL') compare('24', '123') compare(24, 123) compare(5.0, 5)</lang>

Output:
'YUP' is of type <class 'str'> and 'YUP' is of type <class 'str'>
'YUP' is less than or equal to 'YUP'
'YUP' is greater than or equal to 'YUP'
'YUP' is equal to 'YUP'
'YUP' has object identity with 'YUP'

'BALL' is of type <class 'str'> and 'BELL' is of type <class 'str'>
'BALL' is strictly less than  'BELL'
'BALL' is less than or equal to 'BELL'
'BALL' is not equal to 'BELL'
'BALL' has negated object identity with 'BELL'

'24' is of type <class 'str'> and '123' is of type <class 'str'>
'24' is strictly greater than  '123'
'24' is greater than or equal to '123'
'24' is not equal to '123'
'24' has negated object identity with '123'

24 is of type <class 'int'> and 123 is of type <class 'int'>
24 is strictly less than  123
24 is less than or equal to 123
24 is not equal to 123
24 has negated object identity with 123

5.0 is of type <class 'float'> and 5 is of type <class 'int'>
5.0 is less than or equal to 5
5.0 is greater than or equal to 5
5.0 is equal to 5
5.0 has negated object identity with 5

Racket

<lang racket>

  1. lang racket
Comparing two strings for exact equality

(string=? "foo" "foo")

Comparing two strings for inequality

(not (string=? "foo" "bar"))

Comparing two strings to see if one is lexically ordered before than the other

(string<? "abc" "def")

Comparing two strings to see if one is lexically ordered after than the other

(string>? "def" "abc")

How to achieve both case sensitive comparisons and case insensitive comparisons within the language

(string-ci=? "foo" "FOO") </lang>

REXX

<lang rexx>animal = 'dog' if animal = 'cat' then

 say animal "is lexically equal to cat"

if animal != 'cat' then

 say animal "is not lexically equal cat"

if animal > 'cat' then

 say animal "is lexically higher than cat"

if animal < 'cat' then

 say animal "is lexically lower than cat"

if animal >= 'cat' then

 say animal "is not lexically lower than cat"

if animal <= 'cat' then

 say animal "is not lexically higher than cat"

/* The above comparative operators do not consider

  leading and trailing whitespace when making comparisons. */

if ' cat ' = 'cat' then

 say "this will print because whitespace is stripped"

/* To consider all whitespace in a comparison

  we need to use strict comparative operators */

if ' cat ' == 'cat' then

 say "this will not print because comparison is strict"</lang>

Here is a list of the strict comparative operators and their meaning:

  • == Strictly Equal To
  • \== Strictly Not Equal To
  • << Strictly Less Than
  • >> Strictly Greater Than
  • <<= Strictly Less Than or Equal To
  • >>= Strictly Greater Than or Equal To
  • \<< Strictly Not Less Than
  • \>> Strictly Not Greater Than

Run BASIC

<lang runbasic>a$ = "dog" b$ = "cat" if a$ = b$ then print "the strings are equal" ' test for equalitY if a$ <> b$ then print "the strings are not equal" ' test for inequalitY if a$ > b$ then print a$;" is lexicallY higher than ";b$ ' test for lexicallY higher if a$ < b$ then print a$;" is lexicallY lower than ";b$ ' test for lexicallY lower if a$ <= b$ then print a$;" is not lexicallY higher than ";b$ if a$ >= b$ then print a$;" is not lexicallY lower than ";b$ end</lang>

Tcl

The best way to compare two strings in Tcl for equality is with the eq and ne expression operators: <lang tcl>if {$a eq $b} {

   puts "the strings are equal"

} if {$a ne $b} {

   puts "the strings are not equal"

}</lang> The numeric == and != operators also mostly work, but can give somewhat unexpected results when the both the values look numeric. The string equal command is equally suited to equality-testing (and generates the same bytecode).

For ordering, the < and > operators may be used, but again they are principally numeric operators. For guaranteed string ordering, the result of the string compare command should be used instead (which uses the unicode codepoints of the string): <lang tcl>if {[string compare $a $b] < 0} {

   puts "first string lower than second"

} if {[string compare $a $b] > 0} {

   puts "first string higher than second"

}</lang> Greater-or-equal and less-or-equal operations can be done by changing what exact comparison is used on the result of the string compare.

Tcl also can do a prefix-equal (approximately the same as strncmp() in C) through the use of the -length option: <lang tcl>if {[string equal -length 3 $x "abc123"]} {

   puts "first three characters are equal"

}</lang> And case-insensitive equality is (orthogonally) enabled through the -nocase option. These options are supported by both string equal and string compare, but not by the expression operators.

UNIX Shell

<lang sh>#!/bin/sh

A=Bell B=Ball

  1. Traditional test command implementations test for equality and inequality
  2. but do not have a lexical comparison facility

if [ $A = $B ] ; then

 echo 'The strings are equal'

fi if [ $A != $B ] ; then

 echo 'The strings are not equal'

fi

  1. All variables in the shell are strings, so numeric content cause no lexical problems
  2. 0 , -0 , 0.0 and 00 are all lexically different if tested using the above methods.
  1. However this may not be the case if other tools, such as awk are the slave instead of test.</lang>