Natural sorting: Difference between revisions

m
(added Scheme example)
m (→‎{{header|Wren}}: Minor tidy)
 
(45 intermediate revisions by 15 users not shown)
Line 1:
{{task|Sorting Algorithms}}
[[Category:Sorting]]
Natural sorting is the sorting of text that does more than rely on the
{{Sorting Algorithm}}
 
Natural sorting is the sorting of text that does more than rely on the
order of individual characters codes to make the finding of
individual strings easier for a ''human'' reader.
Line 8 ⟶ 11:
:2. Make all whitespace characters equivalent.
:3. Sorting without regard to case.
:4. Sorting numeric portions of strings in numeric order. That is split the string into fields on numeric boundaries, then sort on each field, with the rightmost fields being the most significant, and numeric fields of integers treated as numbers.
:: That is split the string into fields on numeric boundaries, then sort on each field, with the rightmost fields being the most significant, and numeric fields of integers treated as numbers.
:: foo9.txt before foo10.txt
:: As well as ... x9y99 before x9y100, before x10y0
Line 17 ⟶ 21:
:7. Sort ligatures as separate letters.
:8. Replacements:
:: Sort germanGerman eszett or scharfes S (ß)       as   ss
:: Sort ſ, LATIN SMALL LETTER LONG S     as   s
:: Sort ʒ, LATIN SMALL LETTER EZH           as   s
:: <big><big> ∙∙∙ </big></big>
:: ...
 
;Task Description
* '''Implement the first four''' of the eight given features in a natural sorting routine/function/method...
* Test each feature implemented separately with an ordered list of test strings from the &nbsp; '''Sample inputs''' &nbsp; section below, &nbsp; and make sure your naturally sorted output is in the same order as other language outputs such as &nbsp; <CODE> Python</CODE>.
* Print and display your output.
 
* '''For extra credit''' implement more than the first four.
 
Note: &nbsp; it is not necessary to have individual control of which features are active in the natural sorting routine at any time.
 
Note: It is not necessary to have individual control of which features are active in the natural sorting routine at any time.
 
;Sample input:
<pre>
&bull; Ignoring leading spaces. Text strings: ['ignore leading spaces: 2-2',
'ignore leading spaces: 2-1',
'ignore leading spaces: 2+0',
'ignore leading spaces: 2+1']
 
&bull; Ignoring multiple adjacent spaces (MAS). Text strings: ['ignore MAS spaces: 2-2',
'ignore MAS spaces: 2-1',
'ignore MAS spaces: 2+0',
'ignore MAS spaces: 2+1']
 
&bull; Equivalent whitespace characters. Text strings: ['Equiv. spaces: 3-3',
'Equiv. \rspaces: 3-2',
'Equiv. \x0cspaces: 3-1',
'Equiv. \x0bspaces: 3+0',
'Equiv. \nspaces: 3+1',
'Equiv. \tspaces: 3+2']
 
&bull; Case Independent sort. Text strings: ['cASE INDEPENDENT: 3-2',
'caSE INDEPENDENT: 3-1',
'casE INDEPENDENT: 3+0',
'case INDEPENDENT: 3+1']
&bull; Numeric fields as numerics. Text strings: ['foo100bar99baz0.txt',
'foo100bar10baz0.txt',
'foo1000bar99baz10.txt',
'foo1000bar99baz9.txt']
 
&bull; Title sorts. Text strings: ['The Wind in the Willows',
'The 40th step more',
'The 39 steps',
'Wanda']
 
&bull; Equivalent accented characters (and case). Text strings: [u'Equiv. \xfd accents: 2-2',
u'Equiv. \xdd accents: 2-1',
u'Equiv. y accents: 2+0',
u'Equiv. Y accents: 2+1']
 
&bull; Separated ligatures. Text strings: [u'\u0132 ligatured ij',
'no ligature']
 
&bull; Character replacements. Text strings: [u'Start with an \u0292: 2-2',
u'Start with an \u017f: 2-1',
u'Start with an \xdf: 2+0',
u'Start with an s: 2+1']
</pre><br><br>
 
=={{header|11l}}==
{{trans|Nim}}
 
<syntaxhighlight lang="11l">T.enum Kind
STRING
NUMBER
 
T KeyItem
Kind kind
String s
Int num
 
F natOrderKey(=s)
// Remove leading and trailing white spaces.
s = s.trim((‘ ’, "\t", "\r", "\n"))
 
// Make all whitespace characters equivalent and remove adjacent spaces.
s = s.replace(re:‘\s+’, ‘ ’)
 
// Switch to lower case.
s = s.lowercase()
 
// Remove leading "the ".
I s.starts_with(‘the ’) & s.len > 4
s = s[4..]
 
// Split into fields.
[KeyItem] result
V idx = 0
L idx < s.len
V e = idx
L e < s.len & !s[e].is_digit()
e++
I e > idx
V ki = KeyItem()
ki.kind = Kind.STRING
ki.s = s[idx .< e]
result.append(ki)
idx = e
L e < s.len & s[e].is_digit()
e++
I e > idx
V ki = KeyItem()
ki.kind = Kind.NUMBER
ki.num = Int(s[idx .< e])
result.append(ki)
idx = e
R result
 
F scmp(s1, s2)
I s1 < s2 {R -1}
I s1 > s2 {R 1}
R 0
 
F naturalCmp(String sa, String sb)
V a = natOrderKey(sa)
V b = natOrderKey(sb)
 
L(i) 0 .< min(a.len, b.len)
V ai = a[i]
V bi = b[i]
I ai.kind == bi.kind
V result = I ai.kind == STRING {scmp(ai.s, bi.s)} E ai.num - bi.num
I result != 0
R result
E
R I ai.kind == STRING {1} E -1
 
R I a.len < b.len {-1} E (I a.len == b.len {0} E 1)
 
F test(title, list)
print(title)
print(sorted(list, key' cmp_to_key(naturalCmp)).map(s -> ‘'’s‘'’).join("\n"))
print()
 
test(‘Ignoring leading spaces.’,
[‘ignore leading spaces: 2-2’,
‘ ignore leading spaces: 2-1’,
‘ ignore leading spaces: 2+0’,
‘ ignore leading spaces: 2+1’])
 
test(‘Ignoring multiple adjacent spaces (MAS).’,
[‘ignore MAS spaces: 2-2’,
‘ignore MAS spaces: 2-1’,
‘ignore MAS spaces: 2+0’,
‘ignore MAS spaces: 2+1’])
 
test(‘Equivalent whitespace characters.’,
[‘Equiv. spaces: 3-3’,
"Equiv. \rspaces: 3-2",
"Equiv. \x0cspaces: 3-1",
"Equiv. \x0bspaces: 3+0",
"Equiv. \nspaces: 3+1",
"Equiv. \tspaces: 3+2"])
 
test(‘Case Independent sort.’,
[‘cASE INDEPENDENT: 3-2’,
‘caSE INDEPENDENT: 3-1’,
‘casE INDEPENDENT: 3+0’,
‘case INDEPENDENT: 3+1’])
 
test(‘Numeric fields as numerics.’,
[‘foo100bar99baz0.txt’,
‘foo100bar10baz0.txt’,
‘foo1000bar99baz10.txt’,
‘foo1000bar99baz9.txt’])
 
test(‘Title sorts.’,
[‘The Wind in the Willows’,
‘The 40th step more’,
‘The 39 steps’,
‘Wanda’])</syntaxhighlight>
 
{{out}}
<pre>
# Ignoring leading spaces.
' ignore leading spaces: 2+0'
Text strings:
['ignore leading spaces: 2-2', ' ignore leading spaces: 2-1', ' ignore leading spaces: 2+0', ' ignore leading spaces: 2+1']
' ignore leading spaces: 2-1'
'ignore leading spaces: 2-2'
 
# Ignoring multiple adjacent spaces (mMAS).a.s)
'ignore MAS spaces: 2+0'
Text strings:
['ignore m.a.sMAS spaces: 2-2', 'ignore m.a.s spaces: 2-1', 'ignore m.a.s spaces: 2+0', 'ignore m.a.s spaces: 2+1']
'ignore MAS spaces: 2-1'
'ignore MAS spaces: 2-2'
 
Equivalent whitespace characters.
'Equiv. �spaces: 3+0'
'Equiv.
spaces: 3+1'
'Equiv. spaces: 3+2'
'Equiv. �spaces: 3-1'
'Equiv.
spaces: 3-2'
'Equiv. spaces: 3-3'
 
Case Independent sort.
# Equivalent whitespace characters
'casE INDEPENDENT: 3+0'
Text strings:
'case INDEPENDENT: 3+1'
['Equiv. spaces: 3-3', 'Equiv.\rspaces: 3-2', 'Equiv.\x0cspaces: 3-1', 'Equiv.\x0bspaces: 3+0', 'Equiv.\nspaces: 3+1', 'Equiv.\tspaces: 3+2']
'caSE INDEPENDENT: 3-1'
'cASE INDEPENDENT: 3-2'
 
Numeric fields as numerics.
# Case Indepenent sort
'foo100bar10baz0.txt'
Text strings:
'foo100bar99baz0.txt'
['cASE INDEPENENT: 3-2', 'caSE INDEPENENT: 3-1', 'casE INDEPENENT: 3+0', 'case INDEPENENT: 3+1']
'foo1000bar99baz9.txt'
'foo1000bar99baz10.txt'
 
Title sorts.
# Numeric fields as numerics
'The 39 steps'
Text strings:
'The 40th step more'
['foo100bar99baz0.txt', 'foo100bar10baz0.txt', 'foo1000bar99baz10.txt', 'foo1000bar99baz9.txt']
'Wanda'
'The Wind in the Willows'
 
</pre>
# Title sorts
Text strings:
['The Wind in the Willows', 'The 40th step more', 'The 39 steps', 'Wanda']
 
=={{header|AppleScript}}==
# Equivalent accented characters (and case)
AppleScript doesn't have a built-in sort facility, but its string comparisons are normalised and several attributes can be specifically either considered or ignored, making it fairly simple for a sort coded in the language to produce the results required here.
Text strings:
[u'Equiv. \xfd accents: 2-2', u'Equiv. \xdd accents: 2-1', u'Equiv. y accents: 2+0', u'Equiv. Y accents: 2+1']
 
<syntaxhighlight lang="applescript">use AppleScript version "2.4" -- OS X 10.10 (Yosemite) or later
use framework "Foundation"
use sorter : script ¬
"Custom Iterative Ternary Merge Sort" -- <www.macscripter.net/t/timsort-and-nigsort/71383/3>
 
on naturalSort(listOfText)
# Separated ligatures
-- Get doctored copies of the strings in order to get around situations that
Text strings:
-- AppleScript's text comparison attributes don't handle naturally. ie. Reduce any
[u'\u0132 ligatured ij', 'no ligature']
-- run of white space to a single character, zap any leading/trailing space, move
-- any article word at the beginning to the end, and replace any "ſ" or "ʒ" with "s".
script o
property doctored : listOfText's items
end script
set regex to current application's NSRegularExpressionSearch
set substitutions to {{"\\s++", space}, {"^ | $", ""}, ¬
{"^(?i)(The|An?) (.++)$", "$2 $1"}, {"[\\u0292\\u017f]", "s"}}
repeat with i from 1 to (count o's doctored)
set mutableString to (current application's class "NSMutableString"'s ¬
stringWithString:(o's doctored's item i))
repeat with thisSub in substitutions
set {searchStr, replacement} to thisSub
tell mutableString to replaceOccurrencesOfString:(searchStr) ¬
withString:(replacement) options:(regex) range:({0, its |length|()})
end repeat
set o's doctored's item i to mutableString as text
end repeat
-- Sort the doctored strings with the relevant AppleScript comparison attributes
-- explicitly either set or not, echoing the moves in the original list.
considering numeric strings, white space and hyphens but ignoring diacriticals, punctuation and case
tell sorter to sort(o's doctored, 1, -1, {slave:{listOfText}})
end considering
return listOfText
end naturalSort
 
on join(lst, delim)
# Character replacements
set astid to AppleScript's text item delimiters
Text strings:
set AppleScript's text item delimiters to delim
[u'Start with an \u0292: 2-2', u'Start with an \u017f: 2-1', u'Start with an \xdf: 2+0', u'Start with an s: 2+1']</pre>
set txt to lst as text
set AppleScript's text item delimiters to astid
return txt
end join
 
on tests()
set output to {"(* Leading, trailing, and multiple white spaces ignored *)"}
set output's end to ¬
naturalSort({" ignore superfluous spaces: 1-3", "ignore superfluous spaces: 1-1", ¬
" ignore superfluous spaces: 1-2", " ignore superfluous spaces: 1-4", ¬
"ignore superfluous spaces: 1-7", "ignore superfluous spaces: 1-5 ", ¬
"ignore superfluous spaces: 1-6", " ignore superfluous spaces: 1-8"})
set output's end to linefeed & "(* All white space characters treated as equivalent *)"
set output's end to naturalSort({"Equiv. spaces: 2-6", "Equiv." & return & "spaces: 2-5", ¬
"Equiv." & (character id 12) & "spaces: 2-4", ¬
"Equiv." & (character id 11) & "spaces: 2-3", ¬
"Equiv." & linefeed & "spaces: 2-2", "Equiv." & tab & "spaces: 2-1"})
set output's end to linefeed & ¬
"(* Case ignored. (The sort order would actually be the same with case considered,
since case only decides the issue when the strings are otherwise identical.) *)"
set output's end to naturalSort({"cASE INDEPENDENT: 3-1", "caSE INDEPENDENT: 3-2", ¬
"CASE independent: 3-3", "casE INDEPENDENT: 3-4", "case INDEPENDENT: 3-5"})
set output's end to linefeed & "(* Numerics considered by number value *)"
set output's end to naturalSort({"foo1000bar99baz10.txt", "foo100bar99baz0.txt", ¬
"foo100bar10baz0.txt", "foo1000bar99baz9.txt"})
set output's end to linefeed & "(* Title sort *)"
set output's end to ¬
naturalSort({"The Wind in the Willows", "The 40th Step More", ¬
"A Matter of Life and Death", "The 39 steps", ¬
"An Inspector Calls", "Wanda"})
set output's end to linefeed & "(* Diacriticals (and case) ignored *)"
set output's end to naturalSort({"Equiv. " & (character id 253) & " accents: 6-1", ¬
"Equiv. " & (character id 221) & " accents: 6-3", ¬
"Equiv. y accents: 6-4", "Equiv. Y accents: 6-2"})
set output's end to linefeed & "(* Ligatures *)"
set output's end to naturalSort({(character id 306) & " ligatured", ¬
"of", "ij no ligature", (character id 339), "od"})
set output's end to linefeed & ¬
"(* Custom \"s\" equivalents and Esszet (NB. Esszet normalises to \"ss\") *)"
set output's end to naturalSort({"Start with an " & (character id 658) & ": 8-1", ¬
"Start with an " & (character id 383) & ": 8-2", ¬
"Start with an " & (character id 223) & ": 8-3", ¬
"Start with an s: 8-4", "Start with an ss: 8-5"})
return join(output, linefeed)
end tests
 
tests()</syntaxhighlight>
 
{{output}}
<syntaxhighlight lang="applescript">"(* Leading, trailing, and multiple white spaces ignored *)
ignore superfluous spaces: 1-1
ignore superfluous spaces: 1-2
ignore superfluous spaces: 1-3
ignore superfluous spaces: 1-4
ignore superfluous spaces: 1-5
ignore superfluous spaces: 1-6
ignore superfluous spaces: 1-7
ignore superfluous spaces: 1-8
 
(* All white space characters treated as equivalent *)
Equiv. spaces: 2-1
Equiv.
spaces: 2-2
Equiv.�spaces: 2-3
Equiv.�spaces: 2-4
Equiv.
spaces: 2-5
Equiv. spaces: 2-6
 
(* Case ignored. (The sort order would actually be the same with case considered,
since case only decides the issue when strings are otherwise identical.) *)
cASE INDEPENDENT: 3-1
caSE INDEPENDENT: 3-2
CASE independent: 3-3
casE INDEPENDENT: 3-4
case INDEPENDENT: 3-5
 
(* Numerics considered by number value *)
foo100bar10baz0.txt
foo100bar99baz0.txt
foo1000bar99baz9.txt
foo1000bar99baz10.txt
 
(* Title sort *)
The 39 steps
The 40th Step More
An Inspector Calls
A Matter of Life and Death
Wanda
The Wind in the Willows
 
(* Diacriticals (and case) ignored *)
Equiv. ý accents: 6-1
Equiv. Y accents: 6-2
Equiv. Ý accents: 6-3
Equiv. y accents: 6-4
 
(* Ligatures *)
IJ ligatured
ij no ligature
od
œ
of
 
(* Custom \"s\" equivalents and Esszet (NB. Esszet normalises to \"ss\") *)
Start with an ʒ: 8-1
Start with an ſ: 8-2
Start with an s: 8-4
Start with an ß: 8-3
Start with an ss: 8-5"</syntaxhighlight>
 
=={{header|ATS}}==
 
<syntaxhighlight lang="ats">
(*------------------------------------------------------------------*)
(* Natural sorting. *)
(*------------------------------------------------------------------*)
 
(* I deal only with ASCII here and solve only the first four
problems. For Unicode, I would most likely use GNU libunistring (or
Gnulib) and UTF-32. Handling Unicode properly is complicated.
 
There are other matters that make "natural sorting" a tricky
thing. For instance, which "accented letters" are actually
"accented"--rather than distinct letters in their own
right--depends on the language and the purpose. In Esperanto, for
example, 'ĉ' is a distinct letter that goes just after the letter
'c'. *)
 
#include "share/atspre_staload.hats"
staload UN = "prelude/SATS/unsafe.sats"
 
#define NIL list_nil ()
#define :: list_cons
 
(*------------------------------------------------------------------*)
(* Types and interfaces. *)
 
typedef char_skipper =
{n : int}
{i : nat | i <= n}
(string n,
size_t n,
size_t i) -<cloref>
[j : int | i <= j; j <= n]
size_t j
 
typedef char_skipper_backwards =
{n : int}
{i : nat | i <= n}
(string n,
size_t n,
size_t i) -<cloref>
[j : int | 0 <= j; j <= i]
size_t j
 
typedef char_translator =
{n : int}
string n -<cloref,!wrt> string n
 
extern fn
make_char_skipper :
(char -<cloref> bool) -<> char_skipper
 
extern fn
make_char_skipper_backwards :
(char -<cloref> bool) -<> char_skipper_backwards
 
extern fn
make_char_translator :
(char -<cloref> bool,
char -<cloref> [c : int | 1 <= c; c <= 127] char c) -<>
char_translator
 
extern fn
remove_leading_spaces :
{n : int}
string n -< !wrt >
[m : nat | m <= n]
string m
 
extern fn
remove_trailing_spaces :
{n : int}
string n -< !wrt >
[m : nat | m <= n]
string m
 
extern fn
combine_adjacent_spaces :
{n : int}
string n -< !wrt >
[m : nat | m <= n]
string m
 
extern fn
compare_strings_containing_numbers :
{m, n : int}
(string m, string n) -<> int
 
extern fn
evaluate_natural :
{m : int}
{i, n : nat | i + n <= m}
(string m, size_t i, size_t n) -<> ullint
 
extern fn
compare_strings_naturally :
{m, n : int}
(string m, string n) -< !wrt > int
 
extern fn
list_sort_strings_naturally :
{n : int}
list (String, n) -< !wrt > list (String, n)
 
(*------------------------------------------------------------------*)
(* Global closures. *)
 
val skip_spaces =
make_char_skipper (lam c => c = ' ')
 
val skip_spaces_backwards =
make_char_skipper_backwards (lam c => c = ' ')
 
val skip_digits =
make_char_skipper (lam c => isdigit c)
 
val translate_whitespace_to_spaces =
make_char_translator (lam c => isspace c, lam c => ' ')
 
(* A little unit test. *)
val- "1 2 3 " = translate_whitespace_to_spaces "1\t2\v3\n"
 
val string_tolower =
make_char_translator
(lam c => isupper c,
lam c =>
let
typedef ascii_lowercase =
[c : int | 'a' <= c; c <= 'z'] char c
val c = tolower c
in
$UN.cast{ascii_lowercase} c
end)
 
(* A little unit test. *)
val- "abcdef" = string_tolower "ABCdef"
 
(*------------------------------------------------------------------*)
(* Implementations. *)
 
implement
make_char_skipper match =
let
fun
skipper {n : int}
{i : nat | i <= n}
.<n - i>.
(s : string n,
n : size_t n,
i : size_t i)
:<cloref> [j : int | i <= j; j <= n]
size_t j =
if i = n then
i
else if ~match s[i] then
i
else
skipper (s, n, succ i)
in
skipper
end
 
implement
make_char_skipper_backwards match =
let
fun
skipper {n : int}
{i : nat | i <= n}
.<i>.
(s : string n,
n : size_t n,
i : size_t i)
:<cloref> [j : int | 0 <= j; j <= i]
size_t j =
if i = i2sz 0 then
i
else if ~match s[pred i] then
i
else
skipper (s, n, pred i)
in
skipper
end
 
implement
make_char_translator (match, replace) =
let
fn
translator {n : int}
(s : string n)
:<cloref,!wrt> string n =
let
prval () = lemma_string_param s
val n = strlen s
 
fun
loop {i : nat | i <= n}
.<n - i>.
(t : strnptr n, (* strnptr so it is mutable. *)
i : size_t i)
:<!wrt> strnptr n =
if i = n then
t
else
let
val c = t[i]
in
if match c then
t[i] := replace c;
loop (t, succ i)
end
in
strnptr2string (loop (string1_copy s, i2sz 0))
end
in
translator
end
 
implement
remove_leading_spaces s =
let
prval () = lemma_string_param s
val n = strlen s
val j = skip_spaces (s, n, i2sz 0)
in
strnptr2string (string_make_substring (s, j, n - j))
end
 
(* A little unit test. *)
val- "1 " = remove_leading_spaces " 1 "
 
implement
remove_trailing_spaces s =
let
prval () = lemma_string_param s
val n = strlen s
val j = skip_spaces_backwards (s, n, n)
in
strnptr2string (string_make_substring (s, i2sz 0, j))
end
 
(* A little unit test. *)
val- " 1" = remove_trailing_spaces " 1 "
 
fn
_combine_adjacent_spaces
{n : int}
(s : string n)
:<!refwrt> [m : nat | m <= n]
string m =
let
prval () = lemma_string_param s
val n = strlen s
in
if n = i2sz 0 then
s
else
let
val buf = arrayref_make_elt<char> (succ n, '\0')
val () = buf[0] := s[0]
 
fun
loop {i : pos | i <= n}
{j : pos | j <= i}
.<n - i>.
(i : size_t i,
j : size_t j)
:<!refwrt> [k : pos | k <= n]
size_t k =
if i = n then
j
else
begin
if (s[i] = ' ') * (s[pred i] = ' ') then
loop (succ i, j)
else
begin
buf[j] := s[i];
loop (succ i, succ j)
end
end
 
val [k : int] k = loop (i2sz 1, i2sz 1)
val s1 = $UN.cast{string k} (ptrcast buf)
in
strnptr2string (string_make_substring (s1, i2sz 0, k))
end
end
 
implement
combine_adjacent_spaces s =
$effmask_ref _combine_adjacent_spaces s
 
(* A little unit test. *)
val- " 1 2 3 4 " = combine_adjacent_spaces " 1 2 3 4 "
 
implement
compare_strings_containing_numbers {m, n} (sm, sn) =
let
prval () = lemma_string_param sm
prval () = lemma_string_param sn
val m = strlen sm
and n = strlen sn
 
fun
compare_them
{i, j : nat | i <= m; j <= n}
.<m - i, n - j>.
(i : size_t i,
j : size_t j)
:<> int =
if i = m then
(if j = n then 0 else ~1)
else if j = n then
1
else if (isdigit sm[i]) * (isdigit sn[j]) then
let
val i1 = skip_digits (sm, m, succ i)
and j1 = skip_digits (sn, n, succ j)
val vm = evaluate_natural (sm, i, i1 - i)
and vn = evaluate_natural (sn, j, j1 - j)
in
if vm = vn then
compare_them (i1, j1)
else if vm < vn then
~1
else
1
end
else
let
val cmp = compare (sm[i], sn[j])
in
if cmp = 0 then
compare_them (succ i, succ j)
else
cmp
end
in
compare_them (i2sz 0, i2sz 0)
end
 
implement
evaluate_natural {m} {i, n} (s, i, n) =
let
fun
loop {k : int | i <= k; k <= i + n}
.<(i + n) - k>.
(k : size_t k,
accum : ullint)
:<> ullint =
if k = i + n then
accum
else
let
val digit = (char2int0 s[k] - char2int0 '0')
val accum = (10ULL * accum) + g0i2u digit
in
loop (succ k, accum)
end
in
loop (i, 0ULL)
end
 
(* A little unit test. *)
val- 1234ULL = evaluate_natural ("xy1234z", i2sz 2, i2sz 4)
 
implement
compare_strings_naturally (sm, sn) =
let
prval () = lemma_string_param sm
prval () = lemma_string_param sn
 
fn
adjust_string (s : String0)
:<!wrt> String0 =
let
val s = translate_whitespace_to_spaces s
val s = string_tolower s
val s = remove_leading_spaces s
val s = remove_trailing_spaces s
val s = combine_adjacent_spaces s
in
s
end
in
compare_strings_containing_numbers (adjust_string sm,
adjust_string sn)
end
 
implement
list_sort_strings_naturally lst =
let
implement
list_mergesort$cmp<String> (x, y) =
$effmask_wrt compare_strings_naturally (x, y)
in
list_vt2t (list_mergesort<String> lst)
end
 
(*------------------------------------------------------------------*)
(* Now see if we pass the required tests. *)
 
implement
gequal_val_val<String> (x, y) =
g0ofg1 x = g0ofg1 y
 
fn
nicefy_string (s : string)
: string =
let
val s = g1ofg0 s
prval () = lemma_string_param s
val n = strlen s
prval [n : int] EQINT () = eqint_make_guint n
 
var t : string = ""
var i : [i : nat | i <= n] size_t i
in
for (i := i2sz 0; i <> n; i := succ i)
let
val c = char2int0 s[i]
in
if c < 32 then
let
val numstr = strptr2string (g0int2string c)
val u =
strptr2string (string0_append4 (t, "[", numstr, "]"))
in
t := u
end
else
let
val chrstr = strnptr2string (string_sing s[i])
val u = strptr2string (string0_append (t, chrstr))
in
t := u
end
end;
t
end
 
fn
make_list_printable {n : int}
(lst : list (String, n))
: list (string, n) =
let
prval () = lemma_list_param lst
 
fun
loop {i : nat | i <= n}
.<n - i>.
(lst : list (String, n - i),
accum : list (string, i))
: list (string, n) =
case+ lst of
| NIL => list_vt2t (list_reverse accum)
| head :: tail =>
let
val s = strptr2string (string0_append3 ("|", head, "|"))
in
loop (tail, nicefy_string s :: accum)
end
in
loop (lst, NIL)
end
 
fn
test_ignoring_leading_spaces () : void =
let
val givenlst = $list ("ignore leading spaces: 2-2",
" ignore leading spaces: 2-1",
" ignore leading spaces: 2+0",
" ignore leading spaces: 2+1")
val expected = $list (" ignore leading spaces: 2+0",
" ignore leading spaces: 2+1",
" ignore leading spaces: 2-1",
"ignore leading spaces: 2-2")
 
val sortedlst = list_sort_strings_naturally givenlst
in
assertloc (sortedlst = expected);
println! ("Ignoring leading spaces:");
println! (" given: ", make_list_printable givenlst);
println! (" result: ", make_list_printable sortedlst)
end
 
fn
test_ignoring_trailing_spaces () : void =
(* I added this test, myself. *)
let
val givenlst = $list ("ignore trailing spaces: 2b2",
"ignore trailing spaces: 2b1 ",
"ignore trailing spaces: 2b+0 ",
"ignore trailing spaces: 2b+1 ")
val expected = $list ("ignore trailing spaces: 2b+0 ",
"ignore trailing spaces: 2b+1 ",
"ignore trailing spaces: 2b1 ",
"ignore trailing spaces: 2b2")
 
val sortedlst = list_sort_strings_naturally givenlst
in
assertloc (sortedlst = expected);
println! ("Ignoring trailing spaces:");
println! (" given: ", make_list_printable givenlst);
println! (" result: ", make_list_printable sortedlst)
end
 
fn
test_combining_adjacent_spaces () : void =
let
val givenlst = $list ("ignore m.a.s spaces: 2-2",
"ignore m.a.s spaces: 2-1",
"ignore m.a.s spaces: 2+0",
"ignore m.a.s spaces: 2+1")
val expected = $list ("ignore m.a.s spaces: 2+0",
"ignore m.a.s spaces: 2+1",
"ignore m.a.s spaces: 2-1",
"ignore m.a.s spaces: 2-2")
 
val sortedlst = list_sort_strings_naturally givenlst
in
assertloc (sortedlst = expected);
println! ("Combining adjacent spaces:");
println! (" given: ", make_list_printable givenlst);
println! (" result: ", make_list_printable sortedlst)
end
 
fn
test_whitespace_equivalence () : void =
let
val givenlst = $list ("Equiv. spaces: 3-3",
"Equiv.\rspaces: 3-2",
"Equiv.\x0cspaces: 3-1",
"Equiv.\x0bspaces: 3+0",
"Equiv.\nspaces: 3+1",
"Equiv.\tspaces: 3+2")
val expected = $list ("Equiv.\x0bspaces: 3+0",
"Equiv.\nspaces: 3+1",
"Equiv.\tspaces: 3+2",
"Equiv.\x0cspaces: 3-1",
"Equiv.\rspaces: 3-2",
"Equiv. spaces: 3-3")
 
val sortedlst = list_sort_strings_naturally givenlst
in
assertloc (sortedlst = expected);
println! ("All whitespace characters equivalent:");
println! (" given: ", make_list_printable givenlst);
println! (" result: ", make_list_printable sortedlst)
end
 
fn
test_case_independence () : void =
let
val givenlst = $list ("cASE INDEPENENT: 3-2",
"caSE INDEPENENT: 3-1",
"casE INDEPENENT: 3+0",
"case INDEPENENT: 3+1")
val expected = $list ("casE INDEPENENT: 3+0",
"case INDEPENENT: 3+1",
"caSE INDEPENENT: 3-1",
"cASE INDEPENENT: 3-2")
 
val sortedlst = list_sort_strings_naturally givenlst
in
assertloc (sortedlst = expected);
println! ("Case independence:");
println! (" given: ", make_list_printable givenlst);
println! (" result: ", make_list_printable sortedlst)
end
 
fn
test_numeric_fields () : void =
let
val givenlst = $list ("foo100bar99baz0.txt",
"foo100bar10baz0.txt",
"foo1000bar99baz10.txt",
"foo1000bar99baz9.txt")
val expected = $list ("foo100bar10baz0.txt",
"foo100bar99baz0.txt",
"foo1000bar99baz9.txt",
"foo1000bar99baz10.txt")
 
val sortedlst = list_sort_strings_naturally givenlst
in
assertloc (sortedlst = expected);
println! ("Numeric fields:");
println! (" given: ", make_list_printable givenlst);
println! (" result: ", make_list_printable sortedlst)
end
 
implement
main0 () =
begin
test_ignoring_leading_spaces ();
test_ignoring_trailing_spaces ();
test_combining_adjacent_spaces ();
test_whitespace_equivalence ();
test_case_independence ();
test_numeric_fields ();
println! ("success")
end
 
(*------------------------------------------------------------------*)
</syntaxhighlight>
 
{{out}}
<pre>$ patscc -DATS_MEMALLOC_GCBDW natural_sorting.dats -lgc && ./a.out
Ignoring leading spaces:
given: |ignore leading spaces: 2-2|, | ignore leading spaces: 2-1|, | ignore leading spaces: 2+0|, | ignore leading spaces: 2+1|
result: | ignore leading spaces: 2+0|, | ignore leading spaces: 2+1|, | ignore leading spaces: 2-1|, |ignore leading spaces: 2-2|
Ignoring trailing spaces:
given: |ignore trailing spaces: 2b2|, |ignore trailing spaces: 2b1 |, |ignore trailing spaces: 2b+0 |, |ignore trailing spaces: 2b+1 |
result: |ignore trailing spaces: 2b+0 |, |ignore trailing spaces: 2b+1 |, |ignore trailing spaces: 2b1 |, |ignore trailing spaces: 2b2|
Combining adjacent spaces:
given: |ignore m.a.s spaces: 2-2|, |ignore m.a.s spaces: 2-1|, |ignore m.a.s spaces: 2+0|, |ignore m.a.s spaces: 2+1|
result: |ignore m.a.s spaces: 2+0|, |ignore m.a.s spaces: 2+1|, |ignore m.a.s spaces: 2-1|, |ignore m.a.s spaces: 2-2|
All whitespace characters equivalent:
given: |Equiv. spaces: 3-3|, |Equiv.[13]spaces: 3-2|, |Equiv.[12]spaces: 3-1|, |Equiv.[11]spaces: 3+0|, |Equiv [10]spaces: 3+1|, |Equiv.[9]spaces: 3+2|
result: |Equiv.[11]spaces: 3+0|, |Equiv.[10]spaces: 3+1|, |Equiv.[9]spaces: 3+2|, |Equiv.[12]spaces: 3-1|, |Equiv.[13]spaces: 3-2|, |Equiv. spaces: 3-3|
Case independence:
given: |cASE INDEPENENT: 3-2|, |caSE INDEPENENT: 3-1|, |casE INDEPENENT: 3+0|, |case INDEPENENT: 3+1|
result: |casE INDEPENENT: 3+0|, |case INDEPENENT: 3+1|, |caSE INDEPENENT: 3-1|, |cASE INDEPENENT: 3-2|
Numeric fields:
given: |foo100bar99baz0.txt|, |foo100bar10baz0.txt|, |foo1000bar99baz10.txt|, |foo1000bar99baz9.txt|
result: |foo100bar10baz0.txt|, |foo100bar99baz0.txt|, |foo1000bar99baz9.txt|, |foo1000bar99baz10.txt|
success
</pre>
 
=={{header|C}}==
Line 80 ⟶ 1,041:
 
Besides the numeric part, everything else was done in a uniform way by transforming input strings into some normalized format and comparing those instead. All sort options flags can be freely mixed together. C source is written in UTF-8 for easier reading here: don't do this for serious code.
<langsyntaxhighlight lang="c">#include <stdio.h>
#include <stdlib.h>
#include <wchar.h>
Line 152 ⟶ 1,113:
}
 
static const wchar_t *const tbl_accent[] = { /* copied from Perl6Raku code */
L"Þ", L"TH", L"þ", L"th", L"Ð", L"TH", L"ð", L"th", L"À", L"A",
L"Á", L"A", L"Â", L"A", L"Ã", L"A", L"Ä", L"A", L"Å", L"A", L"à",
Line 336 ⟶ 1,297:
 
return 0;
}</langsyntaxhighlight>output<syntaxhighlight lang="text">Sort flags: (collapse spaces)
0000098 nina
100 NINA
Line 388 ⟶ 1,349:
99 Ninja
100 NINA
100 niño</langsyntaxhighlight>
 
=={{header|D}}==
Implements requests 1-5.
<langsyntaxhighlight lang="d">import std.stdio, std.string, std.algorithm, std.array, std.conv,
std.ascii, std.range;
 
Line 458 ⟶ 1,419:
}
}
</syntaxhighlight>
</lang>
{{out}}
<pre>Test strings:
Line 583 ⟶ 1,544:
=={{header|Elixir}}==
Implements requests 1-5.
<langsyntaxhighlight lang="elixir">defmodule Natural do
def sorting(texts) do
Enum.sort_by(texts, fn text -> compare_value(text) end)
Line 638 ⟶ 1,599:
["The Wind in the Willows", "The 40th step more", "The 39 steps", "Wanda"]}
]
|> Enum.each(fn {title, input} -> Natural.task(title, input) end)</langsyntaxhighlight>
 
{{out}}
Line 760 ⟶ 1,721:
 
Objectives six to eight are attainable, except that the character encodements available are not portable. Code page 850 doesn't offer the same accented character codes as for other systems, such as code page 437. But the auxiliary sort key approach easily accommodates substitute characters (and could also swap + and -, for example!), and could recognise ligatures as well. One might be prodded into escalating to 16-bit or even 32-bit character codes to maintain the same ease of manipulation.
<syntaxhighlight lang="fortran">
<lang Fortran>
MODULE STASHTEXTS !Using COMMON is rather more tedious.
INTEGER MSG,KBD !I/O unit numbers.
Line 1,126 ⟶ 2,087:
END DO !On to the next.
END !A handy hint from Mr. Natural: "At home or at work, get the right tool for the job!"
</syntaxhighlight>
</lang>
Example output, in two columns:
Entry|Text Character Order Entry|Text 'Natural' Order
Line 1,157 ⟶ 2,118:
 
===Via a complex comparison procedure.===
Here, all previous worry over storage consumption is abandoned. An array of CHARACTER*66 is deemed long enough, and too bad over trailing spaces. An auxiliary array holds the length of each entry. The source texts are left alone, at the cost of having the "natural" compare process engage in a lot of cogitation. The plan for the comparison is to have subroutine ANOTHER yield two characters for comparison, one from each string being compared. Subroutine CRUSH deals with spaceish and character case simplification, and each scan can scoot ahead through whatever multiple spaces it encounters. Subroutine LIBRARIAN advances past antany initial spaceish stuff then detects articles at the start of a text and advances the initial scan past them while preparing the appropriate addendum for the end of the text, signified in TAIL, should a scan reach that far.
 
Each scan maintains a STATE, and if both achieve NUMERIC then both scans are looking at a digit in their respective texts. If so, there is a forwards probe to determine how many digits follow in each text, the results of which drive the interpolation of leading zero characters. That is, suppose the second text has fewer digits (looking at "007" versus "7", say) - then a "0" is created for the second text, its finger is backspaced one and the first text's digit count is decremented. The next time around, the digit counts are known so will not be re-scanned and again a "0" will be generated. The third time, the digit counts will be equal, so both texts will yield up their characters for comparison.
Line 1,163 ⟶ 2,124:
When (if) a scan reaches the end of its text, the TAIL will be considered for the extraction of further characters.
 
<syntaxhighlight lang="fortran">
<lang Fortran>
MODULE ASSISTANCE
INTEGER MSG,KBD !I/O unit numbers.
Line 1,489 ⟶ 2,450:
END DO !On to the next.
END !A handy hint from Mr. Natural: "At home or at work, get the right tool for the job!"
</syntaxhighlight>
</lang>
This time, because the texts are no longer being parsed into pieces, the book titles are not ordered together though they are in the required sequence disregarding the other entries. "The 39" and "The 40" have the "The " part converted into a TAIL, and so their first comparison characters are their digits, and in ASCII, digits precede letters. This is revealed in the third column, where the comparison characters are revealed, in an ad-hoc manner: what appears are the characters placed by every comparison so there may be contention.
Output:
Line 1,526 ⟶ 2,487:
 
First four rules, no extra credit:
<langsyntaxhighlight lang="go">package main
 
import (
Line 1,659 ⟶ 2,620:
}
return false
}</langsyntaxhighlight>
{{out}}
<pre>
Line 1,725 ⟶ 2,686:
"foo1000bar99baz9.txt"
"foo1000bar99baz10.txt"
</pre>
 
=={{header|Haskell}}==
 
Implements requests 1-5.
<syntaxhighlight lang="haskell">
import Data.List
import Data.Char
import Data.String.Utils
import Data.List.Utils
import Data.Function (on)
 
 
printOutput = do
putStrLn "# Ignoring leading spaces \n"
printBlockOfMessages sample1Rule ignoringStartEndSpaces
putStrLn "\n # Ignoring multiple adjacent spaces (m.a.s) \n"
printBlockOfMessages sample2Rule ignoringMultipleAdjacentSpaces
putStrLn "\n # Equivalent whitespace characters \n"
printBlockOfMessages sample3Rule ignoringMultipleAdjacentSpaces
putStrLn "\n # Case Indepenent sorts \n"
printBlockOfMessages sample4Rule caseIndependent
putStrLn "\n # Numeric fields as numerics \n"
printBlockOfMessages sample5Rule numericFieldsAsNumbers
putStrLn "\n # Title sorts \n"
printBlockOfMessages sample6Rule removeLeadCommonWords
 
printMessage message content = do
putStrLn message
mapM_ print content
 
printBlockOfMessages list function = do
printMessage "Text strings:" list
printMessage "Normally sorted:" (sort list)
printMessage "Naturally sorted:" (sortListWith list function)
 
 
-- samples
sample1Rule = ["ignore leading spaces: 2-2", " ignore leading spaces: 2-1", " ignore leading spaces: 2+0", " ignore leading spaces: 2+1"]
sample2Rule = ["ignore m.a.s spaces: 2-2", "ignore m.a.s spaces: 2-1", "ignore m.a.s spaces: 2+0", "ignore m.a.s spaces: 2+1"]
sample3Rule = ["Equiv. spaces: 3-3", "Equiv.\rspaces: 3-2", "Equiv.\x0cspaces: 3-1", "Equiv.\x0bspaces: 3+0", "Equiv.\nspaces: 3+1", "Equiv.\tspaces: 3+2"]
sample4Rule = ["cASE INDEPENENT: 3-2", "caSE INDEPENENT: 3-1", "casE INDEPENENT: 3+0", "case INDEPENENT: 3+1"]
sample5Rule = ["foo100bar99baz0.txt", "foo100bar10baz0.txt", "foo1000bar99baz10.txt", "foo1000bar99baz9.txt"]
sample6Rule = ["The Wind in the Willows", "The 40th step more", "The 39 steps", "Wanda"]
 
 
-- function to execute all sorts
sortListWith l f = sort $ f l
 
-- 1. Ignore leading, trailing and multiple adjacent spaces
 
-- Ignoring leading spaces
 
-- receive a String and remove all spaces from the start and end of that String, a String is considered an List os Char
-- ex: " a string " = "a string"
ignoringStartEndSpaces :: [String] -> [String]
ignoringStartEndSpaces = map strip
 
-- Ignoring multiple adjacent spaces and Equivalent whitespace characters
 
ignoringMultipleAdjacentSpaces :: [String] -> [String]
ignoringMultipleAdjacentSpaces = map (unwords . words)
 
-- 2. Equivalent whitespace characters
-- 3. Case independent sort
-- lower case of an entire String
-- ex "SomeCAse" = "somecase"
caseIndependent :: [String] -> [String]
caseIndependent = map (map toLower)
 
-- 4. Numeric fields as numerics (deals with up to 20 digits)
numericFieldsAsNumbers :: [String] -> [[Int]]
numericFieldsAsNumbers = map findOnlyNumerics
 
findOnlyNumerics :: String -> [Int]
findOnlyNumerics s = convertDigitAsStringToInt $ makeListOfDigitsAsString $ extractDigitsAsString s
extractDigitsAsString :: String -> [String]
extractDigitsAsString s = map (filter isNumber) $ groupBy ((==) `on` isNumber ) s
makeListOfDigitsAsString :: [String] -> [String]
makeListOfDigitsAsString l = tail $ nub l
convertDigitAsStringToInt :: [String] -> [Int]
convertDigitAsStringToInt = map (joiner . map digitToInt)
 
-- join a list of numbers into a single number
-- ex [4,2] = 42
joiner :: [Int] -> Int
joiner = read . concatMap show
 
-- 5. Title sort
removeLeadCommonWords l = map removeLeadCommonWord $ splitList l
 
splitList = map words
removeLeadCommonWord a = unwords $ if f a commonWords then tail a else a
where f l1 = elem (map toLower (head l1))
commonWords = ["the","a","an","of"]
 
</syntaxhighlight>
 
{{out}}
<pre style="height: 80ex; overflow: scroll">
# Ignoring leading spaces
 
Text strings:
"ignore leading spaces: 2-2"
" ignore leading spaces: 2-1"
" ignore leading spaces: 2+0"
" ignore leading spaces: 2+1"
Normally sorted:
" ignore leading spaces: 2+1"
" ignore leading spaces: 2+0"
" ignore leading spaces: 2-1"
"ignore leading spaces: 2-2"
Naturally sorted:
"ignore leading spaces: 2+0"
"ignore leading spaces: 2+1"
"ignore leading spaces: 2-1"
"ignore leading spaces: 2-2"
 
# Ignoring multiple adjacent spaces (m.a.s)
 
Text strings:
"ignore m.a.s spaces: 2-2"
"ignore m.a.s spaces: 2-1"
"ignore m.a.s spaces: 2+0"
"ignore m.a.s spaces: 2+1"
Normally sorted:
"ignore m.a.s spaces: 2+1"
"ignore m.a.s spaces: 2+0"
"ignore m.a.s spaces: 2-1"
"ignore m.a.s spaces: 2-2"
Naturally sorted:
"ignore m.a.s spaces: 2+0"
"ignore m.a.s spaces: 2+1"
"ignore m.a.s spaces: 2-1"
"ignore m.a.s spaces: 2-2"
 
# Equivalent whitespace characters
 
Text strings:
"Equiv. spaces: 3-3"
"Equiv.\rspaces: 3-2"
"Equiv.\fspaces: 3-1"
"Equiv.\vspaces: 3+0"
"Equiv.\nspaces: 3+1"
"Equiv.\tspaces: 3+2"
Normally sorted:
"Equiv.\tspaces: 3+2"
"Equiv.\nspaces: 3+1"
"Equiv.\vspaces: 3+0"
"Equiv.\fspaces: 3-1"
"Equiv.\rspaces: 3-2"
"Equiv. spaces: 3-3"
Naturally sorted:
"Equiv. spaces: 3+0"
"Equiv. spaces: 3+1"
"Equiv. spaces: 3+2"
"Equiv. spaces: 3-1"
"Equiv. spaces: 3-2"
"Equiv. spaces: 3-3"
 
# Case Indepenent sorts
 
Text strings:
"cASE INDEPENENT: 3-2"
"caSE INDEPENENT: 3-1"
"casE INDEPENENT: 3+0"
"case INDEPENENT: 3+1"
Normally sorted:
"cASE INDEPENENT: 3-2"
"caSE INDEPENENT: 3-1"
"casE INDEPENENT: 3+0"
"case INDEPENENT: 3+1"
Naturally sorted:
"case indepenent: 3+0"
"case indepenent: 3+1"
"case indepenent: 3-1"
"case indepenent: 3-2"
 
# Numeric fields as numerics
 
Text strings:
"foo100bar99baz0.txt"
"foo100bar10baz0.txt"
"foo1000bar99baz10.txt"
"foo1000bar99baz9.txt"
Normally sorted:
"foo1000bar99baz10.txt"
"foo1000bar99baz9.txt"
"foo100bar10baz0.txt"
"foo100bar99baz0.txt"
Naturally sorted:
[100,10,0]
[100,99,0]
[1000,99,9]
[1000,99,10]
 
# Title sorts
 
Text strings:
"The Wind in the Willows"
"The 40th step more"
"The 39 steps"
"Wanda"
Normally sorted:
"The 39 steps"
"The 40th step more"
"The Wind in the Willows"
"Wanda"
Naturally sorted:
"39 steps"
"40th step more"
"Wanda"
"Wind in the Willows"
</pre>
 
Line 1,731 ⟶ 2,905:
The natural way of approaching this task in J is to normalize the text based on the rules desired for sorting. Here, we limit ourselves to ascii, for portability, and decide that our domain shall be terminated strings (where each string ends with the same character - typically a newline):
 
<langsyntaxhighlight lang="j">require'strings regex'
lines=: <;.2
Line 1,740 ⟶ 2,914:
norm=: [: split (32 9 12 13 14 15{a.) -.~ [: titleFix tolower
natSor=: lines ;@/: norm&.>@lines</langsyntaxhighlight>
 
Example data:
 
<langsyntaxhighlight lang="j">IgnoringLeadingSpaces=:0 :0
ignore leading spaces: 2-2
ignore leading spaces: 2-1
Line 1,788 ⟶ 2,962:
The 39 steps
Wanda
)</langsyntaxhighlight>
 
Note that the required example which contains equivalent whitespace characters includes a '\n' in the data. So, for that example, we use a backslash as our terminator.
Line 1,794 ⟶ 2,968:
Example use:
 
<langsyntaxhighlight lang="j"> natSor IgnoringLeadingSpaces
ignore leading spaces: 2+0
ignore leading spaces: 2+1
Line 1,828 ⟶ 3,002:
The Wind in the Willows
 
</syntaxhighlight>
</lang>
 
=={{header|JavaScript}}==
Line 1,834 ⟶ 3,008:
Implements the first four rules. Rule 4 works for digits up to 20 characters.
 
<syntaxhighlight lang="javascript">
<lang JavaScript>
var nsort = function(input) {
var e = function(s) {
Line 1,854 ⟶ 3,028:
]));
// -> ['\nfile9.txt', 'file10.txt', 'File11.TXT', 'file12.txt']
</syntaxhighlight>
</lang>
 
=={{header|jq}}==
Line 1,871 ⟶ 3,045:
matter therefore comes down to the filter named "splitup", which for
clarity, we define here as a top-level function, as follows:
<langsyntaxhighlight lang="jq">def splitup:
def tidy: if .[0] == "" then .[1:] else . end | if .[length-1] == "" then .[0:length-1] else . end ;
 
Line 1,928 ⟶ 3,102:
| splitup # embedded integers
;
sort_by(naturally);</langsyntaxhighlight>
'''Testing'''
For brevity, we use the input as given above, but modified slightly so that it can be read
in as valid JSON. For example, the comments have been quoted. With these adjustments, the test driver can be written as a one-liner:
<langsyntaxhighlight lang="jq">if type == "string" then "", . else natural_sort end</langsyntaxhighlight>
{{out}} (scrollable)
<div style="overflow:scroll; height:400px;">
<langsyntaxhighlight lang="sh">jq -r -f Natural_sorting.jq Natural_sorting.json
 
# Ignoring leading spaces
Line 1,985 ⟶ 3,159:
"Wanda",
"The Wind in the Willows"
]</langsyntaxhighlight>
</div>
 
=={{header|Julia}}==
The functional programming principle used was to customize the "lt" comparison option of Julia's basic sort() to the "natural" sort features required.
<syntaxhighlight lang="julia">#1
natural1(x, y) = strip(x) < strip(y)
 
#2
natural2(x, y) = replace(x, r"\s+" => " ") < replace(y, r"\s+" => " ")
 
#3
natural3(x, y) = lowercase(x) < lowercase(y)
 
#4
splitbynum(x) = split(x, r"(?<=\D)(?=\d)|(?<=\d)(?=\D)")
numstringtonum(arr) = [(n = tryparse(Float32, e)) != nothing ? n : e for e in arr]
function natural4(x, y)
xarr = numstringtonum(splitbynum(x))
yarr = numstringtonum(splitbynum(y))
for i in 1:min(length(xarr), length(yarr))
if typeof(xarr[i]) != typeof(yarr[i])
a = string(xarr[i]); b = string(yarr[i])
else
a = xarr[i]; b = yarr[i]
end
if a == b
continue
else
return a < b
end
end
return length(xarr) < length(yarr)
end
 
#5
deart(x) = replace(x, r"^The\s+|^An\s+|^A\s+"i => "")
natural5(x, y) = deart(x) < deart(y)
 
#6
const accentdict = Dict(
'À'=> 'A', 'Á'=> 'A', 'Â'=> 'A', 'Ã'=> 'A', 'Ä'=> 'A',
'Å'=> 'A', 'Ç'=> 'C', 'È'=> 'E', 'É'=> 'E',
'Ê'=> 'E', 'Ë'=> 'E', 'Ì'=> 'I', 'Í'=> 'I', 'Î'=> 'I',
'Ï'=> 'I', 'Ñ'=> 'N', 'Ò'=> 'O', 'Ó'=> 'O',
'Ô'=> 'O', 'Õ'=> 'O', 'Ö'=> 'O', 'Ù'=> 'U', 'Ú'=> 'U',
'Û'=> 'U', 'Ü'=> 'U', 'Ý'=> 'Y', 'à'=> 'a', 'á'=> 'a',
'â'=> 'a', 'ã'=> 'a', 'ä'=> 'a', 'å'=> 'a', 'è'=> 'e',
'é'=> 'e', 'ê'=> 'e', 'ë'=> 'e', 'ì'=> 'i', 'í'=> 'i',
'î'=> 'i', 'ï'=> 'i', 'ð'=> 'd', 'ñ'=> 'n', 'ò'=> 'o',
'ó'=> 'o', 'ô'=> 'o', 'õ'=> 'o', 'ö'=> 'o', 'ù'=> 'u',
'ú'=> 'u', 'û'=> 'u', 'ü'=> 'u', 'ý'=> 'y', 'ÿ'=> 'y')
function tr(str, dict=accentdict)
for (i, ch) in enumerate(str)
if haskey(dict, ch)
arr = split(str, "")
arr[i] = string(dict[ch])
str = join(arr)
end
end
str
end
 
natural6(x, y) = tr(x) < tr(y)
 
#7
const ligaturedict = Dict('œ' => "oe", 'Œ' => "OE", 'æ' => "ae", 'Æ' => "AE", 'IJ' => "IJ")
natural7(x, y) = tr(x, ligaturedict) < tr(y, ligaturedict)
 
#8
const altsdict = Dict('ß' => "ss", 'ſ' => 's', 'ʒ' => 's')
natural8(x, y) = tr(x, altsdict) < tr(y, altsdict)
 
preprocessors = [natural1, natural2, natural2, natural3, natural4, natural5, natural6, natural7, natural8]
 
const testarrays = Vector{Vector{String}}([
["ignore leading spaces: 2-2", " ignore leading spaces: 2-1", " ignore leading spaces: 2+0", " ignore leading spaces: 2+1"],
["ignore m.a.s spaces: 2-2", "ignore m.a.s spaces: 2-1", "ignore m.a.s spaces: 2+0", "ignore m.a.s spaces: 2+1"],
["Equiv. spaces: 3-3", "Equiv.\rspaces: 3-2", "Equiv.\x0cspaces: 3-1", "Equiv.\x0bspaces: 3+0", "Equiv.\nspaces: 3+1", "Equiv.\tspaces: 3+2"],
["cASE INDEPENENT: 3-2", "caSE INDEPENENT: 3-1", "casE INDEPENENT: 3+0", "case INDEPENENT: 3+1"],
["foo100bar99baz0.txt", "foo100bar10baz0.txt", "foo1000bar99baz10.txt", "foo1000bar99baz9.txt"],
["The Wind in the Willows", "The 40th step more", "The 39 steps", "Wanda"],
["Equiv. ý accents: 2-2", "Equiv. Ý accents: 2-1", "Equiv. y accents: 2+0", "Equiv. Y accents: 2+1"],
["IJ ligatured ij", "no ligature"],
["Start with an ʒ: 2-2", "Start with an ſ: 2-1", "Start with an ß: 2+0", "Start with an s: 2+1"]])
 
for (i, ltfunction) in enumerate(preprocessors)
println("Testing sorting mod number $i. Sorted is: $(sort(testarrays[i], lt=ltfunction)).")
end
</syntaxhighlight>{{output}}<pre>
Testing sorting mod number 1. Sorted is: [" ignore leading spaces: 2+0", " ignore leading spaces: 2+1", " ignore leading spaces: 2-1", "ignore leading spaces: 2-2"].
Testing sorting mod number 2. Sorted is: ["ignore m.a.s spaces: 2+0", "ignore m.a.s spaces: 2+1", "ignore m.a.s spaces: 2-1", "ignore m.a.s spaces: 2-2"].
Testing sorting mod number 3. Sorted is: ["Equiv.\vspaces: 3+0", "Equiv.\nspaces: 3+1", "Equiv.\tspaces: 3+2", "Equiv.\fspaces: 3-1", "Equiv.\rspaces: 3-2", "Equiv. spaces: 3-3"].
Testing sorting mod number 4. Sorted is: ["casE INDEPENENT: 3+0", "case INDEPENENT: 3+1", "caSE INDEPENENT: 3-1", "cASE INDEPENENT: 3-2"].
Testing sorting mod number 5. Sorted is: ["foo100bar10baz0.txt", "foo100bar99baz0.txt", "foo1000bar99baz9.txt", "foo1000bar99baz10.txt"].
Testing sorting mod number 6. Sorted is: ["The 39 steps", "The 40th step more", "Wanda", "The Wind in the Willows"].
Testing sorting mod number 7. Sorted is: ["Equiv. Y accents: 2+1", "Equiv. Ý accents: 2-1", "Equiv. y accents: 2+0", "Equiv. ý accents: 2- 2"].
Testing sorting mod number 8. Sorted is: ["IJ ligatured ij", "no ligature"].
Testing sorting mod number 9. Sorted is: ["Start with an s: 2+1", "Start with an ſ: 2-1", "Start with an ʒ: 2-2", "Start with an ß: 2+0"].
</pre>
 
=={{header|Kotlin}}==
<syntaxhighlight lang="scala">// version 1.1.4-3
 
val r2 = Regex("""[ ]{2,}""")
val r3 = Regex("""\s""") // \s represents any whitespace character
val r5 = Regex("""\d+""")
 
/** Only covers ISO-8859-1 accented characters plus (for consistency) Ÿ */
val ucAccented = arrayOf("ÀÁÂÃÄÅ", "Ç", "ÈÉÊË", "ÌÍÎÏ", "Ñ", "ÒÓÔÕÖØ", "ÙÚÛÜ", "ÝŸ")
val lcAccented = arrayOf("àáâãäå", "ç", "èéêë", "ìíîï", "ñ", "òóôõöø", "ùúûü", "ýÿ")
val ucNormal = "ACEINOUY"
val lcNormal = "aceinouy"
 
/** Only the commoner ligatures */
val ucLigatures = "ÆIJŒ"
val lcLigatures = "æijœ"
val ucSeparated = arrayOf("AE", "IJ", "OE")
val lcSeparated = arrayOf("ae", "ij", "oe")
 
/** Miscellaneous replacements */
val miscLetters = "ßſʒ"
val miscReplacements = arrayOf("ss", "s", "s")
 
/** Displays strings including whitespace as if the latter were literal characters */
fun String.toDisplayString(): String {
val whitespace = arrayOf("\t", "\n", "\u000b", "\u000c", "\r")
val whitespace2 = arrayOf("\\t", "\\n", "\\u000b", "\\u000c", "\\r")
var s = this
for (i in 0..4) s = s.replace(whitespace[i], whitespace2[i])
return s
}
 
/** Ignoring leading space(s) */
fun selector1(s: String) = s.trimStart(' ')
 
/** Ignoring multiple adjacent spaces i.e. condensing to a single space */
fun selector2(s: String) = s.replace(r2, " ")
 
/** Equivalent whitespace characters (equivalent to a space say) */
fun selector3(s: String) = s.replace(r3, " ")
 
/** Case independent sort */
fun selector4(s: String) = s.toLowerCase()
 
/** Numeric fields as numerics (deals with up to 20 digits) */
fun selector5(s: String) = r5.replace(s) { it.value.padStart(20, '0') }
 
/** Title sort */
fun selector6(s: String): String {
if (s.startsWith("the ", true)) return s.drop(4)
if (s.startsWith("an ", true)) return s.drop(3)
if (s.startsWith("a ", true)) return s.drop(2)
return s
}
 
/** Equivalent accented characters (and case) */
fun selector7(s: String): String {
val sb = StringBuilder()
outer@ for (c in s) {
for ((i, ucs) in ucAccented.withIndex()) {
if (c in ucs) {
sb.append(ucNormal[i])
continue@outer
}
}
for ((i, lcs) in lcAccented.withIndex()) {
if (c in lcs) {
sb.append(lcNormal[i])
continue@outer
}
}
sb.append(c)
}
return sb.toString().toLowerCase()
}
 
/** Separated ligatures */
fun selector8(s: String): String {
var ss = s
for ((i, c) in ucLigatures.withIndex()) ss = ss.replace(c.toString(), ucSeparated[i])
for ((i, c) in lcLigatures.withIndex()) ss = ss.replace(c.toString(), lcSeparated[i])
return ss
}
 
/** Character replacements */
fun selector9(s: String): String {
var ss = s
for ((i, c) in miscLetters.withIndex()) ss = ss.replace(c.toString(), miscReplacements[i])
return ss
}
 
fun main(args: Array<String>) {
println("The 9 string lists, sorted 'naturally':\n")
val s1 = arrayOf(
"ignore leading spaces: 2-2",
" ignore leading spaces: 2-1",
" ignore leading spaces: 2+0",
" ignore leading spaces: 2+1"
)
s1.sortBy(::selector1)
println(s1.map { "'$it'" }.joinToString("\n"))
 
val s2 = arrayOf(
"ignore m.a.s spaces: 2-2",
"ignore m.a.s spaces: 2-1",
"ignore m.a.s spaces: 2+0",
"ignore m.a.s spaces: 2+1"
)
println()
s2.sortBy(::selector2)
println(s2.map { "'$it'" }.joinToString("\n"))
val s3 = arrayOf(
"Equiv. spaces: 3-3",
"Equiv.\rspaces: 3-2",
"Equiv.\u000cspaces: 3-1",
"Equiv.\u000bspaces: 3+0",
"Equiv.\nspaces: 3+1",
"Equiv.\tspaces: 3+2"
)
println()
s3.sortBy(::selector3)
println(s3.map { "'$it'".toDisplayString() }.joinToString("\n"))
 
val s4 = arrayOf(
"cASE INDEPENENT: 3-2",
"caSE INDEPENENT: 3-1",
"casE INDEPENENT: 3+0",
"case INDEPENENT: 3+1"
)
println()
s4.sortBy(::selector4)
println(s4.map { "'$it'" }.joinToString("\n"))
 
val s5 = arrayOf(
"foo100bar99baz0.txt",
"foo100bar10baz0.txt",
"foo1000bar99baz10.txt",
"foo1000bar99baz9.txt"
)
println()
s5.sortBy(::selector5)
println(s5.map { "'$it'" }.joinToString("\n"))
 
val s6 = arrayOf(
"The Wind in the Willows",
"The 40th step more",
"The 39 steps",
"Wanda"
)
println()
s6.sortBy(::selector6)
println(s6.map { "'$it'" }.joinToString("\n"))
 
val s7 = arrayOf(
"Equiv. ý accents: 2-2",
"Equiv. Ý accents: 2-1",
"Equiv. y accents: 2+0",
"Equiv. Y accents: 2+1"
)
println()
s7.sortBy(::selector7)
println(s7.map { "'$it'" }.joinToString("\n"))
 
val s8 = arrayOf(
"IJ ligatured ij",
"no ligature"
)
println()
s8.sortBy(::selector8)
println(s8.map { "'$it'" }.joinToString("\n"))
 
val s9 = arrayOf(
"Start with an ʒ: 2-2",
"Start with an ſ: 2-1",
"Start with an ß: 2+0",
"Start with an s: 2+1"
)
println()
s9.sortBy(::selector9)
println(s9.map { "'$it'" }.joinToString("\n"))
}</syntaxhighlight>
 
{{out}}
<pre>
The 9 string lists, sorted 'naturally':
 
' ignore leading spaces: 2+0'
' ignore leading spaces: 2+1'
' ignore leading spaces: 2-1'
'ignore leading spaces: 2-2'
 
'ignore m.a.s spaces: 2+0'
'ignore m.a.s spaces: 2+1'
'ignore m.a.s spaces: 2-1'
'ignore m.a.s spaces: 2-2'
 
'Equiv.\u000bspaces: 3+0'
'Equiv.\nspaces: 3+1'
'Equiv.\tspaces: 3+2'
'Equiv.\u000cspaces: 3-1'
'Equiv.\rspaces: 3-2'
'Equiv. spaces: 3-3'
 
'casE INDEPENENT: 3+0'
'case INDEPENENT: 3+1'
'caSE INDEPENENT: 3-1'
'cASE INDEPENENT: 3-2'
 
'foo100bar10baz0.txt'
'foo100bar99baz0.txt'
'foo1000bar99baz9.txt'
'foo1000bar99baz10.txt'
 
'The 39 steps'
'The 40th step more'
'Wanda'
'The Wind in the Willows'
 
'Equiv. y accents: 2+0'
'Equiv. Y accents: 2+1'
'Equiv. Ý accents: 2-1'
'Equiv. ý accents: 2-2'
 
'IJ ligatured ij'
'no ligature'
 
'Start with an s: 2+1'
'Start with an ſ: 2-1'
'Start with an ʒ: 2-2'
'Start with an ß: 2+0'
</pre>
 
=={{header|Nim}}==
All the rules are implemented.
 
To sort a sequence of strings, we can use the function “sort” which accepts a comparison function or the template “sortByIt” which accepts an expression. In order to be able to use both methods, we have defined a function to build a “natural order key” which can be compared. The comparison function “naturalCmp” uses these keys.
 
To build a key, we use the procedure “unidecode” which convert the UTF-8 string into an ASCII string, simplifying the implementation of the other rules. Unfortunately, the letter 'ʒ' is translated as 'z', so we have to convert this letter before calling “unidecode”.
 
<syntaxhighlight lang="nim">import algorithm, parseutils, pegs, strutils, unidecode
 
type
 
Kind = enum fString, fNumber
 
KeyItem= object
case kind: Kind
of fString: str: string
of fNumber: num: Natural
 
Key = seq[KeyItem]
 
 
func cmp(a, b: Key): int =
## Compare two keys.
 
for i in 0..<min(a.len, b.len):
let ai = a[i]
let bi = b[i]
if ai.kind == bi.kind:
result = if ai.kind == fString: cmp(ai.str, bi.str) else: cmp(ai.num, bi.num)
if result != 0: return
else:
return if ai.kind == fString: 1 else: -1
result = if a.len < b.len: -1 else: (if a.len == b.len: 0 else: 1)
 
 
proc natOrderKey(s: string): Key =
## Return the natural order key for a string.
 
# Process 'ʒ' separately as "unidecode" converts it to 'z'.
var s = s.replace("ʒ", "s")
 
# Transform UTF-8 text into ASCII text.
s = s.unidecode()
 
# Remove leading and trailing white spaces.
s = s.strip()
 
# Make all whitespace characters equivalent and remove adjacent spaces.
s = s.replace(peg"\s+", " ")
 
# Switch to lower case.
s = s.toLowerAscii()
 
# Remove leading "the ".
if s.startsWith("the ") and s.len > 4: s = s[4..^1]
 
# Split into fields.
var idx = 0
var val: int
while idx < s.len:
var n = s.skipUntil(Digits, start = idx)
if n != 0:
result.add KeyItem(kind: fString, str: s[idx..<(idx + n)])
inc idx, n
n = s.parseInt(val, start = idx)
if n != 0:
result.add KeyItem(kind: fNumber, num: val)
inc idx, n
 
 
proc naturalCmp(a, b: string): int =
## Natural order comparison function.
cmp(a.natOrderKey, b.natOrderKey)
 
 
when isMainModule:
 
proc test(title: string; list: openArray[string]) =
echo title
echo sorted(list, naturalCmp)
echo ""
 
test("Ignoring leading spaces.",
["ignore leading spaces: 2-2",
"ignore leading spaces: 2-1",
"ignore leading spaces: 2+0",
"ignore leading spaces: 2+1"])
 
test("Ignoring multiple adjacent spaces (MAS).",
["ignore MAS spaces: 2-2",
"ignore MAS spaces: 2-1",
"ignore MAS spaces: 2+0",
"ignore MAS spaces: 2+1"])
 
test("Equivalent whitespace characters.",
["Equiv. spaces: 3-3",
"Equiv. \rspaces: 3-2",
"Equiv. \x0cspaces: 3-1",
"Equiv. \x0bspaces: 3+0",
"Equiv. \nspaces: 3+1",
"Equiv. \tspaces: 3+2"])
 
test("Case Independent sort.",
["cASE INDEPENDENT: 3-2",
"caSE INDEPENDENT: 3-1",
"casE INDEPENDENT: 3+0",
"case INDEPENDENT: 3+1"])
 
test("Numeric fields as numerics.",
["foo100bar99baz0.txt",
"foo100bar10baz0.txt",
"foo1000bar99baz10.txt",
"foo1000bar99baz9.txt"])
 
test("Title sorts.",
["The Wind in the Willows",
"The 40th step more",
"The 39 steps",
"Wanda"])
 
test("Equivalent accented characters (and case).",
["Equiv. ý accents: 2-2",
"Equiv. Ý accents: 2-1",
"Equiv. y accents: 2+0",
"Equiv. Y accents: 2+1"])
 
test("Separated ligatures.",
["IJ ligatured ij",
"no ligature"])
 
test("Character replacements.",
["Start with an ʒ: 2-2",
"Start with an ſ: 2-1",
"Start with an ß: 2+0",
"Start with an s: 2+1"])</syntaxhighlight>
 
{{out}}
<pre>Ignoring leading spaces.
@["ignore leading spaces: 2+0", "ignore leading spaces: 2+1", "ignore leading spaces: 2-1", "ignore leading spaces: 2-2"]
 
Ignoring multiple adjacent spaces (MAS).
@["ignore MAS spaces: 2+0", "ignore MAS spaces: 2+1", "ignore MAS spaces: 2-1", "ignore MAS spaces: 2-2"]
 
Equivalent whitespace characters.
@["Equiv. \vspaces: 3+0", "Equiv. \nspaces: 3+1", "Equiv. \tspaces: 3+2", "Equiv. \fspaces: 3-1", "Equiv. \cspaces: 3-2", "Equiv. spaces: 3-3"]
 
Case Independent sort.
@["casE INDEPENDENT: 3+0", "case INDEPENDENT: 3+1", "caSE INDEPENDENT: 3-1", "cASE INDEPENDENT: 3-2"]
 
Numeric fields as numerics.
@["foo100bar10baz0.txt", "foo100bar99baz0.txt", "foo1000bar99baz9.txt", "foo1000bar99baz10.txt"]
 
Title sorts.
@["The 39 steps", "The 40th step more", "Wanda", "The Wind in the Willows"]
 
Equivalent accented characters (and case).
@["Equiv. y accents: 2+0", "Equiv. Y accents: 2+1", "Equiv. Ý accents: 2-1", "Equiv. ý accents: 2-2"]
 
Separated ligatures.
@["IJ ligatured ij", "no ligature"]
 
Character replacements.
@["Start with an s: 2+1", "Start with an ſ: 2-1", "Start with an ʒ: 2-2", "Start with an ß: 2+0"]
</pre>
 
=={{header|Pascal}}==
Line 1,996 ⟶ 3,666:
 
The "structured" features of Pascal do not facilitate escape from loops, so, ... some <code>goto</code> atavisms appear in what follows...
<syntaxhighlight lang="pascal">
<lang Pascal>
Program Natural; Uses DOS, crt; {Simple selection.}
{Demonstrates a "natural" order of sorting text with nameish parts.}
Line 2,241 ⟶ 3,911:
 
END.
</syntaxhighlight>
</lang>
Output, with "!" instead of a backslash to prevent context confusions here:
Text order Natural order
Line 2,275 ⟶ 3,945:
This implements all 8 requirements<sup>*</sup>:
 
<langsyntaxhighlight lang="perl">
use feature 'fc';
use Unicode::Normalize;
Line 2,298 ⟶ 3,968:
} @items;
}
</syntaxhighlight>
</lang>
 
: <sup>*)</sup> Note that decomposing the strings to the NFKD normalization form and subsequently stripping off all code points of the <code>Nonspacing_Mark</code> category, removes differences caused by accents / ligatures / alternate character forms / etc. in a standards-compliant way. This coincides with all the examples given in the task description, with the exception that it does ''not'' replace "ʒ" with "s" &mdash; one could add <pre style="display:inline;padding:0.3em">$str =~ tr/ʒ/s/;</pre> for that but it seems a bit [[wp:International_Phonetic_Alphabet_chart_for_English_dialects#Chart|whimsical]].)
Line 2,304 ⟶ 3,974:
'''Testing:'''
 
<langsyntaxhighlight lang="perl">
use utf8; # interpret this script's source code as UTF8
use Test::More; # for plan(), is_deeply()
Line 2,335 ⟶ 4,005:
print "\n";
}
</syntaxhighlight>
</lang>
 
{{out}}
Line 2,409 ⟶ 4,079:
</pre>
 
=={{header|Perl 6Phix}}==
As per C, common words anywhere in the string are omitted. All eight features. <br>
{{works with|rakudo|2015-11-02}}
Needs chcp 65001 (or 28591) to get this to work on Windows, be sure to save as utf8.
 
<!--<syntaxhighlight lang="phix">-->
In Perl6 it is very easy to modify the default sorting order by passing in a
<span style="color: #000080;font-style:italic;">--
transform routine to the sort function. If the transform routines are arity one,
-- demo/rosetta/Natural_sorting2.exw
the sort function will apply a Schwartzian Transform so it only needs to calculate
--</span>
the transform once. Note that the transforms are non-destructive; The sort function
<span style="color: #008080;">function</span> <span style="color: #000000;">utf32ch</span><span style="color: #0000FF;">(</span><span style="color: #004080;">sequence</span> <span style="color: #000000;">s</span><span style="color: #0000FF;">)</span>
returns the original strings.
<span style="color: #008080;">for</span> <span style="color: #000000;">i</span><span style="color: #0000FF;">=</span><span style="color: #000000;">1</span> <span style="color: #008080;">to</span> <span style="color: #7060A8;">length</span><span style="color: #0000FF;">(</span><span style="color: #000000;">s</span><span style="color: #0000FF;">)</span> <span style="color: #008080;">do</span>
<span style="color: #000000;">s</span><span style="color: #0000FF;">[</span><span style="color: #000000;">i</span><span style="color: #0000FF;">]</span> <span style="color: #0000FF;">=</span> <span style="color: #000000;">utf8_to_utf32</span><span style="color: #0000FF;">(</span><span style="color: #000000;">s</span><span style="color: #0000FF;">[</span><span style="color: #000000;">i</span><span style="color: #0000FF;">])[</span><span style="color: #000000;">1</span><span style="color: #0000FF;">]</span>
<span style="color: #008080;">end</span> <span style="color: #008080;">for</span>
<span style="color: #008080;">return</span> <span style="color: #000000;">s</span>
<span style="color: #008080;">end</span> <span style="color: #008080;">function</span>
<span style="color: #008080;">constant</span> <span style="color: #000000;">common</span> <span style="color: #0000FF;">=</span> <span style="color: #0000FF;">{</span><span style="color: #008000;">"the"</span><span style="color: #0000FF;">,</span><span style="color: #008000;">"it"</span><span style="color: #0000FF;">,</span><span style="color: #008000;">"to"</span><span style="color: #0000FF;">,</span><span style="color: #008000;">"a"</span><span style="color: #0000FF;">,</span><span style="color: #008000;">"of"</span><span style="color: #0000FF;">,</span><span style="color: #008000;">"is"</span><span style="color: #0000FF;">},</span>
<span style="color: #0000FF;">{</span><span style="color: #000000;">al</span><span style="color: #0000FF;">,</span><span style="color: #000000;">ac_replacements</span><span style="color: #0000FF;">}</span> <span style="color: #0000FF;">=</span> <span style="color: #7060A8;">columnize</span><span style="color: #0000FF;">({</span>
<span style="color: #0000FF;">{</span><span style="color: #008000;">"Æ"</span><span style="color: #0000FF;">,</span><span style="color: #008000;">"AE"</span><span style="color: #0000FF;">},{</span><span style="color: #008000;">"æ"</span><span style="color: #0000FF;">,</span><span style="color: #008000;">"ae"</span><span style="color: #0000FF;">},{</span><span style="color: #008000;">"Þ"</span><span style="color: #0000FF;">,</span><span style="color: #008000;">"TH"</span><span style="color: #0000FF;">},{</span><span style="color: #008000;">"þ"</span><span style="color: #0000FF;">,</span><span style="color: #008000;">"th"</span><span style="color: #0000FF;">},</span>
<span style="color: #0000FF;">{</span><span style="color: #008000;">"Ð"</span><span style="color: #0000FF;">,</span><span style="color: #008000;">"TH"</span><span style="color: #0000FF;">},{</span><span style="color: #008000;">"ð"</span><span style="color: #0000FF;">,</span><span style="color: #008000;">"th"</span><span style="color: #0000FF;">},{</span><span style="color: #008000;">"ß"</span><span style="color: #0000FF;">,</span><span style="color: #008000;">"ss"</span><span style="color: #0000FF;">},{</span><span style="color: #008000;">"�"</span><span style="color: #0000FF;">,</span><span style="color: #008000;">"fi"</span><span style="color: #0000FF;">},</span>
<span style="color: #0000FF;">{</span><span style="color: #008000;">"�"</span><span style="color: #0000FF;">,</span><span style="color: #008000;">"fl"</span><span style="color: #0000FF;">},{</span><span style="color: #008000;">"�"</span><span style="color: #0000FF;">,</span><span style="color: #008000;">'s'</span><span style="color: #0000FF;">},{</span><span style="color: #008000;">"’"</span><span style="color: #0000FF;">,</span><span style="color: #008000;">'z'</span><span style="color: #0000FF;">},</span>
<span style="color: #0000FF;">{</span><span style="color: #008000;">"À"</span><span style="color: #0000FF;">,</span><span style="color: #008000;">'A'</span><span style="color: #0000FF;">},{</span><span style="color: #008000;">"Á"</span><span style="color: #0000FF;">,</span><span style="color: #008000;">'A'</span><span style="color: #0000FF;">},{</span><span style="color: #008000;">"Â"</span><span style="color: #0000FF;">,</span><span style="color: #008000;">'A'</span><span style="color: #0000FF;">},{</span><span style="color: #008000;">"Ã"</span><span style="color: #0000FF;">,</span><span style="color: #008000;">'A'</span><span style="color: #0000FF;">},</span>
<span style="color: #0000FF;">{</span><span style="color: #008000;">"Ä"</span><span style="color: #0000FF;">,</span><span style="color: #008000;">'A'</span><span style="color: #0000FF;">},{</span><span style="color: #008000;">"Å"</span><span style="color: #0000FF;">,</span><span style="color: #008000;">'A'</span><span style="color: #0000FF;">},{</span><span style="color: #008000;">"à"</span><span style="color: #0000FF;">,</span><span style="color: #008000;">'a'</span><span style="color: #0000FF;">},{</span><span style="color: #008000;">"á"</span><span style="color: #0000FF;">,</span><span style="color: #008000;">'a'</span><span style="color: #0000FF;">},</span>
<span style="color: #0000FF;">{</span><span style="color: #008000;">"â"</span><span style="color: #0000FF;">,</span><span style="color: #008000;">'a'</span><span style="color: #0000FF;">},{</span><span style="color: #008000;">"ã"</span><span style="color: #0000FF;">,</span><span style="color: #008000;">'a'</span><span style="color: #0000FF;">},{</span><span style="color: #008000;">"ä"</span><span style="color: #0000FF;">,</span><span style="color: #008000;">'a'</span><span style="color: #0000FF;">},{</span><span style="color: #008000;">"å"</span><span style="color: #0000FF;">,</span><span style="color: #008000;">'a'</span><span style="color: #0000FF;">},</span>
<span style="color: #0000FF;">{</span><span style="color: #008000;">"Ç"</span><span style="color: #0000FF;">,</span><span style="color: #008000;">'C'</span><span style="color: #0000FF;">},{</span><span style="color: #008000;">"ç"</span><span style="color: #0000FF;">,</span><span style="color: #008000;">'c'</span><span style="color: #0000FF;">},{</span><span style="color: #008000;">"È"</span><span style="color: #0000FF;">,</span><span style="color: #008000;">'E'</span><span style="color: #0000FF;">},{</span><span style="color: #008000;">"É"</span><span style="color: #0000FF;">,</span><span style="color: #008000;">'E'</span><span style="color: #0000FF;">},</span>
<span style="color: #0000FF;">{</span><span style="color: #008000;">"Ê"</span><span style="color: #0000FF;">,</span><span style="color: #008000;">'E'</span><span style="color: #0000FF;">},{</span><span style="color: #008000;">"Ë"</span><span style="color: #0000FF;">,</span><span style="color: #008000;">'E'</span><span style="color: #0000FF;">},{</span><span style="color: #008000;">"è"</span><span style="color: #0000FF;">,</span><span style="color: #008000;">'e'</span><span style="color: #0000FF;">},{</span><span style="color: #008000;">"é"</span><span style="color: #0000FF;">,</span><span style="color: #008000;">'e'</span><span style="color: #0000FF;">},</span>
<span style="color: #0000FF;">{</span><span style="color: #008000;">"ê"</span><span style="color: #0000FF;">,</span><span style="color: #008000;">'e'</span><span style="color: #0000FF;">},{</span><span style="color: #008000;">"ë"</span><span style="color: #0000FF;">,</span><span style="color: #008000;">'e'</span><span style="color: #0000FF;">},{</span><span style="color: #008000;">"Ì"</span><span style="color: #0000FF;">,</span><span style="color: #008000;">'I'</span><span style="color: #0000FF;">},{</span><span style="color: #008000;">"Í"</span><span style="color: #0000FF;">,</span><span style="color: #008000;">'I'</span><span style="color: #0000FF;">},</span>
<span style="color: #0000FF;">{</span><span style="color: #008000;">"Î"</span><span style="color: #0000FF;">,</span><span style="color: #008000;">'I'</span><span style="color: #0000FF;">},{</span><span style="color: #008000;">"Ï"</span><span style="color: #0000FF;">,</span><span style="color: #008000;">'I'</span><span style="color: #0000FF;">},{</span><span style="color: #008000;">"ì"</span><span style="color: #0000FF;">,</span><span style="color: #008000;">'i'</span><span style="color: #0000FF;">},{</span><span style="color: #008000;">"í"</span><span style="color: #0000FF;">,</span><span style="color: #008000;">'i'</span><span style="color: #0000FF;">},</span>
<span style="color: #0000FF;">{</span><span style="color: #008000;">"î"</span><span style="color: #0000FF;">,</span><span style="color: #008000;">'i'</span><span style="color: #0000FF;">},{</span><span style="color: #008000;">"ï"</span><span style="color: #0000FF;">,</span><span style="color: #008000;">'i'</span><span style="color: #0000FF;">},{</span><span style="color: #008000;">"Ò"</span><span style="color: #0000FF;">,</span><span style="color: #008000;">'O'</span><span style="color: #0000FF;">},{</span><span style="color: #008000;">"Ó"</span><span style="color: #0000FF;">,</span><span style="color: #008000;">'O'</span><span style="color: #0000FF;">},</span>
<span style="color: #0000FF;">{</span><span style="color: #008000;">"Ô"</span><span style="color: #0000FF;">,</span><span style="color: #008000;">'O'</span><span style="color: #0000FF;">},{</span><span style="color: #008000;">"Õ"</span><span style="color: #0000FF;">,</span><span style="color: #008000;">'O'</span><span style="color: #0000FF;">},{</span><span style="color: #008000;">"Ö"</span><span style="color: #0000FF;">,</span><span style="color: #008000;">'O'</span><span style="color: #0000FF;">},{</span><span style="color: #008000;">"Ø"</span><span style="color: #0000FF;">,</span><span style="color: #008000;">'O'</span><span style="color: #0000FF;">},</span>
<span style="color: #0000FF;">{</span><span style="color: #008000;">"ò"</span><span style="color: #0000FF;">,</span><span style="color: #008000;">'o'</span><span style="color: #0000FF;">},{</span><span style="color: #008000;">"ó"</span><span style="color: #0000FF;">,</span><span style="color: #008000;">'o'</span><span style="color: #0000FF;">},{</span><span style="color: #008000;">"ô"</span><span style="color: #0000FF;">,</span><span style="color: #008000;">'o'</span><span style="color: #0000FF;">},{</span><span style="color: #008000;">"õ"</span><span style="color: #0000FF;">,</span><span style="color: #008000;">'o'</span><span style="color: #0000FF;">},</span>
<span style="color: #0000FF;">{</span><span style="color: #008000;">"ö"</span><span style="color: #0000FF;">,</span><span style="color: #008000;">'o'</span><span style="color: #0000FF;">},{</span><span style="color: #008000;">"ø"</span><span style="color: #0000FF;">,</span><span style="color: #008000;">'o'</span><span style="color: #0000FF;">},{</span><span style="color: #008000;">"Ñ"</span><span style="color: #0000FF;">,</span><span style="color: #008000;">'N'</span><span style="color: #0000FF;">},{</span><span style="color: #008000;">"ñ"</span><span style="color: #0000FF;">,</span><span style="color: #008000;">'n'</span><span style="color: #0000FF;">},</span>
<span style="color: #0000FF;">{</span><span style="color: #008000;">"Ù"</span><span style="color: #0000FF;">,</span><span style="color: #008000;">'U'</span><span style="color: #0000FF;">},{</span><span style="color: #008000;">"Ú"</span><span style="color: #0000FF;">,</span><span style="color: #008000;">'U'</span><span style="color: #0000FF;">},{</span><span style="color: #008000;">"Û"</span><span style="color: #0000FF;">,</span><span style="color: #008000;">'U'</span><span style="color: #0000FF;">},{</span><span style="color: #008000;">"Ü"</span><span style="color: #0000FF;">,</span><span style="color: #008000;">'U'</span><span style="color: #0000FF;">},</span>
<span style="color: #0000FF;">{</span><span style="color: #008000;">"ù"</span><span style="color: #0000FF;">,</span><span style="color: #008000;">'u'</span><span style="color: #0000FF;">},{</span><span style="color: #008000;">"ú"</span><span style="color: #0000FF;">,</span><span style="color: #008000;">'u'</span><span style="color: #0000FF;">},{</span><span style="color: #008000;">"û"</span><span style="color: #0000FF;">,</span><span style="color: #008000;">'u'</span><span style="color: #0000FF;">},{</span><span style="color: #008000;">"ü"</span><span style="color: #0000FF;">,</span><span style="color: #008000;">'u'</span><span style="color: #0000FF;">},</span>
<span style="color: #0000FF;">{</span><span style="color: #008000;">"Ý"</span><span style="color: #0000FF;">,</span><span style="color: #008000;">'Y'</span><span style="color: #0000FF;">},{</span><span style="color: #008000;">"ÿ"</span><span style="color: #0000FF;">,</span><span style="color: #008000;">'y'</span><span style="color: #0000FF;">},{</span><span style="color: #008000;">"ý"</span><span style="color: #0000FF;">,</span><span style="color: #008000;">'y'</span><span style="color: #0000FF;">}}),</span>
<span style="color: #000000;">accents_and_ligatures</span> <span style="color: #0000FF;">=</span> <span style="color: #000000;">utf32ch</span><span style="color: #0000FF;">(</span><span style="color: #000000;">al</span><span style="color: #0000FF;">)</span>
<span style="color: #008080;">function</span> <span style="color: #000000;">normalise</span><span style="color: #0000FF;">(</span><span style="color: #004080;">string</span> <span style="color: #000000;">s</span><span style="color: #0000FF;">)</span>
<span style="color: #004080;">sequence</span> <span style="color: #000000;">utf32</span> <span style="color: #0000FF;">=</span> <span style="color: #000000;">utf8_to_utf32</span><span style="color: #0000FF;">(</span><span style="color: #000000;">s</span><span style="color: #0000FF;">)</span>
<span style="color: #004080;">sequence</span> <span style="color: #000000;">res</span> <span style="color: #0000FF;">=</span> <span style="color: #0000FF;">{}</span>
<span style="color: #004080;">integer</span> <span style="color: #000000;">i</span> <span style="color: #0000FF;">=</span> <span style="color: #000000;">1</span><span style="color: #0000FF;">,</span> <span style="color: #000000;">ch</span><span style="color: #0000FF;">,</span> <span style="color: #000000;">prev</span>
<span style="color: #008080;">for</span> <span style="color: #000000;">i</span><span style="color: #0000FF;">=</span><span style="color: #000000;">1</span> <span style="color: #008080;">to</span> <span style="color: #7060A8;">length</span><span style="color: #0000FF;">(</span><span style="color: #000000;">utf32</span><span style="color: #0000FF;">)</span> <span style="color: #008080;">do</span>
<span style="color: #000000;">ch</span> <span style="color: #0000FF;">=</span> <span style="color: #000000;">utf32</span><span style="color: #0000FF;">[</span><span style="color: #000000;">i</span><span style="color: #0000FF;">]</span>
<span style="color: #008080;">if</span> <span style="color: #7060A8;">find</span><span style="color: #0000FF;">(</span><span style="color: #000000;">ch</span><span style="color: #0000FF;">,</span><span style="color: #008000;">" \t\r\n\x0b\x0c"</span><span style="color: #0000FF;">)</span> <span style="color: #008080;">then</span>
<span style="color: #008080;">if</span> <span style="color: #7060A8;">length</span><span style="color: #0000FF;">(</span><span style="color: #000000;">res</span><span style="color: #0000FF;">)></span><span style="color: #000000;">0</span> <span style="color: #008080;">and</span> <span style="color: #000000;">prev</span><span style="color: #0000FF;">!=</span><span style="color: #008000;">' '</span> <span style="color: #008080;">then</span>
<span style="color: #000000;">res</span> <span style="color: #0000FF;">&=</span> <span style="color: #0000FF;">-</span><span style="color: #000000;">1</span>
<span style="color: #008080;">end</span> <span style="color: #008080;">if</span>
<span style="color: #000000;">prev</span> <span style="color: #0000FF;">=</span> <span style="color: #008000;">' '</span>
<span style="color: #008080;">elsif</span> <span style="color: #7060A8;">find</span><span style="color: #0000FF;">(</span><span style="color: #000000;">ch</span><span style="color: #0000FF;">,</span><span style="color: #008000;">"0123456789"</span><span style="color: #0000FF;">)</span> <span style="color: #008080;">then</span>
<span style="color: #008080;">if</span> <span style="color: #7060A8;">length</span><span style="color: #0000FF;">(</span><span style="color: #000000;">res</span><span style="color: #0000FF;">)=</span><span style="color: #000000;">0</span> <span style="color: #008080;">or</span> <span style="color: #000000;">prev</span><span style="color: #0000FF;">!=</span><span style="color: #008000;">'0'</span> <span style="color: #008080;">then</span>
<span style="color: #000000;">res</span> <span style="color: #0000FF;">&=</span> <span style="color: #000000;">ch</span><span style="color: #0000FF;">-</span><span style="color: #008000;">'0'</span>
<span style="color: #008080;">else</span>
<span style="color: #000000;">res</span><span style="color: #0000FF;">[$]</span> <span style="color: #0000FF;">=</span> <span style="color: #000000;">res</span><span style="color: #0000FF;">[$]*</span><span style="color: #000000;">10</span><span style="color: #0000FF;">+</span><span style="color: #000000;">ch</span><span style="color: #0000FF;">-</span><span style="color: #008000;">'0'</span>
<span style="color: #008080;">end</span> <span style="color: #008080;">if</span>
<span style="color: #000000;">prev</span> <span style="color: #0000FF;">=</span> <span style="color: #008000;">'0'</span>
<span style="color: #008080;">else</span>
<span style="color: #004080;">object</span> <span style="color: #000000;">rep</span> <span style="color: #0000FF;">=</span> <span style="color: #7060A8;">find</span><span style="color: #0000FF;">(</span><span style="color: #000000;">ch</span><span style="color: #0000FF;">,</span><span style="color: #000000;">accents_and_ligatures</span><span style="color: #0000FF;">)</span>
<span style="color: #008080;">if</span> <span style="color: #000000;">rep</span> <span style="color: #008080;">then</span>
<span style="color: #000000;">rep</span> <span style="color: #0000FF;">=</span> <span style="color: #7060A8;">lower</span><span style="color: #0000FF;">(</span><span style="color: #000000;">ac_replacements</span><span style="color: #0000FF;">[</span><span style="color: #000000;">rep</span><span style="color: #0000FF;">])</span>
<span style="color: #008080;">else</span>
<span style="color: #000000;">rep</span> <span style="color: #0000FF;">=</span> <span style="color: #7060A8;">lower</span><span style="color: #0000FF;">(</span><span style="color: #000000;">ch</span><span style="color: #0000FF;">)</span>
<span style="color: #008080;">end</span> <span style="color: #008080;">if</span>
<span style="color: #008080;">if</span> <span style="color: #7060A8;">length</span><span style="color: #0000FF;">(</span><span style="color: #000000;">res</span><span style="color: #0000FF;">)</span> <span style="color: #008080;">and</span> <span style="color: #004080;">sequence</span><span style="color: #0000FF;">(</span><span style="color: #000000;">res</span><span style="color: #0000FF;">[$])</span> <span style="color: #008080;">then</span>
<span style="color: #000000;">res</span><span style="color: #0000FF;">[$]</span> <span style="color: #0000FF;">&=</span> <span style="color: #000000;">rep</span>
<span style="color: #008080;">else</span>
<span style="color: #000000;">res</span> <span style="color: #0000FF;">=</span> <span style="color: #7060A8;">append</span><span style="color: #0000FF;">(</span><span style="color: #000000;">res</span><span style="color: #0000FF;">,</span><span style="color: #008000;">""</span><span style="color: #0000FF;">&</span><span style="color: #000000;">rep</span><span style="color: #0000FF;">)</span>
<span style="color: #008080;">end</span> <span style="color: #008080;">if</span>
<span style="color: #000000;">prev</span> <span style="color: #0000FF;">=</span> <span style="color: #000000;">ch</span>
<span style="color: #008080;">end</span> <span style="color: #008080;">if</span>
<span style="color: #008080;">end</span> <span style="color: #008080;">for</span>
<span style="color: #008080;">for</span> <span style="color: #000000;">i</span><span style="color: #0000FF;">=</span><span style="color: #000000;">1</span> <span style="color: #008080;">to</span> <span style="color: #7060A8;">length</span><span style="color: #0000FF;">(</span><span style="color: #000000;">common</span><span style="color: #0000FF;">)</span> <span style="color: #008080;">do</span>
<span style="color: #008080;">while</span> <span style="color: #000000;">1</span> <span style="color: #008080;">do</span>
<span style="color: #004080;">integer</span> <span style="color: #000000;">k</span> <span style="color: #0000FF;">=</span> <span style="color: #7060A8;">find</span><span style="color: #0000FF;">(</span><span style="color: #000000;">common</span><span style="color: #0000FF;">[</span><span style="color: #000000;">i</span><span style="color: #0000FF;">],</span><span style="color: #000000;">res</span><span style="color: #0000FF;">)</span>
<span style="color: #008080;">if</span> <span style="color: #000000;">k</span><span style="color: #0000FF;">=</span><span style="color: #000000;">0</span> <span style="color: #008080;">then</span> <span style="color: #008080;">exit</span> <span style="color: #008080;">end</span> <span style="color: #008080;">if</span>
<span style="color: #000000;">res</span><span style="color: #0000FF;">[</span><span style="color: #000000;">k</span><span style="color: #0000FF;">..</span><span style="color: #000000;">k</span><span style="color: #0000FF;">]</span> <span style="color: #0000FF;">=</span> <span style="color: #0000FF;">{}</span>
<span style="color: #008080;">if</span> <span style="color: #7060A8;">length</span><span style="color: #0000FF;">(</span><span style="color: #000000;">res</span><span style="color: #0000FF;">)</span> <span style="color: #008080;">and</span> <span style="color: #000000;">res</span><span style="color: #0000FF;">[</span><span style="color: #000000;">1</span><span style="color: #0000FF;">]=-</span><span style="color: #000000;">1</span> <span style="color: #008080;">then</span>
<span style="color: #000000;">res</span> <span style="color: #0000FF;">=</span> <span style="color: #000000;">res</span><span style="color: #0000FF;">[</span><span style="color: #000000;">2</span><span style="color: #0000FF;">..$]</span>
<span style="color: #008080;">end</span> <span style="color: #008080;">if</span>
<span style="color: #008080;">end</span> <span style="color: #008080;">while</span>
<span style="color: #008080;">end</span> <span style="color: #008080;">for</span>
<span style="color: #008080;">if</span> <span style="color: #7060A8;">length</span><span style="color: #0000FF;">(</span><span style="color: #000000;">res</span><span style="color: #0000FF;">)</span> <span style="color: #008080;">and</span> <span style="color: #000000;">prev</span><span style="color: #0000FF;">=</span><span style="color: #008000;">' '</span> <span style="color: #008080;">then</span>
<span style="color: #000000;">res</span> <span style="color: #0000FF;">=</span> <span style="color: #000000;">res</span><span style="color: #0000FF;">[</span><span style="color: #000000;">1</span><span style="color: #0000FF;">..$-</span><span style="color: #000000;">1</span><span style="color: #0000FF;">]</span>
<span style="color: #008080;">end</span> <span style="color: #008080;">if</span>
<span style="color: #008080;">return</span> <span style="color: #000000;">res</span>
<span style="color: #008080;">end</span> <span style="color: #008080;">function</span>
<span style="color: #004080;">sequence</span> <span style="color: #000000;">tests</span> <span style="color: #0000FF;">=</span> <span style="color: #0000FF;">{</span>
<span style="color: #0000FF;">{</span><span style="color: #008000;">" leading spaces: 4"</span><span style="color: #0000FF;">,</span>
<span style="color: #008000;">" leading spaces: 3"</span><span style="color: #0000FF;">,</span>
<span style="color: #008000;">"leading spaces: 2"</span><span style="color: #0000FF;">,</span>
<span style="color: #008000;">" leading spaces: 1"</span><span style="color: #0000FF;">},</span>
<span style="color: #0000FF;">{</span><span style="color: #008000;">"adjacent spaces: 3"</span><span style="color: #0000FF;">,</span>
<span style="color: #008000;">"adjacent spaces: 4"</span><span style="color: #0000FF;">,</span>
<span style="color: #008000;">"adjacent spaces: 1"</span><span style="color: #0000FF;">,</span>
<span style="color: #008000;">"adjacent spaces: 2"</span><span style="color: #0000FF;">},</span>
<span style="color: #0000FF;">{</span><span style="color: #008000;">"white space: 3-2"</span><span style="color: #0000FF;">,</span>
<span style="color: #008000;">"white\r space: 3-3"</span><span style="color: #0000FF;">,</span>
<span style="color: #008000;">"white\x0cspace: 3-1"</span><span style="color: #0000FF;">,</span>
<span style="color: #008000;">"white\x0bspace: 3+0"</span><span style="color: #0000FF;">,</span>
<span style="color: #008000;">"white\n space: 3+1"</span><span style="color: #0000FF;">,</span>
<span style="color: #008000;">"white\t space: 3+2"</span><span style="color: #0000FF;">},</span>
<span style="color: #0000FF;">{</span><span style="color: #008000;">"caSE independent: 3-1"</span><span style="color: #0000FF;">,</span>
<span style="color: #008000;">"cASE independent: 3-2"</span><span style="color: #0000FF;">,</span>
<span style="color: #008000;">"casE independent: 3+0"</span><span style="color: #0000FF;">,</span>
<span style="color: #008000;">"case independent: 3+1"</span><span style="color: #0000FF;">},</span>
<span style="color: #0000FF;">{</span><span style="color: #008000;">"foo1000bar99baz9.txt"</span><span style="color: #0000FF;">,</span>
<span style="color: #008000;">"foo100bar99baz0.txt"</span><span style="color: #0000FF;">,</span>
<span style="color: #008000;">"foo100bar10baz0.txt"</span><span style="color: #0000FF;">,</span>
<span style="color: #008000;">"foo1000bar99baz10.txt"</span><span style="color: #0000FF;">},</span>
<span style="color: #0000FF;">{</span><span style="color: #008000;">"foo1bar"</span><span style="color: #0000FF;">,</span>
<span style="color: #008000;">"foo100bar"</span><span style="color: #0000FF;">,</span>
<span style="color: #008000;">"foo bar"</span><span style="color: #0000FF;">,</span>
<span style="color: #008000;">"foo1000bar"</span><span style="color: #0000FF;">},</span>
<span style="color: #0000FF;">{</span><span style="color: #008000;">"The Wind in the Willows"</span><span style="color: #0000FF;">,</span>
<span style="color: #008000;">"The 40th step more"</span><span style="color: #0000FF;">,</span>
<span style="color: #008000;">"The 39 steps"</span><span style="color: #0000FF;">,</span>
<span style="color: #008000;">"Wanda"</span><span style="color: #0000FF;">},</span>
<span style="color: #0000FF;">{</span><span style="color: #008000;">"ignore ý accents: 2-2"</span><span style="color: #0000FF;">,</span>
<span style="color: #008000;">"ignore Ý accents: 2-1"</span><span style="color: #0000FF;">,</span>
<span style="color: #008000;">"ignore y accents: 2+0"</span><span style="color: #0000FF;">,</span>
<span style="color: #008000;">"ignore Y accents: 2+1"</span><span style="color: #0000FF;">},</span>
<span style="color: #0000FF;">{</span><span style="color: #008000;">"Ball"</span><span style="color: #0000FF;">,</span><span style="color: #008000;">"Card"</span><span style="color: #0000FF;">,</span><span style="color: #008000;">"above"</span><span style="color: #0000FF;">,</span><span style="color: #008000;">"aether"</span><span style="color: #0000FF;">,</span>
<span style="color: #008000;">"apple"</span><span style="color: #0000FF;">,</span><span style="color: #008000;">"autumn"</span><span style="color: #0000FF;">,</span><span style="color: #008000;">"außen"</span><span style="color: #0000FF;">,</span><span style="color: #008000;">"bald"</span><span style="color: #0000FF;">,</span>
<span style="color: #008000;">"car"</span><span style="color: #0000FF;">,</span><span style="color: #008000;">"e-mail"</span><span style="color: #0000FF;">,</span><span style="color: #008000;">"evoke"</span><span style="color: #0000FF;">,</span><span style="color: #008000;">"nina"</span><span style="color: #0000FF;">,</span>
<span style="color: #008000;">"niño"</span><span style="color: #0000FF;">,</span><span style="color: #008000;">"Æon"</span><span style="color: #0000FF;">,</span><span style="color: #008000;">"Évian"</span><span style="color: #0000FF;">,</span><span style="color: #008000;">"æon"</span><span style="color: #0000FF;">},</span>
<span style="color: #0000FF;">}</span>
<span style="color: #004080;">sequence</span> <span style="color: #000000;">s</span><span style="color: #0000FF;">,</span> <span style="color: #000000;">n</span><span style="color: #0000FF;">,</span> <span style="color: #000000;">t</span><span style="color: #0000FF;">,</span> <span style="color: #000000;">tags</span>
<span style="color: #008080;">function</span> <span style="color: #000000;">natural</span><span style="color: #0000FF;">(</span><span style="color: #004080;">integer</span> <span style="color: #000000;">i</span><span style="color: #0000FF;">,</span> <span style="color: #004080;">integer</span> <span style="color: #000000;">j</span><span style="color: #0000FF;">)</span>
<span style="color: #008080;">return</span> <span style="color: #000000;">compare</span><span style="color: #0000FF;">(</span><span style="color: #000000;">t</span><span style="color: #0000FF;">[</span><span style="color: #000000;">i</span><span style="color: #0000FF;">],</span><span style="color: #000000;">t</span><span style="color: #0000FF;">[</span><span style="color: #000000;">j</span><span style="color: #0000FF;">])</span>
<span style="color: #008080;">end</span> <span style="color: #008080;">function</span>
<span style="color: #008080;">for</span> <span style="color: #000000;">i</span><span style="color: #0000FF;">=</span><span style="color: #000000;">1</span> <span style="color: #008080;">to</span> <span style="color: #7060A8;">length</span><span style="color: #0000FF;">(</span><span style="color: #000000;">tests</span><span style="color: #0000FF;">)</span> <span style="color: #008080;">do</span>
<span style="color: #000000;">s</span> <span style="color: #0000FF;">=</span> <span style="color: #000000;">tests</span><span style="color: #0000FF;">[</span><span style="color: #000000;">i</span><span style="color: #0000FF;">]</span>
<span style="color: #000000;">n</span> <span style="color: #0000FF;">=</span> <span style="color: #000000;">sort</span><span style="color: #0000FF;">(</span><span style="color: #000000;">s</span><span style="color: #0000FF;">)</span>
<span style="color: #000000;">t</span> <span style="color: #0000FF;">=</span> <span style="color: #7060A8;">repeat</span><span style="color: #0000FF;">(</span><span style="color: #000000;">0</span><span style="color: #0000FF;">,</span><span style="color: #7060A8;">length</span><span style="color: #0000FF;">(</span><span style="color: #000000;">s</span><span style="color: #0000FF;">))</span>
<span style="color: #008080;">for</span> <span style="color: #000000;">j</span><span style="color: #0000FF;">=</span><span style="color: #000000;">1</span> <span style="color: #008080;">to</span> <span style="color: #7060A8;">length</span><span style="color: #0000FF;">(</span><span style="color: #000000;">s</span><span style="color: #0000FF;">)</span> <span style="color: #008080;">do</span>
<span style="color: #000000;">t</span><span style="color: #0000FF;">[</span><span style="color: #000000;">j</span><span style="color: #0000FF;">]</span> <span style="color: #0000FF;">=</span> <span style="color: #000000;">normalise</span><span style="color: #0000FF;">(</span><span style="color: #000000;">s</span><span style="color: #0000FF;">[</span><span style="color: #000000;">j</span><span style="color: #0000FF;">])</span>
<span style="color: #008080;">end</span> <span style="color: #008080;">for</span>
<span style="color: #000000;">tags</span> <span style="color: #0000FF;">=</span> <span style="color: #000000;">custom_sort</span><span style="color: #0000FF;">(</span><span style="color: #7060A8;">routine_id</span><span style="color: #0000FF;">(</span><span style="color: #008000;">"natural"</span><span style="color: #0000FF;">),</span><span style="color: #7060A8;">tagset</span><span style="color: #0000FF;">(</span><span style="color: #7060A8;">length</span><span style="color: #0000FF;">(</span><span style="color: #000000;">s</span><span style="color: #0000FF;">)))</span>
<span style="color: #008080;">if</span> <span style="color: #000000;">i</span><span style="color: #0000FF;">=</span><span style="color: #000000;">3</span> <span style="color: #008080;">then</span> <span style="color: #000080;font-style:italic;">-- clean up the whitespace mess</span>
<span style="color: #008080;">for</span> <span style="color: #000000;">j</span><span style="color: #0000FF;">=</span><span style="color: #000000;">1</span> <span style="color: #008080;">to</span> <span style="color: #7060A8;">length</span><span style="color: #0000FF;">(</span><span style="color: #000000;">s</span><span style="color: #0000FF;">)</span> <span style="color: #008080;">do</span>
<span style="color: #000000;">s</span><span style="color: #0000FF;">[</span><span style="color: #000000;">j</span><span style="color: #0000FF;">]</span> <span style="color: #0000FF;">=</span> <span style="color: #7060A8;">substitute_all</span><span style="color: #0000FF;">(</span><span style="color: #000000;">s</span><span style="color: #0000FF;">[</span><span style="color: #000000;">j</span><span style="color: #0000FF;">],{</span><span style="color: #008000;">"\r"</span><span style="color: #0000FF;">,</span><span style="color: #008000;">"\x0c"</span><span style="color: #0000FF;">,</span><span style="color: #008000;">"\x0b"</span><span style="color: #0000FF;">,</span><span style="color: #008000;">"\n"</span><span style="color: #0000FF;">,</span><span style="color: #008000;">"\t"</span><span style="color: #0000FF;">},{</span><span style="color: #008000;">"\\r"</span><span style="color: #0000FF;">,</span><span style="color: #008000;">"\\x0c"</span><span style="color: #0000FF;">,</span><span style="color: #008000;">"\\x0b"</span><span style="color: #0000FF;">,</span><span style="color: #008000;">"\\n"</span><span style="color: #0000FF;">,</span><span style="color: #008000;">"\\t"</span><span style="color: #0000FF;">})</span>
<span style="color: #000000;">n</span><span style="color: #0000FF;">[</span><span style="color: #000000;">j</span><span style="color: #0000FF;">]</span> <span style="color: #0000FF;">=</span> <span style="color: #7060A8;">substitute_all</span><span style="color: #0000FF;">(</span><span style="color: #000000;">n</span><span style="color: #0000FF;">[</span><span style="color: #000000;">j</span><span style="color: #0000FF;">],{</span><span style="color: #008000;">"\r"</span><span style="color: #0000FF;">,</span><span style="color: #008000;">"\x0c"</span><span style="color: #0000FF;">,</span><span style="color: #008000;">"\x0b"</span><span style="color: #0000FF;">,</span><span style="color: #008000;">"\n"</span><span style="color: #0000FF;">,</span><span style="color: #008000;">"\t"</span><span style="color: #0000FF;">},{</span><span style="color: #008000;">"\\r"</span><span style="color: #0000FF;">,</span><span style="color: #008000;">"\\x0c"</span><span style="color: #0000FF;">,</span><span style="color: #008000;">"\\x0b"</span><span style="color: #0000FF;">,</span><span style="color: #008000;">"\\n"</span><span style="color: #0000FF;">,</span><span style="color: #008000;">"\\t"</span><span style="color: #0000FF;">})</span>
<span style="color: #008080;">end</span> <span style="color: #008080;">for</span>
<span style="color: #008080;">end</span> <span style="color: #008080;">if</span>
<span style="color: #7060A8;">printf</span><span style="color: #0000FF;">(</span><span style="color: #000000;">1</span><span style="color: #0000FF;">,</span><span style="color: #008000;">"%-30s %-30s %-30s\n"</span><span style="color: #0000FF;">,{</span><span style="color: #008000;">"original"</span><span style="color: #0000FF;">,</span><span style="color: #008000;">"normal"</span><span style="color: #0000FF;">,</span><span style="color: #008000;">"natural"</span><span style="color: #0000FF;">})</span>
<span style="color: #7060A8;">printf</span><span style="color: #0000FF;">(</span><span style="color: #000000;">1</span><span style="color: #0000FF;">,</span><span style="color: #008000;">"%-30s %-30s %-30s\n"</span><span style="color: #0000FF;">,{</span><span style="color: #008000;">"========"</span><span style="color: #0000FF;">,</span><span style="color: #008000;">"======"</span><span style="color: #0000FF;">,</span><span style="color: #008000;">"======="</span><span style="color: #0000FF;">})</span>
<span style="color: #008080;">for</span> <span style="color: #000000;">k</span><span style="color: #0000FF;">=</span><span style="color: #000000;">1</span> <span style="color: #008080;">to</span> <span style="color: #7060A8;">length</span><span style="color: #0000FF;">(</span><span style="color: #000000;">tags</span><span style="color: #0000FF;">)</span> <span style="color: #008080;">do</span>
<span style="color: #7060A8;">printf</span><span style="color: #0000FF;">(</span><span style="color: #000000;">1</span><span style="color: #0000FF;">,</span><span style="color: #008000;">"%-30s|%-30s|%-30s\n"</span><span style="color: #0000FF;">,{</span><span style="color: #000000;">s</span><span style="color: #0000FF;">[</span><span style="color: #000000;">k</span><span style="color: #0000FF;">],</span><span style="color: #000000;">n</span><span style="color: #0000FF;">[</span><span style="color: #000000;">k</span><span style="color: #0000FF;">],</span><span style="color: #000000;">s</span><span style="color: #0000FF;">[</span><span style="color: #000000;">tags</span><span style="color: #0000FF;">[</span><span style="color: #000000;">k</span><span style="color: #0000FF;">]]})</span>
<span style="color: #008080;">end</span> <span style="color: #008080;">for</span>
<span style="color: #7060A8;">puts</span><span style="color: #0000FF;">(</span><span style="color: #000000;">1</span><span style="color: #0000FF;">,</span><span style="color: #008000;">"\n"</span><span style="color: #0000FF;">)</span>
<span style="color: #008080;">end</span> <span style="color: #008080;">for</span>
<!--</syntaxhighlight>-->
 
{{Out}}
The following are a series of subroutines to perform the various natural
sorting transforms. They may be applied individually or mixed and matched
to get the particular result desired. When more than one is strung
together, they apply left to right. Some combinations may yield
different results depending on the order they are applied.
 
<lang perl6># Sort groups of digits in number order. Sort by order of magnitude then lexically.
sub naturally ($a) { $a.lc.subst(/(\d+)/, ->$/ {0~$0.chars.chr~$0},:g) ~"\x0"~$a }
 
# Collapse multiple ws characters to a single.
sub collapse ($a) { $a.subst( / ( \s ) $0+ /, -> $/ { $0 }, :g ) }
 
# Convert all ws characters to a space.
sub normalize ($a) { $a.subst( / ( \s ) /, ' ', :g ) }
 
# Ignore common leading articles for title sorts
sub title ($a) { $a.subst( / :i ^ ( a | an | the ) >> \s* /, '' ) }
 
# Decompose ISO-Latin1 glyphs to their base character.
sub latin1_decompose ($a) {
$a.trans: <
Æ AE æ ae Þ TH þ th Ð TH ð th ß ss À A Á A Â A Ã A Ä A Å A à a á a
â a ã a ä a å a Ç C ç c È E É E Ê E Ë E è e é e ê e ë e Ì I Í I Î
I Ï I ì i í i î i ï i Ò O Ó O Ô O Õ O Ö O Ø O ò o ó o ô o õ o ö o
ø o Ñ N ñ n Ù U Ú U Û U Ü U ù u ú u û u ü u Ý Y ÿ y ý y
>.hash;
}
 
# Used as:
 
my @tests = (
[
"Task 1a\nSort while ignoring leading spaces.",
[
'ignore leading spaces: 1', ' ignore leading spaces: 4',
' ignore leading spaces: 3', ' ignore leading spaces: 2'
],
{.trim} # builtin method.
],
[
"Task 1b\nSort while ignoring multiple adjacent spaces.",
[
'ignore m.a.s spaces: 3', 'ignore m.a.s spaces: 1',
'ignore m.a.s spaces: 4', 'ignore m.a.s spaces: 2'
],
{.&collapse}
],
[
"Task 2\nSort with all white space normalized to regular spaces.",
[
"Normalized\tspaces: 4", "Normalized\xa0spaces: 1",
"Normalized\x20spaces: 2", "Normalized\nspaces: 3"
],
{.&normalize}
],
[
"Task 3\nSort case independently.",
[
'caSE INDEPENDENT: 3', 'casE INDEPENDENT: 2',
'cASE INDEPENDENT: 4', 'case INDEPENDENT: 1'
],
{.lc} # builtin method
],
[
"Task 4\nSort groups of digits in natural number order.",
[
<Foo100bar99baz0.txt foo100bar10baz0.txt foo1000bar99baz10.txt
foo1000bar99baz9.txt 201st 32nd 3rd 144th 17th 2 95>
],
{.&naturally}
],
[
"Task 5 ( mixed with 1, 2, 3 & 4 )\n"
~ "Sort titles, normalize white space, collapse multiple spaces to\n"
~ "single, trim leading white space, ignore common leading articles\n"
~ 'and sort digit groups in natural order.',
[
'The Wind in the Willows 8', ' The 39 Steps 3',
'The 7th Seal 1', 'Wanda 6',
'A Fish Called Wanda 5', ' The Wind and the Lion 7',
'Any Which Way But Loose 4', '12 Monkeys 2'
],
{.&normalize.&collapse.trim.&title.&naturally}
],
[
"Task 6, 7, 8\nMap letters in Latin1 that have accents or decompose to two\n"
~ 'characters to their base characters for sorting.',
[
<apple Ball bald car Card above Æon æon aether
niño nina e-mail Évian evoke außen autumn>
],
{.&latin1_decompose.&naturally}
]
);
 
 
for @tests -> $case {
my $code_ref = $case.pop;
my @array = $case.pop.list;
say $case.pop, "\n";
 
say "Standard Sort:\n";
.say for @array.sort;
 
say "\nNatural Sort:\n";
.say for @array.sort: {.$code_ref};
 
say "\n" ~ '*' x 40 ~ "\n";
}</lang>
 
Sample output:
<pre>
Task 1a
Sort while ignoring leading spaces.
 
C:\Program Files (x86)\Phix>p demo\rosetta\Natural_sorting.exw
Standard Sort:
original normal natural
======== ====== =======
leading spaces: 4 | leading spaces: 3 | leading spaces: 1
leading spaces: 3 | leading spaces: 4 |leading spaces: 2
leading spaces: 2 | leading spaces: 1 | leading spaces: 3
leading spaces: 1 |leading spaces: 2 | leading spaces: 4
 
original normal natural
ignore leading spaces: 4
======== ====== =======
ignore leading spaces: 3
adjacent spaces: 3 |adjacent spaces: 2 |adjacent spaces: 1
ignore leading spaces: 2
adjacent spaces: 4 |adjacent spaces: 1 |adjacent spaces: 2
ignore leading spaces: 1
adjacent spaces: 1 |adjacent spaces: 4 |adjacent spaces: 3
adjacent spaces: 2 |adjacent spaces: 3 |adjacent spaces: 4
 
original normal natural
Natural Sort:
======== ====== =======
white space: 3-2 |white\t space: 3+2 |white\x0bspace: 3+0
white\r space: 3-3 |white\n space: 3+1 |white\n space: 3+1
white\x0cspace: 3-1 |white\x0bspace: 3+0 |white\t space: 3+2
white\x0bspace: 3+0 |white\x0cspace: 3-1 |white\x0cspace: 3-1
white\n space: 3+1 |white\r space: 3-3 |white space: 3-2
white\t space: 3+2 |white space: 3-2 |white\r space: 3-3
 
original normal natural
ignore leading spaces: 1
======== ====== =======
ignore leading spaces: 2
caSE independent: 3-1 |cASE independent: 3-2 |casE independent: 3+0
ignore leading spaces: 3
cASE independent: 3-2 |caSE independent: 3-1 |case independent: 3+1
ignore leading spaces: 4
casE independent: 3+0 |casE independent: 3+0 |caSE independent: 3-1
case independent: 3+1 |case independent: 3+1 |cASE independent: 3-2
 
original normal natural
****************************************
======== ====== =======
foo1000bar99baz9.txt |foo1000bar99baz10.txt |foo100bar10baz0.txt
foo100bar99baz0.txt |foo1000bar99baz9.txt |foo100bar99baz0.txt
foo100bar10baz0.txt |foo100bar10baz0.txt |foo1000bar99baz9.txt
foo1000bar99baz10.txt |foo100bar99baz0.txt |foo1000bar99baz10.txt
 
original normal natural
Task 1b
======== ====== =======
Sort while ignoring multiple adjacent spaces.
foo1bar |foo bar |foo bar
foo100bar |foo1000bar |foo1bar
foo bar |foo100bar |foo100bar
foo1000bar |foo1bar |foo1000bar
 
original normal natural
Standard Sort:
======== ====== =======
The Wind in the Willows |The 39 steps |The 39 steps
The 40th step more |The 40th step more |The 40th step more
The 39 steps |The Wind in the Willows |Wanda
Wanda |Wanda |The Wind in the Willows
 
original normal natural
ignore m.a.s spaces: 4
======== ====== =======
ignore m.a.s spaces: 3
ignore ý accents: 2-2 |ignore Ý accents: 2-1 |ignore y accents: 2+0
ignore m.a.s spaces: 2
ignore Ý accents: 2-1 |ignore ý accents: 2-2 |ignore Y accents: 2+1
ignore m.a.s spaces: 1
ignore y accents: 2+0 |ignore Y accents: 2+1 |ignore Ý accents: 2-1
ignore Y accents: 2+1 |ignore y accents: 2+0 |ignore ý accents: 2-2
 
original normal natural
Natural Sort:
======== ====== =======
 
Ball |Æon |above
ignore m.a.s spaces: 1
Card |Évian |Æon
ignore m.a.s spaces: 2
above |æon |æon
ignore m.a.s spaces: 3
aether |Ball |aether
ignore m.a.s spaces: 4
apple |Card |apple
 
autumn |above |außen
****************************************
außen |aether |autumn
 
bald |apple |bald
Task 2
car |außen |Ball
Sort with all white space normalized to regular spaces.
e-mail |autumn |car
 
evoke |bald |Card
Standard Sort:
nina |car |e-mail
 
niño |e-mail |Évian
Normalized spaces: 4
Æon |evoke |evoke
Normalized
Évian |niño |nina
spaces: 3
æon |nina |niño
Normalized spaces: 2
Normalized spaces: 1
 
Natural Sort:
 
Normalized spaces: 1
Normalized spaces: 2
Normalized
spaces: 3
Normalized spaces: 4
 
****************************************
 
Task 3
Sort case independently.
 
Standard Sort:
 
cASE INDEPENDENT: 4
caSE INDEPENDENT: 3
casE INDEPENDENT: 2
case INDEPENDENT: 1
 
Natural Sort:
 
case INDEPENDENT: 1
casE INDEPENDENT: 2
caSE INDEPENDENT: 3
cASE INDEPENDENT: 4
 
****************************************
 
Task 4
Sort groups of digits in natural number order.
 
Standard Sort:
 
144th
17th
2
201st
32nd
3rd
95
Foo100bar99baz0.txt
foo1000bar99baz10.txt
foo1000bar99baz9.txt
foo100bar10baz0.txt
 
Natural Sort:
 
2
3rd
17th
32nd
95
144th
201st
foo100bar10baz0.txt
Foo100bar99baz0.txt
foo1000bar99baz9.txt
foo1000bar99baz10.txt
 
****************************************
 
Task 5 ( mixed with 1, 2, 3 & 4 )
Sort titles, normalize white space, collapse multiple spaces to
single, trim leading white space, ignore common leading articles
and sort digit groups in natural order.
 
Standard Sort:
 
The 39 Steps 3
The Wind and the Lion 7
12 Monkeys 2
A Fish Called Wanda 5
Any Which Way But Loose 4
The 7th Seal 1
The Wind in the Willows 8
Wanda 6
 
Natural Sort:
 
The 7th Seal 1
12 Monkeys 2
The 39 Steps 3
Any Which Way But Loose 4
A Fish Called Wanda 5
Wanda 6
The Wind and the Lion 7
The Wind in the Willows 8
 
****************************************
 
Task 6, 7, 8
Map letters in Latin1 that have accents or decompose to two
characters to their base characters for sorting.
 
Standard Sort:
 
Ball
Card
above
aether
apple
autumn
außen
bald
car
e-mail
evoke
nina
niño
Æon
Évian
æon
 
Natural Sort:
 
above
Æon
æon
aether
apple
außen
autumn
bald
Ball
car
Card
e-mail
Évian
evoke
nina
niño
 
****************************************
</pre>
 
=={{header|PicoLisp}}==
This parser takes care of features 1,2,3,4,5 and 8:
<langsyntaxhighlight PicoLisplang="picolisp">(de parseNatural (Str)
(clip
(make
Line 2,745 ⟶ 4,346:
"ß" "ss" "ſ" "s" "ʒ" "s" ) )
(unless (member Word '(the it to))
(link Word) ) ) ) ) ) ) ) )</langsyntaxhighlight>
Test:
<langsyntaxhighlight PicoLisplang="picolisp">: (parseNatural " ^MThe abc123Defß ^I Ghi ")
-> ("abc" 123 "defss" " " "ghi")</langsyntaxhighlight>
Sorting is trivial then:
<langsyntaxhighlight PicoLisplang="picolisp">(de naturalSort (Lst)
(by parseNatural sort Lst) )</langsyntaxhighlight>
Test:
<langsyntaxhighlight PicoLisplang="picolisp">(de *TestData
"# Ignoring leading spaces"
("ignore leading spaces: 2-2" " ignore leading spaces: 2-1" " ignore leading spaces: 2+0" " ignore leading spaces: 2+1")
Line 2,794 ⟶ 4,395:
(pythonOut "Normally sorted :" (sort (copy X)))
(pythonOut "Naturally sorted:" (naturalSort X))
(prinl) ) )</langsyntaxhighlight>
Output:
<pre># Ignoring leading spaces
Line 2,943 ⟶ 4,544:
'Start with an ʒ: 2-2',
'Start with an ß: 2+0']</pre>
 
=={{header|PowerShell}}==
<syntaxhighlight lang="powershell">
# six sorting
$Discard = '^a ', '^an ', '^the '
$List =
'ignore leading spaces: 2-2<==',
' ignore leading spaces: 2-1 <==',
' ignore leading spaces: 2+0 <==',
' ignore leading spaces: 2+1 <==',
'ignore m.a.s spaces: 2-2<==',
'ignore m.a.s spaces: 2-1<==',
'ignore m.a.s spaces: 2+0<==',
'ignore m.a.s spaces: 2+1<==',
'Equiv. spaces: 3-3<==',
"Equiv.`rspaces: 3-2<==",
"Equiv.`fspaces: 3-1<==",
"Equiv.`vspaces: 3+0<==",
"Equiv.`nspaces: 3+1<==",
"Equiv.`tspaces: 3+2<==",
'cASE INDEPENDENT: 3-2<==',
'caSE INDEPENDENT: 3-1<==',
'casE INDEPENDENT: 3+0<==',
'case INDEPENDENT: 3+1<==',
'Lâmpada accented characters<==',
'Lúdico accented characters<==',
' Lula Free !!! accented characters<==',
'Amanda accented characters<==',
'Ágata accented characters<==',
'Ångström accented characters<==',
'Ângela accented characters<==',
'À toa accented characters<==',
'ânsia accented characters<==',
'álibi accented characters<==',
'foo100bar99baz0.txt<==',
'foo100bar10baz0.txt<==',
'foo1000bar99baz10.txt<==',
'foo1000bar99baz9.txt<==',
'The Wind in the Willows<==',
'The 40th step more<==',
'The 39 steps<==',
'Wanda<=='
 
'List index sorting'
$List
' '
'Lexicographically sorting'
$List | Sort-Object
' '
'Natural sorting'
$List | Sort-Object -Property {
[Regex]::Replace(
(
(
& {
If ($_.Trim() -match ($Discard -join '|')) {
$_ -replace '^\s*[^\s]+\s*'
} Else {
$_.Trim()
}
}
) -replace '\s+'
), '\d+', { $args[0].Value.PadLeft(20) }
)
}
</syntaxhighlight>
{{Out}}
<pre>
List index sorting
ignore leading spaces: 2-2<==
ignore leading spaces: 2-1 <==
ignore leading spaces: 2+0 <==
ignore leading spaces: 2+1 <==
ignore m.a.s spaces: 2-2<==
ignore m.a.s spaces: 2-1<==
ignore m.a.s spaces: 2+0<==
ignore m.a.s spaces: 2+1<==
Equiv. spaces: 3-3<==
spaces: 3-2<==
Equiv.`fspaces: 3-1<==
Equiv.`vspaces: 3+0<==
Equiv.
spaces: 3+1<==
Equiv. spaces: 3+2<==
cASE INDEPENDENT: 3-2<==
caSE INDEPENDENT: 3-1<==
casE INDEPENDENT: 3+0<==
case INDEPENDENT: 3+1<==
Lâmpada accented characters<==
Lúdico accented characters<==
Lula Free !!! accented characters<==
Amanda accented characters<==
Ágata accented characters<==
Ångström accented characters<==
Ângela accented characters<==
À toa accented characters<==
ânsia accented characters<==
álibi accented characters<==
foo100bar99baz0.txt<==
foo100bar10baz0.txt<==
foo1000bar99baz10.txt<==
foo1000bar99baz9.txt<==
The Wind in the Willows<==
The 40th step more<==
The 39 steps<==
Wanda<==
 
Lexicographically sorting
Lula Free !!! accented characters<==
ignore leading spaces: 2+1 <==
ignore leading spaces: 2+0 <==
ignore leading spaces: 2-1 <==
À toa accented characters<==
Ágata accented characters<==
álibi accented characters<==
Amanda accented characters<==
Ângela accented characters<==
Ångström accented characters<==
ânsia accented characters<==
casE INDEPENDENT: 3+0<==
case INDEPENDENT: 3+1<==
caSE INDEPENDENT: 3-1<==
cASE INDEPENDENT: 3-2<==
Equiv. spaces: 3-3<==
Equiv. spaces: 3+2<==
Equiv.
spaces: 3+1<==
Equiv.`vspaces: 3+0<==
Equiv.`fspaces: 3-1<==
spaces: 3-2<==
foo1000bar99baz10.txt<==
foo1000bar99baz9.txt<==
foo100bar10baz0.txt<==
foo100bar99baz0.txt<==
ignore leading spaces: 2-2<==
ignore m.a.s spaces: 2+1<==
ignore m.a.s spaces: 2+0<==
ignore m.a.s spaces: 2-1<==
ignore m.a.s spaces: 2-2<==
Lâmpada accented characters<==
Lúdico accented characters<==
The 39 steps<==
The 40th step more<==
The Wind in the Willows<==
Wanda<==
 
Natural sorting
The 39 steps<==
The 40th step more<==
Ágata accented characters<==
álibi accented characters<==
Amanda accented characters<==
Ângela accented characters<==
Ångström accented characters<==
ânsia accented characters<==
À toa accented characters<==
caSE INDEPENDENT: 3-1<==
cASE INDEPENDENT: 3-2<==
casE INDEPENDENT: 3+0<==
case INDEPENDENT: 3+1<==
Equiv.`fspaces: 3-1<==
spaces: 3-2<==
Equiv. spaces: 3-3<==
Equiv.`vspaces: 3+0<==
Equiv.
spaces: 3+1<==
Equiv. spaces: 3+2<==
foo100bar10baz0.txt<==
foo100bar99baz0.txt<==
foo1000bar99baz9.txt<==
foo1000bar99baz10.txt<==
ignore leading spaces: 2-1 <==
ignore leading spaces: 2-2<==
ignore leading spaces: 2+0 <==
ignore leading spaces: 2+1 <==
ignore m.a.s spaces: 2-1<==
ignore m.a.s spaces: 2-2<==
ignore m.a.s spaces: 2+0<==
ignore m.a.s spaces: 2+1<==
Lâmpada accented characters<==
Lúdico accented characters<==
Lula Free !!! accented characters<==
Wanda<==
The Wind in the Willows<==
</pre>
 
=={{header|Python}}==
All eight features:
<langsyntaxhighlight lang="python"># -*- coding: utf-8 -*-
# Not Python 3.x (Can't compare str and int)
 
Line 3,100 ⟶ 4,886:
print 'Text strings:'; pp(txt)
print 'Normally sorted :'; print '\n'.join(sorted(txt))
print 'Naturally sorted:'; print '\n'.join(ns(txt))</langsyntaxhighlight>
 
===Sample Python output===
Line 3,250 ⟶ 5,036:
beginning/end, easy to implement, but sounds wrong).
 
<langsyntaxhighlight lang="racket">
#lang racket
(define (natural-sort l)
Line 3,272 ⟶ 5,058:
(shuffle '("foo9.txt" "foo10.txt" "x9y99" "x9y100" "x10y0" "x z" "x y")))
;; => '("foo9.txt" "foo10.txt" "x9y99" "x9y100" "x10y0" "x y" "x z")
</syntaxhighlight>
</lang>
 
=={{header|Raku}}==
(formerly Perl 6)
{{works with|rakudo|2015-11-02}}
 
In Raku it is very easy to modify the default sorting order by passing in a
transform routine to the sort function. If the transform routines are arity one,
the sort function will apply a Schwartzian Transform so it only needs to calculate
the transform once. Note that the transforms are non-destructive; The sort function
returns the original strings.
 
The following are a series of subroutines to perform the various natural
sorting transforms. They may be applied individually or mixed and matched
to get the particular result desired. When more than one is strung
together, they apply left to right. Some combinations may yield
different results depending on the order they are applied.
 
<syntaxhighlight lang="raku" line># Sort groups of digits in number order. Sort by order of magnitude then lexically.
sub naturally ($a) { $a.lc.subst(/(\d+)/, ->$/ {0~$0.chars.chr~$0},:g) ~"\x0"~$a }
 
# Collapse multiple ws characters to a single.
sub collapse ($a) { $a.subst( / ( \s ) $0+ /, -> $/ { $0 }, :g ) }
 
# Convert all ws characters to a space.
sub normalize ($a) { $a.subst( / ( \s ) /, ' ', :g ) }
 
# Ignore common leading articles for title sorts
sub title ($a) { $a.subst( / :i ^ ( a | an | the ) >> \s* /, '' ) }
 
# Decompose ISO-Latin1 glyphs to their base character.
sub latin1_decompose ($a) {
$a.trans: <
Æ AE æ ae Þ TH þ th Ð TH ð th ß ss À A Á A Â A Ã A Ä A Å A à a á a
â a ã a ä a å a Ç C ç c È E É E Ê E Ë E è e é e ê e ë e Ì I Í I Î
I Ï I ì i í i î i ï i Ò O Ó O Ô O Õ O Ö O Ø O ò o ó o ô o õ o ö o
ø o Ñ N ñ n Ù U Ú U Û U Ü U ù u ú u û u ü u Ý Y ÿ y ý y
>.hash;
}
 
# Used as:
 
my @tests = (
[
"Task 1a\nSort while ignoring leading spaces.",
[
'ignore leading spaces: 1', ' ignore leading spaces: 4',
' ignore leading spaces: 3', ' ignore leading spaces: 2'
],
{.trim} # builtin method.
],
[
"Task 1b\nSort while ignoring multiple adjacent spaces.",
[
'ignore m.a.s spaces: 3', 'ignore m.a.s spaces: 1',
'ignore m.a.s spaces: 4', 'ignore m.a.s spaces: 2'
],
{.&collapse}
],
[
"Task 2\nSort with all white space normalized to regular spaces.",
[
"Normalized\tspaces: 4", "Normalized\xa0spaces: 1",
"Normalized\x20spaces: 2", "Normalized\nspaces: 3"
],
{.&normalize}
],
[
"Task 3\nSort case independently.",
[
'caSE INDEPENDENT: 3', 'casE INDEPENDENT: 2',
'cASE INDEPENDENT: 4', 'case INDEPENDENT: 1'
],
{.lc} # builtin method
],
[
"Task 4\nSort groups of digits in natural number order.",
[
<Foo100bar99baz0.txt foo100bar10baz0.txt foo1000bar99baz10.txt
foo1000bar99baz9.txt 201st 32nd 3rd 144th 17th 2 95>
],
{.&naturally}
],
[
"Task 5 ( mixed with 1, 2, 3 & 4 )\n"
~ "Sort titles, normalize white space, collapse multiple spaces to\n"
~ "single, trim leading white space, ignore common leading articles\n"
~ 'and sort digit groups in natural order.',
[
'The Wind in the Willows 8', ' The 39 Steps 3',
'The 7th Seal 1', 'Wanda 6',
'A Fish Called Wanda 5', ' The Wind and the Lion 7',
'Any Which Way But Loose 4', '12 Monkeys 2'
],
{.&normalize.&collapse.trim.&title.&naturally}
],
[
"Task 6, 7, 8\nMap letters in Latin1 that have accents or decompose to two\n"
~ 'characters to their base characters for sorting.',
[
<apple Ball bald car Card above Æon æon aether
niño nina e-mail Évian evoke außen autumn>
],
{.&latin1_decompose.&naturally}
]
);
 
 
for @tests -> $case {
my $code_ref = $case.pop;
my @array = $case.pop.list;
say $case.pop, "\n";
 
say "Standard Sort:\n";
.say for @array.sort;
 
say "\nNatural Sort:\n";
.say for @array.sort: {.$code_ref};
 
say "\n" ~ '*' x 40 ~ "\n";
}</syntaxhighlight>
 
Sample output:
<pre>
Task 1a
Sort while ignoring leading spaces.
 
Standard Sort:
 
ignore leading spaces: 4
ignore leading spaces: 3
ignore leading spaces: 2
ignore leading spaces: 1
 
Natural Sort:
 
ignore leading spaces: 1
ignore leading spaces: 2
ignore leading spaces: 3
ignore leading spaces: 4
 
****************************************
 
Task 1b
Sort while ignoring multiple adjacent spaces.
 
Standard Sort:
 
ignore m.a.s spaces: 4
ignore m.a.s spaces: 3
ignore m.a.s spaces: 2
ignore m.a.s spaces: 1
 
Natural Sort:
 
ignore m.a.s spaces: 1
ignore m.a.s spaces: 2
ignore m.a.s spaces: 3
ignore m.a.s spaces: 4
 
****************************************
 
Task 2
Sort with all white space normalized to regular spaces.
 
Standard Sort:
 
Normalized spaces: 4
Normalized
spaces: 3
Normalized spaces: 2
Normalized spaces: 1
 
Natural Sort:
 
Normalized spaces: 1
Normalized spaces: 2
Normalized
spaces: 3
Normalized spaces: 4
 
****************************************
 
Task 3
Sort case independently.
 
Standard Sort:
 
cASE INDEPENDENT: 4
caSE INDEPENDENT: 3
casE INDEPENDENT: 2
case INDEPENDENT: 1
 
Natural Sort:
 
case INDEPENDENT: 1
casE INDEPENDENT: 2
caSE INDEPENDENT: 3
cASE INDEPENDENT: 4
 
****************************************
 
Task 4
Sort groups of digits in natural number order.
 
Standard Sort:
 
144th
17th
2
201st
32nd
3rd
95
Foo100bar99baz0.txt
foo1000bar99baz10.txt
foo1000bar99baz9.txt
foo100bar10baz0.txt
 
Natural Sort:
 
2
3rd
17th
32nd
95
144th
201st
foo100bar10baz0.txt
Foo100bar99baz0.txt
foo1000bar99baz9.txt
foo1000bar99baz10.txt
 
****************************************
 
Task 5 ( mixed with 1, 2, 3 & 4 )
Sort titles, normalize white space, collapse multiple spaces to
single, trim leading white space, ignore common leading articles
and sort digit groups in natural order.
 
Standard Sort:
 
The 39 Steps 3
The Wind and the Lion 7
12 Monkeys 2
A Fish Called Wanda 5
Any Which Way But Loose 4
The 7th Seal 1
The Wind in the Willows 8
Wanda 6
 
Natural Sort:
 
The 7th Seal 1
12 Monkeys 2
The 39 Steps 3
Any Which Way But Loose 4
A Fish Called Wanda 5
Wanda 6
The Wind and the Lion 7
The Wind in the Willows 8
 
****************************************
 
Task 6, 7, 8
Map letters in Latin1 that have accents or decompose to two
characters to their base characters for sorting.
 
Standard Sort:
 
Ball
Card
above
aether
apple
autumn
außen
bald
car
e-mail
evoke
nina
niño
Æon
Évian
æon
 
Natural Sort:
 
above
Æon
æon
aether
apple
außen
autumn
bald
Ball
car
Card
e-mail
Évian
evoke
nina
niño
 
****************************************
</pre>
 
=={{header|Ruby}}==
Requirements 1,2,3 and 5 are met in one line of code:
<langsyntaxhighlight lang="ruby">ar.sort_by{|str| str.downcase.gsub(/\Athe |\Aa |\Aan /, "").lstrip.gsub(/\s+/, " ")}</langsyntaxhighlight>
Almost all of the code below is handling requirement 4. The problem is that Ruby will happily sort ["a",1] against ["a",2] or even ["b"], but it does not know how to handle [1, "a"] against ["a", 2] and raises an ArgumentError. The code below does not define a new sort method, it defines a new class which is sortable by the existing method (falling back on string comparison).
<langsyntaxhighlight lang="ruby">class NatSortString
include Comparable
attr_reader :scrubbed, :ints_and_strings, :i_s_pattern
Line 3,302 ⟶ 5,395:
 
end
</syntaxhighlight>
</lang>
Demo:
<langsyntaxhighlight lang="ruby">tests =
{"Ignoring leading spaces" =>
[ "ignore leading spaces: 2-2 ", " ignore leading spaces: 2-1 ", " ignore leading spaces: 2+0 ", " ignore leading spaces: 2+1 "],
Line 3,322 ⟶ 5,415:
puts [title,"--input--", ar, "--normal sort--", ar.sort, "--natural sort--", nat_sorts.sort, "\n"]
end
</syntaxhighlight>
</lang>
{{out}}
<pre>
Line 3,445 ⟶ 5,538:
=={{header|Scala}}==
All 8:
<langsyntaxhighlight Scalalang="scala">object NaturalSorting {
implicit object ArrayOrdering extends Ordering[Array[String]] { // 4
val INT = "([0-9]+)".r
Line 3,497 ⟶ 5,590:
okay
})
}</langsyntaxhighlight>
Output:
<pre>PASS: 1 IGNORING LEADING SPACES
Line 3,551 ⟶ 5,644:
Tasks 1-5 are completed.
 
<langsyntaxhighlight lang="scheme">
(import (scheme base)
(scheme char)
Line 3,634 ⟶ 5,727:
'("# Ignoring leading spaces" "# Ignoring multiple adjacent spaces (m.a.s.)"
"# Equivalent whitespace characters" "# Case Independent sort"
"# Numeric fields as numerics" "# TitleNumeric sortsfields as numerics - shows sorting from right")
"# Title sorts")
'(("ignore leading spaces: 2-2" " ignore leading spaces: 2-1" " ignore leading spaces: 2+0" " ignore leading spaces: 2+1")
("ignore m.a.s spaces: 2-2" "ignore m.a.s spaces: 2-1" "ignore m.a.s spaces: 2+0" "ignore m.a.s spaces: 2+1")
Line 3,640 ⟶ 5,734:
("cASE INDEPENDENT: 3-2" "caSE INDEPENDENT: 3-1" "casE INDEPENDENT: 3+0" "case INDEPENDENT: 3+1")
("foo100bar99baz0.txt" "foo100bar10baz0.txt" "foo1000bar99baz10.txt" "foo1000bar99baz9.txt")
("foo1bar99baz4.txt" "foo2bar99baz3.txt" "foo4bar99baz1.txt" "foo3bar99baz2.txt")
("The Wind in the Willows" "The 40th step more" "The 39 steps" "Wanda")))
</syntaxhighlight>
</lang>
 
{{out}}
 
Note that "Equivalent whitespace characters" shows the effect of the input character.
Also for "Equivalent whitespace characters", Isorting believeincludes thenumeric Python output is incorrectfields, aswhich Isorts have described onfrom the discussion pageright.
 
<pre>
Line 3,777 ⟶ 5,872:
foo1000bar99baz9.txt
foo1000bar99baz10.txt
]
 
# Numeric fields as numerics - shows sorting from right
Text strings:
[
foo1bar99baz4.txt
foo2bar99baz3.txt
foo4bar99baz1.txt
foo3bar99baz2.txt
]
Normally sorted:
[
foo1bar99baz4.txt
foo2bar99baz3.txt
foo3bar99baz2.txt
foo4bar99baz1.txt
]
Naturally sorted:
[
foo4bar99baz1.txt
foo3bar99baz2.txt
foo2bar99baz3.txt
foo1bar99baz4.txt
]
 
Line 3,804 ⟶ 5,922:
 
=={{header|Sidef}}==
{{trans|Perl 6Raku}}
<langsyntaxhighlight lang="ruby">class String {
# Sort groups of digits in number order. Sort by order of magnitude then lexically.
-> naturally { self.lc.gsub(/(\d+)/, {|s1| "0" + s1.len.chr + s1 }) + "\x0" + self };
Line 3,830 ⟶ 5,948:
self.gsub(re, {|s1| tr{s1} });
}
}</langsyntaxhighlight>
 
Tests:
<langsyntaxhighlight lang="ruby">var tests = [
[
"Task 1a\nSort while ignoring leading spaces.",
Line 3,906 ⟶ 6,024:
 
say "\n#{'*' * 40}\n";
}</langsyntaxhighlight>
 
{{out}}
Line 4,099 ⟶ 6,217:
 
Note also that Tcl supports case-insensitive sorting and “treat digit sequences as numbers” as native sorting options. (The latter is particularly useful for handling filenames.)
<langsyntaxhighlight lang="tcl">package require Tcl 8.5
 
proc sortWithCollationKey {keyBuilder list} {
Line 4,157 ⟶ 6,275:
foo100bar10baz0.txt
foo1000bar99baz10.txt
foo1000bar99baz9.txt}</langsyntaxhighlight>
Output:
<pre>
Line 4,250 ⟶ 6,368:
foo1000bar99baz9.txt
foo1000bar99baz10.txt
</pre>
 
=={{header|Wren}}==
{{trans|Kotlin}}
{{libheader|Wren-pattern}}
{{libheader|Wren-str}}
{{libheader|Wren-fmt}}
{{libheader|Wren-sort}}
<syntaxhighlight lang="wren">import "./pattern" for Pattern
import "./str" for Str
import "./fmt" for Fmt
import "./sort" for Cmp, Sort
 
var p2 = Pattern.new("+2 ")
var p3 = Pattern.new("/s") // any whitespace character
var p5 = Pattern.new("+1/d")
 
/** Only covers ISO-8859-1 accented characters plus (for consistency) Ÿ */
var ucAccented = ["ÀÁÂÃÄÅ", "Ç", "ÈÉÊË", "ÌÍÎÏ", "Ñ", "ÒÓÔÕÖØ", "ÙÚÛÜ", "ÝŸ"]
var lcAccented = ["àáâãäå", "ç", "èéêë", "ìíîï", "ñ", "òóôõöø", "ùúûü", "ýÿ"]
var ucNormal = "ACEINOUY"
var lcNormal = "aceinouy"
 
/** Only the commoner ligatures */
var ucLigatures = "ÆIJŒ"
var lcLigatures = "æijœ"
var ucSeparated = ["AE", "IJ", "OE"]
var lcSeparated = ["ae", "ij", "oe"]
 
/** Miscellaneous replacements */
var miscLetters = "ßſʒ"
var miscReplacements = ["ss", "s", "s"]
 
/** Displays strings including whitespace as if the latter were literal characters */
var toDisplayString = Fn.new { |s|
var whitespace = ["\t", "\n", "\v", "\f", "\r"]
var whitespace2 = ["\\t", "\\n", "\\v", "\\f", "\\r"]
for (i in 0..4) s = s.replace(whitespace[i], whitespace2[i])
return s
}
 
/** Ignoring leading space(s) */
var selector1 = Fn.new { |s| s.trimStart(" ") }
 
/** Ignoring multiple adjacent spaces i.e. condensing to a single space */
var selector2 = Fn.new { |s| p2.replaceAll(s, " ") }
 
/** Equivalent whitespace characters (equivalent to a space say) */
var selector3 = Fn.new { |s| p3.replaceAll(s, " ") }
 
/** Case independent sort */
var selector4 = Fn.new { |s| Str.lower(s) }
 
/** Numeric fields as numerics (deals with up to 20 digits) */
var selector5 = Fn.new { |s|
for (m in p5.findAll(s)) {
var ix = m.index
var repl = Fmt.swrite("$020d", Num.fromString(m.text))
s = s[0...ix] + repl + s[ix + m.length..-1]
}
return s
}
 
/** Title sort */
var selector6 = Fn.new { |s|
var t = Str.lower(s)
if (t.startsWith("the ")) return s.skip(4).join()
if (t.startsWith("an ")) return s.skip(3).join()
if (t.startsWith("a ")) return s.skip(2).join()
return s
}
 
/** Equivalent accented characters (and case) */
var selector7 = Fn.new { |s|
var sb = ""
for (c in s) {
var outer = false
var i = 0
for (ucs in ucAccented) {
if (ucs.contains(c)) {
sb = sb + ucNormal[i]
outer = true
break
}
i = i + 1
}
if (outer) continue
i = 0
for (lcs in lcAccented) {
if (lcs.contains(c)) {
sb = sb + lcNormal[i]
outer = true
break
}
i = i + 1
}
if (outer) continue
sb = sb + c
}
return Str.lower(sb)
}
 
/** Separated ligatures */
var selector8 = Fn.new { |s|
var i = 0
for (c in ucLigatures) {
s = s.replace(c, ucSeparated[i])
i = i + 1
}
i = 0
for (c in lcLigatures) {
s = s.replace(c, lcSeparated[i])
i = i + 1
}
return s
}
 
/** Character replacements */
var selector9 = Fn.new { |s|
var i = 0
for (c in miscLetters) {
s = s.replace(c, miscReplacements[i])
i = i + 1
}
return s
}
 
System.print("The 9 string lists, sorted 'naturally':\n")
var selectors = [selector1, selector2, selector3, selector4, selector5,
selector6, selector7, selector8, selector9]
var ss = [
[
"ignore leading spaces: 2-2",
" ignore leading spaces: 2-1",
" ignore leading spaces: 2+0",
" ignore leading spaces: 2+1"
],
[
"ignore m.a.s spaces: 2-2",
"ignore m.a.s spaces: 2-1",
"ignore m.a.s spaces: 2+0",
"ignore m.a.s spaces: 2+1"
],
[
"Equiv. spaces: 3-3",
"Equiv.\rspaces: 3-2",
"Equiv.\vspaces: 3-1",
"Equiv.\fspaces: 3+0",
"Equiv.\nspaces: 3+1",
"Equiv.\tspaces: 3+2"
],
[
"cASE INDEPENENT: 3-2",
"caSE INDEPENENT: 3-1",
"casE INDEPENENT: 3+0",
"case INDEPENENT: 3+1"
],
[
"foo100bar99baz0.txt",
"foo100bar10baz0.txt",
"foo1000bar99baz10.txt",
"foo1000bar99baz9.txt"
],
[
"The Wind in the Willows",
"The 40th step more",
"The 39 steps",
"Wanda"
],
[
"Equiv. ý accents: 2-2",
"Equiv. Ý accents: 2-1",
"Equiv. y accents: 2+0",
"Equiv. Y accents: 2+1"
],
[
"IJ ligatured ij",
"no ligature"
],
[
"Start with an ʒ: 2-2",
"Start with an ſ: 2-1",
"Start with an ß: 2+0",
"Start with an s: 2+1"
]
]
for (i in 0..8) {
var sel = selectors[i]
var cmp = Fn.new { |s1, s2| Cmp.string.call(sel.call(s1), sel.call(s2)) }
Sort.insertion(ss[i], cmp)
System.print(ss[i].map { |s| "'%(s)'" }.join("\n"))
if (i < 8) System.print()
}</syntaxhighlight>
 
{{out}}
<pre>
The 9 string lists, sorted 'naturally':
 
' ignore leading spaces: 2+0'
' ignore leading spaces: 2+1'
' ignore leading spaces: 2-1'
'ignore leading spaces: 2-2'
 
'ignore m.a.s spaces: 2+0'
'ignore m.a.s spaces: 2+1'
'ignore m.a.s spaces: 2-1'
'ignore m.a.s spaces: 2-2'
 
'Equiv.
spaces: 3+0'
'Equiv.
spaces: 3+1'
'Equiv. spaces: 3+2'
'Equiv.
spaces: 3-1'
spaces: 3-2'
'Equiv. spaces: 3-3'
 
'casE INDEPENENT: 3+0'
'case INDEPENENT: 3+1'
'caSE INDEPENENT: 3-1'
'cASE INDEPENENT: 3-2'
 
'foo100bar10baz0.txt'
'foo100bar99baz0.txt'
'foo1000bar99baz9.txt'
'foo1000bar99baz10.txt'
 
'The 39 steps'
'The 40th step more'
'Wanda'
'The Wind in the Willows'
 
'Equiv. y accents: 2+0'
'Equiv. Y accents: 2+1'
'Equiv. Ý accents: 2-1'
'Equiv. ý accents: 2-2'
 
'IJ ligatured ij'
'no ligature'
 
'Start with an s: 2+1'
'Start with an ſ: 2-1'
'Start with an ʒ: 2-2'
'Start with an ß: 2+0'
</pre>
 
Line 4,256 ⟶ 6,619:
 
First, a tag-mangled-fields sort, use tags to pull from the original list in sorted order function:
<langsyntaxhighlight lang="zkl">fcn dsuSort(x,orig){ // decorate-sort-undecorate sort
x.enumerate().sort(fcn([(_,a)],[(_,b)]){a<b})
.apply('wrap([(n,_)]){orig[n]});
}</langsyntaxhighlight>
Now, mangle a copy of the original list to "normalize" it to the task requirements and sort.
<langsyntaxhighlight lang="zkl"># Ignoring leading spaces
ts1:=T("ignore leading spaces: 2-2", " ignore leading spaces: 2-1",
" ignore leading spaces: 2+0", " ignore leading spaces: 2+1");
dsuSort(ts1.apply("strip"),ts1).println();</langsyntaxhighlight>
{{out}}
<pre>
Line 4,273 ⟶ 6,636:
</pre>
 
<langsyntaxhighlight lang="zkl"># Ignoring multiple adjacent spaces (m.a.s)
ts2:=T("ignore m.a.s spaces: 2-2", "ignore m.a.s spaces: 2-1",
"ignore m.a.s spaces: 2+0", "ignore m.a.s spaces: 2+1");
dsuSort(ts2.apply('-(" ")),ts2).println();</langsyntaxhighlight>
{{out}}
<pre>
Line 4,285 ⟶ 6,648:
</pre>
 
<langsyntaxhighlight lang="zkl"># Equivalent whitespace characters
ts3:=T("Equiv. spaces: 3-3", "Equiv.\rspaces: 3-2", "Equiv.\x0cspaces: 3-1",
"Equiv.\x0bspaces: 3+0", "Equiv.\nspaces: 3+1", "Equiv.\tspaces: 3+2");
dsuSort(ts3.apply('-.fp1("\n\r\t\f\b\x0b ")),ts3).println();</langsyntaxhighlight>
{{out}}
<pre>
Line 4,299 ⟶ 6,662:
</pre>
 
<langsyntaxhighlight lang="zkl"># Case Indepenent sort
ts4:=T("cASE INDEPENENT: 3-2", "caSE INDEPENENT: 3-1",
"casE INDEPENENT: 3+0", "case INDEPENENT: 3+1");
dsuSort(ts4.apply("toLower"),ts4).println();</langsyntaxhighlight>
{{out}}
<pre>
Line 4,312 ⟶ 6,675:
 
And the nasty one: multiple numeric fields. Each key is blown apart into a list of ints, where each int is consecutive numeric characters. To compare a pair of fields (ie two lists of ints), walk the two in parallel, stopping when a difference is found. Ugly but it works. Another tactic would be to normalize each field to the same length, concat the fields and use that as the mangled key. But that might have issues if the number of fields differ, the former doesn't care.
<langsyntaxhighlight lang="zkl"># Numeric fields as numerics
fcn fieldize(s){
s.apply(fcn(c){"1234567890".holds(c) and c or "."}).split(".")
Line 4,326 ⟶ 6,689:
list.apply(fieldize).enumerate().sort(fcmp)
.apply('wrap([(n,_)]){list[n]});
}</langsyntaxhighlight>
<langsyntaxhighlight lang="zkl">ts5:=T("foo100bar99baz0.txt", "foo100bar10baz0.txt", "foo1000bar99baz10.txt",
"foo1000bar99baz9.txt");
fsort(ts5).println();
 
x:=T("x9y99","foo10.txt","x10y0","foo9.txt","x9y100");
fsort(x).println();</langsyntaxhighlight>
{{out}}
<pre>
Line 4,342 ⟶ 6,705:
L("foo9.txt","x9y99","x9y100","x10y0","foo10.txt")
</pre>
 
 
[[Category:Sorting]]
9,476

edits