Natural sorting: Difference between revisions

← Older edit

Natural sorting (view source)

Revision as of 09:35, 6 January 2024

72,712 bytes added , 4 months ago

m

→‎{{header|Wren}}: Minor tidy

PureFox

9,476

edits

Revision as of 19:34, 28 June 2020 (view source) Nig (talk \| contribs) (Added appleScript.) ← Older edit		Latest revision as of 09:35, 6 January 2024 (view source) PureFox (talk \| contribs) m (→‎{{header\|Wren}}: Minor tidy)
(20 intermediate revisions by 9 users not shown)
Line 3: {{Sorting Algorithm}} Natural sorting is the sorting of text that does more than rely on the order of individual characters codes to make the finding of individual strings easier for a ''human'' reader. Line 11: :2. Make all whitespace characters equivalent. :3. Sorting without regard to case. :4. Sorting numeric portions of strings in numeric order. ~~That is split the string into fields on numeric boundaries, then sort on each field, with the rightmost fields being the most significant, and numeric fields of integers treated as numbers.~~ :: That is split the string into fields on numeric boundaries, then sort on each field, with the rightmost fields being the most significant, and numeric fields of integers treated as numbers. :: foo9.txt before foo10.txt :: As well as ... x9y99 before x9y100, before x10y0 Line 20 ⟶ 21: :7. Sort ligatures as separate letters. :8. Replacements: :: Sort ~~german~~German eszett or scharfes S (ß)       as   ss :: Sort ſ, LATIN SMALL LETTER LONG S     as   s :: Sort ʒ, LATIN SMALL LETTER EZH           as   s :: <big><big> ∙∙∙ </big></big> ~~:: ...~~ ;Task Description * '''Implement the first four''' of the eight given features in a natural sorting routine/function/method... * Test each feature implemented separately with an ordered list of test strings from the   '''Sample inputs'''   section below,   and make sure your naturally sorted output is in the same order as other language outputs such as   <CODE> Python</CODE>. * Print and display your output. * '''For extra credit''' implement more than the first four. Note:   it is not necessary to have individual control of which features are active in the natural sorting routine at any time. ~~Note: It is not necessary to have individual control of which features are active in the natural sorting routine at any time.~~ ;Sample input: <pre> #• Ignoring leading spaces. Text strings: ['ignore leading spaces: 2-2', 'ignore leading spaces: 2-1', ~~Text strings:~~ ~~['ignore~~ ~~leading~~ ~~spaces:~~ ~~2-2',~~ ' ~~ignore~~ ~~leading~~ ~~spaces:~~ ~~2-1',~~ ' ~~ignore~~ ~~leading~~ ~~spaces:~~ ~~2+0',~~ ' 'ignore leading spaces: 2+10'], 'ignore leading spaces: 2+1'] #• Ignoring multiple adjacent spaces (~~m.a.s~~MAS). Text strings: ['ignore MAS spaces: 2-2', 'ignore MAS spaces: 2-1', ~~Text strings:~~ 'ignore MAS spaces: 2+0', ~~['ignore m.a.s spaces: 2-2', 'ignore m.a.s spaces: 2-1', 'ignore m.a.s spaces: 2+0', 'ignore m.a.s spaces: 2+1']~~ 'ignore MAS spaces: 2+1'] • Equivalent whitespace characters. Text strings: ['Equiv. spaces: 3-3', 'Equiv. \rspaces: 3-2', 'Equiv. \x0cspaces: 3-1', 'Equiv. \x0bspaces: 3+0', 'Equiv. \nspaces: 3+1', 'Equiv. \tspaces: 3+2'] • Case Independent sort. Text strings: ['cASE INDEPENDENT: 3-2', ~~# Equivalent whitespace characters~~ 'caSE INDEPENDENT: 3-1', ~~Text strings:~~ 'casE INDEPENDENT: 3+0', ~~['Equiv. spaces: 3-3', 'Equiv.\rspaces: 3-2', 'Equiv.\x0cspaces: 3-1', 'Equiv.\x0bspaces: 3+0', 'Equiv.\nspaces: 3+1', 'Equiv.\tspaces: 3+2']~~ 'case INDEPENDENT: 3+1'] • Numeric fields as numerics. Text strings: ['foo100bar99baz0.txt', 'foo100bar10baz0.txt', 'foo1000bar99baz10.txt', 'foo1000bar99baz9.txt'] • Title sorts. Text strings: ['The Wind in the Willows', ~~# Case Indepenent sort~~ 'The 40th step more', ~~Text strings:~~ 'The 39 steps', ~~['cASE INDEPENENT: 3-2', 'caSE INDEPENENT: 3-1', 'casE INDEPENENT: 3+0', 'case INDEPENENT: 3+1']~~ 'Wanda'] • Equivalent accented characters (and case). Text strings: [u'Equiv. \xfd accents: 2-2', ~~# Numeric fields as numerics~~ u'Equiv. \xdd accents: 2-1', ~~Text strings:~~ u'Equiv. y accents: 2+0', ~~['foo100bar99baz0.txt', 'foo100bar10baz0.txt', 'foo1000bar99baz10.txt', 'foo1000bar99baz9.txt']~~ u'Equiv. Y accents: 2+1'] • Separated ligatures. Text strings: [u'\u0132 ligatured ij', ~~# Title sorts~~ 'no ligature'] ~~Text strings:~~ ~~['The Wind in the Willows', 'The 40th step more', 'The 39 steps', 'Wanda']~~ • Character replacements. Text strings: [u'Start with an \u0292: 2-2', ~~# Equivalent accented characters (and case)~~ u'Start with an \u017f: 2-1', ~~Text strings:~~ u'Start with an \xdf: 2+0', ~~[u'Equiv. \xfd accents: 2-2', u'Equiv. \xdd accents: 2-1', u'Equiv. y accents: 2+0', u'Equiv. Y accents: 2+1']~~ u'Start with an s: 2+1'] </pre><br><br> =={{header\|11l}}== {{trans\|Nim}} <syntaxhighlight lang="11l">T.enum Kind ~~# Separated ligatures~~ STRING ~~Text strings:~~ NUMBER ~~[u'\u0132 ligatured ij', 'no ligature']~~ T KeyItem ~~# Character replacements~~ Kind kind ~~Text strings:~~ String s ~~[u'Start with an \u0292: 2-2', u'Start with an \u017f: 2-1', u'Start with an \xdf: 2+0', u'Start with an s: 2+1']</pre>~~ Int num F natOrderKey(=s) ~~=={{header\|AppleScript}}==~~ // Remove leading and trailing white spaces. s = s.trim((‘ ’, "\t", "\r", "\n")) // Make all whitespace characters equivalent and remove adjacent spaces. s = s.replace(re:‘\s+’, ‘ ’) // Switch to lower case. s = s.lowercase() // Remove leading "the ". I s.starts_with(‘the ’) & s.len > 4 s = s[4..] // Split into fields. [KeyItem] result V idx = 0 L idx < s.len V e = idx L e < s.len & !s[e].is_digit() e++ I e > idx V ki = KeyItem() ki.kind = Kind.STRING ki.s = s[idx .< e] result.append(ki) idx = e L e < s.len & s[e].is_digit() e++ I e > idx V ki = KeyItem() ki.kind = Kind.NUMBER ki.num = Int(s[idx .< e]) result.append(ki) idx = e R result F scmp(s1, s2) I s1 < s2 {R -1} I s1 > s2 {R 1} R 0 F naturalCmp(String sa, String sb) V a = natOrderKey(sa) V b = natOrderKey(sb) L(i) 0 .< min(a.len, b.len) V ai = a[i] V bi = b[i] I ai.kind == bi.kind V result = I ai.kind == STRING {scmp(ai.s, bi.s)} E ai.num - bi.num I result != 0 R result E R I ai.kind == STRING {1} E -1 R I a.len < b.len {-1} E (I a.len == b.len {0} E 1) F test(title, list) print(title) print(sorted(list, key' cmp_to_key(naturalCmp)).map(s -> ‘'’s‘'’).join("\n")) print() test(‘Ignoring leading spaces.’, [‘ignore leading spaces: 2-2’, ‘ ignore leading spaces: 2-1’, ‘ ignore leading spaces: 2+0’, ‘ ignore leading spaces: 2+1’]) test(‘Ignoring multiple adjacent spaces (MAS).’, [‘ignore MAS spaces: 2-2’, ‘ignore MAS spaces: 2-1’, ‘ignore MAS spaces: 2+0’, ‘ignore MAS spaces: 2+1’]) test(‘Equivalent whitespace characters.’, [‘Equiv. spaces: 3-3’, "Equiv. \rspaces: 3-2", "Equiv. \x0cspaces: 3-1", "Equiv. \x0bspaces: 3+0", "Equiv. \nspaces: 3+1", "Equiv. \tspaces: 3+2"]) test(‘Case Independent sort.’, [‘cASE INDEPENDENT: 3-2’, ‘caSE INDEPENDENT: 3-1’, ‘casE INDEPENDENT: 3+0’, ‘case INDEPENDENT: 3+1’]) test(‘Numeric fields as numerics.’, [‘foo100bar99baz0.txt’, ‘foo100bar10baz0.txt’, ‘foo1000bar99baz10.txt’, ‘foo1000bar99baz9.txt’]) test(‘Title sorts.’, [‘The Wind in the Willows’, ‘The 40th step more’, ‘The 39 steps’, ‘Wanda’])</syntaxhighlight> {{out}} <pre> Ignoring leading spaces. ' ignore leading spaces: 2+0' ' ignore leading spaces: 2+1' ' ignore leading spaces: 2-1' 'ignore leading spaces: 2-2' Ignoring multiple adjacent spaces (MAS). 'ignore MAS spaces: 2+0' 'ignore MAS spaces: 2+1' 'ignore MAS spaces: 2-1' 'ignore MAS spaces: 2-2' Equivalent whitespace characters. 'Equiv. �spaces: 3+0' 'Equiv. spaces: 3+1' 'Equiv. spaces: 3+2' 'Equiv. �spaces: 3-1' 'Equiv. spaces: 3-2' 'Equiv. spaces: 3-3' Case Independent sort. 'casE INDEPENDENT: 3+0' 'case INDEPENDENT: 3+1' 'caSE INDEPENDENT: 3-1' 'cASE INDEPENDENT: 3-2' Numeric fields as numerics. 'foo100bar10baz0.txt' 'foo100bar99baz0.txt' 'foo1000bar99baz9.txt' 'foo1000bar99baz10.txt' Title sorts. 'The 39 steps' 'The 40th step more' 'Wanda' 'The Wind in the Willows' </pre> =={{header\|AppleScript}}== AppleScript doesn't have a built-in sort facility, but its string comparisons are normalised and several attributes can be specifically either considered or ignored, making it fairly simple for a sort coded in the language to produce the results required here. <~~lang~~syntaxhighlight lang="applescript">use AppleScript version "2.4" -- OS X 10.10 (Yosemite) or later use framework "Foundation" use sorter : script ¬ ~~use sorter : script "Custom Iterative Ternary Merge Sort" -- <https://macscripter.net/viewtopic.php?pid=194430#p194430>~~ "Custom Iterative Ternary Merge Sort" -- <www.macscripter.net/t/timsort-and-nigsort/71383/3> on naturalSort(listOfText) -- ~~Produce~~Get ~~versions~~doctored copies of the strings ~~for~~in ~~sorting purposes, doctored~~order to get ~~round~~around situations that -- ~~the situations~~ AppleScript's text comparison attributes don't handle naturally. ie. Reduce any -- run of white space to a single character, zap any leading/trailing space, move -- any article word at the beginning to the end, and replace any "ſ" or "ʒ" with "s". script o property ~~input~~doctored : listOfText's items ~~property doctored : {}~~ end script set regex to current application's NSRegularExpressionSearch set substitutions to {{"\\s++", space}, {"^ \| $", ""}, ¬ ~~repeat with i from 1 to (count listOfText)~~ {"^(?i)(The\|An?) (.++)$", "$2 $1"}, {"[\\u0292\\u017f]", "s"}} ~~set thisString to (current application's class "NSMutableString"'s stringWithString:(item i of o's input))~~ repeat with i from 1 to (count o's doctored) ~~-- AppleScript's 'ignoring white space' setting ignores ALL white space. So, since the existence of~~ set mutableString to (current application's class "NSMutableString"'s ¬ ~~-- white space between words has to be considered, physically remove or alter any to be ignored.~~ stringWithString:(o's doctored's item i)) ~~-- Firstly reduce runs of any type of white space to single 'space' characters.~~ repeat with thisSub in substitutions ~~tell thisString to replaceOccurrencesOfString:("\\s++") withString:(space) options:(regex) range:({0, its \|length\|()})~~ -- ~~Then~~ ~~remove~~ ~~any~~ ~~leading~~set ~~and/or~~{searchStr, ~~trailing~~replacement} ~~spaces.~~to thisSub tell mutableString to replaceOccurrencesOfString:(searchStr) ¬ ~~tell thisString to replaceOccurrencesOfString:("^ \| $") withString:("") options:(regex) range:({0, its \|length\|()})~~ withString:(replacement) options:(regex) range:({0, its \|length\|()}) ~~-- Move any instance of "The ", "A ", or "An " at the front of a string to the end assuming the string to be a title.~~ end repeat ~~-- This allows the article to act as a tie-breaker if necessary.~~ set o's doctored's item i to mutableString as text ~~tell thisString to replaceOccurrencesOfString:("^(?i)(The\|An?) (.++)$") withString:("$2 $1") options:(regex) ¬~~ ~~range:({0, its \|length\|()})~~ ~~-- For the sake of this task, replace any instances of "ſ" or "ʒ" with "s".~~ ~~tell thisString to replaceOccurrencesOfString:("[\\u0292\\u017f]") withString:("s") options:(regex) ¬~~ ~~range:({0, its \|length\|()})~~ ~~set end of o's doctored to thisString as text~~ end repeat -- Sort the doctored strings with the relevant AppleScript comparison attributes ~~-- Set AppleScript's string comparison attributes for the sort.~~ -- explicitly either set or not, echoing the moves in the original list. ~~-- Ligatures are alway compared by their component characters and AppleScript has no setting to change this.~~ ~~-- 'Numeric strings' are runs of digit characters only.~~ ~~-- The white space, hyphens, and case settings here are the defaults,~~ ~~-- but are set explicitly in case this handler's called from a different setting.~~ considering numeric strings, white space and hyphens but ignoring diacriticals, punctuation and case ~~-- Sort items 1 thru -1 of the doctored strings, rearranging the original list in parallel.~~ tell sorter to sort(o's doctored, 1, -1, {slave:{listOfText}}) end considering Line 122 ⟶ 281: end naturalSort on join(lst, delim) ~~(* Tests: )~~ set astid to AppleScript's text item delimiters ~~-- Leading, trailing, and multiple white spaces ignored:~~ set AppleScript's text item delimiters to delim ~~naturalSort({" ignore superfluous spaces: 1-3", "ignore superfluous spaces: 1-1", " ignore superfluous spaces: 1-2", ¬~~ set txt to lst as text ~~" ignore superfluous spaces: 1-4", "ignore superfluous spaces: 1-7", "ignore superfluous spaces: 1-5 ", ¬~~ set AppleScript's text item delimiters to astid ~~"ignore superfluous spaces: 1-6", " ignore superfluous spaces: 1-8"})~~ return txt --> {"ignore superfluous spaces: 1-1", " ignore superfluous spaces: 1-2", " ignore superfluous spaces: 1-3", " ignore superfluous spaces: 1-4", "ignore superfluous spaces: 1-5 ", "ignore superfluous spaces: 1-6", "ignore superfluous spaces: 1-7", " ignore superfluous spaces: 1-8"} end join on tests() ~~-- All white space characters treated as equivalent:~~ set output to {"( Leading, trailing, and multiple white spaces ignored )"} ~~naturalSort({"Equiv. spaces: 2-6", "Equiv." & return & "spaces: 2-5", "Equiv." & (character id 12) & "spaces: 2-4", ¬~~ set output's end to ¬ ~~"Equiv." & (character id 11) & "spaces: 2-3", "Equiv." & linefeed & "spaces: 2-2", "Equiv." & tab & "spaces: 2-1"})~~ naturalSort({" ignore superfluous spaces: 1-3", "ignore superfluous spaces: 1-1", ¬ ~~( -->~~ " ignore superfluous spaces: 1-2", " ignore superfluous spaces: 1-4", ¬ ~~{"Equiv. spaces: 2-1", "Equiv.~~ "ignore superfluous spaces: 1-7", "ignore superfluous spaces: 1-5 ", ¬ ~~spaces: 2-2", "Equiv.�spaces: 2-3", "Equiv.�spaces: 2-4", "Equiv.~~ "ignore superfluous spaces: 21-56", "~~Equiv.~~ ignore superfluous spaces: 21-68"}) ) set output's end to linefeed & "( All white space characters treated as equivalent )" set output's end to naturalSort({"Equiv. spaces: 2-6", "Equiv." & return & "spaces: 2-5", ¬ "Equiv." & (character id 12) & "spaces: 2-4", ¬ "Equiv." & (character id 11) & "spaces: 2-3", ¬ "Equiv." & linefeed & "spaces: 2-2", "Equiv." & tab & "spaces: 2-1"}) set output's end to linefeed & ¬ "( Case ignored. (The sort order would actually be the same with case considered, since case only decides the issue when the strings are otherwise identical.) )" set output's end to naturalSort({"cASE INDEPENDENT: 3-1", "caSE INDEPENDENT: 3-2", ¬ "CASE independent: 3-3", "casE INDEPENDENT: 3-4", "case INDEPENDENT: 3-5"}) set output's end to linefeed & "( Numerics considered by number value )" set output's end to naturalSort({"foo1000bar99baz10.txt", "foo100bar99baz0.txt", ¬ "foo100bar10baz0.txt", "foo1000bar99baz9.txt"}) set output's end to linefeed & "( Title sort )" set output's end to ¬ naturalSort({"The Wind in the Willows", "The 40th Step More", ¬ "A Matter of Life and Death", "The 39 steps", ¬ "An Inspector Calls", "Wanda"}) set output's end to linefeed & "( Diacriticals (and case) ignored )" set output's end to naturalSort({"Equiv. " & (character id 253) & " accents: 6-1", ¬ "Equiv. " & (character id 221) & " accents: 6-3", ¬ "Equiv. y accents: 6-4", "Equiv. Y accents: 6-2"}) set output's end to linefeed & "( Ligatures )" set output's end to naturalSort({(character id 306) & " ligatured", ¬ "of", "ij no ligature", (character id 339), "od"}) set output's end to linefeed & ¬ "( Custom \"s\" equivalents and Esszet (NB. Esszet normalises to \"ss\") )" set output's end to naturalSort({"Start with an " & (character id 658) & ": 8-1", ¬ "Start with an " & (character id 383) & ": 8-2", ¬ "Start with an " & (character id 223) & ": 8-3", ¬ "Start with an s: 8-4", "Start with an ss: 8-5"}) return join(output, linefeed) end tests tests()</syntaxhighlight> ~~-- Case ignored. (The order would actually be the same with case considered,~~ ~~-- because case only decides the issue when strings are otherwise identical.)~~ ~~naturalSort({"cASE INDEPENDENT: 3-1", "caSE INDEPENDENT: 3-2", "CASE independent: 3-3", "casE INDEPENDENT: 3-4", ¬~~ ~~"case INDEPENDENT: 3-5"})~~ ~~--> {"cASE INDEPENDENT: 3-1", "caSE INDEPENDENT: 3-2", "CASE independent: 3-3", "casE INDEPENDENT: 3-4", "case INDEPENDENT: 3-5"}~~ {{output}} ~~-- Numerics considered by number value:~~ <syntaxhighlight lang="applescript">"( Leading, trailing, and multiple white spaces ignored ) ~~naturalSort({"foo1000bar99baz10.txt", "foo100bar99baz0.txt", "foo100bar10baz0.txt", "foo1000bar99baz9.txt"})~~ ignore superfluous spaces: 1-1 ~~--> {"foo100bar10baz0.txt", "foo100bar99baz0.txt", "foo1000bar99baz9.txt", "foo1000bar99baz10.txt"}~~ ignore superfluous spaces: 1-2 ignore superfluous spaces: 1-3 ignore superfluous spaces: 1-4 ignore superfluous spaces: 1-5 ignore superfluous spaces: 1-6 ignore superfluous spaces: 1-7 ignore superfluous spaces: 1-8 ( All white space characters treated as equivalent ) ~~-- Title sort:~~ Equiv. spaces: 2-1 ~~naturalSort({"The Wind in the Willows", "The 40th Step More", "A Matter of Life and Death", "The 39 steps", ¬~~ Equiv. ~~"An Inspector Calls", "Wanda"})~~ spaces: 2-2 ~~--> {"The 39 steps", "The 40th Step More", "An Inspector Calls", "A Matter of Life and Death", "Wanda", "The Wind in the Willows"}~~ Equiv.�spaces: 2-3 Equiv.�spaces: 2-4 Equiv. spaces: 2-5 Equiv. spaces: 2-6 ( Case ignored. (The sort order would actually be the same with case considered, ~~--> Diacriticals (and case) ignored:~~ since case only decides the issue when strings are otherwise identical.) ) ~~naturalSort({"Equiv. " & (character id 253) & " accents: 6-1", "Equiv. " & (character id 221) & " accents: 6-3", ¬~~ cASE INDEPENDENT: 3-1 ~~"Equiv. y accents: 6-4", "Equiv. Y accents: 6-2"})~~ caSE INDEPENDENT: 3-2 ~~--> {"Equiv. ý accents: 6-1", "Equiv. Y accents: 6-2", "Equiv. Ý accents: 6-3", "Equiv. y accents: 6-4"}~~ CASE independent: 3-3 casE INDEPENDENT: 3-4 case INDEPENDENT: 3-5 ( Numerics considered by number value ) ~~-- Ligatures:~~ foo100bar10baz0.txt ~~naturalSort({(character id 306) & " ligatured", "of", "ij no ligature", (character id 339), "od"})~~ foo100bar99baz0.txt ~~--> {"Ĳ ligatured", "ij no ligature", "od", "œ", "of"}~~ foo1000bar99baz9.txt foo1000bar99baz10.txt ( Title sort ) ~~-- Custom "s" equivalents and Esszet (Esszet normalises to ss):"~~ The 39 steps ~~naturalSort({"Start with an " & (character id 658) & ": 8-1", "Start with an " & (character id 383) & ": 8-2", ¬~~ The 40th Step More ~~"Start with an " & (character id 223) & ": 8-3", "Start with an s: 8-4", "Start with an ss: 8-5"})~~ An Inspector Calls ~~--> {"Start with an ʒ: 8-1", "Start with an ſ: 8-2", "Start with an s: 8-4", "Start with an ß: 8-3", "Start with an ss: 8-5"}</lang>~~ A Matter of Life and Death Wanda The Wind in the Willows ( Diacriticals (and case) ignored ) Equiv. ý accents: 6-1 Equiv. Y accents: 6-2 Equiv. Ý accents: 6-3 Equiv. y accents: 6-4 ( Ligatures ) Ĳ ligatured ij no ligature od œ of ( Custom \"s\" equivalents and Esszet (NB. Esszet normalises to \"ss\") ) Start with an ʒ: 8-1 Start with an ſ: 8-2 Start with an s: 8-4 Start with an ß: 8-3 Start with an ss: 8-5"</syntaxhighlight> =={{header\|ATS}}== <syntaxhighlight lang="ats"> (------------------------------------------------------------------) ( Natural sorting. ) (------------------------------------------------------------------) ( I deal only with ASCII here and solve only the first four problems. For Unicode, I would most likely use GNU libunistring (or Gnulib) and UTF-32. Handling Unicode properly is complicated. There are other matters that make "natural sorting" a tricky thing. For instance, which "accented letters" are actually "accented"--rather than distinct letters in their own right--depends on the language and the purpose. In Esperanto, for example, 'ĉ' is a distinct letter that goes just after the letter 'c'. ) #include "share/atspre_staload.hats" staload UN = "prelude/SATS/unsafe.sats" #define NIL list_nil () #define :: list_cons (------------------------------------------------------------------) ( Types and interfaces. ) typedef char_skipper = {n : int} {i : nat \| i <= n} (string n, size_t n, size_t i) -<cloref> [j : int \| i <= j; j <= n] size_t j typedef char_skipper_backwards = {n : int} {i : nat \| i <= n} (string n, size_t n, size_t i) -<cloref> [j : int \| 0 <= j; j <= i] size_t j typedef char_translator = {n : int} string n -<cloref,!wrt> string n extern fn make_char_skipper : (char -<cloref> bool) -<> char_skipper extern fn make_char_skipper_backwards : (char -<cloref> bool) -<> char_skipper_backwards extern fn make_char_translator : (char -<cloref> bool, char -<cloref> [c : int \| 1 <= c; c <= 127] char c) -<> char_translator extern fn remove_leading_spaces : {n : int} string n -< !wrt > [m : nat \| m <= n] string m extern fn remove_trailing_spaces : {n : int} string n -< !wrt > [m : nat \| m <= n] string m extern fn combine_adjacent_spaces : {n : int} string n -< !wrt > [m : nat \| m <= n] string m extern fn compare_strings_containing_numbers : {m, n : int} (string m, string n) -<> int extern fn evaluate_natural : {m : int} {i, n : nat \| i + n <= m} (string m, size_t i, size_t n) -<> ullint extern fn compare_strings_naturally : {m, n : int} (string m, string n) -< !wrt > int extern fn list_sort_strings_naturally : {n : int} list (String, n) -< !wrt > list (String, n) (------------------------------------------------------------------) ( Global closures. ) val skip_spaces = make_char_skipper (lam c => c = ' ') val skip_spaces_backwards = make_char_skipper_backwards (lam c => c = ' ') val skip_digits = make_char_skipper (lam c => isdigit c) val translate_whitespace_to_spaces = make_char_translator (lam c => isspace c, lam c => ' ') ( A little unit test. ) val- "1 2 3 " = translate_whitespace_to_spaces "1\t2\v3\n" val string_tolower = make_char_translator (lam c => isupper c, lam c => let typedef ascii_lowercase = [c : int \| 'a' <= c; c <= 'z'] char c val c = tolower c in $UN.cast{ascii_lowercase} c end) ( A little unit test. ) val- "abcdef" = string_tolower "ABCdef" (------------------------------------------------------------------) ( Implementations. ) implement make_char_skipper match = let fun skipper {n : int} {i : nat \| i <= n} .<n - i>. (s : string n, n : size_t n, i : size_t i) :<cloref> [j : int \| i <= j; j <= n] size_t j = if i = n then i else if ~match s[i] then i else skipper (s, n, succ i) in skipper end implement make_char_skipper_backwards match = let fun skipper {n : int} {i : nat \| i <= n} .<i>. (s : string n, n : size_t n, i : size_t i) :<cloref> [j : int \| 0 <= j; j <= i] size_t j = if i = i2sz 0 then i else if ~match s[pred i] then i else skipper (s, n, pred i) in skipper end implement make_char_translator (match, replace) = let fn translator {n : int} (s : string n) :<cloref,!wrt> string n = let prval () = lemma_string_param s val n = strlen s fun loop {i : nat \| i <= n} .<n - i>. (t : strnptr n, ( strnptr so it is mutable. ) i : size_t i) :<!wrt> strnptr n = if i = n then t else let val c = t[i] in if match c then t[i] := replace c; loop (t, succ i) end in strnptr2string (loop (string1_copy s, i2sz 0)) end in translator end implement remove_leading_spaces s = let prval () = lemma_string_param s val n = strlen s val j = skip_spaces (s, n, i2sz 0) in strnptr2string (string_make_substring (s, j, n - j)) end ( A little unit test. ) val- "1 " = remove_leading_spaces " 1 " implement remove_trailing_spaces s = let prval () = lemma_string_param s val n = strlen s val j = skip_spaces_backwards (s, n, n) in strnptr2string (string_make_substring (s, i2sz 0, j)) end ( A little unit test. ) val- " 1" = remove_trailing_spaces " 1 " fn _combine_adjacent_spaces {n : int} (s : string n) :<!refwrt> [m : nat \| m <= n] string m = let prval () = lemma_string_param s val n = strlen s in if n = i2sz 0 then s else let val buf = arrayref_make_elt<char> (succ n, '\0') val () = buf[0] := s[0] fun loop {i : pos \| i <= n} {j : pos \| j <= i} .<n - i>. (i : size_t i, j : size_t j) :<!refwrt> [k : pos \| k <= n] size_t k = if i = n then j else begin if (s[i] = ' ') (s[pred i] = ' ') then loop (succ i, j) else begin buf[j] := s[i]; loop (succ i, succ j) end end val [k : int] k = loop (i2sz 1, i2sz 1) val s1 = $UN.cast{string k} (ptrcast buf) in strnptr2string (string_make_substring (s1, i2sz 0, k)) end end implement combine_adjacent_spaces s = $effmask_ref _combine_adjacent_spaces s (* A little unit test. ) val- " 1 2 3 4 " = combine_adjacent_spaces " 1 2 3 4 " implement compare_strings_containing_numbers {m, n} (sm, sn) = let prval () = lemma_string_param sm prval () = lemma_string_param sn val m = strlen sm and n = strlen sn fun compare_them {i, j : nat \| i <= m; j <= n} .<m - i, n - j>. (i : size_t i, j : size_t j) :<> int = if i = m then (if j = n then 0 else ~1) else if j = n then 1 else if (isdigit sm[i]) (isdigit sn[j]) then let val i1 = skip_digits (sm, m, succ i) and j1 = skip_digits (sn, n, succ j) val vm = evaluate_natural (sm, i, i1 - i) and vn = evaluate_natural (sn, j, j1 - j) in if vm = vn then compare_them (i1, j1) else if vm < vn then ~1 else 1 end else let val cmp = compare (sm[i], sn[j]) in if cmp = 0 then compare_them (succ i, succ j) else cmp end in compare_them (i2sz 0, i2sz 0) end implement evaluate_natural {m} {i, n} (s, i, n) = let fun loop {k : int \| i <= k; k <= i + n} .<(i + n) - k>. (k : size_t k, accum : ullint) :<> ullint = if k = i + n then accum else let val digit = (char2int0 s[k] - char2int0 '0') val accum = (10ULL * accum) + g0i2u digit in loop (succ k, accum) end in loop (i, 0ULL) end (* A little unit test. ) val- 1234ULL = evaluate_natural ("xy1234z", i2sz 2, i2sz 4) implement compare_strings_naturally (sm, sn) = let prval () = lemma_string_param sm prval () = lemma_string_param sn fn adjust_string (s : String0) :<!wrt> String0 = let val s = translate_whitespace_to_spaces s val s = string_tolower s val s = remove_leading_spaces s val s = remove_trailing_spaces s val s = combine_adjacent_spaces s in s end in compare_strings_containing_numbers (adjust_string sm, adjust_string sn) end implement list_sort_strings_naturally lst = let implement list_mergesort$cmp<String> (x, y) = $effmask_wrt compare_strings_naturally (x, y) in list_vt2t (list_mergesort<String> lst) end (------------------------------------------------------------------) ( Now see if we pass the required tests. ) implement gequal_val_val<String> (x, y) = g0ofg1 x = g0ofg1 y fn nicefy_string (s : string) : string = let val s = g1ofg0 s prval () = lemma_string_param s val n = strlen s prval [n : int] EQINT () = eqint_make_guint n var t : string = "" var i : [i : nat \| i <= n] size_t i in for (i := i2sz 0; i <> n; i := succ i) let val c = char2int0 s[i] in if c < 32 then let val numstr = strptr2string (g0int2string c) val u = strptr2string (string0_append4 (t, "[", numstr, "]")) in t := u end else let val chrstr = strnptr2string (string_sing s[i]) val u = strptr2string (string0_append (t, chrstr)) in t := u end end; t end fn make_list_printable {n : int} (lst : list (String, n)) : list (string, n) = let prval () = lemma_list_param lst fun loop {i : nat \| i <= n} .<n - i>. (lst : list (String, n - i), accum : list (string, i)) : list (string, n) = case+ lst of \| NIL => list_vt2t (list_reverse accum) \| head :: tail => let val s = strptr2string (string0_append3 ("\|", head, "\|")) in loop (tail, nicefy_string s :: accum) end in loop (lst, NIL) end fn test_ignoring_leading_spaces () : void = let val givenlst = $list ("ignore leading spaces: 2-2", " ignore leading spaces: 2-1", " ignore leading spaces: 2+0", " ignore leading spaces: 2+1") val expected = $list (" ignore leading spaces: 2+0", " ignore leading spaces: 2+1", " ignore leading spaces: 2-1", "ignore leading spaces: 2-2") val sortedlst = list_sort_strings_naturally givenlst in assertloc (sortedlst = expected); println! ("Ignoring leading spaces:"); println! (" given: ", make_list_printable givenlst); println! (" result: ", make_list_printable sortedlst) end fn test_ignoring_trailing_spaces () : void = ( I added this test, myself. ) let val givenlst = $list ("ignore trailing spaces: 2b2", "ignore trailing spaces: 2b1 ", "ignore trailing spaces: 2b+0 ", "ignore trailing spaces: 2b+1 ") val expected = $list ("ignore trailing spaces: 2b+0 ", "ignore trailing spaces: 2b+1 ", "ignore trailing spaces: 2b1 ", "ignore trailing spaces: 2b2") val sortedlst = list_sort_strings_naturally givenlst in assertloc (sortedlst = expected); println! ("Ignoring trailing spaces:"); println! (" given: ", make_list_printable givenlst); println! (" result: ", make_list_printable sortedlst) end fn test_combining_adjacent_spaces () : void = let val givenlst = $list ("ignore m.a.s spaces: 2-2", "ignore m.a.s spaces: 2-1", "ignore m.a.s spaces: 2+0", "ignore m.a.s spaces: 2+1") val expected = $list ("ignore m.a.s spaces: 2+0", "ignore m.a.s spaces: 2+1", "ignore m.a.s spaces: 2-1", "ignore m.a.s spaces: 2-2") val sortedlst = list_sort_strings_naturally givenlst in assertloc (sortedlst = expected); println! ("Combining adjacent spaces:"); println! (" given: ", make_list_printable givenlst); println! (" result: ", make_list_printable sortedlst) end fn test_whitespace_equivalence () : void = let val givenlst = $list ("Equiv. spaces: 3-3", "Equiv.\rspaces: 3-2", "Equiv.\x0cspaces: 3-1", "Equiv.\x0bspaces: 3+0", "Equiv.\nspaces: 3+1", "Equiv.\tspaces: 3+2") val expected = $list ("Equiv.\x0bspaces: 3+0", "Equiv.\nspaces: 3+1", "Equiv.\tspaces: 3+2", "Equiv.\x0cspaces: 3-1", "Equiv.\rspaces: 3-2", "Equiv. spaces: 3-3") val sortedlst = list_sort_strings_naturally givenlst in assertloc (sortedlst = expected); println! ("All whitespace characters equivalent:"); println! (" given: ", make_list_printable givenlst); println! (" result: ", make_list_printable sortedlst) end fn test_case_independence () : void = let val givenlst = $list ("cASE INDEPENENT: 3-2", "caSE INDEPENENT: 3-1", "casE INDEPENENT: 3+0", "case INDEPENENT: 3+1") val expected = $list ("casE INDEPENENT: 3+0", "case INDEPENENT: 3+1", "caSE INDEPENENT: 3-1", "cASE INDEPENENT: 3-2") val sortedlst = list_sort_strings_naturally givenlst in assertloc (sortedlst = expected); println! ("Case independence:"); println! (" given: ", make_list_printable givenlst); println! (" result: ", make_list_printable sortedlst) end fn test_numeric_fields () : void = let val givenlst = $list ("foo100bar99baz0.txt", "foo100bar10baz0.txt", "foo1000bar99baz10.txt", "foo1000bar99baz9.txt") val expected = $list ("foo100bar10baz0.txt", "foo100bar99baz0.txt", "foo1000bar99baz9.txt", "foo1000bar99baz10.txt") val sortedlst = list_sort_strings_naturally givenlst in assertloc (sortedlst = expected); println! ("Numeric fields:"); println! (" given: ", make_list_printable givenlst); println! (" result: ", make_list_printable sortedlst) end implement main0 () = begin test_ignoring_leading_spaces (); test_ignoring_trailing_spaces (); test_combining_adjacent_spaces (); test_whitespace_equivalence (); test_case_independence (); test_numeric_fields (); println! ("success") end (------------------------------------------------------------------) </syntaxhighlight> {{out}} <pre>$ patscc -DATS_MEMALLOC_GCBDW natural_sorting.dats -lgc && ./a.out Ignoring leading spaces: given: \|ignore leading spaces: 2-2\|, \| ignore leading spaces: 2-1\|, \| ignore leading spaces: 2+0\|, \| ignore leading spaces: 2+1\| result: \| ignore leading spaces: 2+0\|, \| ignore leading spaces: 2+1\|, \| ignore leading spaces: 2-1\|, \|ignore leading spaces: 2-2\| Ignoring trailing spaces: given: \|ignore trailing spaces: 2b2\|, \|ignore trailing spaces: 2b1 \|, \|ignore trailing spaces: 2b+0 \|, \|ignore trailing spaces: 2b+1 \| result: \|ignore trailing spaces: 2b+0 \|, \|ignore trailing spaces: 2b+1 \|, \|ignore trailing spaces: 2b1 \|, \|ignore trailing spaces: 2b2\| Combining adjacent spaces: given: \|ignore m.a.s spaces: 2-2\|, \|ignore m.a.s spaces: 2-1\|, \|ignore m.a.s spaces: 2+0\|, \|ignore m.a.s spaces: 2+1\| result: \|ignore m.a.s spaces: 2+0\|, \|ignore m.a.s spaces: 2+1\|, \|ignore m.a.s spaces: 2-1\|, \|ignore m.a.s spaces: 2-2\| All whitespace characters equivalent: given: \|Equiv. spaces: 3-3\|, \|Equiv.[13]spaces: 3-2\|, \|Equiv.[12]spaces: 3-1\|, \|Equiv.[11]spaces: 3+0\|, \|Equiv [10]spaces: 3+1\|, \|Equiv.[9]spaces: 3+2\| result: \|Equiv.[11]spaces: 3+0\|, \|Equiv.[10]spaces: 3+1\|, \|Equiv.[9]spaces: 3+2\|, \|Equiv.[12]spaces: 3-1\|, \|Equiv.[13]spaces: 3-2\|, \|Equiv. spaces: 3-3\| Case independence: given: \|cASE INDEPENENT: 3-2\|, \|caSE INDEPENENT: 3-1\|, \|casE INDEPENENT: 3+0\|, \|case INDEPENENT: 3+1\| result: \|casE INDEPENENT: 3+0\|, \|case INDEPENENT: 3+1\|, \|caSE INDEPENENT: 3-1\|, \|cASE INDEPENENT: 3-2\| Numeric fields: given: \|foo100bar99baz0.txt\|, \|foo100bar10baz0.txt\|, \|foo1000bar99baz10.txt\|, \|foo1000bar99baz9.txt\| result: \|foo100bar10baz0.txt\|, \|foo100bar99baz0.txt\|, \|foo1000bar99baz9.txt\|, \|foo1000bar99baz10.txt\| success </pre> =={{header\|C}}== Line 174 ⟶ 1,041: Besides the numeric part, everything else was done in a uniform way by transforming input strings into some normalized format and comparing those instead. All sort options flags can be freely mixed together. C source is written in UTF-8 for easier reading here: don't do this for serious code. <~~lang~~syntaxhighlight lang="c">#include <stdio.h> #include <stdlib.h> #include <wchar.h> Line 430 ⟶ 1,297: return 0; }</~~lang~~syntaxhighlight>output<syntaxhighlight lang="text">Sort flags: (collapse spaces) 0000098 nina 100 NINA Line 482 ⟶ 1,349: 99 Ninja 100 NINA 100 niño</~~lang~~syntaxhighlight> =={{header\|D}}== Implements requests 1-5. <~~lang~~syntaxhighlight lang="d">import std.stdio, std.string, std.algorithm, std.array, std.conv, std.ascii, std.range; Line 552 ⟶ 1,419: } } </syntaxhighlight> ~~</lang>~~ {{out}} <pre>Test strings: Line 677 ⟶ 1,544: =={{header\|Elixir}}== Implements requests 1-5. <~~lang~~syntaxhighlight lang="elixir">defmodule Natural do def sorting(texts) do Enum.sort_by(texts, fn text -> compare_value(text) end) Line 732 ⟶ 1,599: ["The Wind in the Willows", "The 40th step more", "The 39 steps", "Wanda"]} ] \|> Enum.each(fn {title, input} -> Natural.task(title, input) end)</~~lang~~syntaxhighlight> {{out}} Line 854 ⟶ 1,721: Objectives six to eight are attainable, except that the character encodements available are not portable. Code page 850 doesn't offer the same accented character codes as for other systems, such as code page 437. But the auxiliary sort key approach easily accommodates substitute characters (and could also swap + and -, for example!), and could recognise ligatures as well. One might be prodded into escalating to 16-bit or even 32-bit character codes to maintain the same ease of manipulation. <syntaxhighlight lang="fortran"> ~~<lang Fortran>~~ MODULE STASHTEXTS !Using COMMON is rather more tedious. INTEGER MSG,KBD !I/O unit numbers. Line 1,220 ⟶ 2,087: END DO !On to the next. END !A handy hint from Mr. Natural: "At home or at work, get the right tool for the job!" </syntaxhighlight> ~~</lang>~~ Example output, in two columns: Entry\|Text Character Order Entry\|Text 'Natural' Order Line 1,257 ⟶ 2,124: When (if) a scan reaches the end of its text, the TAIL will be considered for the extraction of further characters. <syntaxhighlight lang="fortran"> ~~<lang Fortran>~~ MODULE ASSISTANCE INTEGER MSG,KBD !I/O unit numbers. Line 1,583 ⟶ 2,450: END DO !On to the next. END !A handy hint from Mr. Natural: "At home or at work, get the right tool for the job!" </syntaxhighlight> ~~</lang>~~ This time, because the texts are no longer being parsed into pieces, the book titles are not ordered together though they are in the required sequence disregarding the other entries. "The 39" and "The 40" have the "The " part converted into a TAIL, and so their first comparison characters are their digits, and in ASCII, digits precede letters. This is revealed in the third column, where the comparison characters are revealed, in an ad-hoc manner: what appears are the characters placed by every comparison so there may be contention. Output: Line 1,620 ⟶ 2,487: First four rules, no extra credit: <~~lang~~syntaxhighlight lang="go">package main import ( Line 1,753 ⟶ 2,620: } return false }</~~lang~~syntaxhighlight> {{out}} <pre> Line 1,824 ⟶ 2,691: Implements requests 1-5. <~~lang~~syntaxhighlight lang="haskell"> import Data.List import Data.Char Line 1,915 ⟶ 2,782: commonWords = ["the","a","an","of"] </syntaxhighlight> ~~</lang>~~ {{out}} Line 2,038 ⟶ 2,905: The natural way of approaching this task in J is to normalize the text based on the rules desired for sorting. Here, we limit ourselves to ascii, for portability, and decide that our domain shall be terminated strings (where each string ends with the same character - typically a newline): <~~lang~~syntaxhighlight lang="j">require'strings regex' lines=: <;.2 Line 2,047 ⟶ 2,914: norm=: [: split (32 9 12 13 14 15{a.) -.~ [: titleFix tolower natSor=: lines ;@/: norm&.>@lines</~~lang~~syntaxhighlight> Example data: <~~lang~~syntaxhighlight lang="j">IgnoringLeadingSpaces=:0 :0 ignore leading spaces: 2-2 ignore leading spaces: 2-1 Line 2,095 ⟶ 2,962: The 39 steps Wanda )</~~lang~~syntaxhighlight> Note that the required example which contains equivalent whitespace characters includes a '\n' in the data. So, for that example, we use a backslash as our terminator. Line 2,101 ⟶ 2,968: Example use: <~~lang~~syntaxhighlight lang="j"> natSor IgnoringLeadingSpaces ignore leading spaces: 2+0 ignore leading spaces: 2+1 Line 2,135 ⟶ 3,002: The Wind in the Willows </syntaxhighlight> ~~</lang>~~ =={{header\|JavaScript}}== Line 2,141 ⟶ 3,008: Implements the first four rules. Rule 4 works for digits up to 20 characters. <syntaxhighlight lang="javascript"> ~~<lang JavaScript>~~ var nsort = function(input) { var e = function(s) { Line 2,161 ⟶ 3,028: ])); // -> ['\nfile9.txt', 'file10.txt', 'File11.TXT', 'file12.txt'] </syntaxhighlight> ~~</lang>~~ =={{header\|jq}}== Line 2,178 ⟶ 3,045: matter therefore comes down to the filter named "splitup", which for clarity, we define here as a top-level function, as follows: <~~lang~~syntaxhighlight lang="jq">def splitup: def tidy: if .[0] == "" then .[1:] else . end \| if .[length-1] == "" then .[0:length-1] else . end ; Line 2,235 ⟶ 3,102: \| splitup # embedded integers ; sort_by(naturally);</~~lang~~syntaxhighlight> '''Testing''' For brevity, we use the input as given above, but modified slightly so that it can be read in as valid JSON. For example, the comments have been quoted. With these adjustments, the test driver can be written as a one-liner: <~~lang~~syntaxhighlight lang="jq">if type == "string" then "", . else natural_sort end</~~lang~~syntaxhighlight> {{out}} (scrollable) <div style="overflow:scroll; height:400px;"> <~~lang~~syntaxhighlight lang="sh">jq -r -f Natural_sorting.jq Natural_sorting.json # Ignoring leading spaces Line 2,292 ⟶ 3,159: "Wanda", "The Wind in the Willows" ]</~~lang~~syntaxhighlight> </div> =={{header\|Julia}}== The functional programming principle used was to customize the "lt" comparison option of Julia's basic sort() to the "natural" sort features required. <~~lang~~syntaxhighlight lang="julia">#1 natural1(x, y) = strip(x) < strip(y) Line 2,381 ⟶ 3,248: println("Testing sorting mod number $i. Sorted is: $(sort(testarrays[i], lt=ltfunction)).") end </~~lang~~syntaxhighlight>{{output}}<pre> Testing sorting mod number 1. Sorted is: [" ignore leading spaces: 2+0", " ignore leading spaces: 2+1", " ignore leading spaces: 2-1", "ignore leading spaces: 2-2"]. Testing sorting mod number 2. Sorted is: ["ignore m.a.s spaces: 2+0", "ignore m.a.s spaces: 2+1", "ignore m.a.s spaces: 2-1", "ignore m.a.s spaces: 2-2"]. Line 2,394 ⟶ 3,261: =={{header\|Kotlin}}== <~~lang~~syntaxhighlight lang="scala">// version 1.1.4-3 val r2 = Regex("""[ ]{2,}""") Line 2,574 ⟶ 3,441: s9.sortBy(::selector9) println(s9.map { "'$it'" }.joinToString("\n")) }</~~lang~~syntaxhighlight> {{out}} Line 2,624 ⟶ 3,491: 'Start with an ʒ: 2-2' 'Start with an ß: 2+0' </pre> =={{header\|Nim}}== All the rules are implemented. To sort a sequence of strings, we can use the function “sort” which accepts a comparison function or the template “sortByIt” which accepts an expression. In order to be able to use both methods, we have defined a function to build a “natural order key” which can be compared. The comparison function “naturalCmp” uses these keys. To build a key, we use the procedure “unidecode” which convert the UTF-8 string into an ASCII string, simplifying the implementation of the other rules. Unfortunately, the letter 'ʒ' is translated as 'z', so we have to convert this letter before calling “unidecode”. <syntaxhighlight lang="nim">import algorithm, parseutils, pegs, strutils, unidecode type Kind = enum fString, fNumber KeyItem= object case kind: Kind of fString: str: string of fNumber: num: Natural Key = seq[KeyItem] func cmp(a, b: Key): int = ## Compare two keys. for i in 0..<min(a.len, b.len): let ai = a[i] let bi = b[i] if ai.kind == bi.kind: result = if ai.kind == fString: cmp(ai.str, bi.str) else: cmp(ai.num, bi.num) if result != 0: return else: return if ai.kind == fString: 1 else: -1 result = if a.len < b.len: -1 else: (if a.len == b.len: 0 else: 1) proc natOrderKey(s: string): Key = ## Return the natural order key for a string. # Process 'ʒ' separately as "unidecode" converts it to 'z'. var s = s.replace("ʒ", "s") # Transform UTF-8 text into ASCII text. s = s.unidecode() # Remove leading and trailing white spaces. s = s.strip() # Make all whitespace characters equivalent and remove adjacent spaces. s = s.replace(peg"\s+", " ") # Switch to lower case. s = s.toLowerAscii() # Remove leading "the ". if s.startsWith("the ") and s.len > 4: s = s[4..^1] # Split into fields. var idx = 0 var val: int while idx < s.len: var n = s.skipUntil(Digits, start = idx) if n != 0: result.add KeyItem(kind: fString, str: s[idx..<(idx + n)]) inc idx, n n = s.parseInt(val, start = idx) if n != 0: result.add KeyItem(kind: fNumber, num: val) inc idx, n proc naturalCmp(a, b: string): int = ## Natural order comparison function. cmp(a.natOrderKey, b.natOrderKey) when isMainModule: proc test(title: string; list: openArray[string]) = echo title echo sorted(list, naturalCmp) echo "" test("Ignoring leading spaces.", ["ignore leading spaces: 2-2", "ignore leading spaces: 2-1", "ignore leading spaces: 2+0", "ignore leading spaces: 2+1"]) test("Ignoring multiple adjacent spaces (MAS).", ["ignore MAS spaces: 2-2", "ignore MAS spaces: 2-1", "ignore MAS spaces: 2+0", "ignore MAS spaces: 2+1"]) test("Equivalent whitespace characters.", ["Equiv. spaces: 3-3", "Equiv. \rspaces: 3-2", "Equiv. \x0cspaces: 3-1", "Equiv. \x0bspaces: 3+0", "Equiv. \nspaces: 3+1", "Equiv. \tspaces: 3+2"]) test("Case Independent sort.", ["cASE INDEPENDENT: 3-2", "caSE INDEPENDENT: 3-1", "casE INDEPENDENT: 3+0", "case INDEPENDENT: 3+1"]) test("Numeric fields as numerics.", ["foo100bar99baz0.txt", "foo100bar10baz0.txt", "foo1000bar99baz10.txt", "foo1000bar99baz9.txt"]) test("Title sorts.", ["The Wind in the Willows", "The 40th step more", "The 39 steps", "Wanda"]) test("Equivalent accented characters (and case).", ["Equiv. ý accents: 2-2", "Equiv. Ý accents: 2-1", "Equiv. y accents: 2+0", "Equiv. Y accents: 2+1"]) test("Separated ligatures.", ["Ĳ ligatured ij", "no ligature"]) test("Character replacements.", ["Start with an ʒ: 2-2", "Start with an ſ: 2-1", "Start with an ß: 2+0", "Start with an s: 2+1"])</syntaxhighlight> {{out}} <pre>Ignoring leading spaces. @["ignore leading spaces: 2+0", "ignore leading spaces: 2+1", "ignore leading spaces: 2-1", "ignore leading spaces: 2-2"] Ignoring multiple adjacent spaces (MAS). @["ignore MAS spaces: 2+0", "ignore MAS spaces: 2+1", "ignore MAS spaces: 2-1", "ignore MAS spaces: 2-2"] Equivalent whitespace characters. @["Equiv. \vspaces: 3+0", "Equiv. \nspaces: 3+1", "Equiv. \tspaces: 3+2", "Equiv. \fspaces: 3-1", "Equiv. \cspaces: 3-2", "Equiv. spaces: 3-3"] Case Independent sort. @["casE INDEPENDENT: 3+0", "case INDEPENDENT: 3+1", "caSE INDEPENDENT: 3-1", "cASE INDEPENDENT: 3-2"] Numeric fields as numerics. @["foo100bar10baz0.txt", "foo100bar99baz0.txt", "foo1000bar99baz9.txt", "foo1000bar99baz10.txt"] Title sorts. @["The 39 steps", "The 40th step more", "Wanda", "The Wind in the Willows"] Equivalent accented characters (and case). @["Equiv. y accents: 2+0", "Equiv. Y accents: 2+1", "Equiv. Ý accents: 2-1", "Equiv. ý accents: 2-2"] Separated ligatures. @["Ĳ ligatured ij", "no ligature"] Character replacements. @["Start with an s: 2+1", "Start with an ſ: 2-1", "Start with an ʒ: 2-2", "Start with an ß: 2+0"] </pre> Line 2,634 ⟶ 3,666: The "structured" features of Pascal do not facilitate escape from loops, so, ... some <code>goto</code> atavisms appear in what follows... <syntaxhighlight lang="pascal"> ~~<lang Pascal>~~ Program Natural; Uses DOS, crt; {Simple selection.} {Demonstrates a "natural" order of sorting text with nameish parts.} Line 2,879 ⟶ 3,911: END. </syntaxhighlight> ~~</lang>~~ Output, with "!" instead of a backslash to prevent context confusions here: Text order Natural order Line 2,913 ⟶ 3,945: This implements all 8 requirements<sup></sup>: <~~lang~~syntaxhighlight lang="perl"> use feature 'fc'; use Unicode::Normalize; Line 2,936 ⟶ 3,968: } @items; } </syntaxhighlight> ~~</lang>~~ : <sup>)</sup> Note that decomposing the strings to the NFKD normalization form and subsequently stripping off all code points of the <code>Nonspacing_Mark</code> category, removes differences caused by accents / ligatures / alternate character forms / etc. in a standards-compliant way. This coincides with all the examples given in the task description, with the exception that it does ''not'' replace "ʒ" with "s" — one could add <pre style="display:inline;padding:0.3em">$str =~ tr/ʒ/s/;</pre> for that but it seems a bit [[wp:International_Phonetic_Alphabet_chart_for_English_dialects#Chart\|whimsical]].) Line 2,942 ⟶ 3,974: '''Testing:''' <~~lang~~syntaxhighlight lang="perl"> use utf8; # interpret this script's source code as UTF8 use Test::More; # for plan(), is_deeply() Line 2,973 ⟶ 4,005: print "\n"; } </syntaxhighlight> ~~</lang>~~ {{out}} Line 3,050 ⟶ 4,082: As per C, common words anywhere in the string are omitted. All eight features. <br> Needs chcp 65001 (or 28591) to get this to work on Windows, be sure to save as utf8. ~~<lang Phix>--~~ ~~-- demo/rosetta/Natural_sorting2.exw~~ -- ~~function utf32ch(sequence s)~~ ~~for i=1 to length(s) do~~ ~~s[i] = utf8_to_utf32(s[i])[1]~~ ~~end for~~ ~~return s~~ ~~end function~~ <!--<syntaxhighlight lang="phix">--> ~~constant common = {"the","it","to","a","of","is"},~~ <span style="color: #000080;font-style:italic;">-- ~~{al,ac_replacements} = columnize({~~ -- demo/rosetta/Natural_sorting2.exw ~~{"Æ","AE"},{"æ","ae"},{"Þ","TH"},{"þ","th"},~~ --</span> ~~{"Ð","TH"},{"ð","th"},{"ß","ss"},{"�","fi"},~~ <span style="color: #008080;">function</span> <span style="color: #000000;">utf32ch</span><span style="color: #0000FF;">(</span><span style="color: #004080;">sequence</span> <span style="color: #000000;">s</span><span style="color: #0000FF;">)</span> ~~{"�","fl"},{"�",'s'},{"’",'z'},~~ <span style="color: #008080;">for</span> <span style="color: #000000;">i</span><span style="color: #0000FF;">=</span><span style="color: #000000;">1</span> <span style="color: #008080;">to</span> <span style="color: #7060A8;">length</span><span style="color: #0000FF;">(</span><span style="color: #000000;">s</span><span style="color: #0000FF;">)</span> <span style="color: #008080;">do</span> ~~{"À",'A'},{"Á",'A'},{"Â",'A'},{"Ã",'A'},~~ <span style="color: #000000;">s</span><span style="color: #0000FF;">[</span><span style="color: #000000;">i</span><span style="color: #0000FF;">]</span> <span style="color: #0000FF;">=</span> <span style="color: #000000;">utf8_to_utf32</span><span style="color: #0000FF;">(</span><span style="color: #000000;">s</span><span style="color: #0000FF;">[</span><span style="color: #000000;">i</span><span style="color: #0000FF;">])[</span><span style="color: #000000;">1</span><span style="color: #0000FF;">]</span> ~~{"Ä",'A'},{"Å",'A'},{"à",'a'},{"á",'a'},~~ <span style="color: #008080;">end</span> <span style="color: #008080;">for</span> ~~{"â",'a'},{"ã",'a'},{"ä",'a'},{"å",'a'},~~ <span style="color: #008080;">return</span> <span style="color: #000000;">s</span> ~~{"Ç",'C'},{"ç",'c'},{"È",'E'},{"É",'E'},~~ <span style="color: #008080;">end</span> <span style="color: #008080;">function</span> ~~{"Ê",'E'},{"Ë",'E'},{"è",'e'},{"é",'e'},~~ ~~{"ê",'e'},{"ë",'e'},{"Ì",'I'},{"Í",'I'},~~ <span style="color: #008080;">constant</span> <span style="color: #000000;">common</span> <span style="color: #0000FF;">=</span> <span style="color: #0000FF;">{</span><span style="color: #008000;">"the"</span><span style="color: #0000FF;">,</span><span style="color: #008000;">"it"</span><span style="color: #0000FF;">,</span><span style="color: #008000;">"to"</span><span style="color: #0000FF;">,</span><span style="color: #008000;">"a"</span><span style="color: #0000FF;">,</span><span style="color: #008000;">"of"</span><span style="color: #0000FF;">,</span><span style="color: #008000;">"is"</span><span style="color: #0000FF;">},</span> ~~{"Î",'I'},{"Ï",'I'},{"ì",'i'},{"í",'i'},~~ <span style="color: #0000FF;">{</span><span style="color: #000000;">al</span><span style="color: #0000FF;">,</span><span style="color: #000000;">ac_replacements</span><span style="color: #0000FF;">}</span> <span style="color: #0000FF;">=</span> <span style="color: #7060A8;">columnize</span><span style="color: #0000FF;">({</span> ~~{"î",'i'},{"ï",'i'},{"Ò",'O'},{"Ó",'O'},~~ <span style="color: #0000FF;">{</span><span style="color: #008000;">"Æ"</span><span style="color: #0000FF;">,</span><span style="color: #008000;">"AE"</span><span style="color: #0000FF;">},{</span><span style="color: #008000;">"æ"</span><span style="color: #0000FF;">,</span><span style="color: #008000;">"ae"</span><span style="color: #0000FF;">},{</span><span style="color: #008000;">"Þ"</span><span style="color: #0000FF;">,</span><span style="color: #008000;">"TH"</span><span style="color: #0000FF;">},{</span><span style="color: #008000;">"þ"</span><span style="color: #0000FF;">,</span><span style="color: #008000;">"th"</span><span style="color: #0000FF;">},</span> ~~{"Ô",'O'},{"Õ",'O'},{"Ö",'O'},{"Ø",'O'},~~ <span style="color: #0000FF;">{</span><span style="color: #008000;">"Ð"</span><span style="color: #0000FF;">,</span><span style="color: #008000;">"TH"</span><span style="color: #0000FF;">},{</span><span style="color: #008000;">"ð"</span><span style="color: #0000FF;">,</span><span style="color: #008000;">"th"</span><span style="color: #0000FF;">},{</span><span style="color: #008000;">"ß"</span><span style="color: #0000FF;">,</span><span style="color: #008000;">"ss"</span><span style="color: #0000FF;">},{</span><span style="color: #008000;">"�"</span><span style="color: #0000FF;">,</span><span style="color: #008000;">"fi"</span><span style="color: #0000FF;">},</span> ~~{"ò",'o'},{"ó",'o'},{"ô",'o'},{"õ",'o'},~~ <span style="color: #0000FF;">{</span><span style="color: #008000;">"�"</span><span style="color: #0000FF;">,</span><span style="color: #008000;">"fl"</span><span style="color: #0000FF;">},{</span><span style="color: #008000;">"�"</span><span style="color: #0000FF;">,</span><span style="color: #008000;">'s'</span><span style="color: #0000FF;">},{</span><span style="color: #008000;">"’"</span><span style="color: #0000FF;">,</span><span style="color: #008000;">'z'</span><span style="color: #0000FF;">},</span> ~~{"ö",'o'},{"ø",'o'},{"Ñ",'N'},{"ñ",'n'},~~ <span style="color: #0000FF;">{</span><span style="color: #008000;">"À"</span><span style="color: #0000FF;">,</span><span style="color: #008000;">'A'</span><span style="color: #0000FF;">},{</span><span style="color: #008000;">"Á"</span><span style="color: #0000FF;">,</span><span style="color: #008000;">'A'</span><span style="color: #0000FF;">},{</span><span style="color: #008000;">"Â"</span><span style="color: #0000FF;">,</span><span style="color: #008000;">'A'</span><span style="color: #0000FF;">},{</span><span style="color: #008000;">"Ã"</span><span style="color: #0000FF;">,</span><span style="color: #008000;">'A'</span><span style="color: #0000FF;">},</span> ~~{"Ù",'U'},{"Ú",'U'},{"Û",'U'},{"Ü",'U'},~~ <span style="color: #0000FF;">{</span><span style="color: #008000;">"Ä"</span><span style="color: #0000FF;">,</span><span style="color: #008000;">'A'</span><span style="color: #0000FF;">},{</span><span style="color: #008000;">"Å"</span><span style="color: #0000FF;">,</span><span style="color: #008000;">'A'</span><span style="color: #0000FF;">},{</span><span style="color: #008000;">"à"</span><span style="color: #0000FF;">,</span><span style="color: #008000;">'a'</span><span style="color: #0000FF;">},{</span><span style="color: #008000;">"á"</span><span style="color: #0000FF;">,</span><span style="color: #008000;">'a'</span><span style="color: #0000FF;">},</span> ~~{"ù",'u'},{"ú",'u'},{"û",'u'},{"ü",'u'},~~ <span style="color: #0000FF;">{</span><span style="color: #008000;">"â"</span><span style="color: #0000FF;">,</span><span style="color: #008000;">'a'</span><span style="color: #0000FF;">},{</span><span style="color: #008000;">"ã"</span><span style="color: #0000FF;">,</span><span style="color: #008000;">'a'</span><span style="color: #0000FF;">},{</span><span style="color: #008000;">"ä"</span><span style="color: #0000FF;">,</span><span style="color: #008000;">'a'</span><span style="color: #0000FF;">},{</span><span style="color: #008000;">"å"</span><span style="color: #0000FF;">,</span><span style="color: #008000;">'a'</span><span style="color: #0000FF;">},</span> ~~{"Ý",'Y'},{"ÿ",'y'},{"ý",'y'}}),~~ <span style="color: #0000FF;">{</span><span style="color: #008000;">"Ç"</span><span style="color: #0000FF;">,</span><span style="color: #008000;">'C'</span><span style="color: #0000FF;">},{</span><span style="color: #008000;">"ç"</span><span style="color: #0000FF;">,</span><span style="color: #008000;">'c'</span><span style="color: #0000FF;">},{</span><span style="color: #008000;">"È"</span><span style="color: #0000FF;">,</span><span style="color: #008000;">'E'</span><span style="color: #0000FF;">},{</span><span style="color: #008000;">"É"</span><span style="color: #0000FF;">,</span><span style="color: #008000;">'E'</span><span style="color: #0000FF;">},</span> ~~accents_and_ligatures = utf32ch(al)~~ <span style="color: #0000FF;">{</span><span style="color: #008000;">"Ê"</span><span style="color: #0000FF;">,</span><span style="color: #008000;">'E'</span><span style="color: #0000FF;">},{</span><span style="color: #008000;">"Ë"</span><span style="color: #0000FF;">,</span><span style="color: #008000;">'E'</span><span style="color: #0000FF;">},{</span><span style="color: #008000;">"è"</span><span style="color: #0000FF;">,</span><span style="color: #008000;">'e'</span><span style="color: #0000FF;">},{</span><span style="color: #008000;">"é"</span><span style="color: #0000FF;">,</span><span style="color: #008000;">'e'</span><span style="color: #0000FF;">},</span> <span style="color: #0000FF;">{</span><span style="color: #008000;">"ê"</span><span style="color: #0000FF;">,</span><span style="color: #008000;">'e'</span><span style="color: #0000FF;">},{</span><span style="color: #008000;">"ë"</span><span style="color: #0000FF;">,</span><span style="color: #008000;">'e'</span><span style="color: #0000FF;">},{</span><span style="color: #008000;">"Ì"</span><span style="color: #0000FF;">,</span><span style="color: #008000;">'I'</span><span style="color: #0000FF;">},{</span><span style="color: #008000;">"Í"</span><span style="color: #0000FF;">,</span><span style="color: #008000;">'I'</span><span style="color: #0000FF;">},</span> ~~function normalise(string s)~~ <span style="color: #0000FF;">{</span><span style="color: #008000;">"Î"</span><span style="color: #0000FF;">,</span><span style="color: #008000;">'I'</span><span style="color: #0000FF;">},{</span><span style="color: #008000;">"Ï"</span><span style="color: #0000FF;">,</span><span style="color: #008000;">'I'</span><span style="color: #0000FF;">},{</span><span style="color: #008000;">"ì"</span><span style="color: #0000FF;">,</span><span style="color: #008000;">'i'</span><span style="color: #0000FF;">},{</span><span style="color: #008000;">"í"</span><span style="color: #0000FF;">,</span><span style="color: #008000;">'i'</span><span style="color: #0000FF;">},</span> ~~sequence utf32 = utf8_to_utf32(s)~~ <span style="color: #0000FF;">{</span><span style="color: #008000;">"î"</span><span style="color: #0000FF;">,</span><span style="color: #008000;">'i'</span><span style="color: #0000FF;">},{</span><span style="color: #008000;">"ï"</span><span style="color: #0000FF;">,</span><span style="color: #008000;">'i'</span><span style="color: #0000FF;">},{</span><span style="color: #008000;">"Ò"</span><span style="color: #0000FF;">,</span><span style="color: #008000;">'O'</span><span style="color: #0000FF;">},{</span><span style="color: #008000;">"Ó"</span><span style="color: #0000FF;">,</span><span style="color: #008000;">'O'</span><span style="color: #0000FF;">},</span> ~~sequence res = {}~~ <span style="color: #0000FF;">{</span><span style="color: #008000;">"Ô"</span><span style="color: #0000FF;">,</span><span style="color: #008000;">'O'</span><span style="color: #0000FF;">},{</span><span style="color: #008000;">"Õ"</span><span style="color: #0000FF;">,</span><span style="color: #008000;">'O'</span><span style="color: #0000FF;">},{</span><span style="color: #008000;">"Ö"</span><span style="color: #0000FF;">,</span><span style="color: #008000;">'O'</span><span style="color: #0000FF;">},{</span><span style="color: #008000;">"Ø"</span><span style="color: #0000FF;">,</span><span style="color: #008000;">'O'</span><span style="color: #0000FF;">},</span> ~~integer i = 1, ch, prev~~ <span style="color: #0000FF;">{</span><span style="color: #008000;">"ò"</span><span style="color: #0000FF;">,</span><span style="color: #008000;">'o'</span><span style="color: #0000FF;">},{</span><span style="color: #008000;">"ó"</span><span style="color: #0000FF;">,</span><span style="color: #008000;">'o'</span><span style="color: #0000FF;">},{</span><span style="color: #008000;">"ô"</span><span style="color: #0000FF;">,</span><span style="color: #008000;">'o'</span><span style="color: #0000FF;">},{</span><span style="color: #008000;">"õ"</span><span style="color: #0000FF;">,</span><span style="color: #008000;">'o'</span><span style="color: #0000FF;">},</span> ~~for i=1 to length(utf32) do~~ <span style="color: #0000FF;">{</span><span style="color: #008000;">"ö"</span><span style="color: #0000FF;">,</span><span style="color: #008000;">'o'</span><span style="color: #0000FF;">},{</span><span style="color: #008000;">"ø"</span><span style="color: #0000FF;">,</span><span style="color: #008000;">'o'</span><span style="color: #0000FF;">},{</span><span style="color: #008000;">"Ñ"</span><span style="color: #0000FF;">,</span><span style="color: #008000;">'N'</span><span style="color: #0000FF;">},{</span><span style="color: #008000;">"ñ"</span><span style="color: #0000FF;">,</span><span style="color: #008000;">'n'</span><span style="color: #0000FF;">},</span> ~~ch = utf32[i]~~ <span style="color: #0000FF;">{</span><span style="color: #008000;">"Ù"</span><span style="color: #0000FF;">,</span><span style="color: #008000;">'U'</span><span style="color: #0000FF;">},{</span><span style="color: #008000;">"Ú"</span><span style="color: #0000FF;">,</span><span style="color: #008000;">'U'</span><span style="color: #0000FF;">},{</span><span style="color: #008000;">"Û"</span><span style="color: #0000FF;">,</span><span style="color: #008000;">'U'</span><span style="color: #0000FF;">},{</span><span style="color: #008000;">"Ü"</span><span style="color: #0000FF;">,</span><span style="color: #008000;">'U'</span><span style="color: #0000FF;">},</span> ~~if find(ch," \t\r\n\x0b\x0c") then~~ <span style="color: #0000FF;">{</span><span style="color: #008000;">"ù"</span><span style="color: #0000FF;">,</span><span style="color: #008000;">'u'</span><span style="color: #0000FF;">},{</span><span style="color: #008000;">"ú"</span><span style="color: #0000FF;">,</span><span style="color: #008000;">'u'</span><span style="color: #0000FF;">},{</span><span style="color: #008000;">"û"</span><span style="color: #0000FF;">,</span><span style="color: #008000;">'u'</span><span style="color: #0000FF;">},{</span><span style="color: #008000;">"ü"</span><span style="color: #0000FF;">,</span><span style="color: #008000;">'u'</span><span style="color: #0000FF;">},</span> ~~if length(res)>0 and prev!=' ' then~~ <span style="color: #0000FF;">{</span><span style="color: #008000;">"Ý"</span><span style="color: #0000FF;">,</span><span style="color: #008000;">'Y'</span><span style="color: #0000FF;">},{</span><span style="color: #008000;">"ÿ"</span><span style="color: #0000FF;">,</span><span style="color: #008000;">'y'</span><span style="color: #0000FF;">},{</span><span style="color: #008000;">"ý"</span><span style="color: #0000FF;">,</span><span style="color: #008000;">'y'</span><span style="color: #0000FF;">}}),</span> ~~res &= -1~~ <span style="color: #000000;">accents_and_ligatures</span> <span style="color: #0000FF;">=</span> <span style="color: #000000;">utf32ch</span><span style="color: #0000FF;">(</span><span style="color: #000000;">al</span><span style="color: #0000FF;">)</span> ~~end if~~ ~~prev = ' '~~ <span style="color: #008080;">function</span> <span style="color: #000000;">normalise</span><span style="color: #0000FF;">(</span><span style="color: #004080;">string</span> <span style="color: #000000;">s</span><span style="color: #0000FF;">)</span> ~~elsif find(ch,"0123456789") then~~ <span style="color: #004080;">sequence</span> <span style="color: #000000;">utf32</span> <span style="color: #0000FF;">=</span> <span style="color: #000000;">utf8_to_utf32</span><span style="color: #0000FF;">(</span><span style="color: #000000;">s</span><span style="color: #0000FF;">)</span> ~~if length(res)=0 or prev!='0' then~~ <span style="color: #004080;">sequence</span> <span style="color: #000000;">res</span> <span style="color: #0000FF;">=</span> <span style="color: #0000FF;">{}</span> ~~res &= ch-'0'~~ <span style="color: #004080;">integer</span> <span style="color: #000000;">i</span> <span style="color: #0000FF;">=</span> <span style="color: #000000;">1</span><span style="color: #0000FF;">,</span> <span style="color: #000000;">ch</span><span style="color: #0000FF;">,</span> <span style="color: #000000;">prev</span> ~~else~~ <span style="color: #008080;">for</span> <span style="color: #000000;">i</span><span style="color: #0000FF;">=</span><span style="color: #000000;">1</span> <span style="color: #008080;">to</span> <span style="color: #7060A8;">length</span><span style="color: #0000FF;">(</span><span style="color: #000000;">utf32</span><span style="color: #0000FF;">)</span> <span style="color: #008080;">do</span> ~~res[$] = res[$]10+ch-'0'~~ <span style="color: #000000;">ch</span> <span style="color: #0000FF;">=</span> <span style="color: #000000;">utf32</span><span style="color: #0000FF;">[</span><span style="color: #000000;">i</span><span style="color: #0000FF;">]</span> ~~end if~~ <span style="color: #008080;">if</span> <span style="color: #7060A8;">find</span><span style="color: #0000FF;">(</span><span style="color: #000000;">ch</span><span style="color: #0000FF;">,</span><span style="color: #008000;">" \t\r\n\x0b\x0c"</span><span style="color: #0000FF;">)</span> <span style="color: #008080;">then</span> ~~prev = '0'~~ <span style="color: #008080;">if</span> <span style="color: #7060A8;">length</span><span style="color: #0000FF;">(</span><span style="color: #000000;">res</span><span style="color: #0000FF;">)></span><span style="color: #000000;">0</span> <span style="color: #008080;">and</span> <span style="color: #000000;">prev</span><span style="color: #0000FF;">!=</span><span style="color: #008000;">' '</span> <span style="color: #008080;">then</span> ~~else~~ <span style="color: #000000;">res</span> <span style="color: #0000FF;">&=</span> <span style="color: #0000FF;">-</span><span style="color: #000000;">1</span> ~~object rep = find(ch,accents_and_ligatures)~~ <span style="color: #008080;">end</span> <span style="color: #008080;">if</span> ~~if rep then~~ <span style="color: #000000;">prev</span> <span style="color: #0000FF;">=</span> <span style="color: #008000;">' '</span> ~~rep = lower(ac_replacements[rep])~~ <span style="color: #008080;">elsif</span> <span style="color: #7060A8;">find</span><span style="color: #0000FF;">(</span><span style="color: #000000;">ch</span><span style="color: #0000FF;">,</span><span style="color: #008000;">"0123456789"</span><span style="color: #0000FF;">)</span> <span style="color: #008080;">then</span> ~~else~~ <span style="color: #008080;">if</span> <span style="color: #7060A8;">length</span><span style="color: #0000FF;">(</span><span style="color: #000000;">res</span><span style="color: #0000FF;">)=</span><span style="color: #000000;">0</span> <span style="color: #008080;">or</span> <span style="color: #000000;">prev</span><span style="color: #0000FF;">!=</span><span style="color: #008000;">'0'</span> <span style="color: #008080;">then</span> ~~rep = lower(ch)~~ <span style="color: #000000;">res</span> <span style="color: #0000FF;">&=</span> <span style="color: #000000;">ch</span><span style="color: #0000FF;">-</span><span style="color: #008000;">'0'</span> ~~end if~~ if ~~length(res)~~<span ~~and~~style="color: ~~sequence(res[$]) then~~#008080;">else</span> <span style="color: #000000;">res</span><span style="color: #0000FF;">[$]</span> <span style="color: #0000FF;">=</span> <span style="color: #000000;">res</span><span style="color: #0000FF;">[$]</span><span style="color: #000000;">10</span><span style="color: #0000FF;">+</span><span style="color: #000000;">ch</span><span style="color: #0000FF;">-</span><span style="color: #008000;">'0'</span> ~~res[$] &= rep~~ <span style="color: #008080;">end</span> <span style="color: #008080;">if</span> ~~else~~ <span style="color: #000000;">prev</span> <span style="color: #0000FF;">=</span> <span style="color: #008000;">'0'</span> ~~res = append(res,""&rep)~~ <span style="color: ~~end if~~#008080;">else</span> <span style="color: #004080;">object</span> <span style="color: #000000;">rep</span> <span style="color: #0000FF;">=</span> <span style="color: #7060A8;">find</span><span style="color: #0000FF;">(</span><span style="color: #000000;">ch</span><span style="color: #0000FF;">,</span><span style="color: #000000;">accents_and_ligatures</span><span style="color: #0000FF;">)</span> ~~prev = ch~~ <span style="color: #008080;">if</span> <span style="color: #000000;">rep</span> <span style="color: #008080;">then</span> ~~end if~~ <span style="color: #000000;">rep</span> <span style="color: #0000FF;">=</span> <span style="color: #7060A8;">lower</span><span style="color: #0000FF;">(</span><span style="color: #000000;">ac_replacements</span><span style="color: #0000FF;">[</span><span style="color: #000000;">rep</span><span style="color: #0000FF;">])</span> ~~end for~~ <span style="color: #008080;">else</span> ~~for i=1 to length(common) do~~ <span style="color: #000000;">rep</span> <span style="color: #0000FF;">=</span> <span style="color: #7060A8;">lower</span><span style="color: #0000FF;">(</span><span style="color: #000000;">ch</span><span style="color: #0000FF;">)</span> ~~while 1 do~~ <span style="color: #008080;">end</span> <span style="color: #008080;">if</span> ~~integer k = find(common[i],res)~~ <span style="color: #008080;">if</span> <span style="color: #7060A8;">length</span><span style="color: #0000FF;">(</span><span style="color: #000000;">res</span><span style="color: #0000FF;">)</span> <span style="color: #008080;">and</span> <span style="color: #004080;">sequence</span><span style="color: #0000FF;">(</span><span style="color: #000000;">res</span><span style="color: #0000FF;">[$])</span> <span style="color: #008080;">then</span> ~~if k=0 then exit end if~~ <span style="color: #000000;">res</span><span style="color: #0000FF;">[$]</span> <span style="color: #0000FF;">&=</span> <span style="color: #000000;">rep</span> ~~res[k..k] = {}~~ if ~~length(res)~~<span ~~and res[1]~~style=-1"color: ~~then~~#008080;">else</span> <span style="color: #000000;">res</span> <span style="color: #0000FF;">=</span> <span style="color: #7060A8;">append</span><span style="color: #0000FF;">(</span><span style="color: #000000;">res</span><span style="color: #0000FF;">,</span><span style="color: #008000;">""</span><span style="color: #0000FF;">&</span><span style="color: #000000;">rep</span><span style="color: #0000FF;">)</span> ~~res = res[2..$]~~ <span style="color: #008080;">end</span> <span style="color: #008080;">if</span> ~~end if~~ <span style="color: #000000;">prev</span> <span style="color: #0000FF;">=</span> <span style="color: #000000;">ch</span> ~~end while~~ <span style="color: #008080;">end</span> <span style="color: #008080;">if</span> ~~end for~~ <span style="color: #008080;">end</span> <span style="color: #008080;">for</span> ~~if length(res) and prev=' ' then~~ <span style="color: #008080;">for</span> <span style="color: #000000;">i</span><span style="color: #0000FF;">=</span><span style="color: #000000;">1</span> <span style="color: #008080;">to</span> <span style="color: #7060A8;">length</span><span style="color: #0000FF;">(</span><span style="color: #000000;">common</span><span style="color: #0000FF;">)</span> <span style="color: #008080;">do</span> ~~res = res[1..$-1]~~ <span style="color: #008080;">while</span> <span style="color: #000000;">1</span> <span style="color: #008080;">do</span> ~~end if~~ <span style="color: #004080;">integer</span> <span style="color: #000000;">k</span> <span style="color: #0000FF;">=</span> <span style="color: #7060A8;">find</span><span style="color: #0000FF;">(</span><span style="color: #000000;">common</span><span style="color: #0000FF;">[</span><span style="color: #000000;">i</span><span style="color: #0000FF;">],</span><span style="color: #000000;">res</span><span style="color: #0000FF;">)</span> ~~return res~~ <span style="color: #008080;">if</span> <span style="color: #000000;">k</span><span style="color: #0000FF;">=</span><span style="color: #000000;">0</span> <span style="color: #008080;">then</span> <span style="color: #008080;">exit</span> <span style="color: #008080;">end</span> <span style="color: #008080;">if</span> ~~end function~~ <span style="color: #000000;">res</span><span style="color: #0000FF;">[</span><span style="color: #000000;">k</span><span style="color: #0000FF;">..</span><span style="color: #000000;">k</span><span style="color: #0000FF;">]</span> <span style="color: #0000FF;">=</span> <span style="color: #0000FF;">{}</span> <span style="color: #008080;">if</span> <span style="color: #7060A8;">length</span><span style="color: #0000FF;">(</span><span style="color: #000000;">res</span><span style="color: #0000FF;">)</span> <span style="color: #008080;">and</span> <span style="color: #000000;">res</span><span style="color: #0000FF;">[</span><span style="color: #000000;">1</span><span style="color: #0000FF;">]=-</span><span style="color: #000000;">1</span> <span style="color: #008080;">then</span> ~~sequence tests = {~~ <span style="color: #000000;">res</span> <span style="color: #0000FF;">=</span> <span style="color: #000000;">res</span><span style="color: #0000FF;">[</span><span style="color: #000000;">2</span><span style="color: #0000FF;">..$]</span> ~~{" leading spaces: 4",~~ <span style="color: #008080;">end</span> <span style="color: #008080;">if</span> ~~" leading spaces: 3",~~ <span style="color: #008080;">end</span> <span style="color: #008080;">while</span> ~~"leading spaces: 2",~~ <span style="color: #008080;">end</span> <span style="color: #008080;">for</span> ~~" leading spaces: 1"},~~ <span style="color: #008080;">if</span> <span style="color: #7060A8;">length</span><span style="color: #0000FF;">(</span><span style="color: #000000;">res</span><span style="color: #0000FF;">)</span> <span style="color: #008080;">and</span> <span style="color: #000000;">prev</span><span style="color: #0000FF;">=</span><span style="color: #008000;">' '</span> <span style="color: #008080;">then</span> ~~{"adjacent spaces: 3",~~ <span style="color: #000000;">res</span> <span style="color: #0000FF;">=</span> <span style="color: #000000;">res</span><span style="color: #0000FF;">[</span><span style="color: #000000;">1</span><span style="color: #0000FF;">..$-</span><span style="color: #000000;">1</span><span style="color: #0000FF;">]</span> ~~"adjacent spaces: 4",~~ <span style="color: #008080;">end</span> <span style="color: #008080;">if</span> ~~"adjacent spaces: 1",~~ <span style="color: #008080;">return</span> <span style="color: #000000;">res</span> ~~"adjacent spaces: 2"},~~ <span style="color: #008080;">end</span> <span style="color: #008080;">function</span> ~~{"white space: 3-2",~~ ~~"white\r space: 3-3",~~ <span style="color: #004080;">sequence</span> <span style="color: #000000;">tests</span> <span style="color: #0000FF;">=</span> <span style="color: #0000FF;">{</span> ~~"white\x0cspace: 3-1",~~ <span style="color: #0000FF;">{</span><span style="color: #008000;">" leading spaces: 4"</span><span style="color: #0000FF;">,</span> ~~"white\x0bspace: 3+0",~~ <span style="~~white\n~~color: #008000;">" ~~space~~ leading spaces: 3+1"</span><span style="color: #0000FF;">,</span> <span style="~~white\t~~color: #008000;">"leading ~~space~~spaces: 3+2"}</span><span style="color: #0000FF;">,</span> { <span style="~~caSE~~color: ~~independent~~#008000;">" leading spaces: 3-1"</span><span style="color: #0000FF;">},</span> <span style="color: #0000FF;">{</span><span style="color: #008000;">"adjacent spaces: 3"</span><span style="color: #0000FF;">,</span> ~~"cASE independent: 3-2",~~ <span style="~~casE~~color: ~~independent~~#008000;">"adjacent spaces: 4"</span><span style="color: ~~3+0~~#0000FF;">,</span> <span style="~~case~~color: ~~independent~~#008000;">"adjacent spaces: 3+1"}</span><span style="color: #0000FF;">,</span> <span style="color: #008000;">"adjacent spaces: 2"</span><span style="color: #0000FF;">},</span> ~~{"foo1000bar99baz9.txt",~~ <span style="color: #0000FF;">{</span><span style="color: #008000;">"white space: 3-2"</span><span style="color: #0000FF;">,</span> ~~"foo100bar99baz0.txt",~~ <span style="color: #008000;">"white\r space: 3-3"</span><span style="color: #0000FF;">,</span> ~~"foo100bar10baz0.txt",~~ <span style="color: #008000;">"white\x0cspace: 3-1"</span><span style="color: #0000FF;">,</span> ~~"foo1000bar99baz10.txt"},~~ <span style="color: #008000;">"white\x0bspace: 3+0"</span><span style="color: #0000FF;">,</span> ~~{"foo1bar",~~ <span style="color: #008000;">"white\n space: 3+1"</span><span style="color: #0000FF;">,</span> ~~"foo100bar",~~ <span style="color: #008000;">"white\t space: 3+2"</span><span style="color: #0000FF;">},</span> ~~"foo bar",~~ <span style="color: #0000FF;">{</span><span style="color: #008000;">"caSE independent: 3-1"</span><span style="color: #0000FF;">,</span> ~~"foo1000bar"},~~ { <span style="~~The~~color: ~~Wind~~#008000;">"cASE inindependent: ~~the~~3-2"</span><span style="color: ~~Willows~~#0000FF;">,</span> <span style="~~The~~color: ~~40th~~#008000;">"casE ~~step~~independent: 3+0"</span><span style="color: ~~more~~#0000FF;">,</span> <span style="~~The~~color: 39#008000;">"case independent: 3+1"</span><span style="color: ~~steps~~#0000FF;">},</span> <span style="color: #0000FF;">{</span><span style="color: #008000;">"foo1000bar99baz9.txt"</span><span style="color: #0000FF;">,</span> ~~"Wanda"},~~ { <span style="~~ignore~~color: ý#008000;">"foo100bar99baz0.txt"</span><span ~~accents~~style="color: ~~2-2~~#0000FF;">,</span> <span style="~~ignore~~color: Ý#008000;">"foo100bar10baz0.txt"</span><span ~~accents~~style="color: ~~2-1~~#0000FF;">,</span> <span style="~~ignore~~color: y#008000;">"foo1000bar99baz10.txt"</span><span ~~accents~~style="color: ~~2+0~~#0000FF;">},</span> <span style="color: #0000FF;">{</span><span style="color: #008000;">"foo1bar"</span><span style="color: #0000FF;">,</span> ~~"ignore Y accents: 2+1"},~~ { <span style="~~Ball~~color: #008000;",>"~~Card~~foo100bar"~~,"above",~~</span><span style="~~aether~~color: #0000FF;">,</span> <span style="~~apple~~color: #008000;",>"~~autumn~~foo bar"~~,"außen",~~</span><span style="~~bald~~color: #0000FF;">,</span> <span style="~~car~~color: #008000;",>"~~e-mail~~foo1000bar"~~,"evoke",~~</span><span style="~~nina~~color: #0000FF;">},</span> <span style="color: #0000FF;">{</span><span style="color: #008000;">"The Wind in the Willows"</span><span style="color: #0000FF;">,</span> ~~"niño","Æon","Évian","æon"},~~ <span style="color: #008000;">"The 40th step more"</span><span style="color: #0000FF;">,</span> } <span style="color: #008000;">"The 39 steps"</span><span style="color: #0000FF;">,</span> <span style="color: #008000;">"Wanda"</span><span style="color: #0000FF;">},</span> ~~sequence s, n, t, tags~~ <span style="color: #0000FF;">{</span><span style="color: #008000;">"ignore ý accents: 2-2"</span><span style="color: #0000FF;">,</span> <span style="color: #008000;">"ignore Ý accents: 2-1"</span><span style="color: #0000FF;">,</span> ~~function natural(integer i, integer j)~~ <span style="color: #008000;">"ignore y accents: 2+0"</span><span style="color: #0000FF;">,</span> ~~return compare(t[i],t[j])~~ <span style="color: #008000;">"ignore Y accents: 2+1"</span><span style="color: #0000FF;">},</span> ~~end function~~ <span style="color: #0000FF;">{</span><span style="color: #008000;">"Ball"</span><span style="color: #0000FF;">,</span><span style="color: #008000;">"Card"</span><span style="color: #0000FF;">,</span><span style="color: #008000;">"above"</span><span style="color: #0000FF;">,</span><span style="color: #008000;">"aether"</span><span style="color: #0000FF;">,</span> <span style="color: #008000;">"apple"</span><span style="color: #0000FF;">,</span><span style="color: #008000;">"autumn"</span><span style="color: #0000FF;">,</span><span style="color: #008000;">"außen"</span><span style="color: #0000FF;">,</span><span style="color: #008000;">"bald"</span><span style="color: #0000FF;">,</span> <span style="color: #008000;">"car"</span><span style="color: #0000FF;">,</span><span style="color: #008000;">"e-mail"</span><span style="color: #0000FF;">,</span><span style="color: #008000;">"evoke"</span><span style="color: #0000FF;">,</span><span style="color: #008000;">"nina"</span><span style="color: #0000FF;">,</span> <span style="color: #008000;">"niño"</span><span style="color: #0000FF;">,</span><span style="color: #008000;">"Æon"</span><span style="color: #0000FF;">,</span><span style="color: #008000;">"Évian"</span><span style="color: #0000FF;">,</span><span style="color: #008000;">"æon"</span><span style="color: #0000FF;">},</span> <span style="color: #0000FF;">}</span> <span style="color: #004080;">sequence</span> <span style="color: #000000;">s</span><span style="color: #0000FF;">,</span> <span style="color: #000000;">n</span><span style="color: #0000FF;">,</span> <span style="color: #000000;">t</span><span style="color: #0000FF;">,</span> <span style="color: #000000;">tags</span> <span style="color: #008080;">function</span> <span style="color: #000000;">natural</span><span style="color: #0000FF;">(</span><span style="color: #004080;">integer</span> <span style="color: #000000;">i</span><span style="color: #0000FF;">,</span> <span style="color: #004080;">integer</span> <span style="color: #000000;">j</span><span style="color: #0000FF;">)</span> <span style="color: #008080;">return</span> <span style="color: #000000;">compare</span><span style="color: #0000FF;">(</span><span style="color: #000000;">t</span><span style="color: #0000FF;">[</span><span style="color: #000000;">i</span><span style="color: #0000FF;">],</span><span style="color: #000000;">t</span><span style="color: #0000FF;">[</span><span style="color: #000000;">j</span><span style="color: #0000FF;">])</span> <span style="color: #008080;">end</span> <span style="color: #008080;">function</span> <span style="color: #008080;">for</span> <span style="color: #000000;">i</span><span style="color: #0000FF;">=</span><span style="color: #000000;">1</span> <span style="color: #008080;">to</span> <span style="color: #7060A8;">length</span><span style="color: #0000FF;">(</span><span style="color: #000000;">tests</span><span style="color: #0000FF;">)</span> <span style="color: #008080;">do</span> <span style="color: #000000;">s</span> <span style="color: #0000FF;">=</span> <span style="color: #000000;">tests</span><span style="color: #0000FF;">[</span><span style="color: #000000;">i</span><span style="color: #0000FF;">]</span> <span style="color: #000000;">n</span> <span style="color: #0000FF;">=</span> <span style="color: #000000;">sort</span><span style="color: #0000FF;">(</span><span style="color: #000000;">s</span><span style="color: #0000FF;">)</span> <span style="color: #000000;">t</span> <span style="color: #0000FF;">=</span> <span style="color: #7060A8;">repeat</span><span style="color: #0000FF;">(</span><span style="color: #000000;">0</span><span style="color: #0000FF;">,</span><span style="color: #7060A8;">length</span><span style="color: #0000FF;">(</span><span style="color: #000000;">s</span><span style="color: #0000FF;">))</span> <span style="color: #008080;">for</span> <span style="color: #000000;">j</span><span style="color: #0000FF;">=</span><span style="color: #000000;">1</span> <span style="color: #008080;">to</span> <span style="color: #7060A8;">length</span><span style="color: #0000FF;">(</span><span style="color: #000000;">s</span><span style="color: #0000FF;">)</span> <span style="color: #008080;">do</span> <span style="color: #000000;">t</span><span style="color: #0000FF;">[</span><span style="color: #000000;">j</span><span style="color: #0000FF;">]</span> <span style="color: #0000FF;">=</span> <span style="color: #000000;">normalise</span><span style="color: #0000FF;">(</span><span style="color: #000000;">s</span><span style="color: #0000FF;">[</span><span style="color: #000000;">j</span><span style="color: #0000FF;">])</span> <span style="color: #008080;">end</span> <span style="color: #008080;">for</span> <span style="color: #000000;">tags</span> <span style="color: #0000FF;">=</span> <span style="color: #000000;">custom_sort</span><span style="color: #0000FF;">(</span><span style="color: #7060A8;">routine_id</span><span style="color: #0000FF;">(</span><span style="color: #008000;">"natural"</span><span style="color: #0000FF;">),</span><span style="color: #7060A8;">tagset</span><span style="color: #0000FF;">(</span><span style="color: #7060A8;">length</span><span style="color: #0000FF;">(</span><span style="color: #000000;">s</span><span style="color: #0000FF;">)))</span> <span style="color: #008080;">if</span> <span style="color: #000000;">i</span><span style="color: #0000FF;">=</span><span style="color: #000000;">3</span> <span style="color: #008080;">then</span> <span style="color: #000080;font-style:italic;">-- clean up the whitespace mess</span> <span style="color: #008080;">for</span> <span style="color: #000000;">j</span><span style="color: #0000FF;">=</span><span style="color: #000000;">1</span> <span style="color: #008080;">to</span> <span style="color: #7060A8;">length</span><span style="color: #0000FF;">(</span><span style="color: #000000;">s</span><span style="color: #0000FF;">)</span> <span style="color: #008080;">do</span> <span style="color: #000000;">s</span><span style="color: #0000FF;">[</span><span style="color: #000000;">j</span><span style="color: #0000FF;">]</span> <span style="color: #0000FF;">=</span> <span style="color: #7060A8;">substitute_all</span><span style="color: #0000FF;">(</span><span style="color: #000000;">s</span><span style="color: #0000FF;">[</span><span style="color: #000000;">j</span><span style="color: #0000FF;">],{</span><span style="color: #008000;">"\r"</span><span style="color: #0000FF;">,</span><span style="color: #008000;">"\x0c"</span><span style="color: #0000FF;">,</span><span style="color: #008000;">"\x0b"</span><span style="color: #0000FF;">,</span><span style="color: #008000;">"\n"</span><span style="color: #0000FF;">,</span><span style="color: #008000;">"\t"</span><span style="color: #0000FF;">},{</span><span style="color: #008000;">"\\r"</span><span style="color: #0000FF;">,</span><span style="color: #008000;">"\\x0c"</span><span style="color: #0000FF;">,</span><span style="color: #008000;">"\\x0b"</span><span style="color: #0000FF;">,</span><span style="color: #008000;">"\\n"</span><span style="color: #0000FF;">,</span><span style="color: #008000;">"\\t"</span><span style="color: #0000FF;">})</span> <span style="color: #000000;">n</span><span style="color: #0000FF;">[</span><span style="color: #000000;">j</span><span style="color: #0000FF;">]</span> <span style="color: #0000FF;">=</span> <span style="color: #7060A8;">substitute_all</span><span style="color: #0000FF;">(</span><span style="color: #000000;">n</span><span style="color: #0000FF;">[</span><span style="color: #000000;">j</span><span style="color: #0000FF;">],{</span><span style="color: #008000;">"\r"</span><span style="color: #0000FF;">,</span><span style="color: #008000;">"\x0c"</span><span style="color: #0000FF;">,</span><span style="color: #008000;">"\x0b"</span><span style="color: #0000FF;">,</span><span style="color: #008000;">"\n"</span><span style="color: #0000FF;">,</span><span style="color: #008000;">"\t"</span><span style="color: #0000FF;">},{</span><span style="color: #008000;">"\\r"</span><span style="color: #0000FF;">,</span><span style="color: #008000;">"\\x0c"</span><span style="color: #0000FF;">,</span><span style="color: #008000;">"\\x0b"</span><span style="color: #0000FF;">,</span><span style="color: #008000;">"\\n"</span><span style="color: #0000FF;">,</span><span style="color: #008000;">"\\t"</span><span style="color: #0000FF;">})</span> <span style="color: #008080;">end</span> <span style="color: #008080;">for</span> <span style="color: #008080;">end</span> <span style="color: #008080;">if</span> <span style="color: #7060A8;">printf</span><span style="color: #0000FF;">(</span><span style="color: #000000;">1</span><span style="color: #0000FF;">,</span><span style="color: #008000;">"%-30s %-30s %-30s\n"</span><span style="color: #0000FF;">,{</span><span style="color: #008000;">"original"</span><span style="color: #0000FF;">,</span><span style="color: #008000;">"normal"</span><span style="color: #0000FF;">,</span><span style="color: #008000;">"natural"</span><span style="color: #0000FF;">})</span> <span style="color: #7060A8;">printf</span><span style="color: #0000FF;">(</span><span style="color: #000000;">1</span><span style="color: #0000FF;">,</span><span style="color: #008000;">"%-30s %-30s %-30s\n"</span><span style="color: #0000FF;">,{</span><span style="color: #008000;">"========"</span><span style="color: #0000FF;">,</span><span style="color: #008000;">"======"</span><span style="color: #0000FF;">,</span><span style="color: #008000;">"======="</span><span style="color: #0000FF;">})</span> <span style="color: #008080;">for</span> <span style="color: #000000;">k</span><span style="color: #0000FF;">=</span><span style="color: #000000;">1</span> <span style="color: #008080;">to</span> <span style="color: #7060A8;">length</span><span style="color: #0000FF;">(</span><span style="color: #000000;">tags</span><span style="color: #0000FF;">)</span> <span style="color: #008080;">do</span> <span style="color: #7060A8;">printf</span><span style="color: #0000FF;">(</span><span style="color: #000000;">1</span><span style="color: #0000FF;">,</span><span style="color: #008000;">"%-30s\|%-30s\|%-30s\n"</span><span style="color: #0000FF;">,{</span><span style="color: #000000;">s</span><span style="color: #0000FF;">[</span><span style="color: #000000;">k</span><span style="color: #0000FF;">],</span><span style="color: #000000;">n</span><span style="color: #0000FF;">[</span><span style="color: #000000;">k</span><span style="color: #0000FF;">],</span><span style="color: #000000;">s</span><span style="color: #0000FF;">[</span><span style="color: #000000;">tags</span><span style="color: #0000FF;">[</span><span style="color: #000000;">k</span><span style="color: #0000FF;">]]})</span> <span style="color: #008080;">end</span> <span style="color: #008080;">for</span> <span style="color: #7060A8;">puts</span><span style="color: #0000FF;">(</span><span style="color: #000000;">1</span><span style="color: #0000FF;">,</span><span style="color: #008000;">"\n"</span><span style="color: #0000FF;">)</span> <span style="color: #008080;">end</span> <span style="color: #008080;">for</span> <!--</syntaxhighlight>--> ~~for i=1 to length(tests) do~~ ~~s = tests[i]~~ ~~n = sort(s)~~ ~~t = repeat(0,length(s))~~ ~~for j=1 to length(s) do~~ ~~t[j] = normalise(s[j])~~ ~~end for~~ ~~tags = custom_sort(routine_id("natural"),tagset(length(s)))~~ ~~if i=3 then -- clean up the whitespace mess~~ ~~for j=1 to length(s) do~~ ~~s[j] = substitute_all(s[j],{"\r","\x0c","\x0b","\n","\t"},{"\\r","\\x0c","\\x0b","\\n","\\t"})~~ ~~n[j] = substitute_all(n[j],{"\r","\x0c","\x0b","\n","\t"},{"\\r","\\x0c","\\x0b","\\n","\\t"})~~ ~~end for~~ ~~end if~~ ~~printf(1,"%-30s %-30s %-30s\n",{"original","normal","natural"})~~ ~~printf(1,"%-30s %-30s %-30s\n",{"========","======","======="})~~ ~~for k=1 to length(tags) do~~ ~~printf(1,"%-30s\|%-30s\|%-30s\n",{s[k],n[k],s[tags[k]]})~~ ~~end for~~ ~~puts(1,"\n")~~ ~~end for</lang>~~ {{Out}} <pre> Line 3,282 ⟶ 4,318: =={{header\|PicoLisp}}== This parser takes care of features 1,2,3,4,5 and 8: <~~lang~~syntaxhighlight ~~PicoLisp~~lang="picolisp">(de parseNatural (Str) (clip (make Line 3,310 ⟶ 4,346: "ß" "ss" "ſ" "s" "ʒ" "s" ) ) (unless (member Word '(the it to)) (link Word) ) ) ) ) ) ) ) )</~~lang~~syntaxhighlight> Test: <~~lang~~syntaxhighlight ~~PicoLisp~~lang="picolisp">: (parseNatural " ^MThe abc123Defß ^I Ghi ") -> ("abc" 123 "defss" " " "ghi")</~~lang~~syntaxhighlight> Sorting is trivial then: <~~lang~~syntaxhighlight ~~PicoLisp~~lang="picolisp">(de naturalSort (Lst) (by parseNatural sort Lst) )</~~lang~~syntaxhighlight> Test: <~~lang~~syntaxhighlight ~~PicoLisp~~lang="picolisp">(de TestData "# Ignoring leading spaces" ("ignore leading spaces: 2-2" " ignore leading spaces: 2-1" " ignore leading spaces: 2+0" " ignore leading spaces: 2+1") Line 3,359 ⟶ 4,395: (pythonOut "Normally sorted :" (sort (copy X))) (pythonOut "Naturally sorted:" (naturalSort X)) (prinl) ) )</~~lang~~syntaxhighlight> Output: <pre># Ignoring leading spaces Line 3,510 ⟶ 4,546: =={{header\|PowerShell}}== <~~lang~~syntaxhighlight lang="powershell"> # six sorting $Discard = '^a ', '^an ', '^the ' Line 3,573 ⟶ 4,609: ) } </syntaxhighlight> ~~</lang>~~ {{Out}} <pre> Line 3,696 ⟶ 4,732: =={{header\|Python}}== All eight features: <~~lang~~syntaxhighlight lang="python"># -- coding: utf-8 -- # Not Python 3.x (Can't compare str and int) Line 3,850 ⟶ 4,886: print 'Text strings:'; pp(txt) print 'Normally sorted :'; print '\n'.join(sorted(txt)) print 'Naturally sorted:'; print '\n'.join(ns(txt))</~~lang~~syntaxhighlight> ===Sample Python output=== Line 4,000 ⟶ 5,036: beginning/end, easy to implement, but sounds wrong). <~~lang~~syntaxhighlight lang="racket"> #lang racket (define (natural-sort l) Line 4,022 ⟶ 5,058: (shuffle '("foo9.txt" "foo10.txt" "x9y99" "x9y100" "x10y0" "x z" "x y"))) ;; => '("foo9.txt" "foo10.txt" "x9y99" "x9y100" "x10y0" "x y" "x z") </syntaxhighlight> ~~</lang>~~ =={{header\|Raku}}== Line 4,040 ⟶ 5,076: different results depending on the order they are applied. <syntaxhighlight lang="raku" ~~perl6~~line># Sort groups of digits in number order. Sort by order of magnitude then lexically. sub naturally ($a) { $a.lc.subst(/(\d+)/, ->$/ {0~$0.chars.chr~$0},:g) ~"\x0"~$a } Line 4,142 ⟶ 5,178: say "\n" ~ '' x 40 ~ "\n"; }</~~lang~~syntaxhighlight> Sample output: Line 4,333 ⟶ 5,369: =={{header\|Ruby}}== Requirements 1,2,3 and 5 are met in one line of code: <~~lang~~syntaxhighlight lang="ruby">ar.sort_by{\|str\| str.downcase.gsub(/\Athe \|\Aa \|\Aan /, "").lstrip.gsub(/\s+/, " ")}</~~lang~~syntaxhighlight> Almost all of the code below is handling requirement 4. The problem is that Ruby will happily sort ["a",1] against ["a",2] or even ["b"], but it does not know how to handle [1, "a"] against ["a", 2] and raises an ArgumentError. The code below does not define a new sort method, it defines a new class which is sortable by the existing method (falling back on string comparison). <~~lang~~syntaxhighlight lang="ruby">class NatSortString include Comparable attr_reader :scrubbed, :ints_and_strings, :i_s_pattern Line 4,359 ⟶ 5,395: end </syntaxhighlight> ~~</lang>~~ Demo: <~~lang~~syntaxhighlight lang="ruby">tests = {"Ignoring leading spaces" => [ "ignore leading spaces: 2-2 ", " ignore leading spaces: 2-1 ", " ignore leading spaces: 2+0 ", " ignore leading spaces: 2+1 "], Line 4,379 ⟶ 5,415: puts [title,"--input--", ar, "--normal sort--", ar.sort, "--natural sort--", nat_sorts.sort, "\n"] end </syntaxhighlight> ~~</lang>~~ {{out}} <pre> Line 4,502 ⟶ 5,538: =={{header\|Scala}}== All 8: <~~lang~~syntaxhighlight ~~Scala~~lang="scala">object NaturalSorting { implicit object ArrayOrdering extends Ordering[Array[String]] { // 4 val INT = "([0-9]+)".r Line 4,554 ⟶ 5,590: okay }) }</~~lang~~syntaxhighlight> Output: <pre>PASS: 1 IGNORING LEADING SPACES Line 4,608 ⟶ 5,644: Tasks 1-5 are completed. <~~lang~~syntaxhighlight lang="scheme"> (import (scheme base) (scheme char) Line 4,700 ⟶ 5,736: ("foo1bar99baz4.txt" "foo2bar99baz3.txt" "foo4bar99baz1.txt" "foo3bar99baz2.txt") ("The Wind in the Willows" "The 40th step more" "The 39 steps" "Wanda"))) </syntaxhighlight> ~~</lang>~~ {{out}} Line 4,887 ⟶ 5,923: =={{header\|Sidef}}== {{trans\|Raku}} <~~lang~~syntaxhighlight lang="ruby">class String { # Sort groups of digits in number order. Sort by order of magnitude then lexically. -> naturally { self.lc.gsub(/(\d+)/, {\|s1\| "0" + s1.len.chr + s1 }) + "\x0" + self }; Line 4,912 ⟶ 5,948: self.gsub(re, {\|s1\| tr{s1} }); } }</~~lang~~syntaxhighlight> Tests: <~~lang~~syntaxhighlight lang="ruby">var tests = [ [ "Task 1a\nSort while ignoring leading spaces.", Line 4,988 ⟶ 6,024: say "\n#{'' * 40}\n"; }</~~lang~~syntaxhighlight> {{out}} Line 5,181 ⟶ 6,217: Note also that Tcl supports case-insensitive sorting and “treat digit sequences as numbers” as native sorting options. (The latter is particularly useful for handling filenames.) <~~lang~~syntaxhighlight lang="tcl">package require Tcl 8.5 proc sortWithCollationKey {keyBuilder list} { Line 5,239 ⟶ 6,275: foo100bar10baz0.txt foo1000bar99baz10.txt foo1000bar99baz9.txt}</~~lang~~syntaxhighlight> Output: <pre> Line 5,332 ⟶ 6,368: foo1000bar99baz9.txt foo1000bar99baz10.txt </pre> =={{header\|Wren}}== {{trans\|Kotlin}} {{libheader\|Wren-pattern}} {{libheader\|Wren-str}} {{libheader\|Wren-fmt}} {{libheader\|Wren-sort}} <syntaxhighlight lang="wren">import "./pattern" for Pattern import "./str" for Str import "./fmt" for Fmt import "./sort" for Cmp, Sort var p2 = Pattern.new("+2 ") var p3 = Pattern.new("/s") // any whitespace character var p5 = Pattern.new("+1/d") /** Only covers ISO-8859-1 accented characters plus (for consistency) Ÿ / var ucAccented = ["ÀÁÂÃÄÅ", "Ç", "ÈÉÊË", "ÌÍÎÏ", "Ñ", "ÒÓÔÕÖØ", "ÙÚÛÜ", "ÝŸ"] var lcAccented = ["àáâãäå", "ç", "èéêë", "ìíîï", "ñ", "òóôõöø", "ùúûü", "ýÿ"] var ucNormal = "ACEINOUY" var lcNormal = "aceinouy" /* Only the commoner ligatures / var ucLigatures = "ÆĲŒ" var lcLigatures = "æĳœ" var ucSeparated = ["AE", "IJ", "OE"] var lcSeparated = ["ae", "ij", "oe"] /* Miscellaneous replacements / var miscLetters = "ßſʒ" var miscReplacements = ["ss", "s", "s"] /* Displays strings including whitespace as if the latter were literal characters / var toDisplayString = Fn.new { \|s\| var whitespace = ["\t", "\n", "\v", "\f", "\r"] var whitespace2 = ["\\t", "\\n", "\\v", "\\f", "\\r"] for (i in 0..4) s = s.replace(whitespace[i], whitespace2[i]) return s } /* Ignoring leading space(s) / var selector1 = Fn.new { \|s\| s.trimStart(" ") } /* Ignoring multiple adjacent spaces i.e. condensing to a single space / var selector2 = Fn.new { \|s\| p2.replaceAll(s, " ") } /* Equivalent whitespace characters (equivalent to a space say) / var selector3 = Fn.new { \|s\| p3.replaceAll(s, " ") } /* Case independent sort / var selector4 = Fn.new { \|s\| Str.lower(s) } /* Numeric fields as numerics (deals with up to 20 digits) / var selector5 = Fn.new { \|s\| for (m in p5.findAll(s)) { var ix = m.index var repl = Fmt.swrite("$020d", Num.fromString(m.text)) s = s[0...ix] + repl + s[ix + m.length..-1] } return s } /* Title sort / var selector6 = Fn.new { \|s\| var t = Str.lower(s) if (t.startsWith("the ")) return s.skip(4).join() if (t.startsWith("an ")) return s.skip(3).join() if (t.startsWith("a ")) return s.skip(2).join() return s } /* Equivalent accented characters (and case) / var selector7 = Fn.new { \|s\| var sb = "" for (c in s) { var outer = false var i = 0 for (ucs in ucAccented) { if (ucs.contains(c)) { sb = sb + ucNormal[i] outer = true break } i = i + 1 } if (outer) continue i = 0 for (lcs in lcAccented) { if (lcs.contains(c)) { sb = sb + lcNormal[i] outer = true break } i = i + 1 } if (outer) continue sb = sb + c } return Str.lower(sb) } /* Separated ligatures / var selector8 = Fn.new { \|s\| var i = 0 for (c in ucLigatures) { s = s.replace(c, ucSeparated[i]) i = i + 1 } i = 0 for (c in lcLigatures) { s = s.replace(c, lcSeparated[i]) i = i + 1 } return s } /* Character replacements */ var selector9 = Fn.new { \|s\| var i = 0 for (c in miscLetters) { s = s.replace(c, miscReplacements[i]) i = i + 1 } return s } System.print("The 9 string lists, sorted 'naturally':\n") var selectors = [selector1, selector2, selector3, selector4, selector5, selector6, selector7, selector8, selector9] var ss = [ [ "ignore leading spaces: 2-2", " ignore leading spaces: 2-1", " ignore leading spaces: 2+0", " ignore leading spaces: 2+1" ], [ "ignore m.a.s spaces: 2-2", "ignore m.a.s spaces: 2-1", "ignore m.a.s spaces: 2+0", "ignore m.a.s spaces: 2+1" ], [ "Equiv. spaces: 3-3", "Equiv.\rspaces: 3-2", "Equiv.\vspaces: 3-1", "Equiv.\fspaces: 3+0", "Equiv.\nspaces: 3+1", "Equiv.\tspaces: 3+2" ], [ "cASE INDEPENENT: 3-2", "caSE INDEPENENT: 3-1", "casE INDEPENENT: 3+0", "case INDEPENENT: 3+1" ], [ "foo100bar99baz0.txt", "foo100bar10baz0.txt", "foo1000bar99baz10.txt", "foo1000bar99baz9.txt" ], [ "The Wind in the Willows", "The 40th step more", "The 39 steps", "Wanda" ], [ "Equiv. ý accents: 2-2", "Equiv. Ý accents: 2-1", "Equiv. y accents: 2+0", "Equiv. Y accents: 2+1" ], [ "Ĳ ligatured ij", "no ligature" ], [ "Start with an ʒ: 2-2", "Start with an ſ: 2-1", "Start with an ß: 2+0", "Start with an s: 2+1" ] ] for (i in 0..8) { var sel = selectors[i] var cmp = Fn.new { \|s1, s2\| Cmp.string.call(sel.call(s1), sel.call(s2)) } Sort.insertion(ss[i], cmp) System.print(ss[i].map { \|s\| "'%(s)'" }.join("\n")) if (i < 8) System.print() }</syntaxhighlight> {{out}} <pre> The 9 string lists, sorted 'naturally': ' ignore leading spaces: 2+0' ' ignore leading spaces: 2+1' ' ignore leading spaces: 2-1' 'ignore leading spaces: 2-2' 'ignore m.a.s spaces: 2+0' 'ignore m.a.s spaces: 2+1' 'ignore m.a.s spaces: 2-1' 'ignore m.a.s spaces: 2-2' 'Equiv. spaces: 3+0' 'Equiv. spaces: 3+1' 'Equiv. spaces: 3+2' 'Equiv. spaces: 3-1' spaces: 3-2' 'Equiv. spaces: 3-3' 'casE INDEPENENT: 3+0' 'case INDEPENENT: 3+1' 'caSE INDEPENENT: 3-1' 'cASE INDEPENENT: 3-2' 'foo100bar10baz0.txt' 'foo100bar99baz0.txt' 'foo1000bar99baz9.txt' 'foo1000bar99baz10.txt' 'The 39 steps' 'The 40th step more' 'Wanda' 'The Wind in the Willows' 'Equiv. y accents: 2+0' 'Equiv. Y accents: 2+1' 'Equiv. Ý accents: 2-1' 'Equiv. ý accents: 2-2' 'Ĳ ligatured ij' 'no ligature' 'Start with an s: 2+1' 'Start with an ſ: 2-1' 'Start with an ʒ: 2-2' 'Start with an ß: 2+0' </pre> Line 5,338 ⟶ 6,619: First, a tag-mangled-fields sort, use tags to pull from the original list in sorted order function: <~~lang~~syntaxhighlight lang="zkl">fcn dsuSort(x,orig){ // decorate-sort-undecorate sort x.enumerate().sort(fcn([(_,a)],[(_,b)]){a<b}) .apply('wrap([(n,_)]){orig[n]}); }</~~lang~~syntaxhighlight> Now, mangle a copy of the original list to "normalize" it to the task requirements and sort. <~~lang~~syntaxhighlight lang="zkl"># Ignoring leading spaces ts1:=T("ignore leading spaces: 2-2", " ignore leading spaces: 2-1", " ignore leading spaces: 2+0", " ignore leading spaces: 2+1"); dsuSort(ts1.apply("strip"),ts1).println();</~~lang~~syntaxhighlight> {{out}} <pre> Line 5,355 ⟶ 6,636: </pre> <~~lang~~syntaxhighlight lang="zkl"># Ignoring multiple adjacent spaces (m.a.s) ts2:=T("ignore m.a.s spaces: 2-2", "ignore m.a.s spaces: 2-1", "ignore m.a.s spaces: 2+0", "ignore m.a.s spaces: 2+1"); dsuSort(ts2.apply('-(" ")),ts2).println();</~~lang~~syntaxhighlight> {{out}} <pre> Line 5,367 ⟶ 6,648: </pre> <~~lang~~syntaxhighlight lang="zkl"># Equivalent whitespace characters ts3:=T("Equiv. spaces: 3-3", "Equiv.\rspaces: 3-2", "Equiv.\x0cspaces: 3-1", "Equiv.\x0bspaces: 3+0", "Equiv.\nspaces: 3+1", "Equiv.\tspaces: 3+2"); dsuSort(ts3.apply('-.fp1("\n\r\t\f\b\x0b ")),ts3).println();</~~lang~~syntaxhighlight> {{out}} <pre> Line 5,381 ⟶ 6,662: </pre> <~~lang~~syntaxhighlight lang="zkl"># Case Indepenent sort ts4:=T("cASE INDEPENENT: 3-2", "caSE INDEPENENT: 3-1", "casE INDEPENENT: 3+0", "case INDEPENENT: 3+1"); dsuSort(ts4.apply("toLower"),ts4).println();</~~lang~~syntaxhighlight> {{out}} <pre> Line 5,394 ⟶ 6,675: And the nasty one: multiple numeric fields. Each key is blown apart into a list of ints, where each int is consecutive numeric characters. To compare a pair of fields (ie two lists of ints), walk the two in parallel, stopping when a difference is found. Ugly but it works. Another tactic would be to normalize each field to the same length, concat the fields and use that as the mangled key. But that might have issues if the number of fields differ, the former doesn't care. <~~lang~~syntaxhighlight lang="zkl"># Numeric fields as numerics fcn fieldize(s){ s.apply(fcn(c){"1234567890".holds(c) and c or "."}).split(".") Line 5,408 ⟶ 6,689: list.apply(fieldize).enumerate().sort(fcmp) .apply('wrap([(n,_)]){list[n]}); }</~~lang~~syntaxhighlight> <~~lang~~syntaxhighlight lang="zkl">ts5:=T("foo100bar99baz0.txt", "foo100bar10baz0.txt", "foo1000bar99baz10.txt", "foo1000bar99baz9.txt"); fsort(ts5).println(); x:=T("x9y99","foo10.txt","x10y0","foo9.txt","x9y100"); fsort(x).println();</~~lang~~syntaxhighlight> {{out}} <pre>