Talk:Natural sorting: Difference between revisions

 
(13 intermediate revisions by 6 users not shown)
Line 23:
::: That's what <code>[http://www.tcl.tk/man/tcl8.5/TclCmd/lsort.htm#M6 lsort -dictionary]</code> does. (Well, it also handles case a bit differently, treating it as a second-order difference rather than the usual first-order difference.) It was added because it is the mode which “puts filenames in the order that the user expects”, making it much easier to produce a nice GUI to use. (FWIW, I'm not doing the ligature normalization stuff because that's a lot of work to do right and it's an area where Tcl needs more work; I forget the Request For Enhancement number. :-)) –[[User:Dkf|Donal Fellows]] 11:17, 2 May 2011 (UTC)
:::: Thanks. --[[User:Paddy3118|Paddy3118]] 23:26, 2 May 2011 (UTC)
 
I'm not understanding this algorithm. Copied from the Python results,
<pre>
Naturally sorted:
['Equiv.\x0bspaces: 3+0',
'Equiv.\nspaces: 3+1',
'Equiv.\tspaces: 3+2',
'Equiv.\x0cspaces: 3-1',
'Equiv.\rspaces: 3-2',
'Equiv. spaces: 3-3']
</pre>
Each of these strings ends in a sub-field of digits, actually a single digit, and the sub-field before that is either "+" or "-". Starting with the right-most subfields, they will be ordered by their integer values, 0, 1, 2, or 3. Then for cases with the same value, "+" comes before "-" by usual string comparison. That's enough to discriminate these strings. The remaining left-most sub-fields don't need to be considered. So, shouldn't the correct order be,
<pre>
'Equiv.\x0bspaces: 3+0',
'Equiv.\nspaces: 3+1',
'Equiv.\x0cspaces: 3-1',
'Equiv.\tspaces: 3+2',
'Equiv.\rspaces: 3-2',
'Equiv. spaces: 3-3'
</pre>
I'm missing something &mdash;[[User:Sonia|Sonia]] ([[User talk:Sonia|talk]]) 22:32, 25 July 2014 (UTC)
 
:(I see this. I won't answer now as I am (pleasantly), hung over after the works summer party yesterday). (Hic) --[[User:Paddy3118|Paddy3118]] ([[User talk:Paddy3118|talk]]) 07:27, 26 July 2014 (UTC)
 
::Looking at this more, it seems examples are interpreting "rightmost" as "leftmost", or perhaps "most significant" as "least significant." Maybe the task description could be clarified? &mdash;[[User:Sonia|Sonia]] ([[User talk:Sonia|talk]]) 21:40, 29 July 2014 (UTC)
 
I agree with [[User:Sonia|Sonia]], the Python treatment of numeric fields does not agree with the algorithm above, which makes the above example come out wrong. Consider this example:
<pre>
# TEST Numeric fields as numerics
Text strings:
['foo3bar99baz2.txt',
'foo2bar99baz3.txt',
'foo1bar99baz4.txt',
'foo4bar99baz1.txt']
Normally sorted :
['foo1bar99baz4.txt',
'foo2bar99baz3.txt',
'foo3bar99baz2.txt',
'foo4bar99baz1.txt']
Naturally sorted:
['foo1bar99baz4.txt',
'foo2bar99baz3.txt',
'foo3bar99baz2.txt',
'foo4bar99baz1.txt']
</pre>
 
I would expect the natural sort to give the reverse order, based on evaluating the groups from the right. -- [[User:Peter|Peter]] 00:20, 17 Feb 2017 (UTC)
 
Task point four is self-contradictory. It says "with the rightmost fields being the most significant" and later on "x9y99 before x9y100, before x10y0", which is the leftmost fields being the most significant. [[User:Petelomax|Pete Lomax]] ([[User talk:Petelomax|talk]]) 19:18, 7 April 2017 (UTC)
 
==Criticisms please==
Line 35 ⟶ 84:
:I may be blinded by the only way I can think of to do it in Java, but it seems to me that the task is a super-complex version of [[Sort using a custom comparator]]. I'm not sure that linking to a particular example's output is the best way to define the task. We ran into problems with that in [[Multisplit]]. Beyond all of that it does seem like a lot of work. Do umlauts count as accents? Should they be sorted as their expansion like scharfes? What about circumflexes (circumfleces? circumfli?)? There is a whole list of marks [[wp:Diacritic|here]]. Some of them represent condensations of letters (like ss condensed to a scharfe), some represent accents, and some represent different pronunciations (like a cedilla in French or tilde in Spanish). As you can see, this can get pretty complicated quickly. --[[User:Mwn3d|Mwn3d]] 15:25, 27 April 2011 (UTC)
:: I'll only answer the point about accents at the moment. Unicode is a pig for me too, so you only ''need'' to handle the accents mentioned in the particular test for that section if you don't have a convenient unicode class to make it more generic. You might implement parts via expandable means like using a table for example - expand the table to handle more than what task examples require. --[[User:Paddy3118|Paddy3118]] 18:01, 27 April 2011 (UTC)
 
:To me it seems that it is not so much the task, it is the test cases. Can't you simply provide a file with lines to be sorted?--[[User:Abu|Abu]] 08:05, 1 September 2011 (UTC)
::Hi Abu, I have extracted the strings used for testing from the original Python and created a sample inputs section. --[[User:Paddy3118|Paddy3118]] 05:19, 2 September 2011 (UTC)
:::Great! Thanks Paddy3118, I give it a try --[[User:Abu|Abu]] 19:06, 3 September 2011 (UTC)
 
=== Lexical ordering system ===
Line 49 ⟶ 102:
 
[[User:Markhobley|Markhobley]] 19:46, 27 April 2011 (UTC)
 
== Ʒ ==
 
Wasn't 'Ʒ' derived from 'Z' rather than 'S'? And IIRC 'ß' may be interpreted as either 'ss' or 'sz', but I admit that may be incorrect since I don't speak German. --[[User:Ledrug|Ledrug]] 23:39, 31 July 2011 (UTC)
:All I can find is [[wp:Ʒ]] and [[wp:ß]]. --[[User:Paddy3118|Paddy3118]] 04:57, 2 August 2011 (UTC)
7,803

edits