Talk:Natural sorting: Difference between revisions

← Older edit

Talk:Natural sorting (view source)

Revision as of 19:19, 7 April 2017

3,228 bytes added , 7 years ago

→‎Numeric sub-fields

Petelomax

7,803

edits

Revision as of 07:48, 14 July 2011 (view source) rosettacode>Markhobley m (→‎Lexical ordering system) ← Older edit		Latest revision as of 19:19, 7 April 2017 (view source) Petelomax (talk \| contribs) (→‎Numeric sub-fields)
(13 intermediate revisions by 6 users not shown)
Line 23: ::: That's what <code>[http://www.tcl.tk/man/tcl8.5/TclCmd/lsort.htm#M6 lsort -dictionary]</code> does. (Well, it also handles case a bit differently, treating it as a second-order difference rather than the usual first-order difference.) It was added because it is the mode which “puts filenames in the order that the user expects”, making it much easier to produce a nice GUI to use. (FWIW, I'm not doing the ligature normalization stuff because that's a lot of work to do right and it's an area where Tcl needs more work; I forget the Request For Enhancement number. :-)) –[[User:Dkf\|Donal Fellows]] 11:17, 2 May 2011 (UTC) :::: Thanks. --[[User:Paddy3118\|Paddy3118]] 23:26, 2 May 2011 (UTC) I'm not understanding this algorithm. Copied from the Python results, <pre> Naturally sorted: ['Equiv.\x0bspaces: 3+0', 'Equiv.\nspaces: 3+1', 'Equiv.\tspaces: 3+2', 'Equiv.\x0cspaces: 3-1', 'Equiv.\rspaces: 3-2', 'Equiv. spaces: 3-3'] </pre> Each of these strings ends in a sub-field of digits, actually a single digit, and the sub-field before that is either "+" or "-". Starting with the right-most subfields, they will be ordered by their integer values, 0, 1, 2, or 3. Then for cases with the same value, "+" comes before "-" by usual string comparison. That's enough to discriminate these strings. The remaining left-most sub-fields don't need to be considered. So, shouldn't the correct order be, <pre> 'Equiv.\x0bspaces: 3+0', 'Equiv.\nspaces: 3+1', 'Equiv.\x0cspaces: 3-1', 'Equiv.\tspaces: 3+2', 'Equiv.\rspaces: 3-2', 'Equiv. spaces: 3-3' </pre> I'm missing something —[[User:Sonia\|Sonia]] ([[User talk:Sonia\|talk]]) 22:32, 25 July 2014 (UTC) :(I see this. I won't answer now as I am (pleasantly), hung over after the works summer party yesterday). (Hic) --[[User:Paddy3118\|Paddy3118]] ([[User talk:Paddy3118\|talk]]) 07:27, 26 July 2014 (UTC) ::Looking at this more, it seems examples are interpreting "rightmost" as "leftmost", or perhaps "most significant" as "least significant." Maybe the task description could be clarified? —[[User:Sonia\|Sonia]] ([[User talk:Sonia\|talk]]) 21:40, 29 July 2014 (UTC) I agree with [[User:Sonia\|Sonia]], the Python treatment of numeric fields does not agree with the algorithm above, which makes the above example come out wrong. Consider this example: <pre> # TEST Numeric fields as numerics Text strings: ['foo3bar99baz2.txt', 'foo2bar99baz3.txt', 'foo1bar99baz4.txt', 'foo4bar99baz1.txt'] Normally sorted : ['foo1bar99baz4.txt', 'foo2bar99baz3.txt', 'foo3bar99baz2.txt', 'foo4bar99baz1.txt'] Naturally sorted: ['foo1bar99baz4.txt', 'foo2bar99baz3.txt', 'foo3bar99baz2.txt', 'foo4bar99baz1.txt'] </pre> I would expect the natural sort to give the reverse order, based on evaluating the groups from the right. -- [[User:Peter\|Peter]] 00:20, 17 Feb 2017 (UTC) Task point four is self-contradictory. It says "with the rightmost fields being the most significant" and later on "x9y99 before x9y100, before x10y0", which is the leftmost fields being the most significant. [[User:Petelomax\|Pete Lomax]] ([[User talk:Petelomax\|talk]]) 19:18, 7 April 2017 (UTC) ==Criticisms please== Line 35 ⟶ 84: :I may be blinded by the only way I can think of to do it in Java, but it seems to me that the task is a super-complex version of [[Sort using a custom comparator]]. I'm not sure that linking to a particular example's output is the best way to define the task. We ran into problems with that in [[Multisplit]]. Beyond all of that it does seem like a lot of work. Do umlauts count as accents? Should they be sorted as their expansion like scharfes? What about circumflexes (circumfleces? circumfli?)? There is a whole list of marks [[wp:Diacritic\|here]]. Some of them represent condensations of letters (like ss condensed to a scharfe), some represent accents, and some represent different pronunciations (like a cedilla in French or tilde in Spanish). As you can see, this can get pretty complicated quickly. --[[User:Mwn3d\|Mwn3d]] 15:25, 27 April 2011 (UTC) :: I'll only answer the point about accents at the moment. Unicode is a pig for me too, so you only ''need'' to handle the accents mentioned in the particular test for that section if you don't have a convenient unicode class to make it more generic. You might implement parts via expandable means like using a table for example - expand the table to handle more than what task examples require. --[[User:Paddy3118\|Paddy3118]] 18:01, 27 April 2011 (UTC) :To me it seems that it is not so much the task, it is the test cases. Can't you simply provide a file with lines to be sorted?--[[User:Abu\|Abu]] 08:05, 1 September 2011 (UTC) ::Hi Abu, I have extracted the strings used for testing from the original Python and created a sample inputs section. --[[User:Paddy3118\|Paddy3118]] 05:19, 2 September 2011 (UTC) :::Great! Thanks Paddy3118, I give it a try --[[User:Abu\|Abu]] 19:06, 3 September 2011 (UTC) === Lexical ordering system === Line 49 ⟶ 102: [[User:Markhobley\|Markhobley]] 19:46, 27 April 2011 (UTC) == Ʒ == Wasn't 'Ʒ' derived from 'Z' rather than 'S'? And IIRC 'ß' may be interpreted as either 'ss' or 'sz', but I admit that may be incorrect since I don't speak German. --[[User:Ledrug\|Ledrug]] 23:39, 31 July 2011 (UTC) :All I can find is [[wp:Ʒ]] and [[wp:ß]]. --[[User:Paddy3118\|Paddy3118]] 04:57, 2 August 2011 (UTC)