Talk:Ordered words: Difference between revisions

From Rosetta Code
Content added Content deleted
Line 187: Line 187:
**********************************************************************/
**********************************************************************/
</lang>
</lang>
As for me, I like to program for people with a limited line length.
As for me, I like to program for people (readingI with a limited line length.
:now I see what you mean by your recurring elimination of dead code
:now I see what you mean by your recurring elimination of dead code
which is actually avoiding empty lines created by the formatter and
:which is actually avoiding empty lines created by the formatter and
makes copy/paste harder for me.
:makes copy/paste harder for me.
:And finally, can you provide your benchmark results for the strict comparison?
And finally, can you provide your benchmark results for the strict comparison case?
::--[[User:Walterpachl|Walterpachl]] 06:11, 16 July 2012 (UTC)
::--[[User:Walterpachl|Walterpachl]] 06:11, 16 July 2012 (UTC)

Revision as of 06:13, 16 July 2012

Lexicographical order

Note: This task should probably be modified to be aware (and tolerant) of lexicographical order. --Michael Mol 22:15, 9 November 2010 (UTC)

I think it's probably not important on that particular dictionary; it's all lower case and well-behaved. –Donal Fellows 22:37, 9 November 2010 (UTC)
Yes, I thought I had programmed (REXX) for an unlexicographical ordered dictionary, but it erronously gave me the correct result (because the list of words was in order). Yes, that does sound strange. The program has been fixed. -- Gerard Schildberger 06:28, 14 July 2012 (UTC)
Indeed the task description should state either
any lexicographical ordered dictionary such as ... (instead of 'this disctionary')
tasks should not be geared to one input, should they?
or (better yet??) drop the requirement for it to be ordered.

Being a sloppy reader I used an unordered input for my test with ooRexx and detected/reported the bug (which wasn't one according to the current task description) Anyway: I think the program was improved and works now also for an unordered dictionary and (slightly modified) for ooRexx. --Walterpachl 08:05, 14 July 2012 (UTC)

longest word length

By "longest word length", do you mean "equal to the length of the longest ordered word"? --Michael Mol 22:16, 9 November 2010 (UTC)

Yes. I find the ordered words, find the maximum length of any ordered word, then find all ordered words of that maximum length. --Paddy3118 13:09, 10 November 2010 (UTC)
For contrast, the Tcl code does it in a single pass. (It happened to be more natural to express it that way.) The result is the same though; the words in the result list are such that they are all of the same length, all ordered words, and there is no other ordered word (in the originating dictionary) such that its length is greater than the length of any result word. (Also, every ordered word of that length in the originating dictionary is present in the result.) I can't be bothered to write that mathematically. :-) –Donal Fellows 16:24, 10 November 2010 (UTC)

Knotty problem

I noticed three of the examples written by User talk:Ulrie, C++, Perl and Perl 6; don't have the word knotty mentioned. I don't know if this is due to a faulty copy of the dictionary or a faulty algorithm. (Knotty is still in the dictionary). I will give it a day then I think I should mark them incorrect? --Paddy3118 12:05, 27 November 2010 (UTC)

Or could their algorithms not cope with the double-t in knotty? (But note, The task description does state that the word 'abbey' with a double-b is to be considered ordered. --Paddy3118 16:58, 27 November 2010 (UTC)
Is there a public-domain (or, GFDL1.2-compatible, at least) wordlist I could host and have dictionary-dependent tasks draw from? URL preferred; Places I'm looking have weird or tricky copyright restricitons. --Michael Mol 18:18, 27 November 2010 (UTC)
This? --Paddy3118 06:02, 28 November 2010 (UTC)
In the Perl code it was a faulty algorithm that hardwired the number of expected results!?! I fixed the Perl and Perl 6, but someone else can fix the C++. --TimToady 18:32, 27 November 2010 (UTC)
Thanks TimToady. --Paddy3118 05:42, 28 November 2010 (UTC)

Output

Generally, tasks which are about writing programs (not describing language features) make requirements of the program’s output, not the form of the example itself. If it is required the output of the program to be included in the example,

  • then please clarify the task wording.
  • why is it required?

A few output examples are useful in checking one's own work when adding implementations, but it seems silly to insist that every single program come with its output. I would like to understand the motivation here. —Kevin Reid 18:36, 1 February 2011 (UTC)

Hi Kevin, on other occasions, having output available has helped find and debug problems where output has been given and it was easier to spot something missing from the output and then look more closely at the implementation. I, and others do try and not insist on output if output will always be large. An easy case for insisting on output is something like Conway's Game of Life were output (of some sort) has been given. Come to think of it, it was Wireworld where odd output lead to fixing of an example.
In other cases it aids consistency. If output is needed from one, to allow others to complete the task, then asking for output from all is consistent. The requst for output in this task should be straight-forward to do.
Different tasks get patrolled and "policed" to different degrees, but I'm sure their is no underlying malice to a request to add output. --Paddy3118 19:03, 1 February 2011 (UTC)

E and Factor

Someone marked the E and Factor examples as incomplete/incorrect because they print no output. Yet both examples seem to have print statements (println(" ".rjoin(best.snapshot())) or filter-longest-words [ print ] each. Did someone add the print statements and forget to remove the incomplete/incorrect marks? --Kernigh 20:45, 9 March 2011 (UTC)

If you compare those to others you should see that the output of those programs are not given, even though the task asks for it. One might argue that running the programs would give the display, but a quick reading of other examples should show that the output is required in addition to the program source. --Paddy3118 21:06, 9 March 2011 (UTC)

Not sure about E, but I installed Factor and confirmed that the Factor example really outputs the answer. The task asks for the program to "display" the answer, but the task does not require anyone to post the answer on this page. --Kernigh 23:18, 9 March 2011 (UTC)

Hi Kernigh, you seem to be ignoring both comments and the other examples - most of which give their output on the task page. (not this talk page), ... ...How should the task page be altered to ensure most people put the results on the page? Most of the current results comply; the task states that it should be displayed and displaying it elsewhere would not be of much help. --Paddy3118 00:25, 10 March 2011 (UTC)

I am guessing that the program would display it on standard output, or in a GUI window, or something like that. That would allow the person who runs the program to see the answer. Or am I misunderstanding? --Kernigh 01:09, 10 March 2011 (UTC)

We want it shown on this page. Just copy the output and paste it in a pre block under the example. --Mwn3d 01:34, 10 March 2011 (UTC)
I don't think that the 'on this page' comment is needed in the task description; most people would imply that. --Paddy3118 17:49, 10 March 2011 (UTC)

Now that we fixed the task description, I agree that the output must be on the page. As of now, every example on the page has output on the page, except for E. Can someone run the E program and post the output? --Kernigh 02:24, 11 March 2011 (UTC)

A bug (which was not really a bug) in Rexx solution

I suggest to change the last two lines of the Rexx program to <lang rexx> say words(i.m) 'words found (of length' m")"; say /*show & tell time*/

 do n=1 for words(i.m);  say word(i.m,n);  end       /*list the words. */

</lang> I added abcdefgh to the unixdict.txt and got 0 words of length 8. And please add to i.= for compatibility.

--Walterpachl 18:26, 13 July 2012 (UTC)

Well, yes, you used the wrong index. If I changed the code to what you want, the REXX program will no longer work correctly nor solve the task at hand. -- Gerard Schildberger 21:18, 13 July 2012 (UTC)
And also, these comments shouldn't be placed under the E and Factor section. -- Gerard Schildberger 21:18, 13 July 2012 (UTC)
Also, note that this program entry was entered for classic REXX, and is not meant for other object-oriented languages. -- Gerard Schildberger 21:18, 13 July 2012 (UTC)


Please try your program with this test file:

abcdef
abcde
abcd  

Expected result: 1 word of length 5.

My expectation is: 1 words of length 6. [I didn't think it would be necessary to add a pluralizer to this program, but in hindsight, I should have.] -- Gerard Schildberger 06:42, 14 July 2012 (UTC)

Suggested code change under wrong section. sorry.

"and is not meant for other object-oriented languages"
How can a program be meant to be...
When corrected as shown above it runs perfectly well with ooRexx

--Walterpachl 05:34, 14 July 2012 (UTC)

I do not have an ooREXX (other than ROO!) available to check whether it runs under an object-oriented language (nor do I have an interest of doing so as I have often repeated), so I can't answer your query regarding ooRexx. -- Gerard Schildberger 06:42, 14 July 2012 (UTC)

Looking at the program again I noticed two typos EBCDICI should be EBCDIC and then should be than (2 places) I dare not change your program.

I had programmed the REXX example to expect a lexicographical ordered word list. I corrected the error. -- Gerard Schildberger 06:42, 14 July 2012 (UTC)
Thank you
You don't have to have ooRexx. You COULD write i.= instead of i.= to avoid the incompatibility
(and the use of $#@ in symbols - which aren't used in this program)
and you didn't correct the 'thens' (is less then -> is less than) :-(

--Walterpachl 07:49, 14 July 2012 (UTC)

Yes, I corrected then to than several updates ago. You may have to do a RELOAD or REFRESH to see the updated version. -- Gerard Schildberger 07:57, 14 July 2012 (UTC)

I think you have missed this one:
In ASCII, A is less then a, while in EBCDIC, it's the other way around.

--Walterpachl 08:15, 14 July 2012 (UTC)

Hell's bells, I must've read that line a half dozen times, and I still missed it! It's been corrected, finally. -- Gerard Schildberger 08:21, 14 July 2012 (UTC)

And I'd appreciate i.= instead of i= (as you adapted some other programs)

Second thoughts about the task description: your mentioning of uppercase versus lowercase and
letters of the English alphabet. Things get tricky with German Umlaute (äöüÄÖÜß) not to speak of Scandinavian alphabets
of which I know nothing except that they are different.

One could ask for a list of characters showing the desired sort order (ol) and turn a>=b into pos(a,ol)>=pos(b,ol).

--Walterpachl 08:43, 14 July 2012 (UTC) --Walterpachl 08:43, 14 July 2012 (UTC)

The REXX language was always English-centric (well, Latin letter centric, as least). The UPPER statement, function, and option was just designed for the Latin alphabet, and when porting REXX to be used with other alphabets becomes problematic and a subject worthy of a full discussion. [Note that some REXX support other languages for error messages. The names of the weekdays and months and various time suffixes are still in English, however, as well as the options of all functions.] The order of sorting characters is in itself, a field of study. The order in which various hardware sorts characters (or put in order) is also of interest. Some sort packages and other computer software allow for specifying (for instance) how to sort/order numeric digits: ASCII has them below letters, BCDIC and EBCDIC has them above. Also, ASCII has the uppercase Latin letters lower in a list, lowercase letters are higher. BCDIC and EBCDIC is the other way around. The German essett character (ß) has two problems, it has no uppercase equivalent [other then SS], and in the German alphabet, the ß is normally listed after z. [This is due to the way it's pronounced.] Using the translate and/or upper instruction/option/function to uppercase (verb) non-Latin letters quickly degenerates into one heck of a mess; how would a computer language know a prioi how (or what) to uppercase (or to lowercase)? It's a question that I'm not equipped to address. -- Gerard Schildberger 17:51, 14 July 2012 (UTC)
actually I know all of that. Whereas the Rexx language is English-centric one can still process German input

albeit not as simply as English. For example to uppercase a string. <lang rexx> uppercase: Procedure Parse Arg s a2z='abcdefghijklmnopqrstuvwxyz' r=translate(s,translate(a2z)'ÄÖÜ',a2z'äöü') r=changestr('ß',r,'SS') Return r </lang> I know that lowercase ain't that easy! By the way, did the Romans have lowercase letters (you mention that in Roman Number decoding)? --Walterpachl 19:01, 14 July 2012 (UTC)

Yes, the Romans had lowercase letters. A papyri written in Latin from Herculaneum dating before 79 CE was found using lowercase letters a, b, d, h, p, and r (as per Wikipedia). At one point, when using lowercase Roman numerals, "they" started using a lowercase j instead of an i so the reader could more easily distinguish the end of the Roman numeral. Think what would ci, cii, vi (Latin ablative singular of vis), di (Latin irregular masculine plural of deus [diety]), lii, mi (as in the 7-note muscial scale, from the 1st verse of the Latin hymm Ut queant laxis), xi (name of a Greek letter) ... would look like in a Roman text. Boy, this paragraph is a hodge-podge. It needs a kitchen sink thrown in. Maybe a tune-a-fish. Perhaps some (Roman) scholar could add a few words here --- and I don't mean any long-dead scholars. -- Gerard Schildberger 20:09, 14 July 2012 (UTC)


By the way, the above REXX example could be re-written without the need for a PROCEDURE, a Latin alphabet literal string, or temporary variables (along with some appropriate comments explaining the three steps performed):

<lang rexx>uppercase: return translate(changestr("ß",translate(arg(1),'ÄÖÜ',"äöü"),'SS'))</lang> Of course, if one wanted to break up the complex instruction, then a local (temporary) variable would be needed, along with a PROCEDURE statement. But the Latin alphabet literal isn't needed. I wonder how those German characters would translate on various codepages. -- Gerard Schildberger 20:09, 14 July 2012 (UTC)

thanks for the Roman explanation
not only could, but can as you did. But there's nothing wrong with Procedure.
It makes it readable without comments, I think-
other code pages: AWFUL (even on my DOS Window or command prompt.

--Walterpachl 03:58, 15 July 2012 (UTC)

thanks for the line. I'd settle for <lang rexx> uppercase: Procedure Parse Arg a

 a=translate(arg(1),'ÄÖÜ',"äöü")     /* translate lowercase umlaute */
 a=changestr("ß",a,'SS')             /* replace ß with SS           */
 return translate(a)                 /* translate lowercase letters */   

</lang>

--Walterpachl 19:26, 15 July 2012 (UTC)

The above REXX program could be shortened to: <lang rexx>uppercase: Procedure

 a=translate(arg(1),'ÄÖÜ',"äöü")     /* translate lowercase umlaute */
 a=changestr("ß",a,'SS')             /* replace ß with SS           */
 return translate(a)                 /* translate lowercase letters */</lang>

(which removes a line of dead code.)


As for using Procedure, it does come with some overhead.
I ran a benchmark with the original uppercase subroutine, and the second version, along with the above version and the one-liner version.

The original version was faster than version two (but only slightly), and the one-liner was about four times faster. An in-line version was twice as fast as the one-liner.

Now, in this day and age of fast computers, some people don't care about speed that much. I have two REXX applications, one that processes over 708,000 records (actually, words in an English word list which needs to be uppercased while doing a search), and that amount of invocations addes up. Another application reads over 58 million records, and one can see that inefficient subroutines can really slow up the works. I'd like to think that Rosetta Code is a place to show well-written routines that are applicable to any size/amount of use; one can never know where people will use such a routine (or the scale of use). A REXX procedure has to build an environment which has its own NUMERIC DIGITS, FORM, and FUZZ, its own timing (elapsed and resetted timers), local REXX variables (RC, SIGL, RESULT), and whatnot. In the above case, one local variable (a) also has to be DROPed. There is always something to be paid (as far as overhead). Once the uppercase becomes a one-liner, then it becomes available to be used as an in-line algorithm, bypassing the overhead of calling a subroutine (whether a procedure or not). If anyone wants to see the benchmark REXX program, I can post it here. -- Gerard Schildberger 02:04, 16 July 2012 (UTC)

Then how about <lang rexx> uppercase:

return translate(changestr("ß",translate(arg(1),'ÄÖÜ',"äöü"),'SS'))

/**********************************************************************

 a=translate(arg(1),'ÄÖÜ',"äöü")     /* translate lowercase umlaute */
 a=changestr("ß",a,'SS')             /* replace ß with SS           */
 return translate(a)                 /* translate lowercase letters */
                                                                                                                                            • /

</lang> As for me, I like to program for people (readingI with a limited line length.

now I see what you mean by your recurring elimination of dead code
which is actually avoiding empty lines created by the formatter and
makes copy/paste harder for me.

And finally, can you provide your benchmark results for the strict comparison case?

--Walterpachl 06:11, 16 July 2012 (UTC)