Talk:Word wrap: Difference between revisions

→‎REXX timings: added some comments.
m (→‎REXX Timings: corrected a misspelling. -- ~~~~)
(→‎REXX timings: added some comments.)
 
(39 intermediate revisions by 2 users not shown)
Line 45:
::: Oh yes. I just took the question as asking "in a perfect world ...". --[[User:Paddy3118|Paddy3118]] ([[User talk:Paddy3118|talk]]) 10:24, 21 August 2013 (UTC)
 
== REXX Timingstimings ==
 
I created a file containing one line of about 1000000 characters containing words of 1 to 90 characters, randomly distributed such as
Line 95:
End
</lang>
<lang rexx>/*REXX pgm ww0 reads a file and displays it (with word wrap to the screen). */
Call time 'R'
parse arg iFID width /*get optional arguments from CL.*/
Line 119:
o: Return lineout(oid,arg(1))</lang>
<lang rexx>/*REXX pgm ww1 reads a file and displays it (with word wrap to the screen). */
Call time 'R'
parse arg iFID width justify _ . /*get optional CL args.*/
Line 172:
 
<lang rexx>
/* REXX ww2 ***************************************************************
* 20.08.2013 Walter Pachl "my way"
**********************************************************************/
Line 206:
:: Translated version 2 to PL/I. Since PL/I has a limit of 32767 for character strings I had to cut the input into junks of 20000 bytes and add extra reads. Output is identical to REXX. --[[User:Walterpachl|Walterpachl]] ([[User talk:Walterpachl|talk]]) 19:38, 21 August 2013 (UTC)
 
The last shown REXX program has a problem with classic REXX: '''fn''' is an unknown function. &nbsp; Also, that REXX program only reads the first record of the file (does exactly one read) instead of doing a loop until done. &nbsp; It would make more sense to exclude the time to read the file as well as bypassing the writing of the records to the file, as the I/O would be unvarying and slightly dependantdependent on other I/O activity in the system, not to mention caching. &nbsp; Whoever does the first reading pays for all the I/O, the 2nd reading would be from cache. &nbsp; I would benchmark for a paragraph of text as the task says, not a million bytes. &nbsp; Scale up the number of executions to make the timings meaningful. &nbsp; Also, I took the liberty of breaking up the listing of the REXX programs into separate sections, perhaps it would be a good idea to label/identify them, not to mention to bring version 0 and 1 up to date. -- [[User:Gerard Schildberger|Gerard Schildberger]] ([[User talk:Gerard Schildberger|talk]]) 21:01, 21 August 2013 (UTC)
 
-----
 
I seemed to found a discrepancy. &nbsp; For an input of:
<pre>
────────── Computer programming laws ──────────
The Primal Scenario -or- Basic Datum of Experience:
∙ Systems in general work poorly or not at all.
∙ Nothing complicated works.
∙ Complicated systems seldom exceed 5% efficiency.
∙ There is always a fly in the ointment.
</pre>
The REXX versions 0 and 1 produce:
<pre>
────────── Computer programming laws
────────── The Primal Scenario -or-
Basic Datum of Experience: ∙ Systems in
general work poorly or not at all. ∙
Nothing complicated works. ∙ Complicated
systems seldom exceed 5% efficiency. ∙
There is always a fly in the ointment.
</pre>
The REXX version 2 (modified for my timings) produces:
<pre>
────────── Computer programming
laws ────────── The Primal Scenario
-or- Basic Datum of Experience: ∙
Systems in general work poorly or not at
all. ∙ Nothing complicated works.
∙ Complicated systems seldom exceed 5%
efficiency. ∙ There is always a fly
in the ointment.
</pre>
It seems that the REXX version 2 isn't handling leading or imbedded blanks. -- [[User:Gerard Schildberger|Gerard Schildberger]] ([[User talk:Gerard Schildberger|talk]]) 21:40, 21 August 2013 (UTC)
 
:: correct. pls try to live without that "feature". for testing, pls replace fn(fid) with "long"--[[User:Walterpachl|Walterpachl]] ([[User talk:Walterpachl|talk]]) 21:56, 21 August 2013 (UTC)
 
::: I don't understand. &nbsp; That isn't a "feature" (failure by design?), that is a bug. &nbsp; The output (the word wrapping) isn't what I expect, although it might be the design goal of the coder of the REXX version 2 program to not ignore those blanks. -- [[User:Gerard Schildberger|Gerard Schildberger]] ([[User talk:Gerard Schildberger|talk]]) 22:06, 21 August 2013 (UTC)
 
:: version 0 and 1 remove them and reduce multiple blanks to one blank. --[[User:Walterpachl|Walterpachl]] ([[User talk:Walterpachl|talk]]) 21:59, 21 August 2013 (UTC)
 
::: What about version zero ??? &nbsp; REXX version 0 and 1 already removes leading and multiple embedded blanks (as well as trailing blanks). -- [[User:Gerard Schildberger|Gerard Schildberger]] ([[User talk:Gerard Schildberger|talk]]) 22:06, 21 August 2013 (UTC)
 
:::: that's what I tried to say. the '1' got lost.--[[User:Walterpachl|Walterpachl]] ([[User talk:Walterpachl|talk]]) 22:15, 21 August 2013 (UTC)
 
adding s=space(s) to ww2 should fix that!?! --[[User:Walterpachl|Walterpachl]] ([[User talk:Walterpachl|talk]]) 22:49, 21 August 2013 (UTC)
 
: Yes, it should. &nbsp; The proof is in the tasting of the pudding. -- [[User:Gerard Schildberger|Gerard Schildberger]] ([[User talk:Gerard Schildberger|talk]]) 22:52, 21 August 2013 (UTC)
 
-----
 
This just performed with:
::* the &nbsp; newer &nbsp; REXX version 1 &nbsp; using a ''stemmed array'' instead of a ''char string''
::* the updated REXX version 2 &nbsp; using &nbsp; '''s=space(s)'''
::* using an appropriate values of repetitions to elongate the elapsed time
::* using modified programs to suppress the writing/display of the output
::* bypassing the timing of the reading of the input file
::* both REXX programs producing the exact same output
::* using many trials and variations (under Windows/XP)
::* using the REXX Regina 3.7 interpreter
::* using my coal-fired steam-driven Frankenbox &nbsp; (built last century)
the timings are:
:::::* REXX version 1 &nbsp; &nbsp; 2.49 seconds
:::::* REXX version 2 &nbsp; &nbsp; 2.45 seconds
:::::* REXX version 2 &nbsp; &nbsp; 2.29 seconds &nbsp; (optimized with exact comparisons)
:::::* REXX version 2 &nbsp; &nbsp; 1.27 seconds &nbsp; (optimized with &nbsp; '''lastpos''' &nbsp; BIF)
:::::* REXX version 2 &nbsp; &nbsp; 1.06 seconds &nbsp; (optimized with &nbsp; '''parse''' &nbsp; statement)
:::::* REXX version 2 &nbsp; &nbsp; 1.05 seconds &nbsp; (optimized by making the '''ow''' subroutine non-destructive)
:::::* REXX version 2 &nbsp; &nbsp; 1.01 seconds &nbsp; (optimized by making the '''ow''' subroutine in-line)
:::::* REXX version 2 &nbsp; &nbsp; 0.96 seconds &nbsp; (optimized the inner DO loop, eliminated an &nbsp; '''if''' &nbsp; statement)
The &nbsp; '''lastpos''' &nbsp; BIF was used to find the last blank (within a field of '''W''' characters instead of searching for the last blank character by character).
<br>Further optimization was done using &nbsp; '''parse''' &nbsp; instead of &nbsp; '''substr''' &nbsp; and other such thingys. -- [[User:Gerard Schildberger|Gerard Schildberger]] ([[User talk:Gerard Schildberger|talk]]) 23:39, 21 August 2013 (UTC)
 
<br>I really have to stop optimizing that REXX program, I'm running out of coal. -- [[User:Gerard Schildberger|Gerard Schildberger]] ([[User talk:Gerard Schildberger|talk]]) 00:11, 22 August 2013 (UTC)
<br>Well, I ran out of coal ... can't stoke the fires anymore. -- [[User:Gerard Schildberger|Gerard Schildberger]] ([[User talk:Gerard Schildberger|talk]]) 02:31, 22 August 2013 (UTC)
 
:: I refrained from using lastpos since the (classic?) Rexx on the host does not have it. Is version 2 that you refer to "my way" modified as noted above? Are your final versions 1 & 2 available somewhere? I had to look up vestigual (limited English) - it should have been vestigial:-) --[[User:Walterpachl|Walterpachl]] ([[User talk:Walterpachl|talk]]) 06:35, 22 August 2013 (UTC)
 
::: Yes, the REXX version 2 (as mentioned above) is a cumulative modification of the original, that is, the 3rd optimization is the changed 2nd optimization. -- [[User:Gerard Schildberger|Gerard Schildberger]] ([[User talk:Gerard Schildberger|talk]]) 21:19, 22 August 2013 (UTC)
 
::: (about the misspelling of ''vestigial''): &nbsp; I did a <strike> quite </strike> quick web check and found many hits on ''vestigual'', but I saw the answers to a question and thought that was the correct spelling. &nbsp; At least, I'm not alone in misspelling that word: &nbsp; Vestigial Vsetigial Vesitgial Veetigial Veatigial Vedtigial Vewtigial Vextigial Vesrigial Vesgigial Vesyigial Vestogial Vestugial Vestkgial Vestirial Vestinial Vestitial Vestihial Vestibial Vestifial Vestigoal Vestigual Vestigkal Vestigisl Vestigizl Vestigiql Vestigiap Vestigiam Vestigiak --- that word must hold some kind of record in the number of ways to misspell a word. &nbsp; But I got almost all of the letters right. -- [[User:Gerard Schildberger|Gerard Schildberger]] ([[User talk:Gerard Schildberger|talk]]) 21:19, 22 August 2013 (UTC)
 
::::: There is no need to strike misspellings, just correct them. &nbsp; The reason I did a strike-out for the "quite" misspelling is that the misspelling was discussed later, so I just couldn't correct it without making the comment invalid. -- [[User:Gerard Schildberger|Gerard Schildberger]] ([[User talk:Gerard Schildberger|talk]]) 07:22, 23 August 2013 (UTC)
 
:: In my IBM time I learned that American colleagues are less spelling-conscious than we Europeans (or Austrians). It's a matter of emphasis on spelling in school. Did you do quite a web check or a quiet web check --[[User:Walterpachl|Walterpachl]] ([[User talk:Walterpachl|talk]]) 05:36, 23 August 2013 (UTC)
 
::: Rosetta Code isn't the place to publish such observations. &nbsp; I know a snub when I hear (or read) one. &nbsp; Best to just remove such comments, even from a discussion page. &nbsp; Even if it were true, its still not an unbiased opinion or possible not even a valid observation (too limited and narrow), and it might appear that it could be based on a limited sampling group (and by one person at that). &nbsp; Not everybody has a spell-checker available. &nbsp; Without spell-checkers, typos are more common. &nbsp; -- [[User:Gerard Schildberger|Gerard Schildberger]] ([[User talk:Gerard Schildberger|talk]]) 03:24, 17 August 2018 (UTC)
::: Are you sure about the &nbsp; '''lastpos''' &nbsp; BIF not being available in (your) host's version of REXX? &nbsp; It's been around in REXX at least since 1984 (according to a VM System Product Interpreter Reference Summary), long before it was ported to MVS (or whatever it's being called now). &nbsp; Which host (and release) are you using? &nbsp; I didn't post any of REXX version 2 programs since you signed your name to it, and I didn't want to publish various versions of it, as it would appear that you were the author, and it didn't seem worth all the bother to include disclaimers and whatnot, and I had so many versions. &nbsp; I was just fooling around and was squeezing blood from a turnip trying to get more performance out of the program. &nbsp; I probably could get more performance out of it, but I got tired shoveling all that coal, and I had to add more code to handle a special case of long words. -- [[User:Gerard Schildberger|Gerard Schildberger]] ([[User talk:Gerard Schildberger|talk]]) 07:28, 22 August 2013 (UTC)
 
:::: I'd suggest to leave the header intact and add change lines such as * yyyyddmm GS this and that. But I really don't care. I put my names into my programs because I like to be known. Your programs are easily recognizable by @ and $ :-) AND your unique indentation rules! --[[User:Walterpachl|Walterpachl]] ([[User talk:Walterpachl|talk]]) 05:36, 23 August 2013 (UTC)
 
::::: I guess some people like to be known. &nbsp; However, Rosetta Code has a policy against vanity badges and strongly discouraged, and most have been removed. &nbsp; &nbsp; People can look at the ''history''' file and see who performed the entering of the computer program and/or the changes. &nbsp; I have learned later (after I did the tuning and timings) that timings are also discouraged, especially between languages. &nbsp; This whole discussion on the REXX timings should probably be deleted. &nbsp;
 
 
<br>Here is the latest revision &nbsp; (with not much commenting, but better than nothing):
<lang rexx>/*rexx*/ parse arg ifid w /*get required options from CL */
/*{timer}*/ parse arg ifid w times . /*a good try is 10k ──► 100k. */
/*{timer}*/ if times=='' then times=1 /*use a default if omitted. */
s=''
do while lines(ifid)\==0
s=s linein(ifid)
end /*DO while*/
s=space(s) /*remove superfluous blanks. */
say 'length of input string:' length(s) /*display the length of input. */
say
call time 'Reset' /*reset the REXX elapsed timer.*/
/*{timer}*/ do jj=1 for times /*the repetitions thingy. */
x=s' ' /*var X is destroyed (below).*/
 
do while x\=='' /*1 chunk at a time.*/
i=lastpos(' ',x,w+1) /*look for blank <W.*/
if i==0 then do /*...no blank found.*/
call o left(x,w)
parse var x =(w) x
end
else do /*... a blank found.*/
call o left(x,i)
parse var x =(i) +1 x
end
end /*DO while*/
/*{timer}*/ end /*jj*/
say
say format(time('Elapsed'),,2) "seconds for" times 'times.'
call lineout ifid
exit
 
/*{timer}*/ o: if jj==times then say arg(1); return /*show last text*/
o: say arg(1); return</lang>
Here is the input file:
<pre>
────────── Computer programming laws ──────────
The Primal Scenario -or- Basic Datum of Experience:
∙ Systems in general work poorly or not at all.
∙ Nothing complicated works.
∙ Complicated systems seldom exceed 5% efficiency.
∙ There is always a fly in the ointment.
</pre>
-- [[User:Gerard Schildberger|Gerard Schildberger]] ([[User talk:Gerard Schildberger|talk]]) 07:28, 22 August 2013 (UTC)
 
: lastpos: no I'm not sure and I have alas no longer a host (pun intended). I wonder where I missed it. Thanks for massaging my program. I shall study it later and test my 1MB file. Your input, by the way, is not exactly a "paragraph", is it? --[[User:Walterpachl|Walterpachl]] ([[User talk:Walterpachl|talk]]) 07:44, 22 August 2013 (UTC)
 
:: As mentioned earlier, it was bigger than a paragraph; I hated to cut it down (as the file above). &nbsp; I was using a 100x200 character wide console window and I needed something with some heft to it. &nbsp; Plus, with almost all of us (readers of Rosetta Code) being computer programmers of one sort or another, I thought a by-product would be some people perusing the text and reflecting on the wisdom of the laws ... if not only in a Murphy's Law sort of way. -- [[User:Gerard Schildberger|Gerard Schildberger]] ([[User talk:Gerard Schildberger|talk]]) 07:58, 22 August 2013 (UTC)
 
My results from testing your program:
<pre>
with i=lastpos(' ',x,w+1) /*look for blank <W.*/
rexx gs text.txt 72 1000000 -> 7.09 seconds for 1000000 times.
 
with Do i=w+1 to 1 by -1
If substr(x,i,1)=' ' Then Leave
End
rexx gs2 text.txt 72 1000000 -> 10.88 seconds for 1000000 times.
</pre>
 
: I got a 45% improvement (using Regina REXX), you got a 35% improvement (using ooRexx) --- Are my assumptions correct? &nbsp; How many engines does your laptop have? &nbsp; How much memory? &nbsp; What other processes are running? &nbsp; When I run benchmarks, the computer is running pretty much naked (as possible). &nbsp; No matter what the improvement (35% or 45%), that's nothing to sneeze at. -- [[User:Gerard Schildberger|Gerard Schildberger]] ([[User talk:Gerard Schildberger|talk]]) 20:29, 22 August 2013 (UTC)
 
:: Nobody sneezes. Can't answer your questions. Will use lastpos from now on. thanks. --[[User:Walterpachl|Walterpachl]] ([[User talk:Walterpachl|talk]]) 05:36, 23 August 2013 (UTC)
 
Unfortunately I cannot verify a similar performance difference with my 1MB file. --[[User:Walterpachl|Walterpachl]] ([[User talk:Walterpachl|talk]]) 19:58, 22 August 2013 (UTC)
 
With o: Return (to avoid output to screen)
<pre>
rexx gs long.txt 72 -> 2.31 seconds for 1 times
rexx gs2 long.txt 72 -> 2.36 seconds for 1 times
</pre>
--[[User:Walterpachl|Walterpachl]] ([[User talk:Walterpachl|talk]]) 20:11, 22 August 2013 (UTC)
 
: With a one megabyte file, you may be measuring the effects of paging in your laptop (as for elapsed time) as well as competition/interference with other processes. &nbsp; That was one reason why I used a multiplier for the '''do''' loop instead of increasing the amount of text read. &nbsp; The drawback is that (the multiplier) increases the locality of reference, and I don't know enough about the Microsoft Windows paging sub-system to know how much of an effect that is. -- [[User:Gerard Schildberger|Gerard Schildberger]] ([[User talk:Gerard Schildberger|talk]]) 20:36, 22 August 2013 (UTC)
 
:: Let's leave it at that. I shall be using lastpos in the future. thanks. Nevertheless version 2 seems to be undoubtedly better than !?! --[[User:Walterpachl|Walterpachl]] ([[User talk:Walterpachl|talk]]) 05:40, 23 August 2013 (UTC)
 
::: I wouldn't agree that your version is &nbsp; ''undoubtedly'' &nbsp; better. &nbsp; I do have a few doubts. &nbsp; Version one version doesn't erase existing files, it also has more options (the ''kind'' of text justifications, giving the user a choice), it has as lot more documentation (comments) to explain what is happening and why, has error checking and error messages to handle bad command line options, checks for file-not-found and file-is-empty conditions, etc. &nbsp; I assume you must be using a different or unknown metric(s) for ''undoubtedly better". &nbsp; Rosetta Code is not the place to crow about one's version being better than another, '''unless''' you wrote both versions and you're pointing out the value &nbsp; (however one judges ''value'') &nbsp; of one program entry versus another. &nbsp; -- [[User:Gerard Schildberger|Gerard Schildberger]] ([[User talk:Gerard Schildberger|talk]]) 03:24, 17 August 2018 (UTC)