Talk:Sparkline in unicode: Difference between revisions

 
(13 intermediate revisions by 4 users not shown)
Line 19:
* <code>0 999 4000 4999 7000 7999</code> detects the half-width bug and some smaller errors (see Tcl). Output should have three heights; the half-width bug looks like: ▁▂▅▅▇█
 
: '''Addendum:''' ''the second test case assumes that each of the 8 heights should represent 1/8<sup>th</sup> of the range, as closely as possible. Not everyone agrees. See [[#Counterpoint]] and [[#Deeper_root_of_the_.27bug.27_.3F|Deeper root of the bug?]] below for discussion.''
:: A very helpful intervention and discussion, and I agree absolutely about the first test example.
::
:: Perhaps our interpretation of the '''second''' test example depends on some unclarified assumptions about the optimal width (and alignment) of the bins ?
:: The Haskell '''Statistics.Sample.Histogram''' library, for example, returns the following allocation of the sample <code>0 999 4000 4999 7000 7999</code> to 8 evenly sized bins:
:: <code>[1,1,0,0,2,0,1,1]</code>
:: which would, I think, correspond to 5 different sparkline heights, unless I am confusing myself.
:: The set of lower bounds suggested by '''Statistics.Sample.Histogram''' for a division of this sample between 8 bins is:
:: <code>[-571.3571428571429,571.3571428571429,1714.0714285714287,2856.7857142857147,3999.5,5142.214285714286,6284.928571428572,7427.642857142857]</code>
:: The assumption they are making is that any given sample is likely to be drawn from a slightly larger range of possible sample values, and that some margin can usefully be allowed.
:: The margin which that library adopts is <code>margin = (hi - lo) / (fromIntegral (intBins - 1) * 2))</code>
:: (yielding fractionally larger bins and a total range that starts a little below the minimum observed value, and ends a little above the maximum observed value)
:: Arguably reasonable for us to do something comparable ? [[User:Hout|Hout]] ([[User talk:Hout|talk]]) 12:26, 26 February 2019 (UTC)
::: PS the dependence of edge cases on mutable assumptions (e.g. the relationship between the range of the sample and the range of possible/graphed values) may be underscored by the result given by the '''Mathematica 11 Histogram function''', which (if we specify only a target number of bins) allocates the same sample as follows (different pattern again, but still, I think, 5 sparkline levels):
:::: <code>Histogram[{0, 999, 4000, 4999, 7000, 7999}, {"Raw", 8}] --> </code>
:::: [2, 0, 0, 1, 1, 0, 1, 1]
::::
:::: And similarly the '''R language hist() function''' expression <code>hist(c(0, 999, 4000, 4999, 7000, 7999), breaks=8)</code>
:::: Returns a distribution of 5 [2, 0, 0, 1, 1, 0, 1, 1], again using 5 (rather than 3) of 8 available bins.
:::: The breaks which it derives from that data set can be listed:
:::: <code> > histinfo<-hist(c(0, 999, 4000, 4999, 7000, 7999), breaks=8)</code>
:::: <code> > histinfo</code>
:::: <code>$breaks</code>
:::: <code>[1] 0 1000 2000 3000 4000 5000 6000 7000 8000</code>
::::[[User:Hout|Hout]] ([[User talk:Hout|talk]]) 13:33, 26 February 2019 (UTC)
 
::::: "fractionally larger bins" is the Tcl approach I discussed in the section above. It's fine but requires careful selection of the denominator. Too big, and the bins are wider than they need to be (Tcl's mistake); too small, and it can be erased by fp errors.
 
::::: edit: the relationship between the value of <code>breaks</code> and the number of bins in R is completely opaque and does not match the documentation. For example, <code>hist(0:9, breaks=x)</code> gives 2 bins for x=3; 5 bins for x=4,5,6; 9 bins for x=7.
 
::::: edit2: I should clarify that Haskell's solution exhibits the half-width bug. I don't believe this is defensible. Much better choices of denominator are available. --Oopsiedaisy, 26 February 2019
 
 
 
;sparktest.pl
Line 78 ⟶ 47:
* Elixir: ▁▂▅▅▇█
* Groovy: one-wide; didn't run
* Haskell: looks(both likeversions half-width bug; didn't runnormalized)
* Java: one-wide; didn't run
* Javascript: ▁▂▅▅▇█(normalized)
* jq: one-wide and neglects to check bounds: ▁▃▷►
* Nim: Python translation
Line 86 ⟶ 55:
* fixed! <s>Perl 6</s>: ▁▁▇█
* PicoLisp: ▁▂▅▅▇█
* Python: (both versions normalized)
* (half fixed) Python: ▁▁▇█
* Ruby: ▁▁▇█
* Rust: thread 'main' panicked at 'attempt to subtract with overflow', sl.rust:8:40
Line 100 ⟶ 69:
 
::Thanks Oopsiedaisy. I started the task off with an initial buggy Python solution. Now fixed and with examples extended to show your problem cases. Thanks again. --[[User:Paddy3118|Paddy3118]] ([[User talk:Paddy3118|talk]]) 19:35, 24 February 2019 (UTC)
 
 
====Counterpoint====
:: A very helpful intervention and discussion, and I agree absolutely about the first test example.
::
:: Perhaps our interpretation of the '''second''' test example depends on some unclarified assumptions about the optimal width (and alignment) of the bins ?
:: The Haskell '''Statistics.Sample.Histogram''' library, for example, returns the following allocation of the sample <code>0 999 4000 4999 7000 7999</code> to 8 evenly sized bins:
:: <code>[1,1,0,0,2,0,1,1]</code>
:: which would, I think, correspond to 5 different sparkline heights, unless I am confusing myself.
:: The set of lower bounds suggested by '''Statistics.Sample.Histogram''' for a division of this sample between 8 bins is:
:: <code>[-571.3571428571429,571.3571428571429,1714.0714285714287,2856.7857142857147,3999.5,5142.214285714286,6284.928571428572,7427.642857142857]</code>
:: The assumption they are making is that any given sample is likely to be drawn from a slightly larger range of possible sample values, and that some margin can usefully be allowed.
:: The margin which that library adopts is <code>margin = (hi - lo) / (fromIntegral (intBins - 1) * 2))</code>
:: (yielding fractionally larger bins and a total range that starts a little below the minimum observed value, and ends a little above the maximum observed value)
:: Arguably reasonable for us to do something comparable ? [[User:Hout|Hout]] ([[User talk:Hout|talk]]) 12:26, 26 February 2019 (UTC)
::: PS the dependence of edge cases on mutable assumptions (e.g. the relationship between the range of the sample and the range of possible/graphed values) may be underscored by the result given by the '''Mathematica 11 Histogram function''', which (if we specify only a target number of bins) allocates the same sample as follows (different pattern again, but still, I think, 5 sparkline levels):
:::: <code>Histogram[{0, 999, 4000, 4999, 7000, 7999}, {"Raw", 8}] --> </code>
:::: [2, 0, 0, 1, 1, 0, 1, 1]
::::
:::: And similarly the '''R language hist() function''' expression <code>hist(c(0, 999, 4000, 4999, 7000, 7999), breaks=8)</code>
:::: Returns a distribution of 5 [2, 0, 0, 1, 1, 0, 1, 1], again using 5 (rather than 3) of 8 available bins.
:::: The breaks which it derives from that data set can be listed:
:::: <code> > histinfo<-hist(c(0, 999, 4000, 4999, 7000, 7999), breaks=8)</code>
:::: <code> > histinfo</code>
:::: <code>$breaks</code>
:::: <code>[1] 0 1000 2000 3000 4000 5000 6000 7000 8000</code>
::::[[User:Hout|Hout]] ([[User talk:Hout|talk]]) 13:33, 26 February 2019 (UTC)
 
::::: "fractionally larger bins" is the Tcl approach I discussed in the section above. It's fine but requires careful selection of the denominator. Too big, and the bins are wider than they need to be (Tcl's mistake); too small, and it can be erased by fp errors.
 
::::: edit: the relationship between the value of <code>breaks</code> and the number of bins in R is completely opaque and does not match the documentation. For example, <code>hist(0:9, breaks=x)</code> gives 2 bins for x=3; 5 bins for x=4,5,6; 9 bins for x=7.
 
::::: edit2: I should clarify that Haskell's solution exhibits the half-width bug. I don't believe this is defensible. Much better choices of denominator are available. --Oopsiedaisy, 26 February 2019
 
==Deeper root of the 'bug' ?==
Line 213 ⟶ 215:
> ''derivation of <code>fencepost_size</code>, the task description leaves this entirely open''
 
The task description is not a contract or rigorous specification. Properties like usability, fidelity, proportionality are implied. You're free to disagree but you won't convince me or most other people.
The sole purpose of <code>fencepost_size</code> is to prevent the formula from returning 8. In IEEE754 64-bit floats (used by JavaScript, Perl, and many others) it can be about 15 orders of magnitude smaller than <code>max-min</code> before it fails. In 32-bit floats, about 7. Using larger values provides no benefits and (eventual) visible drawbacks, therefore larger values should not be used.
 
ForThe comparison,sole thepurpose half-widthof bug<code>fencepost_size</code> is to prevent the equivalentformula from returning 8. In IEEE754 64-bit floats (used by JavaScript, Perl, and many others) it can be about 15 orders of usingmagnitude smaller than <code>fencepost_size==(max-min)/8</code> before it fails. In other32-bit wordsfloats, it'sabout 147 orders of magnitude. Using larger thanvalues what'sprovides needed,no benefits and it causes(eventual) visible deformationdrawbacks, oftherefore thelarger graphvalues should be avoided.
 
For comparison, the half-width bug is the rough equivalent of <code>fencepost_size==(max-min)/8</code>. It's '''14 orders of magnitude''' larger than needed, and it causes visible deformation of the graph.
The description is not a contract or rigorous specification. Properties like usability, fidelity, proportionality are implied. You're free to disagree but you won't convince me or most other people.
 
> ''as the over-lexicalising tone of your '''"absolutely definitely not [the half-width bug]"''' XYZ... inadvertently confirms :-)''
Line 272 ⟶ 274:
 
::::Ha - I noticed it was quietly removed in the last bug fix, so I assume it was unnecessary/meaningless. --[[User:Petelomax|Pete Lomax]] ([[User talk:Petelomax|talk]]) 05:46, 26 February 2019 (UTC)
 
==A Point Is That Which Has No Part==
Due to fence posts it is claimed above that it is impossible to distribute xMax and xMin into uniform buckets. I don't consider a day worth living if the impossible has not been achieved before breakfast. The title above is accepted widely as the first line of The Elements. I would also draw attention to the Platonic view that a point is a monad with position attached because it is often coded thus, and the Aristotelian view that a point is not the thing but produces the thing by motion because this rules out some of the interpretations above. Sticking with The Elements the second line is 'A line is a breadth less length'. I define the line as the length xMax-xMin. I shall use nB as the number of buckets (8 in this tasks case). I want to distribute these buckets equally along the line so I find the width of each bucket using division as dX=(xMax-xMin)/nB. I now need a function which allocates each of the numbers in the test sequence to a bucket. I number the buckets placed along the line from 0 to nB-1 and assign each number x to bucket n=floor((x-xMin)/nB). For the first test xMin(=1)->0/8 (=0). xMax(=8)->56/8 (=7). For the second test xMin(=0.5)->0/3 (=0) and xMax(=6.5)->24/3 (=8). So the impossible to devise a scheme including xMin and xMax is achieved and today is a day worth living (assuming a late breakfast or a different time zone).--[[User:Nigel Galloway|Nigel Galloway]] ([[User talk:Nigel Galloway|talk]]) 12:19, 1 March 2019 (UTC)
<br>
For those who want to look at the logic more closely the proof of the above comes (better if I could draw a picture) by finding the center (say C or 0) of the line and distributing the buckets either side of C. The above comes by finding the center of which bucket is closest to x from C. If we adjust the scale so that C is 0 then xMin must be negative and xMax must be positive. Some of the confusion above can now be seen to be trying to answer the question is 0 positive or negative. This question has been answered it is neither positive or negative nor either odd or even. What does the suggested algorithm do? C(=4.5)->28/7 (=4) so positive which is usually a good answer.--[[User:Nigel Galloway|Nigel Galloway]] ([[User talk:Nigel Galloway|talk]]) 12:36, 1 March 2019 (UTC)
: Does the point turn on whether we are filling classical or quantum buckets ? [[User:Hout|Hout]] ([[User talk:Hout|talk]]) 13:03, 1 March 2019 (UTC)
:: Not for this task. Classical is fine. The Aristotelian view is sufficient to explain now (as a point whose motion creates time), so I can say "now if I want to explain the line at the edge of a shadow ..."--[[User:Nigel Galloway|Nigel Galloway]] ([[User talk:Nigel Galloway|talk]]) 13:33, 1 March 2019 (UTC)
::: Thank you for the solution and explanation. (And that's a relief – I was worried that we couldn't machine a fence-width finer than the Planck length, and that a special case at one end or other of the scale was going to be inescapable) [[User:Hout|Hout]] ([[User talk:Hout|talk]]) 13:49, 1 March 2019 (UTC)
 
==Intervals and binning==
Using the [[wp:Interval_(mathematics)#Notations_for_intervals|notation]] where round parenthisis, <code>( or )</code> is used to '''exclude''' an endpoint and square parenthisis, <code>[ or ]</code> is used to '''include''' it.
 
We can start with a range of numbers: <code>[min, max]</code> all the numbers fall in the interval.
 
We split into several contiguos bins, and for Python at least, there is the tradition of including the minimum of ranges and excluding the maximum. This naively leads to
:<code>[min<sub>i</sub>, max<sub>i</sub>)</code> bin intervals for the <code>i<sup>th</sup></code> bin<br>
:Where <code>max<sub>i</sub> == min<sub>i + 1</sub></code>.<br>
 
Numbers falling on any ''interior'' boundary will automatically counted in the ''higher'' bin, but what happens to the highest number? It is '''excluded'''.
 
To fix this you could:
# Make the upper range of the last bin inclusive.
# Or add an extra bin at the high end for this one, maximal, value.
<br>
--[[User:Paddy3118|Paddy3118]] ([[User talk:Paddy3118|talk]]) 08:26, 12 March 2019 (UTC)
:Where is this extra bin coming from? Surely there are 8 bins. Only 8 bins. Not 7 and never 9. Oh, and no fence-posts--[[User:Nigel Galloway|Nigel Galloway]] ([[User talk:Nigel Galloway|talk]]) 14:34, 14 March 2019 (UTC)
 
::I wrote to highlight the boundary issues. I wouldn't chose the second option. --[[User:Paddy3118|Paddy3118]] ([[User talk:Paddy3118|talk]]) 18:30, 14 March 2019 (UTC)
Anonymous user