Talk:Sparkline in unicode: Difference between revisions

→‎Most of these are buggy: this section is intended to concisely explain the bug and its detection concisely. Moved discussion about whether it's a bug at all to bottom of section; added link from the test case.
(→‎Most of these are buggy: this section is intended to concisely explain the bug and its detection concisely. Moved discussion about whether it's a bug at all to bottom of section; added link from the test case.)
Line 19:
* <code>0 999 4000 4999 7000 7999</code> detects the half-width bug and some smaller errors (see Tcl). Output should have three heights; the half-width bug looks like: ▁▂▅▅▇█
 
: '''Addendum:''' ''the second test case assumes that each of the 8 heights should represent 1/8<sup>th</sup> of the range, as closely as possible. Not everyone agrees. See [[#counterpoint]] and [[#Deeper_root_of_the_.27bug.27_.3F|Deeper root of the bug?]] below for discussion.''
:: A very helpful intervention and discussion, and I agree absolutely about the first test example.
::
:: Perhaps our interpretation of the '''second''' test example depends on some unclarified assumptions about the optimal width (and alignment) of the bins ?
:: The Haskell '''Statistics.Sample.Histogram''' library, for example, returns the following allocation of the sample <code>0 999 4000 4999 7000 7999</code> to 8 evenly sized bins:
:: <code>[1,1,0,0,2,0,1,1]</code>
:: which would, I think, correspond to 5 different sparkline heights, unless I am confusing myself.
:: The set of lower bounds suggested by '''Statistics.Sample.Histogram''' for a division of this sample between 8 bins is:
:: <code>[-571.3571428571429,571.3571428571429,1714.0714285714287,2856.7857142857147,3999.5,5142.214285714286,6284.928571428572,7427.642857142857]</code>
:: The assumption they are making is that any given sample is likely to be drawn from a slightly larger range of possible sample values, and that some margin can usefully be allowed.
:: The margin which that library adopts is <code>margin = (hi - lo) / (fromIntegral (intBins - 1) * 2))</code>
:: (yielding fractionally larger bins and a total range that starts a little below the minimum observed value, and ends a little above the maximum observed value)
:: Arguably reasonable for us to do something comparable ? [[User:Hout|Hout]] ([[User talk:Hout|talk]]) 12:26, 26 February 2019 (UTC)
::: PS the dependence of edge cases on mutable assumptions (e.g. the relationship between the range of the sample and the range of possible/graphed values) may be underscored by the result given by the '''Mathematica 11 Histogram function''', which (if we specify only a target number of bins) allocates the same sample as follows (different pattern again, but still, I think, 5 sparkline levels):
:::: <code>Histogram[{0, 999, 4000, 4999, 7000, 7999}, {"Raw", 8}] --> </code>
:::: [2, 0, 0, 1, 1, 0, 1, 1]
::::
:::: And similarly the '''R language hist() function''' expression <code>hist(c(0, 999, 4000, 4999, 7000, 7999), breaks=8)</code>
:::: Returns a distribution of 5 [2, 0, 0, 1, 1, 0, 1, 1], again using 5 (rather than 3) of 8 available bins.
:::: The breaks which it derives from that data set can be listed:
:::: <code> > histinfo<-hist(c(0, 999, 4000, 4999, 7000, 7999), breaks=8)</code>
:::: <code> > histinfo</code>
:::: <code>$breaks</code>
:::: <code>[1] 0 1000 2000 3000 4000 5000 6000 7000 8000</code>
::::[[User:Hout|Hout]] ([[User talk:Hout|talk]]) 13:33, 26 February 2019 (UTC)
 
::::: "fractionally larger bins" is the Tcl approach I discussed in the section above. It's fine but requires careful selection of the denominator. Too big, and the bins are wider than they need to be (Tcl's mistake); too small, and it can be erased by fp errors.
 
::::: edit: the relationship between the value of <code>breaks</code> and the number of bins in R is completely opaque and does not match the documentation. For example, <code>hist(0:9, breaks=x)</code> gives 2 bins for x=3; 5 bins for x=4,5,6; 9 bins for x=7.
 
::::: edit2: I should clarify that Haskell's solution exhibits the half-width bug. I don't believe this is defensible. Much better choices of denominator are available. --Oopsiedaisy, 26 February 2019
 
 
 
;sparktest.pl
Line 100 ⟶ 69:
 
::Thanks Oopsiedaisy. I started the task off with an initial buggy Python solution. Now fixed and with examples extended to show your problem cases. Thanks again. --[[User:Paddy3118|Paddy3118]] ([[User talk:Paddy3118|talk]]) 19:35, 24 February 2019 (UTC)
 
 
====Counterpoint====
:: A very helpful intervention and discussion, and I agree absolutely about the first test example.
::
:: Perhaps our interpretation of the '''second''' test example depends on some unclarified assumptions about the optimal width (and alignment) of the bins ?
:: The Haskell '''Statistics.Sample.Histogram''' library, for example, returns the following allocation of the sample <code>0 999 4000 4999 7000 7999</code> to 8 evenly sized bins:
:: <code>[1,1,0,0,2,0,1,1]</code>
:: which would, I think, correspond to 5 different sparkline heights, unless I am confusing myself.
:: The set of lower bounds suggested by '''Statistics.Sample.Histogram''' for a division of this sample between 8 bins is:
:: <code>[-571.3571428571429,571.3571428571429,1714.0714285714287,2856.7857142857147,3999.5,5142.214285714286,6284.928571428572,7427.642857142857]</code>
:: The assumption they are making is that any given sample is likely to be drawn from a slightly larger range of possible sample values, and that some margin can usefully be allowed.
:: The margin which that library adopts is <code>margin = (hi - lo) / (fromIntegral (intBins - 1) * 2))</code>
:: (yielding fractionally larger bins and a total range that starts a little below the minimum observed value, and ends a little above the maximum observed value)
:: Arguably reasonable for us to do something comparable ? [[User:Hout|Hout]] ([[User talk:Hout|talk]]) 12:26, 26 February 2019 (UTC)
::: PS the dependence of edge cases on mutable assumptions (e.g. the relationship between the range of the sample and the range of possible/graphed values) may be underscored by the result given by the '''Mathematica 11 Histogram function''', which (if we specify only a target number of bins) allocates the same sample as follows (different pattern again, but still, I think, 5 sparkline levels):
:::: <code>Histogram[{0, 999, 4000, 4999, 7000, 7999}, {"Raw", 8}] --> </code>
:::: [2, 0, 0, 1, 1, 0, 1, 1]
::::
:::: And similarly the '''R language hist() function''' expression <code>hist(c(0, 999, 4000, 4999, 7000, 7999), breaks=8)</code>
:::: Returns a distribution of 5 [2, 0, 0, 1, 1, 0, 1, 1], again using 5 (rather than 3) of 8 available bins.
:::: The breaks which it derives from that data set can be listed:
:::: <code> > histinfo<-hist(c(0, 999, 4000, 4999, 7000, 7999), breaks=8)</code>
:::: <code> > histinfo</code>
:::: <code>$breaks</code>
:::: <code>[1] 0 1000 2000 3000 4000 5000 6000 7000 8000</code>
::::[[User:Hout|Hout]] ([[User talk:Hout|talk]]) 13:33, 26 February 2019 (UTC)
 
::::: "fractionally larger bins" is the Tcl approach I discussed in the section above. It's fine but requires careful selection of the denominator. Too big, and the bins are wider than they need to be (Tcl's mistake); too small, and it can be erased by fp errors.
 
::::: edit: the relationship between the value of <code>breaks</code> and the number of bins in R is completely opaque and does not match the documentation. For example, <code>hist(0:9, breaks=x)</code> gives 2 bins for x=3; 5 bins for x=4,5,6; 9 bins for x=7.
 
::::: edit2: I should clarify that Haskell's solution exhibits the half-width bug. I don't believe this is defensible. Much better choices of denominator are available. --Oopsiedaisy, 26 February 2019
 
==Deeper root of the 'bug' ?==