Talk:Sparkline in unicode: Difference between revisions

Talk:Sparkline in unicode (view source)

Revision as of 05:37, 28 February 2019

304 bytes added , 5 years ago

→‎Most of these are buggy: this section is intended to concisely explain the bug and its detection concisely. Moved discussion about whether it's a bug at all to bottom of section; added link from the test case.

Anonymous user

rosettacode>Oopsiedaisy

Revision as of 05:11, 28 February 2019 (view source) rosettacode>Oopsiedaisy m (→‎Deeper root of the 'bug' ?) ← Older edit		Revision as of 05:37, 28 February 2019 (view source) rosettacode>Oopsiedaisy (→‎Most of these are buggy: this section is intended to concisely explain the bug and its detection concisely. Moved discussion about whether it's a bug at all to bottom of section; added link from the test case.) Newer edit →
Line 19: * <code>0 999 4000 4999 7000 7999</code> detects the half-width bug and some smaller errors (see Tcl). Output should have three heights; the half-width bug looks like: ▁▂▅▅▇█ : '''Addendum:''' ''the second test case assumes that each of the 8 heights should represent 1/8<sup>th</sup> of the range, as closely as possible. Not everyone agrees. See [[#counterpoint]] and [[#Deeper_root_of_the_.27bug.27_.3F\|Deeper root of the bug?]] below for discussion.'' :: A very helpful intervention and discussion, and I agree absolutely about the first test example.▼ ::▼ :: Perhaps our interpretation of the '''second''' test example depends on some unclarified assumptions about the optimal width (and alignment) of the bins ? ▼ :: The Haskell '''Statistics.Sample.Histogram''' library, for example, returns the following allocation of the sample <code>0 999 4000 4999 7000 7999</code> to 8 evenly sized bins:▼ :: <code>[1,1,0,0,2,0,1,1]</code>▼ :: which would, I think, correspond to 5 different sparkline heights, unless I am confusing myself.▼ :: The set of lower bounds suggested by '''Statistics.Sample.Histogram''' for a division of this sample between 8 bins is:▼ :: <code>[-571.3571428571429,571.3571428571429,1714.0714285714287,2856.7857142857147,3999.5,5142.214285714286,6284.928571428572,7427.642857142857]</code>▼ :: The assumption they are making is that any given sample is likely to be drawn from a slightly larger range of possible sample values, and that some margin can usefully be allowed.▼ :: The margin which that library adopts is <code>margin = (hi - lo) / (fromIntegral (intBins - 1) * 2))</code> ▼ :: (yielding fractionally larger bins and a total range that starts a little below the minimum observed value, and ends a little above the maximum observed value)▼ :: Arguably reasonable for us to do something comparable ? [[User:Hout\|Hout]] ([[User talk:Hout\|talk]]) 12:26, 26 February 2019 (UTC)▼ ::: PS the dependence of edge cases on mutable assumptions (e.g. the relationship between the range of the sample and the range of possible/graphed values) may be underscored by the result given by the '''Mathematica 11 Histogram function''', which (if we specify only a target number of bins) allocates the same sample as follows (different pattern again, but still, I think, 5 sparkline levels):▼ :::: <code>Histogram[{0, 999, 4000, 4999, 7000, 7999}, {"Raw", 8}] --> </code>▼ :::: [2, 0, 0, 1, 1, 0, 1, 1] ▼ ::::▼ :::: And similarly the '''R language hist() function''' expression <code>hist(c(0, 999, 4000, 4999, 7000, 7999), breaks=8)</code>▼ :::: Returns a distribution of 5 [2, 0, 0, 1, 1, 0, 1, 1], again using 5 (rather than 3) of 8 available bins.▼ :::: The breaks which it derives from that data set can be listed:▼ :::: <code> > histinfo<-hist(c(0, 999, 4000, 4999, 7000, 7999), breaks=8)</code>▼ :::: <code> > histinfo</code>▼ :::: <code>$breaks</code>▼ :::: <code>[1] 0 1000 2000 3000 4000 5000 6000 7000 8000</code>▼ ::::[[User:Hout\|Hout]] ([[User talk:Hout\|talk]]) 13:33, 26 February 2019 (UTC)▼ ::::: "fractionally larger bins" is the Tcl approach I discussed in the section above. It's fine but requires careful selection of the denominator. Too big, and the bins are wider than they need to be (Tcl's mistake); too small, and it can be erased by fp errors. ▼ ::::: edit: the relationship between the value of <code>breaks</code> and the number of bins in R is completely opaque and does not match the documentation. For example, <code>hist(0:9, breaks=x)</code> gives 2 bins for x=3; 5 bins for x=4,5,6; 9 bins for x=7.▼ ::::: edit2: I should clarify that Haskell's solution exhibits the half-width bug. I don't believe this is defensible. Much better choices of denominator are available. --Oopsiedaisy, 26 February 2019▼ ;sparktest.pl Line 100 ⟶ 69: ::Thanks Oopsiedaisy. I started the task off with an initial buggy Python solution. Now fixed and with examples extended to show your problem cases. Thanks again. --[[User:Paddy3118\|Paddy3118]] ([[User talk:Paddy3118\|talk]]) 19:35, 24 February 2019 (UTC) ====Counterpoint==== ▲:: A very helpful intervention and discussion, and I agree absolutely about the first test example. ▲:: ▲:: Perhaps our interpretation of the '''second''' test example depends on some unclarified assumptions about the optimal width (and alignment) of the bins ? ▲:: The Haskell '''Statistics.Sample.Histogram''' library, for example, returns the following allocation of the sample <code>0 999 4000 4999 7000 7999</code> to 8 evenly sized bins: ▲:: <code>[1,1,0,0,2,0,1,1]</code> ▲:: which would, I think, correspond to 5 different sparkline heights, unless I am confusing myself. ▲:: The set of lower bounds suggested by '''Statistics.Sample.Histogram''' for a division of this sample between 8 bins is: ▲:: <code>[-571.3571428571429,571.3571428571429,1714.0714285714287,2856.7857142857147,3999.5,5142.214285714286,6284.928571428572,7427.642857142857]</code> ▲:: The assumption they are making is that any given sample is likely to be drawn from a slightly larger range of possible sample values, and that some margin can usefully be allowed. ▲:: The margin which that library adopts is <code>margin = (hi - lo) / (fromIntegral (intBins - 1) * 2))</code> ▲:: (yielding fractionally larger bins and a total range that starts a little below the minimum observed value, and ends a little above the maximum observed value) ▲:: Arguably reasonable for us to do something comparable ? [[User:Hout\|Hout]] ([[User talk:Hout\|talk]]) 12:26, 26 February 2019 (UTC) ▲::: PS the dependence of edge cases on mutable assumptions (e.g. the relationship between the range of the sample and the range of possible/graphed values) may be underscored by the result given by the '''Mathematica 11 Histogram function''', which (if we specify only a target number of bins) allocates the same sample as follows (different pattern again, but still, I think, 5 sparkline levels): ▲:::: <code>Histogram[{0, 999, 4000, 4999, 7000, 7999}, {"Raw", 8}] --> </code> ▲:::: [2, 0, 0, 1, 1, 0, 1, 1] ▲:::: ▲:::: And similarly the '''R language hist() function''' expression <code>hist(c(0, 999, 4000, 4999, 7000, 7999), breaks=8)</code> ▲:::: Returns a distribution of 5 [2, 0, 0, 1, 1, 0, 1, 1], again using 5 (rather than 3) of 8 available bins. ▲:::: The breaks which it derives from that data set can be listed: ▲:::: <code> > histinfo<-hist(c(0, 999, 4000, 4999, 7000, 7999), breaks=8)</code> ▲:::: <code> > histinfo</code> ▲:::: <code>$breaks</code> ▲:::: <code>[1] 0 1000 2000 3000 4000 5000 6000 7000 8000</code> ▲::::[[User:Hout\|Hout]] ([[User talk:Hout\|talk]]) 13:33, 26 February 2019 (UTC) ▲::::: "fractionally larger bins" is the Tcl approach I discussed in the section above. It's fine but requires careful selection of the denominator. Too big, and the bins are wider than they need to be (Tcl's mistake); too small, and it can be erased by fp errors. ▲::::: edit: the relationship between the value of <code>breaks</code> and the number of bins in R is completely opaque and does not match the documentation. For example, <code>hist(0:9, breaks=x)</code> gives 2 bins for x=3; 5 bins for x=4,5,6; 9 bins for x=7. ▲::::: edit2: I should clarify that Haskell's solution exhibits the half-width bug. I don't believe this is defensible. Much better choices of denominator are available. --Oopsiedaisy, 26 February 2019 ==Deeper root of the 'bug' ?==