Cumulative standard deviation: Difference between revisions

→‎{{header|R}}: 5 solutions, all wrong or hilariously slow: repeating the sum for each subvector leads to a O(n^2) algo
(Emacs Lisp: Improve solutions)
(→‎{{header|R}}: 5 solutions, all wrong or hilariously slow: repeating the sum for each subvector leads to a O(n^2) algo)
Line 3,569:
 
=={{header|R}}==
===Built-in Std Dev fn===
<lang rsplus>#The built-in standard deviation function applies the Bessel correction. To reverse this, we can apply an uncorrection.
#If na.rm is true, missing data points (NA values) are removed.
reverseBesselCorrection <- function(x, na.rm=FALSE)
{
if(na.rm) x <- x[!is.na(x)]
len <- length(x)
if(len < 2) stop("2 or more data points required")
sqrt((len-1)/len)
}
testdata <- c(2,4,4,4,5,5,7,9)
reverseBesselCorrection(testdata)*sd(testdata) #2</lang>
===From scratch===
<lang rsplus>#Again, if na.rm is true, missing data points (NA values) are removed.
uncorrectedsd <- function(x, na.rm=FALSE)
{
len <- length(x)
if(len < 2) stop("2 or more data points required")
mu <- mean(x, na.rm=na.rm)
ssq <- sum((x - mu)^2, na.rm=na.rm)
usd <- sqrt(ssq/len)
usd
}
uncorrectedsd(testdata) #2</lang>
==="Running" SD===
If we desire a solution that gives every "running" standard deviation for each input, rather than only giving one number as our final output, we can do the following. To make this differ from previous solutions, we will not have our code make any mention of missing values, and we will show off R's Reduce and sapply.
<lang rsplus>#Once again, we have to make a standard deviation function from scratch.
biasedSd <- function(data) sqrt(mean((data - mean(data))^2))
cumSd <- function(data) sapply(Reduce(c, data, accumulate = TRUE), biasedSd)</lang>
{{out}}
<pre>> cumSd(c(2, 4, 4, 4, 5, 5, 7, 9))
[1] 0.0000000 1.0000000 0.9428090 0.8660254 0.9797959 1.0000000 1.3997084 2.0000000</pre>
 
To compute the running sum, one must keep track of the number of items, the sum of values, and the sum of squares.
===Stateful SD===
 
If we want a function that remembers and uses the previous inputs, letting us be very strict about the "one at a time" requirement, then we can lift biasedSd from the previous solution and make good use of the distinction between R's <- and <<- methods of assignment.
If the goal is to get a vector of running standard deviations, the simplest is to do it with cumsum:
<lang rsplus>cumSDStateful <- function()
 
{
<lang datar>cumsd <- numericfunction(0x) {
lenn <- lengthseq_along(x)
function(oneNumber)
sqrt(cumsum(x^2) / n - (cumsum(x) / n)^2)
{
}</lang>
data <<- c(data, oneNumber)
 
biasedSd(data)
set.seed(12345L)
x <- rnorm(10)
 
cumsd(x)
# [1] 0.0000000 0.3380816 0.8752973 1.1783628 1.2345538 1.3757142 1.2867220 1.2229056 1.1665168 1.1096814
 
# Compare to the naive implementation, i.e. compute sd on each sublist:
Vectorize(function(k) sd(x[1:k]) * sqrt((k - 1) / k))(seq_along(x))
# [1] NA 0.3380816 0.8752973 1.1783628 1.2345538 1.3757142 1.2867220 1.2229056 1.1665168 1.1096814
# Note that the first is NA because sd is unbiased formula, hence there is a division by n-1, which is 0 for n=1.</lang>
 
The task requires an accumulator solution:
 
<lang rsplusr>cumSDStatefulaccumsd <- function() {
n <- 0
m <- 0
s <- 0
{
function(oneNumberx) {
n <<- n + 1
m <<- m + x
s <<- s + x * x
sqrt((lens / n -1) (m /len n)^2)
}
}
state <- cumSDStateful()</lang>
{{out}}
<pre>> state(2);state(4);state(4);state(4);state(5);state(5);state(7);state(9)
[1] 0
[1] 1
[1] 0.942809
[1] 0.8660254
[1] 0.9797959
[1] 1
[1] 1.399708
[1] 2</pre>
 
f <- accumsd()
===Environment solution===
sapply(x, f)
R gives us some control over what environment expressions are evaluated in. This lets us shorten the previous solution and get identical output.
# [1] 0.0000000 0.3380816 0.8752973 1.1783628 1.2345538 1.3757142 1.2867220 1.2229056 1.1665168 1.1096814</lang>
<lang rsplus>localCumSD <- local({
data <- numeric(0)
function(oneNumber)
{
data <<- c(data, oneNumber)
biasedSd(data)#Again, lifted from the ""Running" SD" solution.
}
})
localCumSD(2);localCumSD(4);localCumSD(4);localCumSD(4);localCumSD(5);localCumSD(5);localCumSD(7);localCumSD(9)</lang>
 
=={{header|Racket}}==
1,336

edits