File size distribution: Difference between revisions

(New post.)
 
(2 intermediate revisions by 2 users not shown)
Line 1,010:
10000000 4</syntaxhighlight>
 
=={{header|JJava}}==
</syntaxhighlight lang="java">
 
import java.io.IOException;
Line 1,061:
10^6 to 10^7 98 20.7%
10^7 to 10^8 9 1.9%
</pre>
 
=={{header|jq}}==
'''Works with jq, the C implementation of jq'''
 
'''Works with gojq, the Go implementation of jq'''
 
'''Works with jaq, the Rust implementation of jq'''
 
This entry illustrates how jq plays nicely with other command-line
tools; in this case jc (https://kellyjonbrazil.github.io/jc) is used to JSONify the output of `ls -Rl`.
 
(jq could also be used to parse the raw output of `ls`, but it would no doubt
be tricky to achieve portability.)
 
The invocation of jc and jq would be along the following lines:
<pre>
jc --ls -lR | jq -c -f file-size-distribution.jq
</pre>
 
In the present case, the output from the call to `histogram` is a stream of [category, count] pairs
beginning with [0, _] showing the number of files of size 0; thereafter, the boundaries
of the categories are defined logarithmically, i.e. a file of size of $n is assigned to
the category `1 + ($n | log10 | trunc)`.
 
The output shown below for an actual directory tree suggests a
unimodal distribution of file sizes.
 
<syntaxhighlight lang="jq">
# bag of words
def bow(stream):
reduce stream as $word ({}; .[($word|tostring)] += 1);
 
# `stream` is expected to be a stream of non-negative numbers or numeric strings.
# The output is a stream of [bucket, count] pairs, sorted by the value of `bucket`.
# No sorting except for the sorting of these bucket boundaries takes place.
def histogram(stream):
bow(stream)
| to_entries
| map( [(.key | tonumber), .value] )
| sort_by(.[0])
| .[];
 
histogram(.[] | .size | if . == 0 then 0 else 1 + (log10 | trunc) end)
</syntaxhighlight>
{{output}}
<pre>
[0,9]
[1,67]
[2,616]
[3,6239]
[4,3679]
[5,213]
[6,56]
[7,40]
[8,20]
[9,4]
[10,1]
</pre>
 
Line 2,021 ⟶ 2,079:
{{libheader|Wren-math}}
{{libheader|Wren-fmt}}
<syntaxhighlight lang="ecmascriptwren">import "io" for Directory, File, Stat
import "os" for Process
import "./math" for Math
import "./fmt" for Fmt
 
var sizes = List.filled(12, 0)
2,442

edits