File size distribution: Difference between revisions
→{{header|jq}}
(Added solution for Action!) |
|||
(10 intermediate revisions by 6 users not shown) | |||
Line 16:
DOS 2.5 returns file size in number of sectors.
{{libheader|Action! Tool Kit}}
<
PROC SizeDistribution(CHAR ARRAY filter INT ARRAY limits,counts BYTE count)
Line 116:
SizeDistribution(filter,limits,counts,LIMITCOUNT)
PrintResult(filter,limits,counts,LIMITCOUNT)
RETURN</
{{out}}
[https://gitlab.com/amarok8bit/action-rosetta-code/-/raw/master/images/File_size_distribution.png Screenshot from Atari 8-bit computer]
Line 138:
=={{header|Ada}}==
{{libheader|Dir_Iterators}}
<
with Ada.Directories; use Ada.Directories;
with Ada.Strings.Fixed; use Ada.Strings;
Line 196:
New_Line;
end loop;
end File_Size_Distribution;</
{{out}}
<pre>Less than 10**0: 8
Line 210:
The platform independent way to get the file size in C involves opening every file and reading the size. The implementation below works for Windows and utilizes command scripts to get size information quickly even for a large number of files, recursively traversing a large number of directories. Both textual and graphical ( ASCII ) outputs are shown. The same can be done for Linux by a combination of the find, ls and stat commands and my plan was to make it work on both OS types, but I don't have access to a Linux system right now. This would also mean either abandoning scaling the graphical output in order to fit the console buffer or porting that as well, thus including windows.h selectively.
===Windows===
<syntaxhighlight lang="c">
#include<windows.h>
#include<string.h>
Line 284:
}
}
</syntaxhighlight>
Invocation and textual output :
<pre>
Line 350:
{{libheader|POSIX}}
This works on macOS 10.15. It should be OK for Linux as well.
<
#include <locale.h>
#include <stdint.h>
Line 397:
printf("Total file size: %'lu\n", total_size);
return EXIT_SUCCESS;
}</
{{out}}
Line 417:
=={{header|C++}}==
<
#include <array>
#include <filesystem>
Line 468:
}
return EXIT_SUCCESS;
}</
{{out}}
Line 491:
{{libheader| Winapi.Windows}}
{{Trans|Go}}
<syntaxhighlight lang="delphi">
program File_size_distribution;
Line 598:
fileSizeDistribution('.');
readln;
end.</
=={{header|Factor}}==
{{works with|Factor|0.99 2020-03-02}}
<
io.files.types io.pathnames kernel math math.functions
math.statistics namespaces sequences ;
Line 615:
current-directory get file-size-histogram dup
[ "Count of files < 10^%d bytes: %4d\n" printf ] assoc-each
nl values sum "Total files: %d\n" printf</
{{out}}
<pre>
Line 634:
=={{header|Go}}==
{{trans|Kotlin}}
<
import (
Line 705:
func main() {
fileSizeDistribution("./")
}</
{{out}}
Line 733:
Uses a grouped frequency distribution. Program arguments are optional. Arguments include starting directory and initial frequency distribution group size. After the first frequency distribution is computed it further breaks it down for any group that exceeds 25% of the total file count, when possible.
</p>
<
import Control.Concurrent (forkIO, setNumCapabilities)
Line 913:
mapM_ (displayFrequency fileCount) $ Map.assocs results
where
groupThreshold = round . (*0.25) . realToFrac</
{{out}}
<pre style="height: 50rem;">$ filedist ~/Music
Line 992:
16.00MB <-> 18.67MB = 3 0.436%: ▍
24.00MB <-> 26.66MB = 1 0.145%: ▍
</pre>
=={{header|J}}==
We can get file sizes of all files under a specific path by inspecting the last column from dirtree. For example, the sizes of the files under the user's home directory would be <tt>;{:|:dirtree '~'</tt>
From there, we can bucket them by factors of ten, then display the limiting size of each bucket along with the number of files contained (we'll sort them, for legibility):
<syntaxhighlight lang="j"> ((10x^~.),.#/.~) <.10 ^.1>. /:~;{:|:dirtree '~'
1 2
10 8
100 37
1000 49
10000 20
100000 9
1000000 4
10000000 4</syntaxhighlight>
=={{header|Java}}==
<syntaxhighlight lang="java">
import java.io.IOException;
import java.nio.file.Files;
import java.nio.file.Path;
import java.util.HashMap;
import java.util.List;
import java.util.Map;
public final class FileSizeDistribution {
public static void main(String[] aArgs) throws IOException {
List<Path> fileNames = Files.list(Path.of("."))
.filter( file -> ! Files.isDirectory(file) )
.map(Path::getFileName)
.toList();
Map<Integer, Integer> fileSizes = new HashMap<Integer, Integer>();
for ( Path path : fileNames ) {
fileSizes.merge(String.valueOf(Files.size(path)).length(), 1, Integer::sum);
}
final int fileCount = fileSizes.values().stream().mapToInt(Integer::valueOf).sum();
System.out.println("File size distribution for directory \".\":" + System.lineSeparator());
System.out.println("File size in bytes | Number of files | Percentage");
System.out.println("-------------------------------------------------");
for ( int key : fileSizes.keySet() ) {
final int value = fileSizes.get(key);
System.out.println(String.format("%s%d%s%d%15d%15.1f%%",
" 10^", ( key - 1 ), " to 10^", key, value, ( 100.0 * value ) / fileCount));
}
}
}
</syntaxhighlight>
{{ out }}
<pre>
File size distribution for directory ".":
File size in bytes | Number of files | Percentage
-------------------------------------------------
10^0 to 10^1 1 0.2%
10^1 to 10^2 1 0.2%
10^2 to 10^3 5 1.1%
10^3 to 10^4 3 0.6%
10^4 to 10^5 161 34.0%
10^5 to 10^6 196 41.4%
10^6 to 10^7 98 20.7%
10^7 to 10^8 9 1.9%
</pre>
=={{header|jq}}==
'''Works with jq, the C implementation of jq'''
'''Works with gojq, the Go implementation of jq'''
'''Works with jaq, the Rust implementation of jq'''
This entry illustrates how jq plays nicely with other command-line
tools; in this case jc (https://kellyjonbrazil.github.io/jc) is used to JSONify the output of `ls -Rl`.
(jq could also be used to parse the raw output of `ls`, but it would no doubt
be tricky to achieve portability.)
The invocation of jc and jq would be along the following lines:
<pre>
jc --ls -lR | jq -c -f file-size-distribution.jq
</pre>
In the present case, the output from the call to `histogram` is a stream of [category, count] pairs
beginning with [0, _] showing the number of files of size 0; thereafter, the boundaries
of the categories are defined logarithmically, i.e. a file of size of $n is assigned to
the category `1 + ($n | log10 | trunc)`.
The output shown below for an actual directory tree suggests a
unimodal distribution of file sizes.
<syntaxhighlight lang="jq">
# bag of words
def bow(stream):
reduce stream as $word ({}; .[($word|tostring)] += 1);
# `stream` is expected to be a stream of non-negative numbers or numeric strings.
# The output is a stream of [bucket, count] pairs, sorted by the value of `bucket`.
# No sorting except for the sorting of these bucket boundaries takes place.
def histogram(stream):
bow(stream)
| to_entries
| map( [(.key | tonumber), .value] )
| sort_by(.[0])
| .[];
histogram(.[] | .size | if . == 0 then 0 else 1 + (log10 | trunc) end)
</syntaxhighlight>
{{output}}
<pre>
[0,9]
[1,67]
[2,616]
[3,6239]
[4,3679]
[5,213]
[6,56]
[7,40]
[8,20]
[9,4]
[10,1]
</pre>
Line 997 ⟶ 1,124:
{{works with|Julia|0.6}}
<
function sizelist(path::AbstractString)
Line 1,023 ⟶ 1,150:
end
main(".")</
{{out}}
Line 1,041 ⟶ 1,168:
=={{header|Kotlin}}==
<
import java.io.File
Line 1,088 ⟶ 1,215:
fun main(args: Array<String>) {
fileSizeDistribution("./") // current directory
}</
{{out}}
Line 1,113 ⟶ 1,240:
Number of inaccessible files : 0
</pre>
=={{header|Lang}}==
{{libheader|lang-io-module}}
<syntaxhighlight lang="lang">
# Load the IO module
# Replace "<pathToIO.lm>" with the location where the io.lm Lang module was installed to without "<" and ">"
ln.loadModule(<pathToIO.lm>)
fp.fileSizeDistribution = (&sizes, $[totalSize], $file) -> {
if([[io]]::fp.isDirectory($file)) {
&fileNames = [[io]]::fp.listFilesAndDirectories($file)
$path = [[io]]::fp.getCanonicalPath($file)
if($path == /) {
$path = \e
}
$fileName
foreach($[fileName], &fileNames) {
$innerFile = [[io]]::fp.openFile($path/$fileName)
$innerTotalSize = 0L
fp.fileSizeDistribution(&sizes, $innerTotalSize, $innerFile)
$*totalSize += $innerTotalSize
[[io]]::fp.closeFile($innerFile)
}
}else {
$len = [[io]]::fp.getSize($file)
if($len == null) {
return
}
$*totalSize += $len
if($len == 0) {
&sizes[0] += 1
}else {
$index = fn.int(fn.log10($len))
&sizes[$index] += 1
}
}
}
$path $= @&LANG_ARGS == 1?&LANG_ARGS[0]:{{{./}}}
&sizes = fn.arrayMake(12)
fn.arraySetAll(&sizes, 0)
$file = [[io]]::fp.openFile($path)
$totalSize = 0L
fp.fileSizeDistribution(&sizes, $totalSize, $file)
[[io]]::fp.closeFile($file)
fn.println(File size distribution for "$path":)
$i
repeat($[i], @&sizes) {
fn.printf(10 ^% 3d bytes: %d%n, $i, parser.op(&sizes[$i]))
}
fn.println(Number of files: fn.arrayReduce(&sizes, 0, fn.add))
fn.println(Total file size: $totalSize)
</syntaxhighlight>
=={{header|Mathematica}} / {{header|Wolfram Language}}==
<
Histogram[FileByteCount /@ Select[FileNames[__], DirectoryQ /* Not], {"Log", 15}, {"Log", "Count"}]</
=={{header|Nim}}==
<
const
Line 1,165 ⟶ 1,357:
echo fmt"Size in {rangeString: 14} {count:>7} {100 * count / total:5.2f}%"
echo ""
echo "Total number of files: ", sum(counts)</
{{out}}
Line 1,187 ⟶ 1,379:
=={{header|Perl}}==
{{trans|Raku}}
<
use List::Util qw(max);
Line 1,214 ⟶ 1,406:
sub fsize { $fsize{ log10( (lstat($_))[7] ) }++ }
sub log10 { my($s) = @_; $s ? int log($s)/log(10) : 0 }</
{{out}}
<pre>File size distribution in bytes for directory: .
Line 1,228 ⟶ 1,420:
=={{header|Phix}}==
Works on Windows and Linux. Uses "proper" sizes, ie 1MB==1024KB. Can be quite slow at first, but is pretty fast on the second and subsequent runs, that is once the OS has cached its (low-level) directory reads.
<!--<syntaxhighlight lang="phix">(notonline)-->
<span style="color: #008080;">without</span> <span style="color: #008080;">js</span> <span style="color: #000080;font-style:italic;">-- file i/o</span>
<span style="color: #004080;">sequence</span> <span style="color: #000000;">sizes</span> <span style="color: #0000FF;">=</span> <span style="color: #0000FF;">{</span><span style="color: #000000;">1</span><span style="color: #0000FF;">},</span>
<span style="color: #000000;">res</span> <span style="color: #0000FF;">=</span> <span style="color: #0000FF;">{</span><span style="color: #000000;">0</span><span style="color: #0000FF;">}</span>
<span style="color: #004080;">atom</span> <span style="color: #000000;">t1</span> <span style="color: #0000FF;">=</span> <span style="color: #7060A8;">time</span><span style="color: #0000FF;">()+</span><span style="color: #000000;">1</span>
<span style="color: #008080;">function</span> <span style="color: #000000;">store_res</span><span style="color: #0000FF;">(</span><span style="color: #004080;">string</span> <span style="color: #000000;">filepath</span><span style="color: #0000FF;">,</span> <span style="color: #004080;">sequence</span> <span style="color: #000000;">dir_entry</span><span style="color: #0000FF;">)</span>
<span style="color: #008080;">if</span> <span style="color: #008080;">not</span> <span style="color: #7060A8;">find</span><span style="color: #0000FF;">(</span><span style="color: #008000;">'d'</span><span style="color: #0000FF;">,</span> <span style="color: #000000;">dir_entry</span><span style="color: #0000FF;">[</span><span style="color: #004600;">D_ATTRIBUTES</span><span style="color: #0000FF;">])</span> <span style="color: #008080;">then</span>
<span style="color: #004080;">atom</span> <span style="color: #000000;">size</span> <span style="color: #0000FF;">=</span> <span style="color: #000000;">dir_entry</span><span style="color: #0000FF;">[</span><span style="color: #004600;">D_SIZE</span><span style="color: #0000FF;">]</span>
<span style="color: #004080;">integer</span> <span style="color: #000000;">sdx</span> <span style="color: #0000FF;">=</span> <span style="color: #000000;">1</span>
<span style="color: #008080;">while</span> <span style="color: #000000;">size</span><span style="color: #0000FF;">></span><span style="color: #000000;">sizes</span><span style="color: #0000FF;">[</span><span style="color: #000000;">sdx</span><span style="color: #0000FF;">]</span> <span style="color: #008080;">do</span>
<span style="color: #008080;">if</span> <span style="color: #000000;">sdx</span><span style="color: #0000FF;">=</span><span style="color: #7060A8;">length</span><span style="color: #0000FF;">(</span><span style="color: #000000;">sizes</span><span style="color: #0000FF;">)</span> <span style="color: #008080;">then</span>
<span style="color: #000000;">sizes</span> <span style="color: #0000FF;">&=</span> <span style="color: #000000;">sizes</span><span style="color: #0000FF;">[$]*</span><span style="color: #008080;">iff</span><span style="color: #0000FF;">(</span><span style="color: #7060A8;">mod</span><span style="color: #0000FF;">(</span><span style="color: #7060A8;">length</span><span style="color: #0000FF;">(</span><span style="color: #000000;">sizes</span><span style="color: #0000FF;">),</span><span style="color: #000000;">3</span><span style="color: #0000FF;">)?</span><span style="color: #000000;">10</span><span style="color: #0000FF;">:</span><span style="color: #000000;">10.24</span><span style="color: #0000FF;">)</span>
<span style="color: #000000;">res</span> <span style="color: #0000FF;">&=</span> <span style="color: #000000;">0</span>
<span style="color: #008080;">end</span> <span style="color: #008080;">if</span>
<span style="color: #000000;">sdx</span> <span style="color: #0000FF;">+=</span> <span style="color: #000000;">1</span>
<span style="color: #008080;">end</span> <span style="color: #008080;">while</span>
<span style="color: #000000;">res</span><span style="color: #0000FF;">[</span><span style="color: #000000;">sdx</span><span style="color: #0000FF;">]</span> <span style="color: #0000FF;">+=</span> <span style="color: #000000;">1</span>
<span style="color: #008080;">if</span> <span style="color: #7060A8;">time</span><span style="color: #0000FF;">()></span><span style="color: #000000;">t1</span> <span style="color: #008080;">then</span>
<span style="color: #7060A8;">printf</span><span style="color: #0000FF;">(</span><span style="color: #000000;">1</span><span style="color: #0000FF;">,</span><span style="color: #008000;">"%,d files found\r"</span><span style="color: #0000FF;">,</span><span style="color: #7060A8;">sum</span><span style="color: #0000FF;">(</span><span style="color: #000000;">res</span><span style="color: #0000FF;">))</span>
<span style="color: #000000;">t1</span> <span style="color: #0000FF;">=</span> <span style="color: #7060A8;">time</span><span style="color: #0000FF;">()+</span><span style="color: #000000;">1</span>
<span style="color: #008080;">end</span> <span style="color: #008080;">if</span>
<span style="color: #008080;">end</span> <span style="color: #008080;">if</span>
<span style="color: #008080;">return</span> <span style="color: #000000;">0</span> <span style="color: #000080;font-style:italic;">-- keep going</span>
<span style="color: #008080;">end</span> <span style="color: #008080;">function</span>
<span style="color: #004080;">integer</span> <span style="color: #000000;">exit_code</span> <span style="color: #0000FF;">=</span> <span style="color: #7060A8;">walk_dir</span><span style="color: #0000FF;">(</span><span style="color: #008000;">"."</span><span style="color: #0000FF;">,</span> <span style="color: #000000;">store_res</span><span style="color: #0000FF;">,</span> <span style="color: #004600;">true</span><span style="color: #0000FF;">)</span>
<span style="color: #7060A8;">printf</span><span style="color: #0000FF;">(</span><span style="color: #000000;">1</span><span style="color: #0000FF;">,</span><span style="color: #008000;">"%,d files found\n"</span><span style="color: #0000FF;">,</span><span style="color: #7060A8;">sum</span><span style="color: #0000FF;">(</span><span style="color: #000000;">res</span><span style="color: #0000FF;">))</span>
<span style="color: #004080;">integer</span> <span style="color: #000000;">w</span> <span style="color: #0000FF;">=</span> <span style="color: #7060A8;">max</span><span style="color: #0000FF;">(</span><span style="color: #000000;">res</span><span style="color: #0000FF;">)</span>
<span style="color: #000080;font-style:italic;">--include builtins/pfile.e</span>
<span style="color: #008080;">for</span> <span style="color: #000000;">i</span><span style="color: #0000FF;">=</span><span style="color: #000000;">1</span> <span style="color: #008080;">to</span> <span style="color: #7060A8;">length</span><span style="color: #0000FF;">(</span><span style="color: #000000;">res</span><span style="color: #0000FF;">)</span> <span style="color: #008080;">do</span>
<span style="color: #004080;">integer</span> <span style="color: #000000;">ri</span> <span style="color: #0000FF;">=</span> <span style="color: #000000;">res</span><span style="color: #0000FF;">[</span><span style="color: #000000;">i</span><span style="color: #0000FF;">]</span>
<span style="color: #004080;">string</span> <span style="color: #000000;">s</span> <span style="color: #0000FF;">=</span> <span style="color: #000000;">file_size_k</span><span style="color: #0000FF;">(</span><span style="color: #000000;">sizes</span><span style="color: #0000FF;">[</span><span style="color: #000000;">i</span><span style="color: #0000FF;">],</span> <span style="color: #000000;">5</span><span style="color: #0000FF;">),</span>
<span style="color: #000000;">p</span> <span style="color: #0000FF;">=</span> <span style="color: #7060A8;">repeat</span><span style="color: #0000FF;">(</span><span style="color: #008000;">'*'</span><span style="color: #0000FF;">,</span><span style="color: #7060A8;">floor</span><span style="color: #0000FF;">(</span><span style="color: #000000;">60</span><span style="color: #0000FF;">*</span><span style="color: #000000;">ri</span><span style="color: #0000FF;">/</span><span style="color: #000000;">w</span><span style="color: #0000FF;">))</span>
<span style="color: #7060A8;">printf</span><span style="color: #0000FF;">(</span><span style="color: #000000;">1</span><span style="color: #0000FF;">,</span><span style="color: #008000;">"files < %s: %s%,d\n"</span><span style="color: #0000FF;">,{</span><span style="color: #000000;">s</span><span style="color: #0000FF;">,</span><span style="color: #000000;">p</span><span style="color: #0000FF;">,</span><span style="color: #000000;">ri</span><span style="color: #0000FF;">})</span>
<span style="color: #008080;">end</span> <span style="color: #008080;">for</span>
<!--</syntaxhighlight>-->
{{out}}
<pre>
Line 1,281 ⟶ 1,476:
The distribution is stored in a '''collections.Counter''' object (like a dictionary with automatic 0 value when a key is not found, useful when incrementing). Anything could be done with this object, here the number of files is printed for increasing sizes. No check is made during the directory walk: usually, safeguards would be needed or the program will fail on any unreadable file or directory (depending on rights, or too deep paths, for instance). Here links are skipped, so it should avoid cycles.
<
from collections import Counter
Line 1,312 ⟶ 1,507:
print("Total %d bytes for %d files" % (s, n))
main(sys.argv[1:])</
=={{header|Racket}}==
<
(define (file-size-distribution (d (current-directory)) #:size-group-function (sgf values))
Line 1,343 ⟶ 1,538:
(module+ test
(call-with-values (λ () (file-size-distribution #:size-group-function log10-or-so))
(report-fsd log10-or-so)))</
{{out}}
Line 1,361 ⟶ 1,556:
By default, process the current and all readable sub-directories, or, pass in a directory path at the command line.
<syntaxhighlight lang="raku"
sub log10 (Int $s) { $s ?? $s.log(10).Int !! 0 }
my %fsize;
Line 1,386 ⟶ 1,581:
my ($end, $bar) = $scaled.polymod(8);
(@blocks[8] x $bar * 8) ~ (@blocks[$end] if $end) ~ "\n"
}</
{{out}}
Line 1,419 ⟶ 1,614:
Also, some Windows versions of the '''dir''' command insert commas into numbers, so code was added to elide them.
<
numeric digits 30 /*ensure enough decimal digits for a #.*/
parse arg ds . /*obtain optional argument from the CL.*/
Line 1,464 ⟶ 1,659:
exit /*stick a fork in it, we're all done. */
/*──────────────────────────────────────────────────────────────────────────────────────*/
commas: parse arg _; do j#=length(_)-3 to 1 by -3; _=insert(',', _, j#); end; return _</
This REXX program makes use of '''LINESIZE''' REXX program (or BIF) which is used to determine the screen width (or linesize) of the terminal (console) so as to maximize the width of the histogram.
Line 1,529 ⟶ 1,724:
{{libheader|walkdir}}
{{works with|Rust|2018}}
<
use std::error::Error;
use std::marker::PhantomData;
Line 1,706 ⟶ 1,901:
}
}
</syntaxhighlight>
{{out}}
<pre>
Line 1,726 ⟶ 1,921:
=={{header|Sidef}}==
<
dir.open(\var dir_h) || return nil
Line 1,754 ⟶ 1,949:
}
say "Total: #{total_size} bytes in #{files_num} files"</
{{out}}
<pre>
Line 1,771 ⟶ 1,966:
=={{header|Tcl}}==
This is with the '''fileutil::traverse''' package from Tcllib to do the tree walking, a '''glob''' based alternative ignoring links but not hidden files is possible but would add a dozen of lines.
<
namespace path {::tcl::mathfunc ::tcl::mathop}
Line 1,789 ⟶ 1,984:
foreach key [lsort -int [dict keys $hist]] {
puts "[? {$key == -1} 0 {1e$key}]\t[dict get $hist $key]"
}</
{{out}}
<pre>0 1
Line 1,803 ⟶ 1,998:
{{works with|Bourne Shell}}
Use POSIX conformant code unless the environment variable GNU is set to anything not empty.
<
set -eu
Line 1,840 ⟶ 2,035:
printf "\nTotal: %.1f %s in %d files\n",
total / (10 ** l), u[int(l / 3)], NR
}'</
{{out}}
<pre>$ time ~/fsd.sh
Line 1,884 ⟶ 2,079:
{{libheader|Wren-math}}
{{libheader|Wren-fmt}}
<
import "os" for Process
import "./math" for Math
import "./fmt" for Fmt
var sizes = List.filled(12, 0)
Line 1,931 ⟶ 2,126:
Fmt.print("= Number of files : $,5d", numFiles)
Fmt.print(" Total size in bytes : $,d", totalSize)
Fmt.print(" Number of sub-directories : $,5d", numDirs)</
{{out}}
Line 1,956 ⟶ 2,151:
=={{header|zkl}}==
<
// hoover all files in tree, don't return directories
fcn(pipe,dir){ File.globular(dir,"*",True,8,pipe); }
Line 1,974 ⟶ 2,169:
println("%15s : %s".fmt(szchrs[idx,*], "*"*(scale*cnt).round().toInt()));
idx-=1 + comma();
}</
{{out}}
<pre>
|