External sort: Difference between revisions

Content added Content deleted
(fix lang name)
m (syntax highlighting fixup automation)
Line 21: Line 21:
A "half chunk" of integers at a time is read to each of two buffer lists covering different sections of the file range being partitioned. Only those integers needing to be swapped are written back to the file and each list is replaced as it's used up. When the converging sections eventually overlap, a single list is used instead which is updated in parallel with the file to ensure that the partitioning repeat stops in the right place. Partitions less than a "chunk" in length are sorted in memory with a heap sort. (The Foundation framework has a built-in NSMutableArray sort which is faster than a vanilla heap sort — even with the necessary derivation of NSMutableArrays from the lists and lists from the sorted arrays — but I don't know how well this fits the task's "low memory" conceit.)
A "half chunk" of integers at a time is read to each of two buffer lists covering different sections of the file range being partitioned. Only those integers needing to be swapped are written back to the file and each list is replaced as it's used up. When the converging sections eventually overlap, a single list is used instead which is updated in parallel with the file to ensure that the partitioning repeat stops in the right place. Partitions less than a "chunk" in length are sorted in memory with a heap sort. (The Foundation framework has a built-in NSMutableArray sort which is faster than a vanilla heap sort — even with the necessary derivation of NSMutableArrays from the lists and lists from the sorted arrays — but I don't know how well this fits the task's "low memory" conceit.)


<lang applescript>(*
<syntaxhighlight lang="applescript">(*
Quicksort algorithm: S.A.R. (Tony) Hoare, 1960.
Quicksort algorithm: S.A.R. (Tony) Hoare, 1960.
Optimisations by Robert Sedgewick and others at various times.
Optimisations by Robert Sedgewick and others at various times.
Line 345: Line 345:


set theFile to (path to desktop as text) & "Test.dat"
set theFile to (path to desktop as text) & "Test.dat"
set sortedFile to externalSort(theFile)</lang>
set sortedFile to externalSort(theFile)</syntaxhighlight>


=={{header|C++}}==
=={{header|C++}}==
Line 369: Line 369:
All sorted streams are merged in this way out to an external output file ''merged.txt''.
All sorted streams are merged in this way out to an external output file ''merged.txt''.
<lang cpp>
<syntaxhighlight lang="cpp">
/* ExternalSort.cpp */
/* ExternalSort.cpp */


Line 626: Line 626:


/* inputfile integers -- one per line for simplicity */
/* inputfile integers -- one per line for simplicity */
</syntaxhighlight>
</lang>


{{out}}
{{out}}
Line 650: Line 650:


A small test file consisting of random integers has been generated and sorted to demonstrate that the approach works.
A small test file consisting of random integers has been generated and sorted to demonstrate that the approach works.
<lang go>package main
<syntaxhighlight lang="go">package main


import (
import (
Line 855: Line 855:
check(err)
check(err)
}
}
}</lang>
}</syntaxhighlight>


{{out}}
{{out}}
Line 869: Line 869:
=={{header|J}}==
=={{header|J}}==
Untested on a memory mapped file.
Untested on a memory mapped file.
<syntaxhighlight lang="j">
<lang J>
NB. Apply an in-place sorting algorithm to a memory mapped file
NB. Apply an in-place sorting algorithm to a memory mapped file
NB. in-place sort is translation of in-place python quicksort.
NB. in-place sort is translation of in-place python quicksort.
Line 907: Line 907:
i. 0 0 NB. verbs return the final noun
i. 0 0 NB. verbs return the final noun
)
)
</syntaxhighlight>
</lang>


Demonstration the sorting works:
Demonstration the sorting works:
Line 917: Line 917:


=={{header|Julia}}==
=={{header|Julia}}==
<lang julia>intfile = open("/tmp/mmap.bin", "r+")
<syntaxhighlight lang="julia">intfile = open("/tmp/mmap.bin", "r+")


arr = Mmap.mmap(intfile, Vector{Int64}, (div(stat(intfile).size, 8))) # Int64 is 8 bytes
arr = Mmap.mmap(intfile, Vector{Int64}, (div(stat(intfile).size, 8))) # Int64 is 8 bytes


sort!(arr)
sort!(arr)
</syntaxhighlight>
</lang>


=={{header|Nim}}==
=={{header|Nim}}==
{{trans|Phix}}
{{trans|Phix}}


<lang Nim>import algorithm, heapqueue, os, random, sequtils, strformat, strutils
<syntaxhighlight lang="nim">import algorithm, heapqueue, os, random, sequtils, strformat, strutils




Line 983: Line 983:


for filename in filenames:
for filename in filenames:
removeFile(filename)</lang>
removeFile(filename)</syntaxhighlight>


{{out}}
{{out}}
Line 1,006: Line 1,006:
=={{header|Perl}}==
=={{header|Perl}}==
Simulate task by reading from 'DATA' handle and using tiny record limit. As written, works for any numeric input, but could define any kind of customized sorting.
Simulate task by reading from 'DATA' handle and using tiny record limit. As written, works for any numeric input, but could define any kind of customized sorting.
<lang perl>use strict;
<syntaxhighlight lang="perl">use strict;
use warnings;
use warnings;


Line 1,051: Line 1,051:
654
654
789
789
234</lang>
234</syntaxhighlight>
{{out}}
{{out}}
<pre>123
<pre>123
Line 1,070: Line 1,070:
=={{header|Phix}}==
=={{header|Phix}}==
Slight variation on [[Stream_Merge#Phix|Stream_Merge]]
Slight variation on [[Stream_Merge#Phix|Stream_Merge]]
<!--<lang Phix>(notonline)-->
<!--<syntaxhighlight lang="phix">(notonline)-->
<span style="color: #008080;">without</span> <span style="color: #008080;">js</span> <span style="color: #000080;font-style:italic;">-- file i/o</span>
<span style="color: #008080;">without</span> <span style="color: #008080;">js</span> <span style="color: #000080;font-style:italic;">-- file i/o</span>
<span style="color: #008080;">include</span> <span style="color: #000000;">builtins</span><span style="color: #0000FF;">/</span><span style="color: #000000;">pqueue</span><span style="color: #0000FF;">.</span><span style="color: #000000;">e</span>
<span style="color: #008080;">include</span> <span style="color: #000000;">builtins</span><span style="color: #0000FF;">/</span><span style="color: #000000;">pqueue</span><span style="color: #0000FF;">.</span><span style="color: #000000;">e</span>
Line 1,131: Line 1,131:
<span style="color: #008080;">end</span> <span style="color: #008080;">procedure</span>
<span style="color: #008080;">end</span> <span style="color: #008080;">procedure</span>
<span style="color: #000000;">test</span><span style="color: #0000FF;">()</span>
<span style="color: #000000;">test</span><span style="color: #0000FF;">()</span>
<!--</lang>-->
<!--</syntaxhighlight>-->
{{out}}
{{out}}
<pre>
<pre>
Line 1,151: Line 1,151:
=={{header|Python}}==
=={{header|Python}}==
A technique demonstrated with a short string character data.
A technique demonstrated with a short string character data.
<lang python>
<syntaxhighlight lang="python">
#! /usr/bin/python3
#! /usr/bin/python3


Line 1,235: Line 1,235:
example = main
example = main
example()
example()
</syntaxhighlight>
</lang>


=={{header|Raku}}==
=={{header|Raku}}==
(formerly Perl 6)
(formerly Perl 6)
Borrowing from [http://rosettacode.org/wiki/Stream_Merge Stream_Merge] here. Temporary files are automatically deleted when program is done, so no explicit clean-up required.
Borrowing from [http://rosettacode.org/wiki/Stream_Merge Stream_Merge] here. Temporary files are automatically deleted when program is done, so no explicit clean-up required.
<lang perl6>use File::Temp;
<syntaxhighlight lang="raku" line>use File::Temp;


sub merge_streams ( @streams ) {
sub merge_streams ( @streams ) {
Line 1,267: Line 1,267:
@files.push: store(@chunk) if @chunk;
@files.push: store(@chunk) if @chunk;


say join ' ', merge_streams @files».&open;</lang>
say join ' ', merge_streams @files».&open;</syntaxhighlight>
{{out}}
{{out}}
<pre>-11 -9 -2 0 2 3 4 15 32 34 42 43 45 45 55 64 66 76 78 87 92 123</pre>
<pre>-11 -9 -2 0 2 3 4 15 32 34 42 43 45 45 55 64 66 76 78 87 92 123</pre>
Line 1,277: Line 1,277:


This particular example uses the DOS &nbsp; '''SORT''' &nbsp; and &nbsp; '''ERASE''' &nbsp; commands.
This particular example uses the DOS &nbsp; '''SORT''' &nbsp; and &nbsp; '''ERASE''' &nbsp; commands.
<lang rexx>/*REXX pgm reads a file, splits into smaller files, sorts 'em, combines into sorted file*/
<syntaxhighlight lang="rexx">/*REXX pgm reads a file, splits into smaller files, sorts 'em, combines into sorted file*/
parse arg FID n lim seed . /*obtain optional arguments from the CL*/
parse arg FID n lim seed . /*obtain optional arguments from the CL*/
if FID=='' | FID=="," then FID= 'SORT_EXT.OUT' /*name of the output (sorted) file. */
if FID=='' | FID=="," then FID= 'SORT_EXT.OUT' /*name of the output (sorted) file. */
Line 1,328: Line 1,328:
/*──────────────────────────────────────────────────────────────────────────────────────*/
/*──────────────────────────────────────────────────────────────────────────────────────*/
srt: procedure expose sWork; parse arg #
srt: procedure expose sWork; parse arg #
do j=1 for #; fn= sWORK || j; 'SORT' fn "/O" fn; end /*j*/; return</lang><br><br>
do j=1 for #; fn= sWORK || j; 'SORT' fn "/O" fn; end /*j*/; return</syntaxhighlight><br><br>


=={{header|Wren}}==
=={{header|Wren}}==
Line 1,336: Line 1,336:
{{libheader|Wren-str}}
{{libheader|Wren-str}}
A bit simpler than the Go version as we use fixed length integers which (together with a following space) can be sorted as strings.
A bit simpler than the Go version as we use fixed length integers which (together with a following space) can be sorted as strings.
<lang ecmascript>import "io" for File
<syntaxhighlight lang="ecmascript">import "io" for File
import "random" for Random
import "random" for Random
import "/dynamic" for Struct
import "/dynamic" for Struct
Line 1,455: Line 1,455:
var fileName = "es%(i)"
var fileName = "es%(i)"
File.delete(fileName)
File.delete(fileName)
}</lang>
}</syntaxhighlight>


{{out}}
{{out}}