External sort: Difference between revisions

m
m (→‎{{header|REXX}}: changed wording in the REXX section header, added/changed whitespace and comments.)
m (→‎{{header|Wren}}: Minor tidy)
 
(3 intermediate revisions by 3 users not shown)
Line 21:
A "half chunk" of integers at a time is read to each of two buffer lists covering different sections of the file range being partitioned. Only those integers needing to be swapped are written back to the file and each list is replaced as it's used up. When the converging sections eventually overlap, a single list is used instead which is updated in parallel with the file to ensure that the partitioning repeat stops in the right place. Partitions less than a "chunk" in length are sorted in memory with a heap sort. (The Foundation framework has a built-in NSMutableArray sort which is faster than a vanilla heap sort — even with the necessary derivation of NSMutableArrays from the lists and lists from the sorted arrays — but I don't know how well this fits the task's "low memory" conceit.)
 
<langsyntaxhighlight lang="applescript">(*
Quicksort algorithm: S.A.R. (Tony) Hoare, 1960.
Optimisations by Robert Sedgewick and others at various times.
Line 345:
 
set theFile to (path to desktop as text) & "Test.dat"
set sortedFile to externalSort(theFile)</langsyntaxhighlight>
 
=={{header|C++}}==
Line 369:
All sorted streams are merged in this way out to an external output file ''merged.txt''.
<langsyntaxhighlight lang="cpp">
/* ExternalSort.cpp */
 
Line 626:
 
/* inputfile integers -- one per line for simplicity */
</syntaxhighlight>
</lang>
 
{{out}}
Line 650:
 
A small test file consisting of random integers has been generated and sorted to demonstrate that the approach works.
<langsyntaxhighlight lang="go">package main
 
import (
Line 855:
check(err)
}
}</langsyntaxhighlight>
 
{{out}}
Line 867:
</pre>
 
=={{header|jJ}}==
Untested on a memory mapped file.
<syntaxhighlight lang="j">
<lang J>
NB. Apply an in-place sorting algorithm to a memory mapped file
NB. in-place sort is translation of in-place python quicksort.
Line 907:
i. 0 0 NB. verbs return the final noun
)
</syntaxhighlight>
</lang>
 
Demonstration the sorting works:
Line 917:
 
=={{header|Julia}}==
<langsyntaxhighlight lang="julia">intfile = open("/tmp/mmap.bin", "r+")
 
arr = Mmap.mmap(intfile, Vector{Int64}, (div(stat(intfile).size, 8))) # Int64 is 8 bytes
 
sort!(arr)
</syntaxhighlight>
</lang>
 
=={{header|Nim}}==
{{trans|Phix}}
 
<langsyntaxhighlight Nimlang="nim">import algorithm, heapqueue, os, random, sequtils, strformat, strutils
 
 
Line 983:
 
for filename in filenames:
removeFile(filename)</langsyntaxhighlight>
 
{{out}}
Line 1,006:
=={{header|Perl}}==
Simulate task by reading from 'DATA' handle and using tiny record limit. As written, works for any numeric input, but could define any kind of customized sorting.
<langsyntaxhighlight lang="perl">use strict;
use warnings;
 
Line 1,051:
654
789
234</langsyntaxhighlight>
{{out}}
<pre>123
Line 1,070:
=={{header|Phix}}==
Slight variation on [[Stream_Merge#Phix|Stream_Merge]]
<!--<syntaxhighlight lang="phix">(notonline)-->
<lang Phix>include builtins/pqueue.e
<span style="color: #008080;">without</span> <span style="color: #008080;">js</span> <span style="color: #000080;font-style:italic;">-- file i/o</span>
include builtins/pfile.e -- write_lines() - not [yet] documented
<span style="color: #008080;">include</span> <span style="color: #000000;">builtins</span><span style="color: #0000FF;">/</span><span style="color: #000000;">pqueue</span><span style="color: #0000FF;">.</span><span style="color: #000000;">e</span>
<span style="color: #008080;">include</span> <span style="color: #000000;">builtins</span><span style="color: #0000FF;">/</span><span style="color: #000000;">pfile</span><span style="color: #0000FF;">.</span><span style="color: #000000;">e</span> <span style="color: #000080;font-style:italic;">-- write_lines() - not [yet] documented</span>
procedure add(integer fn, pq)
object line = gets(fn)
<span style="color: #008080;">procedure</span> <span style="color: #000000;">add</span><span style="color: #0000FF;">(</span><span style="color: #004080;">integer</span> <span style="color: #000000;">fn</span><span style="color: #0000FF;">,</span> <span style="color: #000000;">pq</span><span style="color: #0000FF;">)</span>
if line=-1 then
<span style="color: #004080;">object</span> <span style="color: #000000;">line</span> <span style="color: #0000FF;">=</span> <span style="color: #7060A8;">gets</span><span style="color: #0000FF;">(</span><span style="color: #000000;">fn</span><span style="color: #0000FF;">)</span>
close(fn)
<span style="color: #008080;">if</span> <span style="color: #000000;">line</span><span style="color: #0000FF;">=-</span><span style="color: #000000;">1</span> <span style="color: #008080;">then</span>
else
<span style="color: #7060A8;">close</span><span style="color: #0000FF;">(</span><span style="color: #000000;">fn</span><span style="color: #0000FF;">)</span>
pq_add({fn,line}, pq)
<span style="color: #008080;">else</span>
end if
<span style="color: #7060A8;">pq_add</span><span style="color: #0000FF;">({</span><span style="color: #000000;">fn</span><span style="color: #0000FF;">,</span><span style="color: #000000;">line</span><span style="color: #0000FF;">},</span> <span style="color: #000000;">pq</span><span style="color: #0000FF;">)</span>
end procedure
<span style="color: #008080;">end</span> <span style="color: #008080;">if</span>
<span style="color: #008080;">end</span> <span style="color: #008080;">procedure</span>
procedure sort_files(sequence filenames)
for i=1 to length(filenames) do
<span style="color: #008080;">procedure</span> <span style="color: #000000;">sort_files</span><span style="color: #0000FF;">(</span><span style="color: #004080;">sequence</span> <span style="color: #000000;">filenames</span><span style="color: #0000FF;">)</span>
sequence lines = get_text(filenames[i],GT_LF_STRIPPED),
<span style="color: #008080;">for</span> <span style="color: #000000;">i</span><span style="color: #0000FF;">=</span><span style="color: #000000;">1</span> <span style="color: #008080;">to</span> <span style="color: #7060A8;">length</span><span style="color: #0000FF;">(</span><span style="color: #000000;">filenames</span><span style="color: #0000FF;">)</span> <span style="color: #008080;">do</span>
sorted = sort(lines)
<span style="color: #004080;">sequence</span> <span style="color: #000000;">lines</span> <span style="color: #0000FF;">=</span> <span style="color: #7060A8;">get_text</span><span style="color: #0000FF;">(</span><span style="color: #000000;">filenames</span><span style="color: #0000FF;">[</span><span style="color: #000000;">i</span><span style="color: #0000FF;">],</span><span style="color: #004600;">GT_LF_STRIPPED</span><span style="color: #0000FF;">),</span>
printf(1,"%s:%v => %v\n",{filenames[i],lines,sorted})
<span style="color: #000000;">sorted</span> <span style="color: #0000FF;">=</span> <span style="color: #7060A8;">sort</span><span style="color: #0000FF;">(</span><span style="color: #000000;">lines</span><span style="color: #0000FF;">)</span>
if write_lines(filenames[i],sorted)!=1 then ?9/0 end if
<span style="color: #7060A8;">printf</span><span style="color: #0000FF;">(</span><span style="color: #000000;">1</span><span style="color: #0000FF;">,</span><span style="color: #008000;">"%s:%v =&gt; %v\n"</span><span style="color: #0000FF;">,{</span><span style="color: #000000;">filenames</span><span style="color: #0000FF;">[</span><span style="color: #000000;">i</span><span style="color: #0000FF;">],</span><span style="color: #000000;">lines</span><span style="color: #0000FF;">,</span><span style="color: #000000;">sorted</span><span style="color: #0000FF;">})</span>
end for
<span style="color: #008080;">if</span> <span style="color: #7060A8;">write_lines</span><span style="color: #0000FF;">(</span><span style="color: #000000;">filenames</span><span style="color: #0000FF;">[</span><span style="color: #000000;">i</span><span style="color: #0000FF;">],</span><span style="color: #000000;">sorted</span><span style="color: #0000FF;">)!=</span><span style="color: #000000;">1</span> <span style="color: #008080;">then</span> <span style="color: #0000FF;">?</span><span style="color: #000000;">9</span><span style="color: #0000FF;">/</span><span style="color: #000000;">0</span> <span style="color: #008080;">end</span> <span style="color: #008080;">if</span>
end procedure
<span style="color: #008080;">end</span> <span style="color: #008080;">for</span>
 
<span style="color: #008080;">end</span> <span style="color: #008080;">procedure</span>
procedure merge_files(integer outfn, sequence filenames)
integer pq = pq_new()
<span style="color: #008080;">procedure</span> <span style="color: #000000;">merge_files</span><span style="color: #0000FF;">(</span><span style="color: #004080;">integer</span> <span style="color: #000000;">outfn</span><span style="color: #0000FF;">,</span> <span style="color: #004080;">sequence</span> <span style="color: #000000;">filenames</span><span style="color: #0000FF;">)</span>
for i=1 to length(filenames) do
<span style="color: #004080;">integer</span> <span style="color: #000000;">pq</span> <span style="color: #0000FF;">=</span> <span style="color: #7060A8;">pq_new</span><span style="color: #0000FF;">()</span>
add(open(filenames[i], "r"),pq)
<span style="color: #008080;">for</span> <span style="color: #000000;">i</span><span style="color: #0000FF;">=</span><span style="color: #000000;">1</span> <span style="color: #008080;">to</span> <span style="color: #7060A8;">length</span><span style="color: #0000FF;">(</span><span style="color: #000000;">filenames</span><span style="color: #0000FF;">)</span> <span style="color: #008080;">do</span>
end for
<span style="color: #000000;">add</span><span style="color: #0000FF;">(</span><span style="color: #7060A8;">open</span><span style="color: #0000FF;">(</span><span style="color: #000000;">filenames</span><span style="color: #0000FF;">[</span><span style="color: #000000;">i</span><span style="color: #0000FF;">],</span> <span style="color: #008000;">"r"</span><span style="color: #0000FF;">),</span><span style="color: #000000;">pq</span><span style="color: #0000FF;">)</span>
while not pq_empty(pq) do
<span style="color: #008080;">end</span> <span style="color: #008080;">for</span>
{integer fn, string line} = pq_pop(pq)
<span style="color: #008080;">while</span> <span style="color: #008080;">not</span> <span style="color: #7060A8;">pq_empty</span><span style="color: #0000FF;">(</span><span style="color: #000000;">pq</span><span style="color: #0000FF;">)</span> <span style="color: #008080;">do</span>
puts(outfn,line)
<span style="color: #0000FF;">{</span><span style="color: #004080;">integer</span> <span style="color: #000000;">fn</span><span style="color: #0000FF;">,</span> <span style="color: #004080;">string</span> <span style="color: #000000;">line</span><span style="color: #0000FF;">}</span> <span style="color: #0000FF;">=</span> <span style="color: #7060A8;">pq_pop</span><span style="color: #0000FF;">(</span><span style="color: #000000;">pq</span><span style="color: #0000FF;">)</span>
add(fn, pq)
<span style="color: #7060A8;">puts</span><span style="color: #0000FF;">(</span><span style="color: #000000;">outfn</span><span style="color: #0000FF;">,</span><span style="color: #000000;">line</span><span style="color: #0000FF;">)</span>
end while
<span style="color: #000000;">add</span><span style="color: #0000FF;">(</span><span style="color: #000000;">fn</span><span style="color: #0000FF;">,</span> <span style="color: #000000;">pq</span><span style="color: #0000FF;">)</span>
pq_destroy(pq)
<span style="color: #008080;">end</span> <span style="color: #008080;">while</span>
end procedure
<span style="color: #7060A8;">pq_destroy</span><span style="color: #0000FF;">(</span><span style="color: #000000;">pq</span><span style="color: #0000FF;">)</span>
 
<span style="color: #008080;">end</span> <span style="color: #008080;">procedure</span>
procedure test()
integer nf = rand(5), -- number of files
<span style="color: #008080;">procedure</span> <span style="color: #000000;">test</span><span style="color: #0000FF;">()</span>
lp = 3 -- lines per file
<span style="color: #004080;">integer</span> <span style="color: #000000;">nf</span> <span style="color: #0000FF;">=</span> <span style="color: #7060A8;">rand</span><span style="color: #0000FF;">(</span><span style="color: #000000;">5</span><span style="color: #0000FF;">),</span> <span style="color: #000080;font-style:italic;">-- number of files</span>
sequence filenames = {},
<span style="color: #000000;">lp</span> <span style="color: #0000FF;">=</span> <span style="color: #000000;">3</span> <span style="color: #000080;font-style:italic;">-- lines per file</span>
lines = shuffle(tagset(nf*lp))
<span style="color: #004080;">sequence</span> <span style="color: #000000;">filenames</span> <span style="color: #0000FF;">=</span> <span style="color: #0000FF;">{},</span>
for i=1 to nf do
<span style="color: #000000;">lines</span> <span style="color: #0000FF;">=</span> <span style="color: #7060A8;">shuffle</span><span style="color: #0000FF;">(</span><span style="color: #7060A8;">tagset</span><span style="color: #0000FF;">(</span><span style="color: #000000;">nf</span><span style="color: #0000FF;">*</span><span style="color: #000000;">lp</span><span style="color: #0000FF;">))</span>
string filename = sprintf("file%d.txt",i)
<span style="color: #008080;">for</span> <span style="color: #000000;">i</span><span style="color: #0000FF;">=</span><span style="color: #000000;">1</span> <span style="color: #008080;">to</span> <span style="color: #000000;">nf</span> <span style="color: #008080;">do</span>
filenames = append(filenames,filename)
<span style="color: #004080;">string</span> <span style="color: #000000;">filename</span> <span style="color: #0000FF;">=</span> <span style="color: #7060A8;">sprintf</span><span style="color: #0000FF;">(</span><span style="color: #008000;">"file%d.txt"</span><span style="color: #0000FF;">,</span><span style="color: #000000;">i</span><span style="color: #0000FF;">)</span>
integer fn = open(filename,"w")
<span style="color: #000000;">filenames</span> <span style="color: #0000FF;">=</span> <span style="color: #7060A8;">append</span><span style="color: #0000FF;">(</span><span style="color: #000000;">filenames</span><span style="color: #0000FF;">,</span><span style="color: #000000;">filename</span><span style="color: #0000FF;">)</span>
for l=1 to lp do
<span style="color: #004080;">integer</span> <span style="color: #000000;">fn</span> <span style="color: #0000FF;">=</span> <span style="color: #7060A8;">open</span><span style="color: #0000FF;">(</span><span style="color: #000000;">filename</span><span style="color: #0000FF;">,</span><span style="color: #008000;">"w"</span><span style="color: #0000FF;">)</span>
printf(fn,"Line %02d\n",lines[l])
<span style="color: #008080;">for</span> <span style="color: #000000;">l</span><span style="color: #0000FF;">=</span><span style="color: #000000;">1</span> <span style="color: #008080;">to</span> <span style="color: #000000;">lp</span> <span style="color: #008080;">do</span>
end for
<span style="color: #7060A8;">printf</span><span style="color: #0000FF;">(</span><span style="color: #000000;">fn</span><span style="color: #0000FF;">,</span><span style="color: #008000;">"Line %02d\n"</span><span style="color: #0000FF;">,</span><span style="color: #000000;">lines</span><span style="color: #0000FF;">[</span><span style="color: #000000;">l</span><span style="color: #0000FF;">])</span>
lines = lines[lp+1..$]
<span style="color: #008080;">end</span> <span style="color: #008080;">for</span>
close(fn)
<span style="color: #000000;">lines</span> <span style="color: #0000FF;">=</span> <span style="color: #000000;">lines</span><span style="color: #0000FF;">[</span><span style="color: #000000;">lp</span><span style="color: #0000FF;">+</span><span style="color: #000000;">1</span><span style="color: #0000FF;">..$]</span>
end for
<span style="color: #7060A8;">close</span><span style="color: #0000FF;">(</span><span style="color: #000000;">fn</span><span style="color: #0000FF;">)</span>
printf(1,"sorting %d lines split over %d files\n",{nf*lp,nf})
<span style="color: #008080;">end</span> <span style="color: #008080;">for</span>
sort_files(filenames)
<span style="color: #7060A8;">printf</span><span style="color: #0000FF;">(</span><span style="color: #000000;">1</span><span style="color: #0000FF;">,</span><span style="color: #008000;">"sorting %d lines split over %d files\n"</span><span style="color: #0000FF;">,{</span><span style="color: #000000;">nf</span><span style="color: #0000FF;">*</span><span style="color: #000000;">lp</span><span style="color: #0000FF;">,</span><span style="color: #000000;">nf</span><span style="color: #0000FF;">})</span>
integer outfn = 1 -- or open("results.txt","w")
<span style="color: #000000;">sort_files</span><span style="color: #0000FF;">(</span><span style="color: #000000;">filenames</span><span style="color: #0000FF;">)</span>
merge_files(outfn,filenames)
<span style="color: #004080;">integer</span> <span style="color: #000000;">outfn</span> <span style="color: #0000FF;">=</span> <span style="color: #000000;">1</span> <span style="color: #000080;font-style:italic;">-- or open("results.txt","w")</span>
-- close(outfn)
<span style="color: #000000;">merge_files</span><span style="color: #0000FF;">(</span><span style="color: #000000;">outfn</span><span style="color: #0000FF;">,</span><span style="color: #000000;">filenames</span><span style="color: #0000FF;">)</span>
for i=1 to nf do
<span style="color: #000080;font-style:italic;">-- close(outfn)</span>
{} = delete_file(filenames[i])
<span style="color: #008080;">for</span> <span style="color: #000000;">i</span><span style="color: #0000FF;">=</span><span style="color: #000000;">1</span> <span style="color: #008080;">to</span> <span style="color: #000000;">nf</span> <span style="color: #008080;">do</span>
end for
<span style="color: #0000FF;">{}</span> <span style="color: #0000FF;">=</span> <span style="color: #7060A8;">delete_file</span><span style="color: #0000FF;">(</span><span style="color: #000000;">filenames</span><span style="color: #0000FF;">[</span><span style="color: #000000;">i</span><span style="color: #0000FF;">])</span>
end procedure
<span style="color: #008080;">end</span> <span style="color: #008080;">for</span>
test()</lang>
<span style="color: #008080;">end</span> <span style="color: #008080;">procedure</span>
<span style="color: #000000;">test</span><span style="color: #0000FF;">()</span>
<!--</syntaxhighlight>-->
{{out}}
<pre>
Line 1,148 ⟶ 1,151:
=={{header|Python}}==
A technique demonstrated with a short string character data.
<langsyntaxhighlight lang="python">
#! /usr/bin/python3
 
Line 1,232 ⟶ 1,235:
example = main
example()
</syntaxhighlight>
</lang>
 
=={{header|Raku}}==
(formerly Perl 6)
Borrowing from [http://rosettacode.org/wiki/Stream_Merge Stream_Merge] here. Temporary files are automatically deleted when program is done, so no explicit clean-up required.
<syntaxhighlight lang="raku" perl6line>use File::Temp;
 
sub merge_streams ( @streams ) {
Line 1,264 ⟶ 1,267:
@files.push: store(@chunk) if @chunk;
 
say join ' ', merge_streams @files».&open;</langsyntaxhighlight>
{{out}}
<pre>-11 -9 -2 0 2 3 4 15 32 34 42 43 45 45 55 64 66 76 78 87 92 123</pre>
Line 1,274 ⟶ 1,277:
 
This particular example uses the DOS &nbsp; '''SORT''' &nbsp; and &nbsp; '''ERASE''' &nbsp; commands.
<langsyntaxhighlight lang="rexx">/*REXX pgm reads a file, splits into smaller files, sorts 'em, combines into sorted file*/
parse arg FID n lim seed . /*obtain optional arguments from the CL*/
if FID=='' | FID=="," then FID= 'SORT_EXT.OUT' /*name of the output (sorted) file. */
Line 1,325 ⟶ 1,328:
/*──────────────────────────────────────────────────────────────────────────────────────*/
srt: procedure expose sWork; parse arg #
do j=1 for #; fn= sWORK || j; 'SORT' fn "/O" fn; end /*j*/; return</langsyntaxhighlight><br><br>
 
=={{header|Wren}}==
Line 1,333 ⟶ 1,336:
{{libheader|Wren-str}}
A bit simpler than the Go version as we use fixed length integers which (together with a following space) can be sorted as strings.
<langsyntaxhighlight ecmascriptlang="wren">import "io" for File
import "random" for Random
import "./dynamic" for Struct
import "./sort" for Sort
import "./str" for Str
 
var MinHeapNode = Struct.create("MinHeapNode", ["element", "index"])
Line 1,452 ⟶ 1,455:
var fileName = "es%(i)"
File.delete(fileName)
}</langsyntaxhighlight>
 
{{out}}
9,476

edits