External sort: Difference between revisions
m
→{{header|Wren}}: Minor tidy
m (→{{header|REXX}}: changed wording in the REXX section header, added/changed whitespace and comments.) |
m (→{{header|Wren}}: Minor tidy) |
||
(3 intermediate revisions by 3 users not shown) | |||
Line 21:
A "half chunk" of integers at a time is read to each of two buffer lists covering different sections of the file range being partitioned. Only those integers needing to be swapped are written back to the file and each list is replaced as it's used up. When the converging sections eventually overlap, a single list is used instead which is updated in parallel with the file to ensure that the partitioning repeat stops in the right place. Partitions less than a "chunk" in length are sorted in memory with a heap sort. (The Foundation framework has a built-in NSMutableArray sort which is faster than a vanilla heap sort — even with the necessary derivation of NSMutableArrays from the lists and lists from the sorted arrays — but I don't know how well this fits the task's "low memory" conceit.)
<
Quicksort algorithm: S.A.R. (Tony) Hoare, 1960.
Optimisations by Robert Sedgewick and others at various times.
Line 345:
set theFile to (path to desktop as text) & "Test.dat"
set sortedFile to externalSort(theFile)</
=={{header|C++}}==
Line 369:
All sorted streams are merged in this way out to an external output file ''merged.txt''.
<
/* ExternalSort.cpp */
Line 626:
/* inputfile integers -- one per line for simplicity */
</syntaxhighlight>
{{out}}
Line 650:
A small test file consisting of random integers has been generated and sorted to demonstrate that the approach works.
<
import (
Line 855:
check(err)
}
}</
{{out}}
Line 867:
</pre>
=={{header|
Untested on a memory mapped file.
<syntaxhighlight lang="j">
NB. Apply an in-place sorting algorithm to a memory mapped file
NB. in-place sort is translation of in-place python quicksort.
Line 907:
i. 0 0 NB. verbs return the final noun
)
</syntaxhighlight>
Demonstration the sorting works:
Line 917:
=={{header|Julia}}==
<
arr = Mmap.mmap(intfile, Vector{Int64}, (div(stat(intfile).size, 8))) # Int64 is 8 bytes
sort!(arr)
</syntaxhighlight>
=={{header|Nim}}==
{{trans|Phix}}
<
Line 983:
for filename in filenames:
removeFile(filename)</
{{out}}
Line 1,006:
=={{header|Perl}}==
Simulate task by reading from 'DATA' handle and using tiny record limit. As written, works for any numeric input, but could define any kind of customized sorting.
<
use warnings;
Line 1,051:
654
789
234</
{{out}}
<pre>123
Line 1,070:
=={{header|Phix}}==
Slight variation on [[Stream_Merge#Phix|Stream_Merge]]
<!--<syntaxhighlight lang="phix">(notonline)-->
<span style="color: #008080;">without</span> <span style="color: #008080;">js</span> <span style="color: #000080;font-style:italic;">-- file i/o</span>
<span style="color: #008080;">include</span> <span style="color: #000000;">builtins</span><span style="color: #0000FF;">/</span><span style="color: #000000;">pqueue</span><span style="color: #0000FF;">.</span><span style="color: #000000;">e</span>
<span style="color: #008080;">include</span> <span style="color: #000000;">builtins</span><span style="color: #0000FF;">/</span><span style="color: #000000;">pfile</span><span style="color: #0000FF;">.</span><span style="color: #000000;">e</span> <span style="color: #000080;font-style:italic;">-- write_lines() - not [yet] documented</span>
<span style="color: #008080;">procedure</span> <span style="color: #000000;">add</span><span style="color: #0000FF;">(</span><span style="color: #004080;">integer</span> <span style="color: #000000;">fn</span><span style="color: #0000FF;">,</span> <span style="color: #000000;">pq</span><span style="color: #0000FF;">)</span>
<span style="color: #004080;">object</span> <span style="color: #000000;">line</span> <span style="color: #0000FF;">=</span> <span style="color: #7060A8;">gets</span><span style="color: #0000FF;">(</span><span style="color: #000000;">fn</span><span style="color: #0000FF;">)</span>
<span style="color: #008080;">if</span> <span style="color: #000000;">line</span><span style="color: #0000FF;">=-</span><span style="color: #000000;">1</span> <span style="color: #008080;">then</span>
<span style="color: #7060A8;">close</span><span style="color: #0000FF;">(</span><span style="color: #000000;">fn</span><span style="color: #0000FF;">)</span>
<span style="color: #008080;">else</span>
<span style="color: #7060A8;">pq_add</span><span style="color: #0000FF;">({</span><span style="color: #000000;">fn</span><span style="color: #0000FF;">,</span><span style="color: #000000;">line</span><span style="color: #0000FF;">},</span> <span style="color: #000000;">pq</span><span style="color: #0000FF;">)</span>
<span style="color: #008080;">end</span> <span style="color: #008080;">if</span>
<span style="color: #008080;">end</span> <span style="color: #008080;">procedure</span>
<span style="color: #008080;">procedure</span> <span style="color: #000000;">sort_files</span><span style="color: #0000FF;">(</span><span style="color: #004080;">sequence</span> <span style="color: #000000;">filenames</span><span style="color: #0000FF;">)</span>
<span style="color: #008080;">for</span> <span style="color: #000000;">i</span><span style="color: #0000FF;">=</span><span style="color: #000000;">1</span> <span style="color: #008080;">to</span> <span style="color: #7060A8;">length</span><span style="color: #0000FF;">(</span><span style="color: #000000;">filenames</span><span style="color: #0000FF;">)</span> <span style="color: #008080;">do</span>
<span style="color: #004080;">sequence</span> <span style="color: #000000;">lines</span> <span style="color: #0000FF;">=</span> <span style="color: #7060A8;">get_text</span><span style="color: #0000FF;">(</span><span style="color: #000000;">filenames</span><span style="color: #0000FF;">[</span><span style="color: #000000;">i</span><span style="color: #0000FF;">],</span><span style="color: #004600;">GT_LF_STRIPPED</span><span style="color: #0000FF;">),</span>
<span style="color: #000000;">sorted</span> <span style="color: #0000FF;">=</span> <span style="color: #7060A8;">sort</span><span style="color: #0000FF;">(</span><span style="color: #000000;">lines</span><span style="color: #0000FF;">)</span>
<span style="color: #7060A8;">printf</span><span style="color: #0000FF;">(</span><span style="color: #000000;">1</span><span style="color: #0000FF;">,</span><span style="color: #008000;">"%s:%v => %v\n"</span><span style="color: #0000FF;">,{</span><span style="color: #000000;">filenames</span><span style="color: #0000FF;">[</span><span style="color: #000000;">i</span><span style="color: #0000FF;">],</span><span style="color: #000000;">lines</span><span style="color: #0000FF;">,</span><span style="color: #000000;">sorted</span><span style="color: #0000FF;">})</span>
<span style="color: #008080;">if</span> <span style="color: #7060A8;">write_lines</span><span style="color: #0000FF;">(</span><span style="color: #000000;">filenames</span><span style="color: #0000FF;">[</span><span style="color: #000000;">i</span><span style="color: #0000FF;">],</span><span style="color: #000000;">sorted</span><span style="color: #0000FF;">)!=</span><span style="color: #000000;">1</span> <span style="color: #008080;">then</span> <span style="color: #0000FF;">?</span><span style="color: #000000;">9</span><span style="color: #0000FF;">/</span><span style="color: #000000;">0</span> <span style="color: #008080;">end</span> <span style="color: #008080;">if</span>
<span style="color: #008080;">end</span> <span style="color: #008080;">for</span>
<span style="color: #008080;">end</span> <span style="color: #008080;">procedure</span>
<span style="color: #008080;">procedure</span> <span style="color: #000000;">merge_files</span><span style="color: #0000FF;">(</span><span style="color: #004080;">integer</span> <span style="color: #000000;">outfn</span><span style="color: #0000FF;">,</span> <span style="color: #004080;">sequence</span> <span style="color: #000000;">filenames</span><span style="color: #0000FF;">)</span>
<span style="color: #004080;">integer</span> <span style="color: #000000;">pq</span> <span style="color: #0000FF;">=</span> <span style="color: #7060A8;">pq_new</span><span style="color: #0000FF;">()</span>
<span style="color: #008080;">for</span> <span style="color: #000000;">i</span><span style="color: #0000FF;">=</span><span style="color: #000000;">1</span> <span style="color: #008080;">to</span> <span style="color: #7060A8;">length</span><span style="color: #0000FF;">(</span><span style="color: #000000;">filenames</span><span style="color: #0000FF;">)</span> <span style="color: #008080;">do</span>
<span style="color: #000000;">add</span><span style="color: #0000FF;">(</span><span style="color: #7060A8;">open</span><span style="color: #0000FF;">(</span><span style="color: #000000;">filenames</span><span style="color: #0000FF;">[</span><span style="color: #000000;">i</span><span style="color: #0000FF;">],</span> <span style="color: #008000;">"r"</span><span style="color: #0000FF;">),</span><span style="color: #000000;">pq</span><span style="color: #0000FF;">)</span>
<span style="color: #008080;">end</span> <span style="color: #008080;">for</span>
<span style="color: #008080;">while</span> <span style="color: #008080;">not</span> <span style="color: #7060A8;">pq_empty</span><span style="color: #0000FF;">(</span><span style="color: #000000;">pq</span><span style="color: #0000FF;">)</span> <span style="color: #008080;">do</span>
<span style="color: #0000FF;">{</span><span style="color: #004080;">integer</span> <span style="color: #000000;">fn</span><span style="color: #0000FF;">,</span> <span style="color: #004080;">string</span> <span style="color: #000000;">line</span><span style="color: #0000FF;">}</span> <span style="color: #0000FF;">=</span> <span style="color: #7060A8;">pq_pop</span><span style="color: #0000FF;">(</span><span style="color: #000000;">pq</span><span style="color: #0000FF;">)</span>
<span style="color: #7060A8;">puts</span><span style="color: #0000FF;">(</span><span style="color: #000000;">outfn</span><span style="color: #0000FF;">,</span><span style="color: #000000;">line</span><span style="color: #0000FF;">)</span>
<span style="color: #000000;">add</span><span style="color: #0000FF;">(</span><span style="color: #000000;">fn</span><span style="color: #0000FF;">,</span> <span style="color: #000000;">pq</span><span style="color: #0000FF;">)</span>
<span style="color: #008080;">end</span> <span style="color: #008080;">while</span>
<span style="color: #7060A8;">pq_destroy</span><span style="color: #0000FF;">(</span><span style="color: #000000;">pq</span><span style="color: #0000FF;">)</span>
<span style="color: #008080;">end</span> <span style="color: #008080;">procedure</span>
<span style="color: #008080;">procedure</span> <span style="color: #000000;">test</span><span style="color: #0000FF;">()</span>
<span style="color: #004080;">integer</span> <span style="color: #000000;">nf</span> <span style="color: #0000FF;">=</span> <span style="color: #7060A8;">rand</span><span style="color: #0000FF;">(</span><span style="color: #000000;">5</span><span style="color: #0000FF;">),</span> <span style="color: #000080;font-style:italic;">-- number of files</span>
<span style="color: #000000;">lp</span> <span style="color: #0000FF;">=</span> <span style="color: #000000;">3</span> <span style="color: #000080;font-style:italic;">-- lines per file</span>
<span style="color: #004080;">sequence</span> <span style="color: #000000;">filenames</span> <span style="color: #0000FF;">=</span> <span style="color: #0000FF;">{},</span>
<span style="color: #000000;">lines</span> <span style="color: #0000FF;">=</span> <span style="color: #7060A8;">shuffle</span><span style="color: #0000FF;">(</span><span style="color: #7060A8;">tagset</span><span style="color: #0000FF;">(</span><span style="color: #000000;">nf</span><span style="color: #0000FF;">*</span><span style="color: #000000;">lp</span><span style="color: #0000FF;">))</span>
<span style="color: #008080;">for</span> <span style="color: #000000;">i</span><span style="color: #0000FF;">=</span><span style="color: #000000;">1</span> <span style="color: #008080;">to</span> <span style="color: #000000;">nf</span> <span style="color: #008080;">do</span>
<span style="color: #004080;">string</span> <span style="color: #000000;">filename</span> <span style="color: #0000FF;">=</span> <span style="color: #7060A8;">sprintf</span><span style="color: #0000FF;">(</span><span style="color: #008000;">"file%d.txt"</span><span style="color: #0000FF;">,</span><span style="color: #000000;">i</span><span style="color: #0000FF;">)</span>
<span style="color: #000000;">filenames</span> <span style="color: #0000FF;">=</span> <span style="color: #7060A8;">append</span><span style="color: #0000FF;">(</span><span style="color: #000000;">filenames</span><span style="color: #0000FF;">,</span><span style="color: #000000;">filename</span><span style="color: #0000FF;">)</span>
<span style="color: #004080;">integer</span> <span style="color: #000000;">fn</span> <span style="color: #0000FF;">=</span> <span style="color: #7060A8;">open</span><span style="color: #0000FF;">(</span><span style="color: #000000;">filename</span><span style="color: #0000FF;">,</span><span style="color: #008000;">"w"</span><span style="color: #0000FF;">)</span>
<span style="color: #008080;">for</span> <span style="color: #000000;">l</span><span style="color: #0000FF;">=</span><span style="color: #000000;">1</span> <span style="color: #008080;">to</span> <span style="color: #000000;">lp</span> <span style="color: #008080;">do</span>
<span style="color: #7060A8;">printf</span><span style="color: #0000FF;">(</span><span style="color: #000000;">fn</span><span style="color: #0000FF;">,</span><span style="color: #008000;">"Line %02d\n"</span><span style="color: #0000FF;">,</span><span style="color: #000000;">lines</span><span style="color: #0000FF;">[</span><span style="color: #000000;">l</span><span style="color: #0000FF;">])</span>
<span style="color: #008080;">end</span> <span style="color: #008080;">for</span>
<span style="color: #000000;">lines</span> <span style="color: #0000FF;">=</span> <span style="color: #000000;">lines</span><span style="color: #0000FF;">[</span><span style="color: #000000;">lp</span><span style="color: #0000FF;">+</span><span style="color: #000000;">1</span><span style="color: #0000FF;">..$]</span>
<span style="color: #7060A8;">close</span><span style="color: #0000FF;">(</span><span style="color: #000000;">fn</span><span style="color: #0000FF;">)</span>
<span style="color: #008080;">end</span> <span style="color: #008080;">for</span>
<span style="color: #7060A8;">printf</span><span style="color: #0000FF;">(</span><span style="color: #000000;">1</span><span style="color: #0000FF;">,</span><span style="color: #008000;">"sorting %d lines split over %d files\n"</span><span style="color: #0000FF;">,{</span><span style="color: #000000;">nf</span><span style="color: #0000FF;">*</span><span style="color: #000000;">lp</span><span style="color: #0000FF;">,</span><span style="color: #000000;">nf</span><span style="color: #0000FF;">})</span>
<span style="color: #000000;">sort_files</span><span style="color: #0000FF;">(</span><span style="color: #000000;">filenames</span><span style="color: #0000FF;">)</span>
<span style="color: #004080;">integer</span> <span style="color: #000000;">outfn</span> <span style="color: #0000FF;">=</span> <span style="color: #000000;">1</span> <span style="color: #000080;font-style:italic;">-- or open("results.txt","w")</span>
<span style="color: #000000;">merge_files</span><span style="color: #0000FF;">(</span><span style="color: #000000;">outfn</span><span style="color: #0000FF;">,</span><span style="color: #000000;">filenames</span><span style="color: #0000FF;">)</span>
<span style="color: #000080;font-style:italic;">-- close(outfn)</span>
<span style="color: #008080;">for</span> <span style="color: #000000;">i</span><span style="color: #0000FF;">=</span><span style="color: #000000;">1</span> <span style="color: #008080;">to</span> <span style="color: #000000;">nf</span> <span style="color: #008080;">do</span>
<span style="color: #0000FF;">{}</span> <span style="color: #0000FF;">=</span> <span style="color: #7060A8;">delete_file</span><span style="color: #0000FF;">(</span><span style="color: #000000;">filenames</span><span style="color: #0000FF;">[</span><span style="color: #000000;">i</span><span style="color: #0000FF;">])</span>
<span style="color: #008080;">end</span> <span style="color: #008080;">for</span>
<span style="color: #008080;">end</span> <span style="color: #008080;">procedure</span>
<span style="color: #000000;">test</span><span style="color: #0000FF;">()</span>
<!--</syntaxhighlight>-->
{{out}}
<pre>
Line 1,148 ⟶ 1,151:
=={{header|Python}}==
A technique demonstrated with a short string character data.
<
#! /usr/bin/python3
Line 1,232 ⟶ 1,235:
example = main
example()
</syntaxhighlight>
=={{header|Raku}}==
(formerly Perl 6)
Borrowing from [http://rosettacode.org/wiki/Stream_Merge Stream_Merge] here. Temporary files are automatically deleted when program is done, so no explicit clean-up required.
<syntaxhighlight lang="raku"
sub merge_streams ( @streams ) {
Line 1,264 ⟶ 1,267:
@files.push: store(@chunk) if @chunk;
say join ' ', merge_streams @files».&open;</
{{out}}
<pre>-11 -9 -2 0 2 3 4 15 32 34 42 43 45 45 55 64 66 76 78 87 92 123</pre>
Line 1,274 ⟶ 1,277:
This particular example uses the DOS '''SORT''' and '''ERASE''' commands.
<
parse arg FID n lim seed . /*obtain optional arguments from the CL*/
if FID=='' | FID=="," then FID= 'SORT_EXT.OUT' /*name of the output (sorted) file. */
Line 1,325 ⟶ 1,328:
/*──────────────────────────────────────────────────────────────────────────────────────*/
srt: procedure expose sWork; parse arg #
do j=1 for #; fn= sWORK || j; 'SORT' fn "/O" fn; end /*j*/; return</
=={{header|Wren}}==
Line 1,333 ⟶ 1,336:
{{libheader|Wren-str}}
A bit simpler than the Go version as we use fixed length integers which (together with a following space) can be sorted as strings.
<
import "random" for Random
import "./dynamic" for Struct
import "./sort" for Sort
import "./str" for Str
var MinHeapNode = Struct.create("MinHeapNode", ["element", "index"])
Line 1,452 ⟶ 1,455:
var fileName = "es%(i)"
File.delete(fileName)
}</
{{out}}
|