External sort: Difference between revisions

Line 21:

A "half chunk" of integers at a time is read to each of two buffer lists covering different sections of the file range being partitioned. Only those integers needing to be swapped are written back to the file and each list is replaced as it's used up. When the converging sections eventually overlap, a single list is used instead which is updated in parallel with the file to ensure that the partitioning repeat stops in the right place. Partitions less than a "chunk" in length are sorted in memory with a heap sort. (The Foundation framework has a built-in NSMutableArray sort which is faster than a vanilla heap sort — even with the necessary derivation of NSMutableArrays from the lists and lists from the sorted arrays — but I don't know how well this fits the task's "low memory" conceit.)

<~~lang~~ applescript>(*

<syntaxhighlight lang="applescript">(*

Quicksort algorithm: S.A.R. (Tony) Hoare, 1960.

Optimisations by Robert Sedgewick and others at various times.

Line 345:

set theFile to (path to desktop as text) & "Test.dat"

set sortedFile to externalSort(theFile)</~~lang~~>

set sortedFile to externalSort(theFile)</syntaxhighlight>

=={{header|C++}}==

Line 369:

All sorted streams are merged in this way out to an external output file ''merged.txt''.

<~~lang~~ cpp>

/* ExternalSort.cpp */

Line 626:

/* inputfile integers -- one per line for simplicity */

</syntaxhighlight>

</lang>

Line 650:

A small test file consisting of random integers has been generated and sorted to demonstrate that the approach works.

<~~lang~~ go>package main

<syntaxhighlight lang="go">package main

import (

Line 855:

check(err)

}

}</~~lang~~>

}</syntaxhighlight>

Line 869:

=={{header|J}}==

Untested on a memory mapped file.

NB. Apply an in-place sorting algorithm to a memory mapped file

NB. in-place sort is translation of in-place python quicksort.

Line 907:

i. 0 0 NB. verbs return the final noun

)

</syntaxhighlight>

</lang>

Demonstration the sorting works:

Line 917:

=={{header|Julia}}==

<~~lang~~ julia>intfile = open("/tmp/mmap.bin", "r+")

<syntaxhighlight lang="julia">intfile = open("/tmp/mmap.bin", "r+")

arr = Mmap.mmap(intfile, Vector{Int64}, (div(stat(intfile).size, 8))) # Int64 is 8 bytes

sort!(arr)

</syntaxhighlight>

</lang>

=={{header|Nim}}==

<~~lang~~ ~~Nim~~>import algorithm, heapqueue, os, random, sequtils, strformat, strutils

<syntaxhighlight lang="nim">import algorithm, heapqueue, os, random, sequtils, strformat, strutils

Line 983:

for filename in filenames:

removeFile(filename)</~~lang~~>

removeFile(filename)</syntaxhighlight>

Line 1,006:

=={{header|Perl}}==

Simulate task by reading from 'DATA' handle and using tiny record limit. As written, works for any numeric input, but could define any kind of customized sorting.

<~~lang~~ perl>use strict;

<syntaxhighlight lang="perl">use strict;

use warnings;

Line 1,051:

654

789

234</~~lang~~>

234</syntaxhighlight>

<pre>123

Line 1,070:

=={{header|Phix}}==

Slight variation on [[Stream_Merge#Phix|Stream_Merge]]

without js -- file i/o

include builtins/pqueue.e

Line 1,131:

end procedure

test()

<pre>

Line 1,151:

=={{header|Python}}==

A technique demonstrated with a short string character data.

<~~lang~~ python>

#! /usr/bin/python3

Line 1,235:

example = main

example()

</syntaxhighlight>

</lang>

=={{header|Raku}}==

(formerly Perl 6)

Borrowing from [http://rosettacode.org/wiki/Stream_Merge Stream_Merge] here. Temporary files are automatically deleted when program is done, so no explicit clean-up required.

<lang ~~perl6~~>use File::Temp;

<syntaxhighlight lang="raku" line>use File::Temp;

sub merge_streams ( @streams ) {

Line 1,267:

@files.push: store(@chunk) if @chunk;

say join ' ', merge_streams @files».&open;</~~lang~~>

say join ' ', merge_streams @files».&open;</syntaxhighlight>

Line 1,277:

This particular example uses the DOS   '''SORT'''   and   '''ERASE'''   commands.

<~~lang~~ rexx>/*REXX pgm reads a file, splits into smaller files, sorts 'em, combines into sorted file*/

<syntaxhighlight lang="rexx">/*REXX pgm reads a file, splits into smaller files, sorts 'em, combines into sorted file*/

parse arg FID n lim seed . /*obtain optional arguments from the CL*/

if FID=='' | FID=="," then FID= 'SORT_EXT.OUT' /*name of the output (sorted) file. */

Line 1,328:

/*──────────────────────────────────────────────────────────────────────────────────────*/

srt: procedure expose sWork; parse arg #

do j=1 for #; fn= sWORK || j; 'SORT' fn "/O" fn; end /*j*/; return</~~lang~~>

do j=1 for #; fn= sWORK || j; 'SORT' fn "/O" fn; end /*j*/; return</syntaxhighlight>

=={{header|Wren}}==

Line 1,336:

A bit simpler than the Go version as we use fixed length integers which (together with a following space) can be sorted as strings.

<~~lang~~ ecmascript>import "io" for File

<syntaxhighlight lang="ecmascript">import "io" for File

import "random" for Random

import "/dynamic" for Struct

Line 1,455:

var fileName = "es%(i)"

File.delete(fileName)

}</~~lang~~>

}</syntaxhighlight>