Stream merge
You are encouraged to solve this task according to the task description, using any language you may know.
- 2-stream merge
- Read two sorted streams of items from external source (e.g. disk, or network), and write one stream of sorted items to external sink.
- Common algorithm: keep 1 buffered item from each source, select minimal of them, write it, fetch another item from that stream from which the written item was.
- N-stream merge
- The same as above, but reading from N sources.
- Common algorithm: same as above, but keep buffered items and their source descriptors in a heap.
Assume streams are very big. You must not suck them whole in the memory, but read them as streams.
Haskell
There is no built-in iterator or stream type for file operations in Haskell. But several such libraries exist.
conduit
<lang haskell>-- stack runhaskell --package=conduit-extra --package=conduit-merge
import Control.Monad.Trans.Resource (runResourceT) import qualified Data.ByteString.Char8 as BS import Data.Conduit (($$), (=$=)) import Data.Conduit.Binary (sinkHandle, sourceFile) import qualified Data.Conduit.Binary as Conduit import qualified Data.Conduit.List as Conduit import Data.Conduit.Merge (mergeSources) import System.Environment (getArgs) import System.IO (stdout)
main :: IO () main = do
inputFileNames <- getArgs let inputs = [sourceFile file =$= Conduit.lines | file <- inputFileNames] runResourceT $ mergeSources inputs $$ sinkStdoutLn where sinkStdoutLn = Conduit.map (`BS.snoc` '\n') =$= sinkHandle stdout</lang>
pipes
<lang haskell>-- stack runhaskell --package=pipes-safe --package=pipes-interleave
import Pipes (runEffect, (>->)) import Pipes.Interleave (interleave) import Pipes.Prelude (stdoutLn) import Pipes.Safe (runSafeT) import Pipes.Safe.Prelude (readFile) import Prelude hiding (readFile) import System.Environment (getArgs)
main :: IO () main = do
sourceFileNames <- getArgs let sources = map readFile sourceFileNames runSafeT . runEffect $ interleave compare sources >-> stdoutLn</lang>
Perl 6
<lang perl6>sub merge_streams ( @streams ) {
my @s = @streams.map({ hash( STREAM => $_, HEAD => .get ) })\ .grep({ .<HEAD>.defined });
return gather while @s { my $h = @s.min: *.<HEAD>; take $h<HEAD>; $h<HEAD> = $h<STREAM>.get orelse @s .= grep( { $_ !=== $h } ); }
}
say merge_streams([ @*ARGS».&open ]);</lang>
Python
Built-in function open
opens a file for reading and returns a line-by-line iterator (stream) over the file.
There exists a standard library function heapq.merge
that takes any number of sorted stream iterators and merges them into one sorted iterator, using a heap.
<lang python>import heapq import sys
sources = sys.argv[1:] for item in heapq.merge(open(source) for source in sources):
print(item)</lang>
REXX
<lang rexx>/**********************************************************************
- Merge 1.txt ... n.txt into m.txt
- 1.txt 2.txt 3.txt 4.txt
- 1 19 1999 2e3
- 17 33 2999 3000
- 8 500 3999
- /
n=4 high='ffff'x p.= Do i=1 To n
f.i=i'.txt' Call get i End
Do Forever
min=high Do i=1 To n If x.i<<min Then Do /* avoid numerical comparison */ imin=i min=x.i End End If min<<high Then Do Call o x.imin Call get imin End Else Do Call lineout oid Leave End End
Exit get: Procedure Expose f. x. high p.
Parse Arg ii If lines(f.ii)=0 Then x.ii=high Else Do x.ii=linein(f.ii) If x.ii<<p.ii Then Do Say 'Input file' f.ii 'is not sorted ascendingly' Say p.ii 'precedes' x.ii Exit End p.ii=x.ii End Return
o: Say arg(1)
Return lineout(oid,arg(1))</lang>
- Output:
1 17 19 1999 2999 2e3 3000 33 3999 500 8
Shell
sort --merge source1 source2 sourceN > sink
zkl
This solution uses iterators, doesn't care where the streams orginate and only keeps the head of the stream on hand. <lang zkl>fcn mergeStreams(s1,s2,etc){ //-->Walker
streams:=vm.arglist.pump(List(),fcn(s){ // prime and prune if( (w:=s.walker())._next() ) return(w); Void.Skip // stream is dry }); Walker().tweak(fcn(streams){ if(not streams) return(Void.Stop); // all streams are dry values:=streams.apply("value"); // head of the streams v:=values.reduce('wrap(min,x){ if(min<=x) min else x }); n:=values.find(v); w:=streams[n]; w._next(); // read next value from min stream if(w.atEnd) streams.del(n); // prune empty streams v }.fp(streams));
}</lang> Using infinite streams: <lang zkl>w:=mergeStreams([0..],[2..*,2],[3..*,3],T(5)); w.walk(20).println();</lang>
- Output:
L(0,1,2,2,3,3,4,4,5,5,6,6,6,7,8,8,9,9,10,10)
Using files: <lang zkl>w:=mergeStreams(File("unixdict.txt"),File("2hkprimes.txt"),File("/dev/null")); do(10){ w.read().print() }</lang>
- Output:
10th 1st 2 2nd 3 3rd 4th 5 5th 6th
Using the above example to squirt the merged stream to a file: <lang zkl>mergeStreams(File("unixdict.txt"),File("2hkprimes.txt"),File("/dev/null")) .pump(File("foo.txt","w"));</lang>
- Output:
$ ls -l unixdict.txt 2hkprimes.txt foo.txt -rw-r--r-- 1 craigd craigd 1510484 Oct 29 2013 2hkprimes.txt -rw-r--r-- 1 craigd craigd 1716887 Jun 16 23:34 foo.txt -rw-r--r-- 1 craigd craigd 206403 Jun 11 2014 unixdict.txt