Stream merge: Difference between revisions

From Rosetta Code
Content added Content deleted
(add REXX)
Line 26: Line 26:


sort --merge source1 source2 sourceN > sink
sort --merge source1 source2 sourceN > sink

==REXX==
==
<lang rexx>/**********************************************************************
* Merge 1.txt ... n.txt into m.txt
* 1.txt 2.txt 3.txt 4.txt
* 1 19 1999 2e3
* 8 33 2999 3000
* 17 100 3999
**********************************************************************/
n=4
high='ffff'x
Do i=1 To n
f.i=i'.txt'
Call get i
End
Do Forever
min=high
Do i=1 To n
If x.i<<min Then Do /* avoid numerical comparison */
imin=i
min=x.i
End
End
If min<<high Then Do
Call o x.imin
Call get imin
End
Else Do
Call lineout oid
Leave
End
End
Exit
get: Procedure Expose f. x. high
Parse Arg i
If lines(f.i)=0 Then
x.i=high
Else Do
x.i=linein(f.i)
End
Return
o: Say arg(1)
Return lineout(oid,arg(1))</lang>
{{out}}
<pre>1
19
1999
2999
2e3
3000
33
100
3999
8
17</pre>

Revision as of 10:24, 15 June 2016

Task
Stream merge
You are encouraged to solve this task according to the task description, using any language you may know.
2-stream merge
Read two sorted streams of items from external source (e.g. disk, or network), and write one stream of sorted items to external sink.
Common algorithm: keep 1 buffered item from each source, select minimal of them, write it, fetch another item from that stream from which the written item was.
N-stream merge
The same as above, but reading from N sources.
Common algorithm: same as above, but keep buffered items and their source descriptors in a heap.

Assuming streams are very big. You must not suck them whole in the memory, but read them as streams.

Python

Built-in function open opens a file for reading and returns a line-by-line iterator (stream) over the file.

There exists a standard library function heapq.merge that takes any number of sorted stream iterators and merges them into one sorted iterator, using a heap.

<lang python>import heapq import sys

sources = sys.argv[1:] for item in heapq.merge(open(source) for source in sources):

   print(item)</lang>

Shell

sort --merge source1 source2 sourceN > sink

REXX

== <lang rexx>/**********************************************************************

  • Merge 1.txt ... n.txt into m.txt
  • 1.txt 2.txt 3.txt 4.txt
  • 1 19 1999 2e3
  • 8 33 2999 3000
  • 17 100 3999
                                                                                                                                            • /

n=4 high='ffff'x Do i=1 To n

 f.i=i'.txt'
 Call get i
 End

Do Forever

 min=high
 Do i=1 To n
   If x.i<<min Then Do    /* avoid numerical comparison */
     imin=i
     min=x.i
     End
   End
 If min<<high Then Do
   Call o x.imin
   Call get imin
   End
 Else Do
   Call lineout oid
   Leave
   End
 End

Exit get: Procedure Expose f. x. high

 Parse Arg i
 If lines(f.i)=0 Then
   x.i=high
 Else Do
   x.i=linein(f.i)
   End
 Return

o: Say arg(1)

  Return lineout(oid,arg(1))</lang>
Output:
1
19
1999
2999
2e3
3000
33
100
3999
8
17