Stream merge: Difference between revisions

From Rosetta Code
Content added Content deleted
(add REXX)
m (→‎REXX: check order of Input (and correct previous wrong Input))
Line 28: Line 28:


==REXX==
==REXX==
==
<lang rexx>/**********************************************************************
<lang rexx>/**********************************************************************
* Merge 1.txt ... n.txt into m.txt
* Merge 1.txt ... n.txt into m.txt
* 1.txt 2.txt 3.txt 4.txt
* 1.txt 2.txt 3.txt 4.txt
* 1 19 1999 2e3
* 1 19 1999 2e3
* 8 33 2999 3000
* 17 33 2999 3000
* 17 100 3999
* 8 500 3999
**********************************************************************/
**********************************************************************/
n=4
n=4
high='ffff'x
high='ffff'x
p.=''
Do i=1 To n
Do i=1 To n
f.i=i'.txt'
f.i=i'.txt'
Line 60: Line 60:
End
End
Exit
Exit
get: Procedure Expose f. x. high
get: Procedure Expose f. x. high p.
Parse Arg i
Parse Arg ii
If lines(f.i)=0 Then
If lines(f.ii)=0 Then
x.i=high
x.ii=high
Else Do
Else Do
x.i=linein(f.i)
x.ii=linein(f.ii)
If x.ii<<p.ii Then Do
Say 'Input file' f.ii 'is not sorted ascendingly'
Say p.ii 'precedes' x.ii
Exit
End
p.ii=x.ii
End
End
Return
Return
Line 72: Line 78:
{{out}}
{{out}}
<pre>1
<pre>1
17
19
19
1999
1999
Line 78: Line 85:
3000
3000
33
33
100
3999
3999
500
8
17</pre>
8</pre>

Revision as of 10:44, 15 June 2016

Task
Stream merge
You are encouraged to solve this task according to the task description, using any language you may know.
2-stream merge
Read two sorted streams of items from external source (e.g. disk, or network), and write one stream of sorted items to external sink.
Common algorithm: keep 1 buffered item from each source, select minimal of them, write it, fetch another item from that stream from which the written item was.
N-stream merge
The same as above, but reading from N sources.
Common algorithm: same as above, but keep buffered items and their source descriptors in a heap.

Assuming streams are very big. You must not suck them whole in the memory, but read them as streams.

Python

Built-in function open opens a file for reading and returns a line-by-line iterator (stream) over the file.

There exists a standard library function heapq.merge that takes any number of sorted stream iterators and merges them into one sorted iterator, using a heap.

<lang python>import heapq import sys

sources = sys.argv[1:] for item in heapq.merge(open(source) for source in sources):

   print(item)</lang>

Shell

sort --merge source1 source2 sourceN > sink

REXX

<lang rexx>/**********************************************************************

  • Merge 1.txt ... n.txt into m.txt
  • 1.txt 2.txt 3.txt 4.txt
  • 1 19 1999 2e3
  • 17 33 2999 3000
  • 8 500 3999
                                                                                                                                            • /

n=4 high='ffff'x p.= Do i=1 To n

 f.i=i'.txt'
 Call get i
 End

Do Forever

 min=high
 Do i=1 To n
   If x.i<<min Then Do    /* avoid numerical comparison */
     imin=i
     min=x.i
     End
   End
 If min<<high Then Do
   Call o x.imin
   Call get imin
   End
 Else Do
   Call lineout oid
   Leave
   End
 End

Exit get: Procedure Expose f. x. high p.

 Parse Arg ii
 If lines(f.ii)=0 Then
   x.ii=high
 Else Do
   x.ii=linein(f.ii)
   If x.ii<<p.ii Then Do
     Say 'Input file' f.ii 'is not sorted ascendingly'
     Say p.ii 'precedes' x.ii
     Exit
     End
   p.ii=x.ii
   End
 Return

o: Say arg(1)

  Return lineout(oid,arg(1))</lang>
Output:
1
17
19
1999
2999
2e3
3000
33
3999
500
8