Stream merge: Difference between revisions
Content added Content deleted
(→Python) |
Walterpachl (talk | contribs) (add REXX) |
||
Line 26: | Line 26: | ||
sort --merge source1 source2 sourceN > sink |
sort --merge source1 source2 sourceN > sink |
||
==REXX== |
|||
== |
|||
<lang rexx>/********************************************************************** |
|||
* Merge 1.txt ... n.txt into m.txt |
|||
* 1.txt 2.txt 3.txt 4.txt |
|||
* 1 19 1999 2e3 |
|||
* 8 33 2999 3000 |
|||
* 17 100 3999 |
|||
**********************************************************************/ |
|||
n=4 |
|||
high='ffff'x |
|||
Do i=1 To n |
|||
f.i=i'.txt' |
|||
Call get i |
|||
End |
|||
Do Forever |
|||
min=high |
|||
Do i=1 To n |
|||
If x.i<<min Then Do /* avoid numerical comparison */ |
|||
imin=i |
|||
min=x.i |
|||
End |
|||
End |
|||
If min<<high Then Do |
|||
Call o x.imin |
|||
Call get imin |
|||
End |
|||
Else Do |
|||
Call lineout oid |
|||
Leave |
|||
End |
|||
End |
|||
Exit |
|||
get: Procedure Expose f. x. high |
|||
Parse Arg i |
|||
If lines(f.i)=0 Then |
|||
x.i=high |
|||
Else Do |
|||
x.i=linein(f.i) |
|||
End |
|||
Return |
|||
o: Say arg(1) |
|||
Return lineout(oid,arg(1))</lang> |
|||
{{out}} |
|||
<pre>1 |
|||
19 |
|||
1999 |
|||
2999 |
|||
2e3 |
|||
3000 |
|||
33 |
|||
100 |
|||
3999 |
|||
8 |
|||
17</pre> |
Revision as of 10:24, 15 June 2016
Stream merge
You are encouraged to solve this task according to the task description, using any language you may know.
You are encouraged to solve this task according to the task description, using any language you may know.
- 2-stream merge
- Read two sorted streams of items from external source (e.g. disk, or network), and write one stream of sorted items to external sink.
- Common algorithm: keep 1 buffered item from each source, select minimal of them, write it, fetch another item from that stream from which the written item was.
- N-stream merge
- The same as above, but reading from N sources.
- Common algorithm: same as above, but keep buffered items and their source descriptors in a heap.
Assuming streams are very big. You must not suck them whole in the memory, but read them as streams.
Python
Built-in function open
opens a file for reading and returns a line-by-line iterator (stream) over the file.
There exists a standard library function heapq.merge
that takes any number of sorted stream iterators and merges them into one sorted iterator, using a heap.
<lang python>import heapq import sys
sources = sys.argv[1:] for item in heapq.merge(open(source) for source in sources):
print(item)</lang>
Shell
sort --merge source1 source2 sourceN > sink
REXX
== <lang rexx>/**********************************************************************
- Merge 1.txt ... n.txt into m.txt
- 1.txt 2.txt 3.txt 4.txt
- 1 19 1999 2e3
- 8 33 2999 3000
- 17 100 3999
- /
n=4 high='ffff'x Do i=1 To n
f.i=i'.txt' Call get i End
Do Forever
min=high Do i=1 To n If x.i<<min Then Do /* avoid numerical comparison */ imin=i min=x.i End End If min<<high Then Do Call o x.imin Call get imin End Else Do Call lineout oid Leave End End
Exit get: Procedure Expose f. x. high
Parse Arg i If lines(f.i)=0 Then x.i=high Else Do x.i=linein(f.i) End Return
o: Say arg(1)
Return lineout(oid,arg(1))</lang>
- Output:
1 19 1999 2999 2e3 3000 33 100 3999 8 17