Compare sorting algorithms' performance: Difference between revisions

From Rosetta Code
Content added Content deleted
No edit summary
m (→‎{{header|Python}}: (tt/pre/ ) to lang tags)
Line 27: Line 27:
{{works with|Python|2.5}}
{{works with|Python|2.5}}
===Examples of sorting routines===
===Examples of sorting routines===
def builtinsort(x):
<lang python> def builtinsort(x):
x.sort()
x.sort()


Line 45: Line 45:
if size < 2: return seq
if size < 2: return seq
low, middle, up = partition(seq, seq[random.randrange(size)])
low, middle, up = partition(seq, seq[random.randrange(size)])
return qsortranpart(low) + middle + qsortranpart(up)
return qsortranpart(low) + middle + qsortranpart(up)</lang>


===Sequence generators===
===Sequence generators===


def ones(n):
<lang python> def ones(n):
return [1]*n
return [1]*n


Line 60: Line 60:
x = range(n)
x = range(n)
random.shuffle(x)
random.shuffle(x)
return x
return x</lang>
===Write timings===
===Write timings===
<lang python> def write_timings(npoints=10, maxN=10**4, sort_functions=(builtinsort,insertion_sort, qsort),
<tt>
def write_timings(npoints=10, maxN=10**4, sort_functions=(builtinsort,insertion_sort, qsort),
sequence_creators = (ones, range, shuffledrange)):
sequence_creators = (ones, range, shuffledrange)):
Ns = range(2, maxN, maxN//npoints)
Ns = range(2, maxN, maxN//npoints)
Line 69: Line 68:
for make_seq in sequence_creators:
for make_seq in sequence_creators:
Ts = map(lambda n: usec(sort, (make_seq(n),)), Ns)
Ts = map(lambda n: usec(sort, (make_seq(n),)), Ns)
writedat('%s-%s-%d-%d.xy' % (sort.__name__, make_seq.__name__, len(Ns), max(Ns)), Ns, Ts)
writedat('%s-%s-%d-%d.xy' % (sort.__name__, make_seq.__name__, len(Ns), max(Ns)), Ns, Ts)</lang>
</tt>
Where ''writedat()'' is defined in the [[Write float arrays to a text file]], ''usec()'' - [[Query Performance]], ''insertion_sort()'' - [[Insertion sort]], ''qsort'' - [[Quicksort]] subtasks, correspondingly.
Where ''writedat()'' is defined in the [[Write float arrays to a text file]], ''usec()'' - [[Query Performance]], ''insertion_sort()'' - [[Insertion sort]], ''qsort'' - [[Quicksort]] subtasks, correspondingly.


Line 76: Line 74:
{{libheader|matplotlib}}
{{libheader|matplotlib}}
{{libheader|numpy}}
{{libheader|numpy}}
<lang python> import operator
<tt>
import operator
import numpy, pylab
import numpy, pylab
def plotdd(dictplotdict):
def plotdd(dictplotdict):
Line 99: Line 96:
pylab.savefig(figname+'.png')
pylab.savefig(figname+'.png')
pylab.savefig(figname+'.pdf')
pylab.savefig(figname+'.pdf')
print figname
print figname</lang>
</tt>
See [[Plot x, y arrays]] and [[Polynomial Fitting]] subtasks for a basic usage of ''pylab.plot()'' and ''numpy.polyfit()''.
See [[Plot x, y arrays]] and [[Polynomial Fitting]] subtasks for a basic usage of ''pylab.plot()'' and ''numpy.polyfit()''.


<lang python> import collections, itertools, glob, re
<tt>
import collections, itertools, glob, re
import numpy
import numpy
def plot_timings():
def plot_timings():
Line 133: Line 128:
# actual plotting
# actual plotting
plotdd(df)
plotdd(df)
plotdd(ds) # see ``plotdd()`` above
plotdd(ds) # see ``plotdd()`` above</lang>
</tt>


===Figures: log2( time in microseconds ) vs. log2( sequence length )===
===Figures: log2( time in microseconds ) vs. log2( sequence length )===
sort_functions = [
<lang python> sort_functions = [
builtinsort, # see implementation above
builtinsort, # see implementation above
insertion_sort, # see [[Insertion sort]]
insertion_sort, # see [[Insertion sort]]
Line 154: Line 148:
sort_functions=sort_functions,
sort_functions=sort_functions,
sequence_creators = (ones, range, shuffledrange))
sequence_creators = (ones, range, shuffledrange))
plot_timings()
plot_timings()</lang>
Executing above script we get belowed figures.
Executing above script we get belowed figures.
====ones====
====ones====

Revision as of 23:28, 1 March 2009

Task
Compare sorting algorithms' performance
You are encouraged to solve this task according to the task description, using any language you may know.

Measure a relative performance of sorting algorithms implementations.

Plot execution time vs. input sequence length dependencies for various implementation of sorting algorithm and different input sequence types (example figures).

Consider three type of input sequences:

  • ones: sequence of all 1's. Example: {1, 1, 1, 1, 1}
  • range: ascending sequence, i.e. already sorted. Example: {1, 2, 3, 10, 15}
  • shuffled range: sequence with elements randomly distributed. Example: {5, 3, 9, 6, 8}

Consider at least two different sorting functions (different algorithms or/and different implementation of the same algorithm). For example, consider Bubble Sort, Insertion sort, Quicksort or/and implementations of Quicksort with different pivot selection mechanisms. Where possible, use existing implementations.

Preliminary subtask:

General steps:

  1. Define sorting routines to be considered.
  2. Define appropriate sequence generators and write timings.
  3. Plot timings.
  4. What conclusions about relative performance of the sorting routines could be made based on the plots?

Python

Works with: Python version 2.5

Examples of sorting routines

<lang python> def builtinsort(x):

    x.sort()
def partition(seq, pivot):
   low, middle, up = [], [], []
   for x in seq:
       if x < pivot:
           low.append(x)
       elif x == pivot:
           middle.append(x)
       else:
           up.append(x)
   return low, middle, up
import random
def qsortranpart(seq):
   size = len(seq)
   if size < 2: return seq
   low, middle, up = partition(seq, seq[random.randrange(size)])
   return qsortranpart(low) + middle + qsortranpart(up)</lang>

Sequence generators

<lang python> def ones(n):

    return [1]*n
def reversedrange(n):
    x = range(n)
    x.reverse()
    return x
def shuffledrange(n):
    x = range(n)
    random.shuffle(x)
    return x</lang>    

Write timings

<lang python> def write_timings(npoints=10, maxN=10**4, sort_functions=(builtinsort,insertion_sort, qsort),

                  sequence_creators = (ones, range, shuffledrange)):    
   Ns = range(2, maxN, maxN//npoints)
   for sort in sort_functions:
       for make_seq in sequence_creators:
           Ts = map(lambda n: usec(sort, (make_seq(n),)), Ns)
           writedat('%s-%s-%d-%d.xy' % (sort.__name__,  make_seq.__name__, len(Ns), max(Ns)), Ns, Ts)</lang>

Where writedat() is defined in the Write float arrays to a text file, usec() - Query Performance, insertion_sort() - Insertion sort, qsort - Quicksort subtasks, correspondingly.

Plot timings

Library: matplotlib
Library: numpy

<lang python> import operator

import numpy, pylab
def plotdd(dictplotdict):
   """See ``plot_timings()`` below."""
   symbols = ('o', '^', 'v', '<', '>', 's', '+', 'x', 'D', 'd',
              '1', '2', '3', '4', 'h', 'H', 'p', '|', '_')
   colors = map(None, 'bgrcmyk') # split string on distinct characters
   for npoints, plotdict in dictplotdict.iteritems():
       for ttle, lst in plotdict.iteritems():            
           pylab.hold(False)                                
           for i, (label, polynom, x, y) in enumerate(sorted(lst,key=operator.itemgetter(0))):
               pylab.plot(x, y, colors[i % len(colors)] + symbols[i % len(symbols)], label='%s %s' % (polynom, label))
               pylab.hold(True)
               y = numpy.polyval(polynom, x)
               pylab.plot(x, y, colors[i % len(colors)], label= '_nolegend_')                
           pylab.legend(loc='upper left')
           pylab.xlabel(polynom.variable)
           pylab.ylabel('log2( time in microseconds )')                
           pylab.title(ttle, verticalalignment='bottom')
           figname = '_%(npoints)03d%(ttle)s' % vars()
           pylab.savefig(figname+'.png')
           pylab.savefig(figname+'.pdf')
           print figname</lang>

See Plot x, y arrays and Polynomial Fitting subtasks for a basic usage of pylab.plot() and numpy.polyfit().

<lang python> import collections, itertools, glob, re

import numpy
def plot_timings():
   makedict = lambda: collections.defaultdict(lambda: collections.defaultdict(list))
   df = makedict()
   ds = makedict()
   # populate plot dictionaries
   for filename in glob.glob('*.xy'):
       m = re.match(r'([^-]+)-([^-]+)-(\d+)-(\d+)\.xy', filename)
       print filename
       assert m, filename
       funcname, seqname, npoints, maxN = m.groups()
       npoints, maxN = int(npoints), int(maxN)        
       a = numpy.fromiter(itertools.imap(float, open(filename).read().split()), dtype='f')
       Ns = a[::2]  # sequences lengths
       Ts = a[1::2] # corresponding times 
       assert len(Ns) == len(Ts) == npoints
       assert max(Ns) <= maxN
       #
       logsafe = numpy.logical_and(Ns>0, Ts>0)
       Ts = numpy.log2(Ts[logsafe])
       Ns = numpy.log2(Ns[logsafe])
       coeffs = numpy.polyfit(Ns, Ts, deg=1)
       poly = numpy.poly1d(coeffs, variable='log2(N)')
       #
       df[npoints][funcname].append((seqname, poly, Ns, Ts))
       ds[npoints][seqname].append((funcname, poly, Ns, Ts))
   # actual plotting
   plotdd(df)
   plotdd(ds) # see ``plotdd()`` above</lang>

Figures: log2( time in microseconds ) vs. log2( sequence length )

<lang python> sort_functions = [

    builtinsort,         # see implementation above
    insertion_sort,      # see Insertion sort
    insertion_sort_lowb, # insertion_sort, where sequential search is replaced
                         #     by lower_bound() function
    qsort,               # see Quicksort
    qsortranlc,          # qsort with randomly choosen pivot
                         #     and the filtering via list comprehension
    qsortranpart,        # qsortranlc with filtering via partition function
    qsortranpartis,      # qsortranpart, where for a small input sequence lengths
    ]                    #     insertion_sort is called
if __name__=="__main__":
   import sys
   sys.setrecursionlimit(10000)
   write_timings(npoints=100, maxN=1024, # 1 <= N <= 2**10 an input sequence length
                 sort_functions=sort_functions,
                 sequence_creators = (ones, range, shuffledrange))
   plot_timings()</lang>

Executing above script we get belowed figures.

ones

ones.png (143KiB)

builtinsort     - O(N)
insertion_sort  - O(N)
qsort           - O(N**2)
qsortranpart    - O(N)

range

range.png (145KiB)

builtinsort     - O(N)
insertion_sort  - O(N)
qsort           - O(N**2)
qsortranpart    - O(N*log(N))

shuffled range

shuffledrange.png (152KiB)

builtinsort     - O(N)  
insertion_sort  - O(N**4) ???
qsort           - O(N*log(N))
qsortranpart    - O(N) ???