Parallel calculations: Difference between revisions

Content added Content deleted
(→‎{{header|Java}}: Reverted to the most recent version which compiled and produced the desired output.)
(→‎{{header|Perl 6}}: Add some comparisons using different numbers of threads to factor the same group of numbers)
Line 1,488: Line 1,488:
Takes the list of numbers and converts them to a <tt>HyperSeq</tt> that is stored in a variable and evaluated concurrently. <tt>HyperSeq</tt>s overload <tt>map</tt> and <tt>grep</tt> to convert and pick values in worker threads. The runtime will pick the number of OS-level threads and assign worker threads to them while avoiding stalling in any part of the program. A <tt>HyperSeq</tt> is lazy, so the computation of values will happen in chunks as they are requested.
Takes the list of numbers and converts them to a <tt>HyperSeq</tt> that is stored in a variable and evaluated concurrently. <tt>HyperSeq</tt>s overload <tt>map</tt> and <tt>grep</tt> to convert and pick values in worker threads. The runtime will pick the number of OS-level threads and assign worker threads to them while avoiding stalling in any part of the program. A <tt>HyperSeq</tt> is lazy, so the computation of values will happen in chunks as they are requested.


The hyper (and race) method can take two parameters that will tweak how the parallelization occurs: :degree and :batch. :degree is the number of worker threads to allocate to the job. By default it is set to the number of physical cores available. If you have a hyper threading processor, and the tasks are not cpu bound, it may be useful to raise that number but it is a reasonable default. :batch is how many sub-tasks are parceled out at a time to each worker thread. Default is 64. For small numbers of cpu intensive tasks a lower number will likely be better, but too low may make the dispatch overhead cancel out the benefit of threading. Conversely, too high will over-burden some threads and starve others. Over long-running processes of multi hundreds/thousands of sub-tasks, the scheduler will automatically adjust the batch size up or down to try to keep the pipeline filled. For small batch sizes of cpu intensive tasks (such as this one) it is useful to give it a smaller starting batch size.
The hyper (and race) method can take two parameters that will tweak how the parallelization occurs: :degree and :batch. :degree is the number of worker threads to allocate to the job. By default it is set to the number of physical cores available. If you have a hyper threading processor, and the tasks are not cpu bound, it may be useful to raise that number but it is a reasonable default. :batch is how many sub-tasks are parceled out at a time to each worker thread. Default is 64. For small numbers of cpu intensive tasks a lower number will likely be better, but too low may make the dispatch overhead cancel out the benefit of threading. Conversely, too high will over-burden some threads and starve others. Over long-running processes of multi hundreds/thousands of sub-tasks, the scheduler will automatically adjust the batch size up or down to try to keep the pipeline filled. For small numbers of cpu intensive tasks (such as this one) it is useful to give it a smaller starting batch size.


On my system, under the load I was running, I found a batch size of 3 to be optimal for this task. May be different for different systems and different loads.
On my system, under the load I was running, I found a batch size of 3 to be optimal for this task. May be different for different systems and different loads.


Also, as a relative comparison, perform the same task on the same set of 100 numbers as found in the [[Parallel_calculations#SequenceL|SequenceL]] example, using varying numbers of threads. Pay no attention to the absolute speed numbers, they will vary greatly between systems, this is more a comparison of relative throughput. On a Core i7-4770 @ 3.40GHz with 4 cores and hyper-threading under Linux, there is a distinct pattern where more threads on physical cores give reliable increases in throughput. Adding hyperthreads may (and, in this case, does seem to) give some additional marginal benefit.
Using the <tt>prime-factors</tt> routine as defined in the [[Prime_decomposition#Perl_6 |prime decomposition]] task:

Using the <tt>prime-factors</tt> routine as defined in the [[Prime_decomposition#Perl_6 |prime decomposition]] task.
<lang perl6>my @nums = 64921987050997300559, 70251412046988563035, 71774104902986066597,
<lang perl6>my @nums = 64921987050997300559, 70251412046988563035, 71774104902986066597,
83448083465633593921, 84209429893632345702, 87001033462961102237,
83448083465633593921, 84209429893632345702, 87001033462961102237,
Line 1,500: Line 1,502:


my @factories = @nums.hyper(:3batch).map: *.&prime-factors;
my @factories = @nums.hyper(:3batch).map: *.&prime-factors;

printf "%21d factors: %s\n", |$_ for @nums Z @factories;
printf "%21d factors: %s\n", |$_ for @nums Z @factories;

my $gmf = {}.append(@factories»[0] »=>« @nums).max: +*.key;
my $gmf = {}.append(@factories»[0] »=>« @nums).max: +*.key;

say "\nGreatest minimum factor: ", $gmf.key;
say "\nGreatest minimum factor: ", $gmf.key;

say "from: { $gmf.value }\n";
say "from: { $gmf.value }\n";

say 'Run time: ', now - INIT now;
say 'Run time: ', now - INIT now;
say '-' x 80;

# For amusements sake and for relative comparison, using the same 100
# numbers as in the SequenceL example, testing with different numbers of threads.

@nums = <625070029 413238785 815577134 738415913 400125878 967798656 830022841
774153795 114250661 259366941 571026384 522503284 757673286 509866901 6303092
516535622 177377611 520078930 996973832 148686385 33604768 384564659 95268916
659700539 149740384 320999438 822361007 701572051 897604940 2091927 206462079
290027015 307100080 904465970 689995756 203175746 802376955 220768968 433644101
892007533 244830058 36338487 870509730 350043612 282189614 262732002 66723331
908238109 635738243 335338769 461336039 225527523 256718333 277834108 430753136
151142121 602303689 847642943 538451532 683561566 724473614 422235315 921779758
766603317 364366380 60185500 333804616 988528614 933855820 168694202 219881490
703969452 308390898 567869022 719881996 577182004 462330772 770409840 203075270
666478446 351859802 660783778 503851023 789751915 224633442 347265052 782142901
43731988 246754498 736887493 875621732 594506110 854991694 829661614 377470268
984990763 275192380 39848200 892766084 76503760>».Int;

for 1..8 -> $degree {
my $start = now;
my \factories = @nums.hyper(:degree($degree), :3batch).map: *.&prime-factors;
my $gmf = {}.append(factories»[0] »=>« @nums).max: +*.key;
say "\nFactoring {+@nums} numbers, greatest minimum factor: {$gmf.key}";
say "Using: $degree thread{ $degree > 1 ?? 's' !! ''}";
my $end = now;
say 'Run time: ', $end - $start, ' seconds.';
}


# Prime factoring routines from the Prime decomposition task
sub prime-factors ( Int $n where * > 0 ) {
sub prime-factors ( Int $n where * > 0 ) {
return $n if $n.is-prime;
return $n if $n.is-prime;
Line 1,554: Line 1,580:
from: 64921987050997300559 71774104902986066597 83448083465633593921 87001033462961102237 89538854889623608177 98421229882942378967
from: 64921987050997300559 71774104902986066597 83448083465633593921 87001033462961102237 89538854889623608177 98421229882942378967


Run time: 0.29642621</pre>
Run time: 0.2903003
--------------------------------------------------------------------------------

Factoring 100 numbers, greatest minimum factor: 782142901
Using: 1 thread
Run time: 0.3881785 seconds.

Factoring 100 numbers, greatest minimum factor: 782142901
Using: 2 threads
Run time: 0.2285219 seconds.

Factoring 100 numbers, greatest minimum factor: 782142901
Using: 3 threads
Run time: 0.1555278 seconds.

Factoring 100 numbers, greatest minimum factor: 782142901
Using: 4 threads
Run time: 0.13021902 seconds.

Factoring 100 numbers, greatest minimum factor: 782142901
Using: 5 threads
Run time: 0.1206574 seconds.

Factoring 100 numbers, greatest minimum factor: 782142901
Using: 6 threads
Run time: 0.13286821 seconds.

Factoring 100 numbers, greatest minimum factor: 782142901
Using: 7 threads
Run time: 0.1102702 seconds.

Factoring 100 numbers, greatest minimum factor: 782142901
Using: 8 threads
Run time: 0.1066578 seconds.</pre>


Beside <tt>HyperSeq</tt> and its (allowed to be) out-of-order equivalent <tt>RaceSeq</tt>, [[Rakudo]] supports primitive threads, locks and highlevel promises. Using channels and supplies values can be move thread-safely from one thread to another. A react-block can be used as a central hub for message passing.
Beside <tt>HyperSeq</tt> and its (allowed to be) out-of-order equivalent <tt>RaceSeq</tt>, [[Rakudo]] supports primitive threads, locks and highlevel promises. Using channels and supplies values can be move thread-safely from one thread to another. A react-block can be used as a central hub for message passing.