User talk:Horst.h

Revision as of 08:06, 6 January 2019 by GordonBGood (talk | contribs) (Sieve of Eratosthenes comments and improvements...)

Hi, I note that in one of your examples you sprinkle timings. On RC we tend not to get too involved in mentioning runtimes unless it is exceptionally slow. It is best to restrict timings to maybe comparative timings between two algorithms in the same language, and only where necessary: "This new algorithm cut the run time from X hours to only Y minutes" kind of thing. Actual run times will be hard to duplicate by anyone else, but th scaling in run times might be achieved. The RC emphasis is on idiomatic code rather than the absolute fastest code. Most examples will not take hours to run in most programming languages, and most examples don't need timings.

Thanks for your examples :-)
--Paddy3118 (talk) 23:09, 12 December 2018 (UTC)

Your Extensible Prime Generator Pascal contribution

You have obviously gone to great lengths to produce a fast version of this using Free Pascal, which caused me to look up what it is and found it interesting that it is a reincarnation and modernized version of Turbo Pascal/Delphi (most of the current contributors seem to have been not more than about 10 years old when Turbo Pascal first appeared!). Although I used Turbo Pascal commercially in about 1983 and later learned Delphi, it didn't cause me to want to abandon my current stable of more modern languages to take Pascal up again. But your taste seems to differ.

So as to your Sieve of Eratosthenes implementation:

  1. if I were you I would first contribute a version based on a hash map or deferred execution stream before opening up with the heavy guns; perhaps a version something like the Dart or Kotlin versions would suffice. This would satisfy those just looking for something that will help them solve Euler problems.
  2. There isn't a lot of point in making the generator much faster than where you are now as the time to enumerate found primes will be the main bottleneck - your version has to make two function calls per enumerated prime to the `NextPrime` and `PosOfPrime` functions and along with some processing by each, will take about 100 CPU cycles per call or something about a second for the primes to a billion and about a hundred times that for the primes to a hundred billion even if it takes zero time to do the actual sieving work. You could reduce the number of function calls per prime found by just maintaining a local counter variable of the primes found, which might speed enumeration up some.
  3. Thus, you can't ever compare your generator against something like Kim Walisch's primesieve which in its fastest mode doesn't enumerate but uses a popcount just to sum the bits in the bit-packed culled arrays.
  4. As to your general culling speed, you seem to know about using a bit-packed array (or maybe Pascal has a built-in version of a BitArray, I don't remember) and you use the optimization of filling from an array pre-culled by the small prime values to save about 70% of the work, but you still have a long way to go in improved culling unrolling techniques (a factor of more than two), maximum 2/3/5/7 wheel factorization (a further factor of three), multi-threading, and many other techniques as the ranges get larger including double culling, first by CPU L1 sizes and then a succession of those to make the CPU L2 cache sizes, and then a "bucket sieve" where that plays out, etc.
  5. I had a quick look in your code to see if you were setting the buffer size correctly, and notices a reference to "64K"; most CPU's have a maximum L1 DATA cache of 32 Kilobytes, and you shouldn't count on that if you are considering multi-threading on a hyper-threaded machine as each thread can then only use half - one can really only count on 16KB, which is why other techniques are required.
  6. At the top of the faster version you say "there is something wrong". What does that mean? Does the code have errors or inconsistencies? If so, you should state what the problem is.
  7. Ignoring the limitations of the "generator", it is possible to match or even beat "primesieve" using the techniques I mention above. I recently saw a post about a "ktprime" algorithm in C on Github that appears to do that, but it is about 2500 obscure lines of code and you may not want to look at what it does. I am currently working on a version that does the same and probably won't be much more than about 500 lines of easy-to-read Nim code or even less lines in harder to read Haskell code (if you aren't a functional programmer). If you are interested, let me know and I'll post a link here when one or the other is complete.
  8. I originally started some of this work in Google's Go language, but stopped working with that, as other than being easy to learn with a really nice to use co-routine (go routines) system, I find it to feel quite "old" to use and it really doesn't produce all that fast code since they decided to write their own compiler. I've learned that the fasted languages are those that either are just front ends to C as is Nim, or that use LLVM as can Haskell and as Rust and Julia do, as otherwise they are just re-inventing what has already been done so well and for so many years by the others. I don't know if Free Pascal attempts its own compiler or if it compares in raw speed with the others, and as I say I don't care enough to install it and try to find out. But you likely do ;)


I chose Haskell because I like it and wanted to show that it can produce code as fast as C/C++ and chose Nim because it is of a modern concise syntax yet is really just a sanitized pre-processor to C and thus is fairly easy to read for most more imperative programmers; also Nim needs a little more love as compared to the more well known new languages such as Rust

BTW, a good way to make speeds more comparable in a CPU agnostic way is to post them as CPU clock cycles per cull operation and CPU clock cycles per prime found - that removes much of the variation between comparisons that people using your examples would have although for some algorithms on some CPU's there is some variation be up to a factor of two due to the abilities of the CPU such as simultaneous micro instruction execution (not including hyper-threading or multi-threading).--GordonBGood (talk) 08:06, 6 January 2019 (UTC)

Return to the user page of "Horst.h".