Disclaimer: Make sure to get yourself comfortable before reading this comment. I was intrigued by the earlier discussion on sieves and decided to benchmark some sieves. There are 2 that I benchmarked with surprising differences in running time. One of them can be found here. https://cp-algorithms.com/algebra/sieve-of-eratosthenes.html. Under the "Different optimizations of the Sieve of Eratosthenes". The optimization is sieving to the root. Another one is Nisiyama_Suzune's https://codeforces.me/blog/entry/54090. Under the section "Linear sieve". It is a direct copy-paste from the CodeForces blog. Now, here comes the surprising part. https://cp-algorithms.com/algebra/prime-sieve-linear.html. Under the section "Runtime and Memory" it says "and the optimized versions of the sieve run as fast as the algorithm given here." Note: the optimized versions is the one that I used in my benchmarking. This states that the optimized sieve is also linear. Okay, now it really comes the surprising part. When I benchmarked both functions, the optimized sieve or the CP-Algorithm's Optimized Sieve ran more than 2x faster than the linear sieve in the CF blog. Why is this true? Both sieves are linear yet one is more than 2x faster than the other. This is pretty weird. Here you can find the benchmark. https://ideone.com/MfaJCF. In this code N = 1e8 which is pretty big. Now if I try on smaller N like N = 1e7 or N = 1e5, the differences are even larger. For N = 1e7, the sieve in CP Algo's is more than 3x faster than the CF blog implementation. https://ideone.com/b7jjyO. As you can see, the only difference between the two benchmarks is the value of N. Now if I try N = 1e6, the sieve from CP algos is 4.56x faster than the other implementation. Now, I haven't tried it yet but I imagine when N = 1e5 or smaller, the difference is massive. I know this is pretty long, happy reading.
Using the other implementation that is not improved in the same CP Algorithms article produces surprising results. The CF blog and the $$$O(n \log \log n)$$$ implementation of sieve from CP algos have basically the same execution time when running it. Now this makes me think, "Is the CF implementation of linear sieve linear?"
What are your thoughts about this?