I know that counting bits using precomputation is the fastest way to go (over __builtin_popcount()
).
For 32 bit integers we can do cnt[x>>16]+cnt[x&65535]
by precomputing counts upto (1<<16)
.
How can we do it for 64 bit integers by precomputing upto (1<<22)
.
I don't understand, why can't we just do
cnt[(x>>48)&65535]+cnt[(x>>32)&65535]+cnt[(x>>16)&65535]+cnt[x&65535]
? or is this too slow? If we precompute upto (1<<22) then it would becnt[(x>>44)&4194303]+cnt[(x>>22)&4194303]+cnt[x&4194303]
. Though, I don't know if this is the fastest way.Thank you, just wanted to confirm this.