Hello.
Reading the article from e-maxx (ALERT! RUSSIAN!). A piece of code ("radix sort of equivalence classes" part):
for (int i=0; i<n; ++i) {
pn[i] = p[i] - (1<<h);
if (pn[i] < 0) pn[i] += n;
}
memset (cnt, 0, classes * sizeof(int));
for (int i=0; i<n; ++i)
++cnt[c[pn[i]]];
for (int i=1; i<classes; ++i)
cnt[i] += cnt[i-1];
for (int i=n-1; i>=0; --i)
p[--cnt[c[pn[i]]]] = pn[i];
Why p[] contains correct order of cyclic shifts of length 2k after?