One day I thought of an interesting problem :
“ Given $$$n$$$ elements that are initially $$$0$$$, operate $$$k$$$ times, each time will randomly select an element, increase it by $$$1$$$, and find the expected value of the maximum value among the $$$n$$$ elements”
I used to think it was a very simple problem (at the beginning), but actually I couldn't find any good solution (at the beginning, I only knew an $$$O(k^2\ln k)$$$ solution)
After asking some powerful people, I learned one way to solve this problem in $$$O(\textrm{poly}~k^{\frac{5}{3}})/O(n\cdot k^{1.5})$$$. (thanks them.)
It is unimaginable to me that the solution of a seemingly simple problem is so difficult, Could any one find a better solution? Thank you !
Can you brief the solutions you have got till now?
Sorry, my English is really poor, it's hard for me to explain it.
let $$$F_z(x)=\sum_{i=0}^{z}\frac{x^i}{i!}$$$, we need to count $$$[x^k]F_z(x)^{n}$$$ for $$$z=0,1,2...k$$$
first, we can count $$$[x^k]F_z(x)^n$$$ in $$$O(k\log k)$$$
the main idea is if we know $$$F_z(x)$$$, we can get $$$[x^k]F_{z+j}(x)$$$ in $$$\mathcal O(j\frac{k^2}{z^2})$$$
to get $$$\mathcal O(\textrm{poly}~k^{\frac{5}{3}})$$$ solution, we need to count $$$F_t(x)^n$$$ for $$$t = 1,2,3...cnt_1$$$, and then count $$$F_{i\cdot cnt_2+cnt1}(x)^n$$$ for each $$$i=0,1,2...\frac{k}{cnt_2}$$$, then use the $$$\mathcal O(j\frac{k^2}{z^2})$$$ algorithm to count this answer for each $$$[x^k]F_{z+j}(x)$$$
we can get an $$$\mathcal O(\textrm{poly}~k^{\frac{5}{3}})$$$ solution by choose right $$$cnt_1,cnt_2$$$
This is the expected worst-case lookup time in a naively implemented hash table with uniformly random hash function.
The distribution of the variable under the expectation has been well-researched in the randomized algorithms literature. For $$$k = n$$$, most courses on the topic prove this value is $$$\frac{\log n}{\log \log n}(1 + o(1))$$$ with high probability.
For an exact expected value in the general $$$k \neq n$$$ case, see the following papers:
http://library.cust.edu.pk/ACM/Disk_1/text/2-6/ICDE88/P362.pdf
For the first document
Digression: if the elements in your problem were independent, controlling the maximum value would be easy. One way of doing this: introduce the variable $$$Y(t) = \text{number of elements greater or equal than } t$$$, and then $$$Y(t)$$$ is a sum of independent Bernoulli variables, and you can calculate $$$\mathbb P[Y(t) > 0]$$$ for every $$$t$$$.
Now, the elements obviously aren't independent. But it is less known that you can still use more or less the same bounds, because the summed variables exhibit negative corellation.
I can't really find a good online introduction to the topic, though. Maybe this is useful for advanced readers
Thank you very much!