EmptySoulist's blog

By EmptySoulist, history, 4 years ago, In English

One day I thought of an interesting problem :

“ Given $$$n$$$ elements that are initially $$$0$$$, operate $$$k$$$ times, each time will randomly select an element, increase it by $$$1$$$, and find the expected value of the maximum value among the $$$n$$$ elements”

I used to think it was a very simple problem (at the beginning), but actually I couldn't find any good solution (at the beginning, I only knew an $$$O(k^2\ln k)$$$ solution)

After asking some powerful people, I learned one way to solve this problem in $$$O(\textrm{poly}~k^{\frac{5}{3}})/O(n\cdot k^{1.5})$$$. (thanks them.)

It is unimaginable to me that the solution of a seemingly simple problem is so difficult, Could any one find a better solution? Thank you !

  • Vote: I like it
  • +67
  • Vote: I do not like it

»
4 years ago, # |
  Vote: I like it 0 Vote: I do not like it

Can you brief the solutions you have got till now?

  • »
    »
    4 years ago, # ^ |
      Vote: I like it +6 Vote: I do not like it

    Sorry, my English is really poor, it's hard for me to explain it.

    let $$$F_z(x)=\sum_{i=0}^{z}\frac{x^i}{i!}$$$, we need to count $$$[x^k]F_z(x)^{n}$$$ for $$$z=0,1,2...k$$$

    first, we can count $$$[x^k]F_z(x)^n$$$ in $$$O(k\log k)$$$

    the main idea is if we know $$$F_z(x)$$$, we can get $$$[x^k]F_{z+j}(x)$$$ in $$$\mathcal O(j\frac{k^2}{z^2})$$$

    to get $$$\mathcal O(\textrm{poly}~k^{\frac{5}{3}})$$$ solution, we need to count $$$F_t(x)^n$$$ for $$$t = 1,2,3...cnt_1$$$, and then count $$$F_{i\cdot cnt_2+cnt1}(x)^n$$$ for each $$$i=0,1,2...\frac{k}{cnt_2}$$$, then use the $$$\mathcal O(j\frac{k^2}{z^2})$$$ algorithm to count this answer for each $$$[x^k]F_{z+j}(x)$$$

    we can get an $$$\mathcal O(\textrm{poly}~k^{\frac{5}{3}})$$$ solution by choose right $$$cnt_1,cnt_2$$$

»
4 years ago, # |
Rev. 2   Vote: I like it +30 Vote: I do not like it

This is the expected worst-case lookup time in a naively implemented hash table with uniformly random hash function.

The distribution of the variable under the expectation has been well-researched in the randomized algorithms literature. For $$$k = n$$$, most courses on the topic prove this value is $$$\frac{\log n}{\log \log n}(1 + o(1))$$$ with high probability.

For an exact expected value in the general $$$k \neq n$$$ case, see the following papers:

  • »
    »
    4 years ago, # ^ |
      Vote: I like it +11 Vote: I do not like it
  • »
    »
    4 years ago, # ^ |
    Rev. 3   Vote: I like it 0 Vote: I do not like it

    Digression: if the elements in your problem were independent, controlling the maximum value would be easy. One way of doing this: introduce the variable $$$Y(t) = \text{number of elements greater or equal than } t$$$, and then $$$Y(t)$$$ is a sum of independent Bernoulli variables, and you can calculate $$$\mathbb P[Y(t) > 0]$$$ for every $$$t$$$.

    Now, the elements obviously aren't independent. But it is less known that you can still use more or less the same bounds, because the summed variables exhibit negative corellation.

    I can't really find a good online introduction to the topic, though. Maybe this is useful for advanced readers