RodionGork's blog

By RodionGork, history, 13 months ago, In English

Regard a simple problem of adding elements to a "list", every time into random place, e.g. in Python:

arr = []
for i in range(1000000):
    pos = random.randint(0, len(arr))
    value = random.randint(1000000, 9999999)
    arr.insert(pos, value)

With what data structure it can be effective? Linked list will allow O(1) insertion, but to find a place where to insert, we need to run O(n) from start. Some variants of trees may allow O(log(N)) insertion but it seems finding the proper index is equally difficult.

Wikipedia suggest Skip List with "minor modification" is what we want. Are there another structures (e.g. perhaps some modifications to some type of tree, perhaps, cartesian) useful for this task?

  • Vote: I like it
  • +29
  • Vote: I do not like it

»
13 months ago, # |
  Vote: I like it +16 Vote: I do not like it

There exists a data structure called "Sorted list". This is what I would normally use when I need a sorted data structure in Python. I have an implementation of it up on pyrival . It runs in O(n^(1/3) per operation, with a really small constant factor.

Here is a modified version of the sorted list data structure that works like a normal Python list, but with fast insert()s and pop()s.

FastInsertList
  • »
    »
    13 months ago, # ^ |
      Vote: I like it 0 Vote: I do not like it

    Hi, and thanks a lot!

    I have difficulty finding what exactly you call "Sorted List" — is it something like TreeMap in java or OrderedDict in Python?

    Seemingly not exactly... I'll dive into the example you kindly created to try figure this out!

    • »
      »
      »
      13 months ago, # ^ |
      Rev. 3   Vote: I like it 0 Vote: I do not like it

      I have difficulty finding what exactly you call "Sorted List"

      Sorted list is a data structure that behaves like a list. The exception is that if you insert elements into the sorted list, then they get automatically sorted. See this example

      A = SortedList()
      A.insert(30)
      A.insert(50)
      A.insert(20)
      A.insert(30)
      A.insert(30)
      
      print(A) # prints [20, 30, 30, 30, 50]
      print(A[-1]) # prints 50
      print(A.pop(1)) # prints 30
      print(A) # prints [20, 30, 30, 50]
      print(A.count(30)) # prints 2
      

      The underlying data structure is based on insertion sort on a list of lists. It stores the sorted data in blocks of size <= n^(1/3). When a block grow bigger than n^(1/3), it splits that block in two. This is what allows it to do inserts in amortized O(n^1/3) time.

      The story behind this data structure is that I came up with the idea for it and implemented it couple of years ago. However, I later found out that I was not the first to invent it. For example, see this . They called it a SortedList, so that is why I call it sorted list too.

  • »
    »
    13 months ago, # ^ |
      Vote: I like it +8 Vote: I do not like it

    python users before c++ sets fr

»
13 months ago, # |
Rev. 2   Vote: I like it +20 Vote: I do not like it

Yeah, treap (modified Cartesian tree) can do both operations in $$$O(log(N))$$$ on average

  • »
    »
    13 months ago, # ^ |
      Vote: I like it 0 Vote: I do not like it

    But how to maintain indices? I'm aware treap allows O(log(N)) insertion, but after every insertion we need to update all X for elements after the inserted one, right?

    • »
      »
      »
      13 months ago, # ^ |
        Vote: I like it 0 Vote: I do not like it

      Correct me if I'm wrong, but Treaps will allow you to do exactly what you want, online, as well as when the queries are mixed:

      • Insert query in O(log(N)): Just insert a single node treap at position k. To get to position k you maintain the size of each subtree in the treap. And update them accordingly when you modify the treap.
      • Get by index query in O(log(N)): Just go down the treap and by knowing the subtree size of the left child, you know if the index is in the left child, or in the right. If it's in the right, don't forget that the new index is id - sz[left] to offset it.

      Most of the time, if you can't come up with a modification of a treap, what you can do is for each sqrt(Q) queries, you can iterate across the whole treap (which takes O(N)) time and update proper indexes in some other array. And in the queries in between these sqrt(Q) events, you will remember the newly inserted indexes by brute force (at most sqrt(Q) of them) and thus solve the problem in something like O((N + Q) sqrt(Q) + Q log(N)), which will be fast enough for most problems.

      • »
        »
        »
        »
        13 months ago, # ^ |
          Vote: I like it +3 Vote: I do not like it

        Thank you, seemingly I'm just quite poorly acquainted with Treap idea and need to go and study it bit harder :)

    • »
      »
      »
      13 months ago, # ^ |
        Vote: I like it +10 Vote: I do not like it

      But how to maintain indices?

      That's the thing, you don't have to maintain them, otherwise you indeed will need to do $$$O(N)$$$ updates. You have to deal with them implicitly and that's why the version of treap which you need is called implicit treap.

      I'm aware treap allows O(log(N)) insertion, but after every insertion we need to update all X for elements after the inserted one, right?

      Every tree allows $$$O(log(N))$$$ insertion but treap can do more than just that :)

»
13 months ago, # |
  Vote: I like it +15 Vote: I do not like it

Think of a structure similar to a binary search tree, but with random distinct values as the key. By carefully translating this structure to use an "implicit" method of randomness, we can implement random insertion in expected $$$O(\log n)$$$ operations (including random sampling). It is done like this.

For each node, maintain how many candidates of leaf nodes exist for the left subtree and the right subtree. Let us assume $$$p$$$ candidates exist for the left subtree, and $$$q$$$ for the right subtree. Sample an integer $$$x$$$ from the $$$[1,p+q]$$$ range. If $$$x \le p$$$, move down to the left subtree. Otherwise, move down to the right subtree. Repeat until you meet a leaf node candidate. Insert your item there and update the number of candidates correspondingly.

This chooses $$$N!$$$ orders equiprobably, with expected $$$O(\log n)$$$ operations per insertion. The proof of expected time complexity is not very hard, it's basically similar to proving the time complexity of random pivot Quicksort.

  • »
    »
    13 months ago, # ^ |
    Rev. 2   Vote: I like it 0 Vote: I do not like it

    Besides the actual structure itself, doesn't the pseudocode you have provided produce the exact same results with the following procedure?

    • Generate $$$n$$$ random values in an array.
    • Shuffle the array.

    Considering both procedures produce $$$N!$$$ different orders when the values are distinct, and each output is equiprobable as well, I would say that the two are equivalent.

»
13 months ago, # |
  Vote: I like it +10 Vote: I do not like it

If you have all insert queries before all get queries, you can just append them into an array, storing info at which position they sould've been added. Then, when insert phase is finished, you can reorder the elements properly. This can be easily done in reverse order in $$$O(n \log n)$$$: use a segment tree to track which positions are taken already and to compute into which place the current element should go.

  • »
    »
    13 months ago, # ^ |
      Vote: I like it 0 Vote: I do not like it

    Cunning trick, perhaps not that much for practical use, but ideal for a small puzzle :) thanks a lot!

»
13 months ago, # |
  Vote: I like it 0 Vote: I do not like it

There's a non-standard data structure called a rope that can do these and much more, which is implemented as a libstdc++ extension, but sadly seems to have bugs.

For more info, see this and this.

It can be implemented in Python too, but I'm guessing you're looking for something more ready-to-use. Just mentioned it here because it is quite a nice data structure to know about.

»
13 months ago, # |
  Vote: I like it 0 Vote: I do not like it

In C++, you can use a rope. It will work in O(log(n)) time for random insertion, albeit with a large constant factor. rope

If you want to implement a random insertion data structure yourself, you can use a splay tree for O(log(n)) amortized insertion. splay tree

A treap would also work. treap

Both treaps and splay trees are very versatile and can be extended to do much more than just random insertion.

  • »
    »
    13 months ago, # ^ |
    Rev. 2   Vote: I like it 0 Vote: I do not like it

    A good thread-unsafe implementation of rope is usually pretty fast. The main issue with the usual implementation is that it is required to be thread safe, which it does by involving mutexes.

    • »
      »
      »
      13 months ago, # ^ |
        Vote: I like it 0 Vote: I do not like it

      Ah, I was basing my observation off of personal experience with the SGI rope. I'm not sure if there's a thread-unsafe implementation that's easily dropped in for competitive programming.