Regard a simple problem of adding elements to a "list", every time into random place, e.g. in Python:
arr = []
for i in range(1000000):
pos = random.randint(0, len(arr))
value = random.randint(1000000, 9999999)
arr.insert(pos, value)
With what data structure it can be effective? Linked list will allow O(1)
insertion, but to find a place where to insert, we need to run O(n)
from start. Some variants of trees may allow O(log(N))
insertion but it seems finding the proper index is equally difficult.
Wikipedia suggest Skip List with "minor modification" is what we want. Are there another structures (e.g. perhaps some modifications to some type of tree, perhaps, cartesian) useful for this task?
There exists a data structure called "Sorted list". This is what I would normally use when I need a sorted data structure in Python. I have an implementation of it up on pyrival . It runs in O(n^(1/3) per operation, with a really small constant factor.
Here is a modified version of the sorted list data structure that works like a normal Python list, but with fast insert()s and pop()s.
Hi, and thanks a lot!
I have difficulty finding what exactly you call "Sorted List" — is it something like
TreeMap
in java orOrderedDict
in Python?Seemingly not exactly... I'll dive into the example you kindly created to try figure this out!
Sorted list is a data structure that behaves like a list. The exception is that if you insert elements into the sorted list, then they get automatically sorted. See this example
The underlying data structure is based on insertion sort on a list of lists. It stores the sorted data in blocks of size <= n^(1/3). When a block grow bigger than n^(1/3), it splits that block in two. This is what allows it to do inserts in amortized O(n^1/3) time.
The story behind this data structure is that I came up with the idea for it and implemented it couple of years ago. However, I later found out that I was not the first to invent it. For example, see this . They called it a SortedList, so that is why I call it sorted list too.
python users before c++ sets fr
Yeah, treap (modified Cartesian tree) can do both operations in $$$O(log(N))$$$ on average
But how to maintain indices? I'm aware treap allows
O(log(N))
insertion, but after every insertion we need to update allX
for elements after the inserted one, right?Correct me if I'm wrong, but Treaps will allow you to do exactly what you want, online, as well as when the queries are mixed:
Insert
query inO(log(N))
: Just insert a single node treap at positionk
. To get to positionk
you maintain the size of each subtree in the treap. And update them accordingly when you modify the treap.Get by index
query inO(log(N))
: Just go down the treap and by knowing the subtree size of the left child, you know if the index is in the left child, or in the right. If it's in the right, don't forget that the new index isid - sz[left]
to offset it.Most of the time, if you can't come up with a modification of a treap, what you can do is for each
sqrt(Q)
queries, you can iterate across the whole treap (which takesO(N)
) time and update proper indexes in some other array. And in the queries in between thesesqrt(Q)
events, you will remember the newly inserted indexes by brute force (at mostsqrt(Q)
of them) and thus solve the problem in something likeO((N + Q) sqrt(Q) + Q log(N))
, which will be fast enough for most problems.Thank you, seemingly I'm just quite poorly acquainted with Treap idea and need to go and study it bit harder :)
That's the thing, you don't have to maintain them, otherwise you indeed will need to do $$$O(N)$$$ updates. You have to deal with them implicitly and that's why the version of treap which you need is called implicit treap.
Every tree allows $$$O(log(N))$$$ insertion but treap can do more than just that :)
Think of a structure similar to a binary search tree, but with random distinct values as the key. By carefully translating this structure to use an "implicit" method of randomness, we can implement random insertion in expected $$$O(\log n)$$$ operations (including random sampling). It is done like this.
For each node, maintain how many candidates of leaf nodes exist for the left subtree and the right subtree. Let us assume $$$p$$$ candidates exist for the left subtree, and $$$q$$$ for the right subtree. Sample an integer $$$x$$$ from the $$$[1,p+q]$$$ range. If $$$x \le p$$$, move down to the left subtree. Otherwise, move down to the right subtree. Repeat until you meet a leaf node candidate. Insert your item there and update the number of candidates correspondingly.
This chooses $$$N!$$$ orders equiprobably, with expected $$$O(\log n)$$$ operations per insertion. The proof of expected time complexity is not very hard, it's basically similar to proving the time complexity of random pivot Quicksort.
Besides the actual structure itself, doesn't the pseudocode you have provided produce the exact same results with the following procedure?
Considering both procedures produce $$$N!$$$ different orders when the values are distinct, and each output is equiprobable as well, I would say that the two are equivalent.
If you have all
insert
queries before allget
queries, you can just append them into an array, storing info at which position they sould've been added. Then, wheninsert
phase is finished, you can reorder the elements properly. This can be easily done in reverse order in $$$O(n \log n)$$$: use a segment tree to track which positions are taken already and to compute into which place the current element should go.Cunning trick, perhaps not that much for practical use, but ideal for a small puzzle :) thanks a lot!
There's a non-standard data structure called a rope that can do these and much more, which is implemented as a libstdc++ extension, but sadly seems to have bugs.
For more info, see this and this.
It can be implemented in Python too, but I'm guessing you're looking for something more ready-to-use. Just mentioned it here because it is quite a nice data structure to know about.
In C++, you can use a rope. It will work in O(log(n)) time for random insertion, albeit with a large constant factor. rope
If you want to implement a random insertion data structure yourself, you can use a splay tree for O(log(n)) amortized insertion. splay tree
A treap would also work. treap
Both treaps and splay trees are very versatile and can be extended to do much more than just random insertion.
A good thread-unsafe implementation of rope is usually pretty fast. The main issue with the usual implementation is that it is required to be thread safe, which it does by involving mutexes.
Ah, I was basing my observation off of personal experience with the SGI rope. I'm not sure if there's a thread-unsafe implementation that's easily dropped in for competitive programming.