Help with Python - Codeforces

#	User	Rating
1	jiangly	3898
2	tourist	3840
3	orzdevinwang	3706
4	ksun48	3691
5	jqdai0815	3682
6	ecnerwala	3525
7	gamegame	3477
8	Benq	3468
9	Ormlis	3381
10	maroonrk	3379

#	User	Contrib.
1	cry	168
2	-is-this-fft-	165
3	Dominater069	161
4	Um_nik	159
4	atcoder_official	159
6	djm03178	157
7	adamant	153
8	luogu_official	151
9	awoo	149
10	TheScrasse	146

My issue concerns 1840D - Wooden Toy Festival. Within the contest I used an approach similar to the one given by the authors of the problem in tutorial, and the first step was to sort an input array and don’t consider duplicates. As usual, I used a combination sorted(set(a)), which always worked well and seemed to me quite reliable. The submission 208933459 successfully passed all pretests. However, then I got a hack with TL exceed. After the contest I submitted an almost identical submission 208932147 with the only one exception: I replaced the combination above with the following block:

a.sort()
b = [a[0]]
for i in a:
	if i != b[-1]:
		b.append(i)

As I understand it, it performs no better. The first step of sorting the array works with $$$O(n \log(n))$$$ complexity, and the second one of removing duplicates with $$$O(n)$$$. However, with the such replacement my solution passed all the tests. Could you explain why this is so? And does it mean that henceforth it’s better to use something like the block above instead of using combination sorted(set(a)))? I would appreciate a lot!

Comments (8)

Write comment?

denilb

20 months ago, # |

As a python user, my advice is to avoid using set() in CP. It doesn't perform well on large datasets

→ Reply

drugkeeper

20 months ago, # ^ |

Then what do you suggest? Sometimes we have no choice but to use a hashset structure?

When O(1) lookup is required, I use a dict, never got an issue with it. defaultdict performs bad as well, I use dict.get(x, y) instead.

iN_siDious

← Rev. 2 →

Your code is working fine in python3 (287 ms). Its only giving tle for pypy3. Eager to see if someone can tell why this is so :)

pypy3 and python use different hashing algorithms for set(), which is why the testcase engineered to kill pypy3 set() solutions will only work on pypy3.

This is due to the input being crafted to make python set() have collisions which will make the code run in n^2 time for set().

I fixed it by shuffling the array before i do sorted(set(a)): https://codeforces.me/contest/1840/submission/208991870

This would likely not happen in div1 / div2 rounds, only where rounds with a lot of time to hack (12h open hacking phase), people would try to hash collide. I still feel that this is quite dumb as this will be added to the system tests?

To the rest reading this, what are your opinions?

Nice!

rmr

It was indeed very informative, thanks a lot! But I didn't quite grasp an idea about how shuffling actually affects this situation? Since Python set() is implemented with using hash-table, lookups work with O(1) in average, but when it's up to collisions, according to Fluent python, its handling works somewhat like an Open addressing method, doesn't it? And does it make any difference in which order to get the hashes if it leads to collisions anyway?

rmr's blog