In the most recent contest, Round #672 (Div. 2), I have identical submissions which TLE on PyPy 3.6, but are accepted in Python 3.
Problem C2:
PyPy 3.6: 93706296 (TLE)
Python 3: 93706871 (Accepted, 1871ms)
Problem D:
PyPy 3.6: 93722395 (TLE)
Python 3: 93721887 (Accepted, 1964ms)
Unfortunately, I did not realize to submit D on Python 3 before the end of the contest, so I did not get points for it.
We all know that PyPy is supposed to be faster, which is why the submission box suggests "Almost always, if you send a solution on PyPy, it works much faster" when you try to submit in Python.
However, this is not the first time I have had a solution only get accepted in Python, and having it happen twice in one contest is very frustrating, to say the least. In particular, having to guess which language to submit under is a great way to accumulate unnecessary penalty, and lower your ranking.
I'm wondering if anyone knows for what type of programs will PyPy tend to run slower than Python, and why this is the case. Any insight would be very helpful, thanks!
Pypy3 has slower unicode so you want to read as bytes instead. I think both your solutions will pass if you just use
input = io.BytesIO(os.read(0, os.fstat(0).st_size)).readline
. You can also use pyrival's fastio template: https://github.com/cheran-senthil/PyRival/blob/master/templates/template_py3.py. I also TLEd a bunch for this contest and fast output with os.write bytes actually did help pass a few more cases (though my problem was ultimately elsewhere).For D, tuple sort is also slower in pypy but you didn't have that problem. So just a fyi for others, the solution is to use several rounds of stable sort instead: https://github.com/cheran-senthil/PyRival/blob/master/pyrival/misc/ordersort.py Or pack it into a int and extract things back out with mod (watching out to not exceed 2^31 since codeforces is 32 bits).
Modulo multiplication is also sometimes too slow because the multiplication result doesn't fit in 32 bits (also wasn't needed for this problem). You need some crazy pypy int op hacks to force it to not use bigints: https://github.com/cheran-senthil/PyRival/blob/master/pyrival/misc/mod.py To debug if you're getting screwed by big ints you can do
__pypy__.internal_repr(num)
since python 3 no longer distinguish between int or long(bigint) types.EDIT: yep both passed https://codeforces.me/contest/1420/submission/93729744 https://codeforces.me/contest/1420/submission/93729753 TLEing for wrong fast io template is :(
I was not familiar with that version of fast input, thank you for sharing! Having I/O take up 80% of runtime (on D) is a bit unfortunate :/
Thanks for the FYI part. I was doing sort on tuples and had no idea what is going wrong. Thanks for the information.
Recently I have been also noticing this. There is no exact pattern. But I made some observations. For n>=10**5 python O(n) solutions are faster. It may not be true for all cases i did get tle using pypy in O(n) solution. However python never gives TLE for O(n). Now if there are nested loops and the complexity is like O(n^2) python does not work. But again if I can break out of the inner loop using a while loop python 93734638 works faster than pypy 89529007. In short i think pypy works well in straight forward brute force (O(n^2) and above).
I recently noticed that sorting tuples in PyPy is slow and could lead to TLE.
One of the cases I have noticed when pypy is slower is repeatedly appending to string. something like :
this should be expected to be slow, because unlike C++ strings are immutable in python so everytime you try to append a character at end python needs to copy whole string to new string with extra char you are appending. But python have some optimisations which speeds up such operations, while pypy doesn't have these optimisations. Thus whenever I have repeated string concat I submit using python instead of pypy. for example:
accepted and TLE
accepted and TLE
another workaround obviously is to store strings as list of characters, since lists are mutable.
Yeah, cpython makes an optimization for that case. pypy does not. It's a dumb thing to do anyway, don't do it. :P
input can be optimised by doing what at.f said as it reads in one go
When the loops are large and output is small use pypy3
When the loops are small(about 5*10**6) and output is large use python3
When the loops are large and output is also large you can do this thing in pypy3
print(a,flush=False)
sys.stdout.write(a)
or by printing all the output in one go
For example take a look at this code which took 1.55 seconds in pypy3
from time import time;
a=time()
for i in range(10**9):
for i in range(10**4):
b=time()
print(b-a)
Now we can optimise by using print("YES",flush=False) which does it in 1 second
from time import time;
a=time()
for i in range(10**9):
for i in range(10**4):
b=time()
print(b-a)