I tried this problem: https://codeforces.me/contest/1136/problem/D.
And came up with a solution similar to the solution in the editorial.
But I first submitted my Python code in both Python3 and PyPy3 and had TLE.
But when I submitted an ALMOST equivalent code in C++, it got accepted.
I wonder what's wrong in my python code or is it the compiler's/interpreter's doing?
Python Code 1 : https://codeforces.me/contest/1136/submission/51256343.
Python Code 2 : https://codeforces.me/contest/1136/submission/51262124.
C++ code: https://codeforces.me/contest/1136/submission/51257855.
Thanks in advance.
Erm. Didn't you know that python runs slower than my grandma can crawl?
Yeah, you are right! But I have been using Python for a long time and feel that this problem is more than just a comparison between the speeds of two.
C++ is a lot faster than Python, so there are some problems which you can't solve in Python because it's not fast enough. I think that's the problem.
Are you sure that's the only problem? Because I usually solve problems in PyPy3/Python and never had such problem.
I'm not sure, but I think Python can do about 10^6 operations in a second. I saw your Python code and, correct me if I'm wrong, but I think the time complexity of your code is not good enough if you use Python.
Can you explain how the time complexity is not good enough?
I'm just saying, you seem really stubborn about using Python/PyPy for your solutions. This is really going to be a problem for harder problems, when the model solution uses about $$$10^8$$$ operations and no amount of PyPy optimization will help. The setters don't even guarantee that all problems can be solved in TL with Python.
If you are serious about solving more problems, you are going to have to switch to C++ or Java eventually.
Firstly pretty much always use pypy, it is a lot quicker than cpython in pretty much every way.
But the actual reason you get TLE is that the built in IO is very slow, especially when you have to read in multiple lines. One way to speed up the input is to do
input = io.BytesIO(os.read(0, os.fstat(0).st_size)).readline
This reads in the input once, and then stores it in a stream. This one-liner will speed up your code immensely. With it your code runs in 1044 ms. There are also similar tricks you can do to speed up the output, but that wasn't necessary in this case.
Thanks a lot! Also, can you give me some resources to other tricks to speed up io?
Most of the tricks I've been using just comes from me and c1729 trying out different methods. He has got a guide but I think it might be a bit outdated at this time.
About IO, as a general rule of thumb, read the entire input in one go, write the entire output in one go. IO in python doesn't have to be slow, it is just the built in stuff that is slow. For example take this huge IO SPOJ problem. I've been able to do it in 0.10 s on pypy, while the quickest Java solution takes 0.17 s.
I think that for input using
input = io.BytesIO(os.read(0, os.fstat(0).st_size)).readline
will be good enough for most problems. And for output I think that using the pypy feature __pypy__.builders.StringBuilder is very good. Here is an example of using a string builder for output:
This takes around $$$420 \mathrm{ms}$$$ in pypy3 and $$$200 \mathrm{ms}$$$ in pypy2. (this is including the upstart time of around 100 ms that pypy got, so the print especially in pypy2 is super quick).
The reason why pypy2 is quicker is that python 2 uses bytestrings but in python 3 they've switched from using bytestrings to using unicode-strings by default, this drastically slows down the IO. So in python3 you have to manually switch to using bytestrings in order to get fast IO (this is done with .encode(), but unfortunately encode is pretty slow). Bytestrings are also one of the reasons os.read is so quick.
Using these methods you should see drastically better speeds in heavy IO problems.
Thanks, updated the blog post with our most recent io techniques and some explanations.
That was really helpful. Thank you so much.
Hey, I use the following template (was copied from someone's solutions and is probably owned by you). Can you tell me if the following one is faster or the one you mentioned above ?
What you got there is my general template. It should be both about as fast as possible while being very general. It works both in python2 and 3, and it even works well on interactive problems.
Thanks, One more question. Can you reason out why exactly same code gives runtime error on some cases when using pypy while passess the same testcase when using python3 ? I mean, in general what could be the possible reason ?
You should just avoid doing any type of deep recursion in Python in general. Python was not made with deep recursion in mind.
There are ways to solve graph problems iteratively, without ever calling a function. If you want to code in Python you should probably learn to do it that way.
If you really want to do recursion then there is still a way to do it that works fairly well. A while back I made https://github.com/cheran-senthil/PyRival/blob/master/pyrival/misc/bootstrap.py, this thing might look weird, but with it I can make your code AC 80648471. This is what I would use if I need to do deep recursion.