In yesterday's contest's problem D2, my PyPy3 code runs in 4150ms and the same PyPy2 code runs in 2700ms. Can someone please explain why? And in what cases, PyPy2 is faster than PyPy3?
pajenegod c1729 Kiri8128 conqueror_of_tourist can help maybe.
# | User | Rating |
---|---|---|
1 | jiangly | 3976 |
2 | tourist | 3815 |
3 | jqdai0815 | 3682 |
4 | ksun48 | 3614 |
5 | orzdevinwang | 3526 |
6 | ecnerwala | 3514 |
7 | Benq | 3482 |
8 | hos.lyric | 3382 |
9 | gamegame | 3374 |
10 | heuristica | 3357 |
# | User | Contrib. |
---|---|---|
1 | cry | 169 |
2 | -is-this-fft- | 166 |
3 | Um_nik | 161 |
3 | atcoder_official | 161 |
5 | djm03178 | 157 |
6 | Dominater069 | 156 |
7 | adamant | 154 |
8 | luogu_official | 152 |
9 | awoo | 151 |
10 | TheScrasse | 147 |
In yesterday's contest's problem D2, my PyPy3 code runs in 4150ms and the same PyPy2 code runs in 2700ms. Can someone please explain why? And in what cases, PyPy2 is faster than PyPy3?
pajenegod c1729 Kiri8128 conqueror_of_tourist can help maybe.
Name |
---|
I mean, 'same' only applies if you treat the io templates as opaque black boxes (like I do by still not using them...?). Most py 3<2 performance issues can be attributed to 3 being unicode aware, whether it's the additional decode/encode calls or default int/str conversions.
A more direct work-on-the-bytes-like-c approach could maybe eek out some extra mileage, but my level of interest/desperation fell short of that (also during the contest I had a kid gnawing on my head telling me about her friend 'Buggy' who was an actual insect she had brought into the house -- so I commented out the input replacement as if it were a sync issue (ie cargo cult prayer) and it stupidly passed pretests, womp womp)
I do not think the reason between the performance this time around is because of unicode. Here is a submission that almost entirely skips unicode in PyPy3 121736481, working only with bytestrings, and it runs slow.
I've tested around a bit, and I found that only using os.write and os.read makes PyPy2 and PyPy3 run equally fast (121737546 and 121737672). So I think the culprit might be that PyPy3 has a slower implementation of
BytesIO
than PyPy2. But at the same time, it doesn't really make sense that the speed ofBytesIO
matters here (since the total amount of input is kind of tiny). My best guess is that something is badly optimized in PyPy3's implementation ofBytesIO
.I ran into this problem before for 1535E - Gold Transfer (which is also interactive).
Using the pyrival template TLE'd on pypy3 but passes on pypy2. Switching to os.write made it barely pass but there was still a large gap between pypy2 and pypy3 (pypy2 was 3s, pypy3 was 4.3s, and time limit was 4.5s). Using your suggestion of replacing os.read too finally made the gap disappear: 118470981 121755890 (note: using a implementation of input that I stole from you from another problem)
I wonder what the actual cause of the bug is and whether the pypy devs can fix it. It kind of sucks that pypy3 needs extra work to achieve the same performance as pypy2 for interactives (but at least there's a known workaround for it now, thanks!).
In general PyPy should just have an overhaul of everything IO related. One of the big drawbacks of PyPy compared to CPython is that the built in IO is really not good. It is stupid that my fastIO code even speeds things up in the first place.
Here is an example of how PyPy really has unoptimized IO. Haven't you wondered why using input is much slower than readline() in PyPy (even though input should just be a wrapper to readline)? Well this kind of explains it:
In your example is the issue the extra empty writes? I wonder if they are actually expensive. I remember looking at the system calls generated by pypy vs cpython and didn't see a significant difference. Command I used:
To sum it all up, we should use PyRival template for normal problems and this IO for interactive. Am I right?
Yea and it seems like you now have the fastest time on 1543D2 with it. That IO should probably be cleaned up and wrapped a little nicer before people copy and paste it all over the place though. (like rename input() to something like getint() since it's semantically very different from input)
Personally I don't like long IO templates so I will stick to using sys.stdin.buffer.readline and os.write unless it seems like TL is going to be a problem.
Hi, this may be out of topic but I am looking for a little help in python and this seems just the right place.
This problem Parsa's humongous tree, is throwing tle, and there is no accepted solution in python but in pypy3. Moreover, all accepted solutions in pypy uses stacks and not recursion (Is recursion slower?). My submission: Using recursion, Using stack.
Please correct me if I am doing something wrong, or is this because of the tight time limit? I would be really grateful for the help.
I had the same issue during the contest- pypy3 was giving WA (stack limit) , Python 3 gave me TLE/WA- Testcase 4 (tried threading too link) and then finally I implemented the same logic in c++ and got AC (link 10 mins after the contest lol :) ).
Can someone help in writing a bootstrap version of DFS ( bypass stack limit ref) — cuz I didn't had any return value in dfs function call , so what was supposed to be yield ?.
I've explained about no return value before here.
I tried adding bootstrap to your code 121758780. Unfortunately it gets TLE because the implementation is pretty slow, and also because Cf runs 32 bit PyPy (causing your implementation to use big integers everywhere). Hopefully Cf will be switching to 64 bit in the near future (link).
python is gay
you're gay
am I python? then yes im very gay