In my previous post, I discussed a fast Scanner for Scala.
To benchmark it, I wrote the fastest possible Scanner I could in Java:
package better.files;
import java.io.BufferedReader;
import java.io.IOException;
import java.io.UncheckedIOException;
import java.util.Arrays;
/**
* Hand built using a char buffer
*/
public class ArrayBufferScanner extends AbstractScanner {
private char[] buffer = new char[1 << 4];
private int pos = 0;
private BufferedReader reader;
public ArrayBufferScanner(BufferedReader reader) {
super(reader);
this.reader = reader;
}
@Override
public boolean hasNext() {
return pos != -1;
}
private void loadBuffer() {
pos = 0;
while (true) {
int i;
try {
i = reader.read();
} catch (IOException e) {
throw new UncheckedIOException(e);
}
if (i == -1) {
pos = -1;
break;
}
char c = (char) i;
if (c != ' ' && c != '\n' && c != '\t' && c != '\r' && c != '\f') {
if (pos == buffer.length) {
buffer = Arrays.copyOf(buffer, 2 * pos);
}
buffer[pos++] = c;
} else if (pos != 0) {
break;
}
}
}
@Override
public String next() {
loadBuffer();
return String.copyValueOf(buffer, 0, pos);
}
@Override
public String nextLine() {
try {
return reader.readLine();
} catch (IOException e) {
throw new UncheckedIOException(e);
}
}
@Override
public int nextInt() {
loadBuffer();
final int radix = 10;
int result = 0;
boolean didRun = false;
for (int i = buffer[0] == '-' || buffer[0] == '+' ? 1 : 0; i < pos; i++, didRun = true) {
int digit = buffer[i] - '0';
checkValidNumber(0 <= digit && digit <= 9);
result = result * radix + digit;
}
checkValidNumber(didRun);
return buffer[0] == '-' ? -result : result;
}
private void checkValidNumber(boolean condition) {
if(!condition) throw new NumberFormatException(String.copyValueOf(buffer, 0, pos));
}
}
Is this the best I can do?
It is barely faster than Java's StreamTokenizer (NOT StringTokenizer) inspite of being much simpler than it: http://docs.oracle.com/javase/8/docs/api/java/io/StreamTokenizer.html
Java source: https://github.com/pathikrit/better-files/blob/master/benchmarks/src/main/java/better/files/ArrayBufferScanner.java
Other Scanners: https://github.com/pathikrit/better-files/blob/master/benchmarks/src/main/scala/better/files/Scanners.scala
Benchmark results: https://github.com/pathikrit/better-files/tree/master/benchmarks
You can do much better. Just take a look at what happens when you call
nextInt()
the first time. First theBufferedReader
copies input to a buffer, then you copy that data to a new buffer, then you allocate a String object (not ideal if you only want a number) which essentially is just yet another buffer, then you actually parse the input.If you want to write your own efficient I/O class in Java, then
byte[]
andSystem.in.read
will be your friends. If you think that's to much of a hassle, well then at leastBufferedInputStream
is preferable toBufferedReader
(at least when it comes to reading integers).Ah, good point — I have not thought about optimizing the nextInt yet... Yes, this is purely an intellectual exercise at this point since it should be fast enough for CodeForces. Btw, what do you think of http://docs.oracle.com/javase/8/docs/api/java/io/DataInputStream.html ? I will include it in my benchmarks next.
I just posted a version (see updated post) with updated
nextInt()
— much faster now. Anything else?Well there's really no point in using a buffer when you use
BufferedReader
, since it has its own buffer, so you unnecessarily end up doing the same work twice.As a reference I include below (more or less) the two classes I use for reading integers (most problems have integers as input) when I wanna make the highscore list on various online judges.
So feel free to use them as inspiration when you design a more solid class with exception handling etc.
Not related to the post,but worth sharing.
But I'd be careful when using Custom classes to read when the input is large ( especially on SPOJ and UVA). I remember one or two times where encapsulating a BufferedReader and StringTokenizer gave me TLE, yet them alone in the main method gave AC.
Unless you are sure that your scanner is really significant for speed, don't use it when you need every second for your program.
What you are saying is basically "use this fast scanner of yours, but only if speed doesn't matter"
As I said, at this point it's basically an intellectual exercise/code golfing. The regular Java one with BufferedReader + StringTokenizer should be just fine for every competition problems.
And, yes, I do have tests for this: https://github.com/pathikrit/better-files/blob/master/benchmarks/src/test/scala/better/files/ScannerBenchmark.scala
I wasn't saying yours is slower, or not to use it. That's why I said it's not related to the post. It slowed me a lot sometimes when solving problems using JAVA so I thought it's worth sharing for JAVA users.