Блог пользователя twoslow

Автор twoslow, история, 6 лет назад, По-английски

Hello, codeforces community

  • I've a 2D sparse matrix of 0's and 1's of size NxN, where N = 16 Million i.e 16,000,000.
  • The number of non zeros(NNZ) in the matrix are ~450 Million. i.e 450,000,000.

What is the most efficient way to store the such matrix, given that I'll get the queries like:

  • What are the non-zero elements in row number x.
  • What are the non-zero elements in column number y.

Currently I'm storing the matrix in CSR format. Which takes an integer array of size NNZ and another array of size N.

Which is taking

  • (450000000 * 4 Bytes) / (1024 * 1024) ~ 1716MB of memory for row storage, and same amount of memory for column storage.

The overall memory to store such structure is about 3.5GB.

I want to compress it in a way such that the memory consumption is reduced and the queries can still be fast. I am looking for some kind of serialization technique or something which can reduce the memory consumption or time complexity.

Any help regarding this is most welcome.

  • Проголосовать: нравится
  • +3
  • Проголосовать: не нравится

»
6 лет назад, # |
  Проголосовать: нравится +1 Проголосовать: не нравится

You could get away with 3 bytes instead of 4 bytes because the column indices and row indices  < 16000000 < 16777216 = 224.

  • »
    »
    6 лет назад, # ^ |
      Проголосовать: нравится 0 Проголосовать: не нравится

    This would have been great optimisation, but unfortunately the number of rows can increase, It increase at rate 0.1 Million / month. So having 3 bytes would not be good solution after ~10 months.

    • »
      »
      »
      6 лет назад, # ^ |
        Проголосовать: нравится +1 Проголосовать: не нравится

      There are some ways to get around that. You could for example split your matrix into chunks being able to be indexed with 3 bytes, or maybe even 2 bytes.