rembocoder's blog

By rembocoder, history, 4 years ago, translation, In English

Hi. I want to share a simple method for coordinate compression. Even noob can write it.

vector<int> a(n);
// read the vector
vector<int> b = a;
sort(b.begin(), b.end());
map<int, int> m;
for (int i = 0; i < n; i++) {
    m[b[i]] = i;
}
for (int i = 0; i < n; i++) {
    a[i] = m[a[i]];
}

Now every value of an array lies in [0, n). The most convineint it that if you need the original value for a[i], you can just write b[a[i]].

And how do you write it?

  • Vote: I like it
  • +21
  • Vote: I do not like it

| Write comment?
»
4 years ago, # |
  Vote: I like it +15 Vote: I do not like it

I think when you are initializing the map, you need to use m[b[i]] instead of m[a[i]], right?

»
4 years ago, # |
  Vote: I like it +301 Vote: I do not like it
vector<int> d = a;
sort(d.begin(), d.end());
d.resize(unique(d.begin(), d.end()) - d.begin());
for (int i = 0; i < n; ++i) {
  a[i] = lower_bound(d.begin(), d.end(), a[i]) - d.begin();
}
//original value of a[i] can be obtained through d[a[i]]
  • »
    »
    4 years ago, # ^ |
    Rev. 5   Vote: I like it +87 Vote: I do not like it

    +1, this method is faster because it doesn't use sets/maps

    The one from the blog doesn't map to values 0,1,2,3,... but instead, it leaves some gaps. This might be confusing while debugging.

    EDIT: It might be even faster to sort pairs (value, index) to avoid binary search. This requires iterating the sorted sequence and doing nxt++ when we have value different than previous value. Longer implementation, better runtime (I think).

    int n = a.size();
    vector<pair<int, int>> pairs(n);
    for(int i = 0; i < n; ++i) {
    	pairs[i] = {a[i], i};
    }
    sort(pairs.begin(), pairs.end());
    int nxt = 0;
    for(int i = 0; i < n; ++i) {
    	if(i > 0 && pairs[i-1].first != pairs[i].first) nxt++;
    	a[pairs[i].second] = nxt;
    }
    
    • »
      »
      »
      4 years ago, # ^ |
      Rev. 2   Vote: I like it +1 Vote: I do not like it

      I know my method isn't fast. I just think it's very easy to comprehend for beginners.

    • »
      »
      »
      4 years ago, # ^ |
      Rev. 4   Vote: I like it 0 Vote: I do not like it

      Thanks for your idea, it is greate, here is my "short" template implementation from your code UwU

      #define fi first
      #define se second
      template<typename T>
      void compress(vector<T> &a)
      {
          vector<pair<T, int> > M;
          for (int i = 0; i < a.size(); ++i)
              M.emplace_back(a[i], i);
          sort(M.begin(), M.end());
      
          a[M[0].se] = 0; /// Assign first value with zero
          for(int i = 1; i < a.size(); ++i) /// Iterate the rest and increase when adj are different
              a[M[i].se] = a[M[i - 1].se] + (M[i - 1].fi != M[i].fi); 
      }
      
      • »
        »
        »
        »
        4 years ago, # ^ |
          Vote: I like it 0 Vote: I do not like it

        I don't claim that [N lower bounds] are slower than [sort]. I claim that sort + lower bounds are together slower than sort (even if it's sort on pairs).

        Also, it's faster to choose the size of vector M while creating it, instead of doing emplace_back many times.

        • »
          »
          »
          »
          »
          4 years ago, # ^ |
          Rev. 8   Vote: I like it +10 Vote: I do not like it

          Yes sir, thanks for pointing my mistakes

          And yes sir, but I just afraid that some class might not have constructor.

          Here are my fixed versions

          Reserve + Emplace back
          Assign directly
          When it is not good to create clone value of array
          • »
            »
            »
            »
            »
            »
            4 years ago, # ^ |
            Rev. 2   Vote: I like it 0 Vote: I do not like it

            Is there anything in C++ that can be compared with < and != and doesn't have a constructor? Even if you create your own class with some int/string attributes, they are 0/empty by default. So it's only if you create a class with some custom constructor with arguments? Seems very rare in CP.

            Btw. I don't think something so simple should be templated and copy-pasted to use during a contest. In the long run, you are better off by just implementing it every time. This way you get to know an algorithm well and you can modify it.

            • »
              »
              »
              »
              »
              »
              »
              4 years ago, # ^ |
                Vote: I like it 0 Vote: I do not like it

              Yes sir, it is really just very small chances for random cases when I read other solutions with non-constructor custome class/struct, but/and all of them also dont have comparators

            • »
              »
              »
              »
              »
              »
              »
              4 years ago, # ^ |
              Rev. 2   Vote: I like it 0 Vote: I do not like it

              Can I ask for a question related to the implementations ?

              Yesterday I use priority_queue with value comparator and it is amolst TLE while using position comparator give TLE. Why using position are slower

              struct value_comparator {
                  bool operator < (const myClass &A, const myClass &B) const {
                      return A > B;
                  }  
              };
              
              vector<myClass> A;
              struct position_comparator {
                  bool operator < (int i, int j) const {
                      return A[i] > A[j];
                  }  
              };
              

              And is it good to use my third implementation of coordinate compression than my other two ? (comparing using position instead of directly compare the values)

              • »
                »
                »
                »
                »
                »
                »
                »
                4 years ago, # ^ |
                  Vote: I like it +3 Vote: I do not like it

                Accessing something in a separate array costs time and it isn't cache-efficient. It's better to have a value here.

                And is it good to use my third implementation of coordinate compression than my other two?

                Just run some tests and see which version is faster.

                • »
                  »
                  »
                  »
                  »
                  »
                  »
                  »
                  »
                  4 years ago, # ^ |
                    Vote: I like it 0 Vote: I do not like it

                  Oh thanks sir, I thought using index will reduce the amount of extra memory but now I know sometimes cache miss is really a problem

            • »
              »
              »
              »
              »
              »
              »
              4 years ago, # ^ |
                Vote: I like it 0 Vote: I do not like it

              Yes sir, but instead of copying the template to every problems, I usually only write the template<> when I face such problem, since the algorithm is simple but it is still a bunch of lines of code if we use many times :V

    • »
      »
      »
      3 years ago, # ^ |
        Vote: I like it 0 Vote: I do not like it

      That's nice code. It help me. Thank you!

»
4 years ago, # |
  Vote: I like it 0 Vote: I do not like it

Better if(!m.count(b[i]) m[b[i]]=i;

»
4 years ago, # |
Rev. 3   Vote: I like it +4 Vote: I do not like it

My way

template<typename T>
void compress(vector<T> &a)
{
    int order = -1;
    set<T> S;
    map<T, int> M;
    for (T  x : a) S.insert(x);    /// Taking all unique elements
    for (T  x : S) M[x] = ++order; /// Assign value for unique elements
    for (T &x : a) x = M[x];       /// Assign value for array
}

Notice that by using set, this would work faster when the number of duplicates higher, but will work slower when all array are unique

  • »
    »
    4 years ago, # ^ |
      Vote: I like it 0 Vote: I do not like it

    Notice that by using set, this would work faster when the number of duplicates higher, but will work slower when all array are unique

    I came across a problem on CSES Salary Queries where Code using Set+Map for compression resulted in TLE, and Code using Vector+Sort+Map gave AC.

    PS: The Time-Limit of the problem is 1s.

  • »
    »
    4 years ago, # ^ |
      Vote: I like it 0 Vote: I do not like it

    you don't need sets

    map<T, int> mp;
    int cnt = 0;
    for (auto v : a) mp[v];
    for (auto& e : mp) e.second = cnt++;
    for (auto& v : a) v = mp[v]; 
    
    • »
      »
      »
      4 years ago, # ^ |
      Rev. 2   Vote: I like it 0 Vote: I do not like it

      Thanks for giving me a new idea

      I forgot auto method so I thought it would be long to use something like

      range based loop
      iterator

      but also I think I can use unordered_map to assigning value faster (if hash function for specific class then it would be more worse)

      unordered_map
»
4 years ago, # |
  Vote: I like it +11 Vote: I do not like it

I write it the same way as you

»
4 years ago, # |
  Vote: I like it 0 Vote: I do not like it

Can you suggest some problems based on this cool trick?

  • »
    »
    4 years ago, # ^ |
    Rev. 3   Vote: I like it 0 Vote: I do not like it

    Many algorithms only need the order of the number compare to each other and not they value, so something like {$$$1, 4, 9, 16$$$} can be use as {$$$1, 2, 3, 4$$$} with return the same result (smaller value, faster to do). So when you face some problem like that, we can compress the array down to smaller (since the limitation for $$$n$$$ usually much smaller than maximum value of $$$a_i$$$). One algorithm usually use this trick is Sweep Line

  • »
    »
    4 years ago, # ^ |
      Vote: I like it 0 Vote: I do not like it

    Well, for example 61E - Противник слаб. And in general it is used when you want to store a set of numbers with a segment tree.

»
4 years ago, # |
Rev. 2   Vote: I like it -13 Vote: I do not like it
set<int> co;
// Put all vals in the set
map<int,int> cocom;
int cocnt=1;
for(int x:co)
  cocom[x] = cocnt++;
// cocom[x] = compressed coordinate of x
»
3 years ago, # |
  Vote: I like it -20 Vote: I do not like it

I use this template:

struct COORDINATE_COMPRESSION
{
    typedef long long ll;

    vector<ll> vect;
    bool is_compress = true;

    int size()
    {
        if(!is_compress)compress();
        return vect.size();
    }

    void clear()
    {
        vect.clear();
        is_compress = true;
    }

    void insert(ll x)
    {
        vect.push_back(x);
        is_compress = false;
    }

    void compress()
    {
        sort(vect.begin(), vect.end());
        vect.resize(unique(vect.begin(), vect.end()) - vect.begin());
        is_compress = true;
    }

    vector<ll> compress_offline(vector<ll> vect)
    {
        if(!vect.size())return vect;

        vector<pair<ll,int>> vvv;

        for(int i = 0 ; i < vect.size() ; i++)
        {
            vvv.push_back({vect[i],i});
        }

        sort(vvv.begin(), vvv.end());

        int cont = 0;
        ll last = vvv[0].first;

        vect[vvv[0].second] = 0;

        for(int i = 1 ; i < vvv.size() ; i++)
        {
            if(vvv[i].first != last)
            {
                cont++;
                last = vvv[i].first;
            }
            vect[vvv[i].second] = cont;
        }

        return vect;
    }

    int get(ll x)
    {
        if(!is_compress)compress();
        int pos = lower_bound(vect.begin(), vect.end(), x) - vect.begin();
        assert(pos != vect.size() && vect[pos] == x);
        return pos;
    }

    ll iget(int x)
    {
        if(!is_compress)compress();
        assert(0 <= x && x < vect.size());
        return vect[x];
    }
};