How to represent a trie in a 2D array?

#	User	Rating
1	tourist	3856
2	jiangly	3747
3	orzdevinwang	3706
4	jqdai0815	3682
5	ksun48	3591
6	gamegame	3477
7	Benq	3468
8	Radewoosh	3462
9	ecnerwala	3451
10	heuristica	3431

#	User	Contrib.
1	cry	167
2	-is-this-fft-	162
3	Dominater069	160
4	Um_nik	158
5	atcoder_official	156
6	Qingyu	153
7	djm03178	152
7	adamant	152
9	luogu_official	150
10	awoo	147

I have seen some people's code to represent a trie in a 2D array. For example, this is my old code to solve Autocomplete. However, I forgot how it works!

Can someone explain how to code a 2D array that represents a trie? I know that one dimension of the array represents the alphabet, and that the values inside the trie array are the node numbers, where the first letter you add is node number 1. Still, I'm confused where to insert values into the trie array, and how strings with the same prefixes would be distinguished in the tire.

Thanks.

#include <bits/stdc++.h>
#define rd(s) freopen(s, "r", stdin);
#define wt(s) freopen(s, "w", stdout);
using namespace std;

const int MAX=1e6+2, ALF=26+2;
int t, trie[MAX][ALF];

int main()
{
    //rd("test.txt");

    scanf("%d", &t);
    for (int i=1; i<=t; ++i) {
        int ans=0, n, numm1=1;
        for (int _=0; _<MAX; ++_) for (int __=0; __<ALF; ++__) trie[_][__] = 0;
        scanf("%d", &n);
        for (int j=0; j<n; ++j) {
            int node=0, idx=0;
            string word; cin>>word;
            while (trie[node][word[idx]-'a'] && idx < word.length()) {
                node=trie[node][word[idx]-'a'];
                ++ans; ++idx;
            }
            if (idx != word.length()) ++ans;
            //printf("ans is %d\n", ans);
            for (; idx<word.length(); ++idx) {
                trie[node][word[idx]-'a'] = numm1;
                //printf("new path from %d to %d via %c\n", node, numm1, word[idx]);
                node=numm1;
                ++numm1;
            }
        }
        printf("Case #%d: %d\n", i, ans);
    }
}

Comments (7)

Show archived | Write comment?

arknave

8 years ago, # |

Each row in your array represents a node in the trie structure. Node 0 is the root.

When you want to insert a new string, begin at node 0. Let c be the first character of your string, and see if trie[0][c] is valid or not. If it is not valid "allocate" a new node by incrementing the size of your trie and set trie[0][c] to the newly allocated node.

Here's a simple example. I'm using -1 to signify that there is no node there.

Initially:

0: -1 -1 -1 ...
1: -1 -1 -1 ...
2: -1 -1 -1 ...
3: -1 -1 -1 ...

Size of the trie is 1 (because of the root node). Now let's insert string "ABC". Then since trie[0]['A'] is -1, we need to allocate a new node and set trie[0]['A'] to it. Then our structure is

0:  1 -1 -1 ...
1: -1 -1 -1 ...
2: -1 -1 -1 ...
3: -1 -1 -1 ...

Now we are currently at node 1 and need to insert "BC". Following this procedure twice, our trie looks like

0:  1 -1 -1 ...
1: -1  2 -1 ...
2: -1 -1  3 ...
3: -1 -1 -1 ...

→ Reply

wag

8 years ago, # ^ |

← Rev. 2 →

By the way, check out my new standup

trentinitaco

18 months ago, # ^ |

What if we want to store 'AB' and 'BB', on the second row don't we need two separation versions of 'B'?

XGptw

i just have a small doubt is it true that if we implement a trie in this way using predefined sizes for arrays then it is more faster than pointer oriented code ... .

bhikkhu

Lets not make it complicated. First, think of how you can represent a tree using an array. If you can do that you can do for trie as well since a trie is also a tree.

Now suppose tree is a binary tree. Normally you represent a tree as Node {int val; Node *l, *r} Where l, r are pointers to the children. Now suppose you don't do dynamic memory allocation and replace l, r with indices of Nodes in array. First you can check whether l points to a children. If not, then you will use an unused Node of our predefined array. Similarly for r.

For a trie, suppose a string has length L. Then in the worst case, for each suffix we need to create a linked list. So total size would be L+L-1+....+1=O(L^2). But if alphabet is just lowercase(a-z) then, some suffixes will have same starting characters and memory will be less than L^2.

shadycoderr

5 years ago, # |

how would you know we have reached at the end of the word?

For ex: we need to store 2 strings 1) "abc" 2) "ab"

mujtaba1747

5 years ago, # ^ |

We'd have to maintain another array of type : bool end[MAX][ALF] for that.

maximaxi's blog