I have seen some people's code to represent a trie in a 2D array. For example, this is my old code to solve Autocomplete. However, I forgot how it works!
Can someone explain how to code a 2D array that represents a trie? I know that one dimension of the array represents the alphabet, and that the values inside the trie array are the node numbers, where the first letter you add is node number 1. Still, I'm confused where to insert values into the trie array, and how strings with the same prefixes would be distinguished in the tire.
Thanks.
#include <bits/stdc++.h>
#define rd(s) freopen(s, "r", stdin);
#define wt(s) freopen(s, "w", stdout);
using namespace std;
const int MAX=1e6+2, ALF=26+2;
int t, trie[MAX][ALF];
int main()
{
//rd("test.txt");
scanf("%d", &t);
for (int i=1; i<=t; ++i) {
int ans=0, n, numm1=1;
for (int _=0; _<MAX; ++_) for (int __=0; __<ALF; ++__) trie[_][__] = 0;
scanf("%d", &n);
for (int j=0; j<n; ++j) {
int node=0, idx=0;
string word; cin>>word;
while (trie[node][word[idx]-'a'] && idx < word.length()) {
node=trie[node][word[idx]-'a'];
++ans; ++idx;
}
if (idx != word.length()) ++ans;
//printf("ans is %d\n", ans);
for (; idx<word.length(); ++idx) {
trie[node][word[idx]-'a'] = numm1;
//printf("new path from %d to %d via %c\n", node, numm1, word[idx]);
node=numm1;
++numm1;
}
}
printf("Case #%d: %d\n", i, ans);
}
}
Each row in your array represents a node in the trie structure. Node 0 is the root.
When you want to insert a new string, begin at node 0. Let c be the first character of your string, and see if
trie[0][c]
is valid or not. If it is not valid "allocate" a new node by incrementing the size of your trie and settrie[0][c]
to the newly allocated node.Here's a simple example. I'm using -1 to signify that there is no node there.
Initially:
Size of the trie is 1 (because of the root node). Now let's insert string "ABC". Then since
trie[0]['A']
is -1, we need to allocate a new node and settrie[0]['A']
to it. Then our structure isNow we are currently at node 1 and need to insert "BC". Following this procedure twice, our trie looks like
By the way, check out my new standup
What if we want to store 'AB' and 'BB', on the second row don't we need two separation versions of 'B'?
i just have a small doubt is it true that if we implement a trie in this way using predefined sizes for arrays then it is more faster than pointer oriented code ... .
Lets not make it complicated. First, think of how you can represent a tree using an array. If you can do that you can do for trie as well since a trie is also a tree.
Now suppose tree is a binary tree. Normally you represent a tree as Node {int val; Node *l, *r} Where l, r are pointers to the children. Now suppose you don't do dynamic memory allocation and replace l, r with indices of Nodes in array. First you can check whether l points to a children. If not, then you will use an unused Node of our predefined array. Similarly for r.
For a trie, suppose a string has length L. Then in the worst case, for each suffix we need to create a linked list. So total size would be L+L-1+....+1=O(L^2). But if alphabet is just lowercase(a-z) then, some suffixes will have same starting characters and memory will be less than L^2.
how would you know we have reached at the end of the word?
For ex: we need to store 2 strings 1) "abc" 2) "ab"
We'd have to maintain another array of type : bool end[MAX][ALF] for that.