Hello everybody,
Recently I faced yet another problem and I thought asking the codeforces community is the best decision.
Lately I've been learning Ukkonen's algorithm for linear building of suffix trees. However as I was reading some applications I noticed that often you have to build a generalized suffix tree of usually 2, but sometimes more strings. In the papers about Ukkonen's algorithm I managed to find only how to build a single suffix tree.
Now my initial thought was to build a suffix tree for each string and then try to merge them in linear time, but that seemed like a very annoying to implement idea and looking around the web many people said that Ukkonen's algorithm can be used to produce generalized suffix tree.
I was wondering if someone could outline a solution that keeps the structure and suffix links correct and builds a generalized suffix tree based on Ukkonen's algorithm.
Thanks in advance! :)
The "standard" (as far as I know) method is to concatenate all the strings together, separated with some special characters. Say, for two strings s1 and s2, run Ukkonen's on s1$s2.
True but if we have "abc#" and "def$" concatenating them we get "abc#def$" which includes suffixes such as "c#def$". In most generalized suffix trees such suffixes are not present, but I guess it won't really worsen the structure, so that's a solution I will have in mind.
Edit: Meant to put the comment as a reply to gawry
I think you can just remove the vertices which have separator characters as a ancestor.
I'm quite inexperienced with suffix trees, so are you sure that this transformation will keep all the suffix links correct?
What's the problem with suffix links?
We remove some subtrees with all it's outgoing suffix links.
On the other side, if the path from root to vertex does not contain separator then the path from root to suffix link does not contain it too, since it is suffix of our substring.
Well that sounds good enough, thanks a lot :)
You are welcome :)