Assume there are 2 patterns P1 and P2 such that P1 is a substring of P2 which is nether prefix nor suffix.Then how does a KMP type on string T work
Eg:
P1=of
P2=sofa
T=sofa
in this if we run aho corasick .will it detect string "of" or not
And also any good tutorial link on it will be apreciated
PS:I havent implemented it yet.I was studying it and I got this doubt as I read about its implementation
Auto comment: topic has been updated by teja349 (previous revision, new revision, compare).
Yes, it will. In Aho-Corasick algorithm you create a trie nodes of which correspond to a prefix of some pattern. And you have suffix links for each node (similar to failure function in KMP). To find all the patterns which can end at a particular position in the string, you keep moving up using the suffix links as long as possible from the node you are currently at.
When traversing the text "sofa", when you reach the third position, the corresponding node in the trie corresponds to "sof". But this node has a suffix link to the node for "of". So when you move up the suffix link from the former to the latter, you will detect pattern P1. :)
But,isnt it like only when there is a mismatch we go to suffix link
so after reaching "sof".when we see "a" is present in continuation "sof" in trie .Then why will we see the suffix link.We only take suffix link at "sofa",am I wrong??
No, you always traverse along suffix links to get all the patterns which end at that position, not only when there is a mismatch. I think what you are referring to (going to suffix link when there is a mismatch), that happens when you want to go the corresponding node in the trie after traversing the next character of the text.
I am not able to underastand what you are saying.i dont get by what u mean traversing all suffix links
Can you provide ur implementation for aho corasick
You should build two different types of upward links: failure links (this always goes to the closest node that is also a valid prefix) and output links. Traversing output links from a node to the root should visit a subset of the nodes that are visited by doing the same with failure links. The sequence of output links should only visit those nodes in which we found an occurrence of a pattern.
Hey, I think the best "tutorial" on Aho-Corasick algorithm is the original paper. Seriously, it is really well written, thorough in explanations, and has detailed pseudocode. Look for "Efficient string matching: An aid to bibliographic search". (This link works for me.)
I apologize for the bump (and I'm sure OP has figured out his issues by now). However, when I was trying to implement Aho-Corasick I had the same concern as OP, ran across this post, and couldn't find an implementation I liked. I was using the description here, but they don't explain how to actually match on the text (which is the cause of confusion for the OP), nor do they have code for jumping to leafs (which is necessary for proper complexity).
So for the sake of what me from 12 hours ago would have wanted, here's my implementation (for 475D, extending upon the emaxx code: https://ideone.com/YIbEn5
It contains the missing parts from emaxx that are needed for a full implementation.
Aho-Corasick just by itself can be found here: https://gist.github.com/Chillee/69141ce899e9bc65291f8b120ee57b90
Here is my implementation of aho-corasick: https://ideone.com/N3hMtK
I learnt it from here: https://www.cs.uku.fi/~kilpelai/BSA05/lectures/slides04.pdf
Implementation part and some optimizations is well described here: https://e-maxx-eng.appspot.com/string/aho_corasick.html.
Hmm, I don't think your implementation works.
https://ideone.com/1nQiE3
Actually I forgot to initialize the links. Added two extra lines. Here is the updated implementation : https://ideone.com/6FDgeP. Thanks for pointing it out. Hope that it'll work.
One of the links is broken.Please update it.Thanks in advance.
Hmm, I'm not really where it fails, but it's still not working on a aho-corasick problem. http://codeforces.me/contest/963/submission/41003199
Finally realized my mistake. I have made two more changes to your code to get AC. Here's the link : https://ideone.com/pE42ue. Sorry if I caused you any trouble. I don't think there's any more mistakes.
Don't worry about it. The only reason I was even testing out your implementation is because I thought your method of generating leaflinks was nicer than mine :)
There were a couple small things that didn't really make sense to me in your code, so here's my version (it's mostly the same, just with some conditions collapsed): https://ideone.com/v75Jg8