Interview Question — Longest regex match with single wildcard

→ Обратите внимание

До соревнования
Rayan Programming Contest 2024 - Selection (Codeforces Round 989, Div. 1 + Div. 2)
08:36:45
Зарегистрироваться »

*есть доп. регистрация

До соревнования
2024-2025 ICPC Asia Jakarta Regional Contest (Unrated, Online Mirror, ICPC Rules, Teams Preferred)
23:06:45
Зарегистрироваться »

→ Лидеры (рейтинг)

№	Пользователь	Рейтинг
1	tourist	3993
2	jiangly	3743
3	orzdevinwang	3707
4	Radewoosh	3627
5	jqdai0815	3620
6	Benq	3564
7	Kevin114514	3443
8	ksun48	3434
9	Rewinding	3397
10	Um_nik	3396

Страны | Города | Организации

Всё →

→ Лидеры (вклад)

№	Пользователь	Вклад
1	cry	167
2	Um_nik	163
3	maomao90	162
3	atcoder_official	162
5	adamant	159
6	-is-this-fft-	158
7	awoo	155
8	TheScrasse	154
9	Dominater069	153
10	djm03178	152

Всё →

→ Найти пользователя

→ Прямой эфир

Детальнее →

Блог пользователя perlentaucher

Interview Question — Longest regex match with single wildcard

Автор perlentaucher, история, 9 месяцев назад, По-английски

Hi there, I was given the following interview question and was wondering, if there's a more simple way to solve it:

Question: Given a text and a regular expression that only contains one wildcard character '*' (which matches any string), return the length of the longest matched substring in the text. Return -1 if no answer can be found. 1 <= |Text|, |Regex| <= 10^6.

Example: text = "programming", regex = "r*in". The matches are "rammin", "rogrammin", thus the answer is 9.

My solution: I make a case distinction based on whether the regex is of the form (BEFORE*AFTER, AFTER, BEFORE, *). I then use the Z-Function on the string to get matches of BEFORE and AFTER and then select the first/last matches to calculate the length.

Time/space complexity is obviously O(n) but I think that my solution is a bit overcomplicated.

interview, regex, string match

perlentaucher
9 месяцев назад
15

Комментарии (15)

Написать комментарий?

mihajlovskijr

9 месяцев назад, # |

← Rev. 2 →

I believe you can also use two pointers to determine the first occurrence of the regex string prefix before * in text and the last occurrence of the regex string suffix after * in text.

My code for this (if it's incorrect I'm sorry and I'd be happy to see tests that it would give the wrong answer to):

Spoiler

void solve() {
	string s, r; cin >> s >> r;
	ll n = s.size(), m = r.size();
	ll i = 0, j = n - 1;
	ll c = 0;
        // search for prefix
	for (ll k = 0; k < n; k++) {
		if (r[c] == '*') break;
		if (s[k] == r[c]) {
			c++;
		}
		else {
			c = 0;
			i = k+1;
		}
	}
	if (i == n) {
		cout << -1 << endl;
		return;
	}
        // search for suffix
	c = m - 1;
	for (ll k = n-1; k >= 0; k--) {
		if (r[c] == '*') break;
		if (s[k] == r[c]) {
			c--;
		}
		else {
			c = m-1;
			j = k - 1;
		}
	}

	cout << max(-1ll, j - i + 1) << endl;
}

→ Ответить

perlentaucher

9 месяцев назад, # ^ |

I think your code is failing on testcase text="aabczzzefg", regex="abc*efg". I first also thought about a two-pointer approach but then I was concerned about partial matches.

→ Ответить

mihajlovskijr

9 месяцев назад, # ^ |

← Rev. 2 →

Yes, you're right, it doesn't work. I tried to fix it and take into account some other cases (e.g. text = "aaaaaaaa", regex = "r*r"), so here is my current approach (the code only handles the case when we are sure to have *, if we don't have it we can just add a separate check for that case):

Spoiler

void solve() {
	string s, r; cin >> s >> r;
	ll n = s.size(), m = r.size();
	ll i = 0, j = n - 1;
	ll c = 0;
	// search for prefix
	for (ll k = 0; k < n; k++) {
		if (c >= m || r[c] == '*') break;
		if (s[k] == r[c]) {
			c++;
		}
		else {
			c = 0;
			i = k + 1;
			if (s[k] == r[c]) {
				c++;
				i = k;
			}
		}
	}
	// search for suffix
	c = m - 1;
	for (ll k = n - 1; k >= 0; k--) {
		if (c < 0 || r[c] == '*') break;
		if (s[k] == r[c]) {
			c--;
		}
		else {
			c = m - 1;
			j = k - 1;
			if (s[k] == r[c]) {
				c--;
				j = k;
			}
		}
	}
	if (i > j) {
		cout << -1 << endl;
		return;
	}
	cout << j - i + 1 << endl;
}

→ Ответить

ujjawal_0P

8 месяцев назад, # ^ |

It's still failing last edge test case even after adding the condition you mentioned :)

I don't think if I will get the next round mail T_T (there were two questions, first one was easy but this one took time and still didn't pass all the test cases)

→ Ответить

mihajlovskijr

8 месяцев назад, # ^ |

← Rev. 2 →

Can I see the test?

Edit: Nvm, I believe that text = "a" (or text = "bab" and similar) and regex = "a*a" give the wrong answer (it shouldn't be 1, it should be -1 in fact). I've already posted too many bad solutions, so I won't post anymore, but I think this is the last edge case you should consider

→ Ответить

ujjawal_0P

8 месяцев назад, # ^ |

the test case was hidden so can't be really sure!

But don't get demotivated, your approach gave more than enough idea about how to approach this problem so thanks for posting the solution of the problem !!

Thanks for the contribution

→ Ответить

Jagnath_Reddy

9 месяцев назад, # |

← Rev. 2 →

pointer pL searching for BEFORE in the given string using Horspool algorithm.

pointer pR searching for RETFA (revered the pattern AFTER;) ) using the same algorithm but in the reversed order of the given string.

if we match both the patterns, it is easy to find the length of the matched string.

if we cross pL and pR it will be -1

→ Ответить

siddhuneerukonda

8 месяцев назад, # |

Today, I gave my Amazon OA and I got this question and solved it using the same idea, but using the KMP algorithm, All test cases passed, and I think it is a good approach to follow.

→ Ответить

expertcoderhk

8 месяцев назад, # ^ |

could you describe your approach

→ Ответить

ujjawal_0P

8 месяцев назад, # ^ |

← Rev. 2 →

got the same question yesterday XD

→ Ответить

thewolfcoder

3 месяца назад, # ^ |

Could you please share the code or approach

→ Ответить

GUNNER19_2.0

7 недель назад, # ^ |

What is wrong with this, it is giving me a TLE and the error is most probably in the last while loop(tried debugging but couldn't find out what exactly is the error). If someone could clarify it ?

Spoiler

Your code here...
#include <bits/stdc++.h>
#include <iterator>

using namespace std;


/*
 * Complete the 'getLongestMatch' function below.
 *
 * The function is expected to return an INTEGER.
 * The function accepts following parameters:
 *  1. STRING text
 *  2. STRING regex
 */

int getLongestMatch(string text, string regex) {
    int lex = 0;
    int f = 0;
    int n = regex.size();
    int m = text.size();
    int pos = -1;
    for(int i = 0;i<n;i++)
    {
        if(regex[i] == '*')
        {
            f = i;
            break;
        }
    }
    cout<<f<<endl;
    vector<int> lps(f+1,0);
    lps[0] = 0;
    int i = 0;
    while(i<f)
    {
        if(regex[i] == regex[lex])
        {
            lex++;
            lps[i] = lex;
            i++;
        }
        else
        {
            if(lex != 0)
            {
                lex = lps[lex-1];
            }
            else {
                lps[i] = 0;
                i++;
            }
        }
    }
    i = 0;
    int j = 0;
    int c = 0;
    n = f;
    while(m-i>=n-j )
    {
        c++;
        if(regex[j] == text[i])
        {
            j++;
            i++;
        }
        if(j == n)
        {
            pos = i-j;
            break;
        }
        else if (i<m && regex[j] != text[i]) {
            if(j != 0)
            {
                j = lps[j-1];
            }
            else {
                i++;
            }
        }
    }
    
    
    
    ///////////////////////////////////////
    
    
    string s2 = text;
    reverse(s2.begin(), s2.end());
    string text2 = s2;
    string regex2;
    i=regex.size()-1;
    while(i)
    {
        if(regex[i]!='*')
        {
            regex2.push_back(regex[i]);
            i--;
        }
        else break;
    }
    
    ///
    lex = 0;
    f = regex2.size();
    vector<int> lps2(f+1,0);
    lps2[0] = 0;
    i = 0;
    while(i<f)
    {
        if(regex2[i] == regex2[lex])
        {
            lex++;
            lps2[i] = lex;
            i++;
        }
        else
        {
            if(lex != 0)
            {
                lex = lps2[lex-1];
            }
            else {
                lps2[i] = 0;
                i++;
            }
        }
    }
    i = 0;
    //cout<<i<<endl;
    j = 0;
    n = f; 
    int pos2 = -1;
    
    while(m-i>=n-j)
    {
        c++;
        if(regex2[j] == text2[i])
        {
            j++;
            i++;
        }
        if(j == n)
        {
            pos2 = i-j;
            break;
        }
        else if (i<m && regex2[j] != text2[i]) {
            if(j != 0)
            {
                j = lps2[j-1];
            }
            else {
                i++;
            }
        }
    }
    
    //cout<<regex2<<endl<<text2<<endl;
    pos2=m-pos2-f;
    //cout<<pos<<" "<<pos2<<" "<<f<<endl;
    if(pos==-1 or pos2==-1 or pos>pos2) return -1;
    return pos2 - pos+f;
}
int main()
{
    ofstream fout(getenv("OUTPUT_PATH"));

    string text;
    getline(cin, text);

    string regex;
    getline(cin, regex);

    int result = getLongestMatch(text, regex);

    fout << result << "\n";

    fout.close();

    return 0;
}

→ Ответить