Problem Statement in English
You’re given a string s that represents a DNA sequence. Return all the -letter-long substrings that occur more than once in it.
Approach
We can use a sliding window approach to solve this problem. In addition, we can keep track of the substrings of length 10 that we have seen so far. If we see a substring that we have already seen, we add it to the answer.
Solution in Python
class Solution:
def findRepeatedDnaSequences(self, s: str) -> List[str]:
# If the length of the string is less than 10,
# we can't have any repeated substrings
if len(s) < 10:
# so return an empty list
return []
# to store the final answer
ans = set()
# to store the substrings of length 10,
# that we have seen so far
seen = set()
# init the left and right pointers
l, r = 0, 10
# iterate over the string
# while we are within range
while r <= len(s):
# if we've already seen this substring
if s[l:r] in seen:
# add it to the answer
ans.add(s[l:r])
else:
# otherwise, add it to the set of seen substrings
seen.add(s[l:r])
# move the window:
# l by 1
l += 1
# r by 1
r += 1
# return the final answer
return list(ans)
Complexity
Time:
Since we iterate over elements at most.Space:
Since we store elements at most in the set.
And we are done.