Problem Statement in English

You’re given a string s that represents a DNA sequence. Return all the 1010-letter-long substrings that occur more than once in it.


Approach

We can use a sliding window approach to solve this problem. In addition, we can keep track of the substrings of length 10 that we have seen so far. If we see a substring that we have already seen, we add it to the answer.


Solution in Python


class Solution:
    def findRepeatedDnaSequences(self, s: str) -> List[str]:
        # If the length of the string is less than 10, 
        # we can't have any repeated substrings
        if len(s) < 10:
            # so return an empty list
            return []

        # to store the final answer
        ans = set()

        # to store the substrings of length 10,
        # that we have seen so far
        seen = set()

        # init the left and right pointers
        l, r = 0, 10

        # iterate over the string
        # while we are within range
        while r <= len(s):
            # if we've already seen this substring
            if s[l:r] in seen:
                # add it to the answer
                ans.add(s[l:r])
            else:
                # otherwise, add it to the set of seen substrings
                seen.add(s[l:r])

            # move the window:
            # l by 1
            l += 1
            # r by 1
            r += 1

        # return the final answer
        return list(ans)

Complexity

  • Time: O(n)O(n)
    Since we iterate over nn elements at most.

  • Space: O(n)O(n)
    Since we store nn elements at most in the set.


And we are done.