Problem Statement in English

You’re give a binary string s (a string consisting only of ‘0’s and ‘1’s). A substring is considered to have dominant ones if the number of ‘1’s in the substring is at least the square of the number of ‘0’s in that substring.

Return the number of substrings of s that have dominant ones.


Approach

Right off the bat, definitely check out NeetCode’s video for this. The pen and paper sketch, and dry run he does makes this much easier to understand.

We build on the naive brute force solution and bring down the time complexity from O(n2)O(n^2) to O(n×n)O(n \times \sqrt{n}) using some precomputation and optimized iteration.

The brute force solution would be to consider all substrings, count the number of zeroes and ones in each substring, and check if the number of ones is at least the square of the number of zeroes. This would take O(n2)O(n^2) time since there are O(n2)O(n^2) possible substrings.

To optimize this, we can precompute the next occurrence of ‘0’ for each index in the string. This allows us to quickly jump to the next zero when iterating through substrings. The reason we know that the inner loop runs at most O(n)O(\sqrt{n}) times is that for each substring, the number of zeroes must be at most the square root number of ones - meaning that there can only be n\sqrt{n} zeroes before the condition fails, and we move l to the next index.

So that was how we actually move the r pointer in the inner loop. The next question is how we count the number of substrings that satisfy the condition, given a fixed l and r, and also next_z. Remember that it’s l <= r < next_z.

The reason we use next_z is that at the next zero is the where we definitely have to re-evaluate the condition, because of the addition of a zero.

There are 2 cases that can arise, assuming that the threshold condition is satisfied (i.e., ones>=zeroes2\text{ones} >= \text{zeroes}^2):

  • The stuff in between r and next_z are all extras. This means that we can add next_z - r to the result.

  • The stuff in between r and next_z are not all extras - some of them help our current window reach the threshold. This means that we can only add ones - thresh + 1 to the result, i.e., the surplus.


Solution in Python


class Solution:
    def numberOfSubstrings(self, s: str) -> int:
        N = len(s)

        # initially make the next zero always be out of bounds
        next_zero = [N] * N
        for i in reversed(range(N - 1)):
            # if next character is zero, then next zero is that character
            # else, it's whatever the next character's next zero is
            next_zero[i] = i + 1 if s[i + 1] == "0" else next_zero[i + 1]

        res = 0
        for l in range(N):
            # check if current l pointer value is zero
            zeroes = 1 if s[l] == "0" else 0
            # start r at l
            r = l

            # either of the following conditions are fine
            # 
            # zeroes <= sqrt(N)
            # pow(zeroes, 2) <= N zeroes
            while pow(zeroes, 2) <= N:
                # find the next zero from r
                next_z = next_zero[r]
                # count the number of ones between l and next_z
                ones = (next_z - l) - zeroes

                # calculate the threshold
                thresh = pow(zeroes, 2)

                # check if the condition is satisfied
                if ones >= thresh:
                    # add to result
                    res += min(next_z - r, ones - thresh + 1)

                # we're going to the next zero, 
                # so there will be one more zero in the substring
                zeroes += 1
                # move r to next zero
                r = next_z
                
                # break if r is out of bounds
                if r == N:
                    break

        # return the result
        return res

Complexity

  • Time: O(n×n)O(n \times \sqrt{n})
    Since for each starting index, we iterate through possible substrings with increasing zeroes, which is limited by the quadratic growth of ones.

  • Space: O(n)O(n)
    Since we use only a few extra variables and an array of size n.


Mistakes I Made

I had to watch NeetCode’s video solution


And we are done.