Problem Statement in English
You’re given a 0-indexed integer array nums and two integers k and x. A subarray is called k-long if its length is equal to k.
The x sum of a k-long subarray is defined as the sum of the x largest distinct elements in the subarray. If there are fewer than x distinct elements in the subarray, the x sum is considered to be 0.
If 2 elements have the same frequency, the larger element is considered larger.
Return an integer array of results for each k-long subarray of nums.
The difference between this problem and its predecessor is that here, n can be up to 10^5, requiring a more efficient solution.
Approach
The main optimisation that we aim to make in this problem compared to its predecessor, is to avoid the repetitive sorting of the frequency map for each window. If you think about it, when we slide the window by one position to the right, only two elements are affected - the element that is removed from the left of the window, and the element that is added to the right of the window. Yet here we are, sorting the entire window again.
Also, we need to be able to calculate the sum of the top x frequent elements efficiently. We can’t be iterating over the top x elements every time.
The idea for this problem, as NeetCode pointed out, comes from another problem - “Find Kth Largest Element in a Stream”. In that problem, a popular 2 heap problem.
In this case, since we need to always have the current x sum, we maintain 2 SortedLists - one for the top x frequent elements, and another for the rest of the elements. We also maintain a variable cachedSum to store the current x sum.
When we add or remove an element from the window, we now need to balance the two lists, and cachedSum.
We check if it is in the top x list or the other list, and adjust accordingly. There’s a bunch of edge cases that need to be handled here.
If there are less than x elements in the top x list, we move elements from the other list to the top x list, if possible.
Similarly, if there are more than x elements in the top x list, we move elements from the top x list to the other list.
And lastly, we need to ensure that the top x list always contains the x most frequent elements. So we compare the smallest element in the top x list with the largest element in the other list, and swap them if necessary.
We keep performing these steps until we’ve balanced the two lists. All while updating cachedSum accordingly.
Solution in Python
class Solution:
def findXSum(self, nums: List[int], k: int, x: int) -> List[int]:
freq = defaultdict(int)
others, topx = SortedList(), SortedList()
res = []
l = cachedSum = 0
def removeTuple(t):
nonlocal cachedSum
if t in others:
others.remove(t)
elif t in topx:
topx.remove(t)
cachedSum -= t[0] * t[1]
if not others:
return
f, v = others.pop()
cachedSum += f * v
topx.add((f, v))
def insertTuple(t):
nonlocal cachedSum
f, v = t
if not f:
return
others.add(t)
while others and topx and ((others[-1][0] > topx[0][0]) or (
others[-1][0] == topx[0][0]
and others[-1][1] > topx[0][1]
)):
bt = others.pop()
xt = topx.pop(0)
cachedSum -= xt[0] * xt[1]
cachedSum += bt[0] * bt[1]
others.add(xt)
topx.add(bt)
while len(topx) > x:
xt = topx.pop(0)
cachedSum -= xt[0] * xt[1]
others.add(xt)
while len(topx) < x and others:
bt = others.pop()
cachedSum += bt[0] * bt[1]
topx.add(bt)
for r in range(len(nums)):
# remove the old tuple of nums[r]:
removeTuple((freq[nums[r]], nums[r]))
freq[nums[r]] += 1
# add the new tuple of nums[r]
insertTuple((freq[nums[r]], nums[r]))
ws = r - l + 1
if ws < k:
continue
elif ws > k:
# remove the old tuple of nums[l]:
removeTuple((freq[nums[l]], nums[l]))
freq[nums[l]] -= 1
# add the new tuple of nums[l]:
insertTuple((freq[nums[l]], nums[l]))
l += 1
res.append(cachedSum)
return res
Complexity
Time: $O(n \log n)$
Since for each of thenelements, we are performing operations onSortedLists which take $O(\log n)$ time.Space: $O(n)$
Since we’re storingnelements at most.
Mistakes I Made
I watched NeetCode’s video on this problem and implemented a similar approach using SortedList to maintain the top x frequent elements in the current window.
And we are done.