Problem Statement in English
You’re given two integer arrays source and target, both of length n, and an array allowedSwaps where allowedSwaps[i] = [a, b] indicates that you are allowed to swap the elements at index a and index b of the source array.
You can swap elements at the given indices any number of times.
Return the minimum Hamming distance between source and target after performing any amount of swap operations on source.
Hamming distance is defined as the number of positions at which the corresponding elements are different.
Approach
The key idea is to treat the allowed swaps as edges in a graph, where the indices of the source array are the nodes. This way, we can find connected components in the graph, which represent groups of indices that can be swapped among themselves.
Once we have the connected components, we can count the frequency of each value in the source and target arrays for each component.
The minimum Hamming distance can then be calculated by comparing the frequency counts of the values in the source and target arrays for each component.
We can use bookkeeping to keep track of the counts of values in the source and target arrays for each component.
For each index in the given arrays, if the integer at that index is different in source and target, we increment the count for that value in the source array and decrement the count for that value in the target array.
Think of the source as a source, and the target as a sink. If the count is positive, it means we have more of that value in the source than in the target, which contributes to the Hamming distance. But if it is negative, it means we have more of that value in the target than in the source. We can count either one and ignore the other, since they are both 2 sides of the same coin. In this solution we will count the positive counts, which represent the excess values in the source that cannot be matched with the target.
I also want to go over the iterative find method for the union find, which is more efficient for large constraints. The way it works is that we first find the root of the node, and then we do path compression by making all nodes on the path point directly to the root. This way, future find operations will be faster.
First we find the root of the node, and until the root of the node is itself, we keep going up the tree. Now we know the root of the node passed into the find method.
We can do path compression by making all nodes on the path point directly to the root. We can do this by iterating through the path again and updating the parent of each node to be the root. Remember to copy the current parent of a node before updating the parent of the node so that once you’re done you can move to the next node in the path.
And we’re done!
Solution in Python
class Solution:
def minimumHammingDistance(self, source: List[int], target: List[int], allowedSwaps: List[List[int]]) -> int:
N = len(source)
parent = list(range(N))
# Iterative find is faster for large constraints
def find(i):
root = i
while parent[root] != root:
root = parent[root]
# Path compression
while parent[i] != root:
next_node = parent[i]
parent[i] = root
i = next_node
return root
def union(i, j):
root_i = find(i)
root_j = find(j)
if root_i != root_j:
parent[root_i] = root_j
for u, v in allowedSwaps:
union(u, v)
count_map = defaultdict(lambda: defaultdict(int))
for i in range(N):
root = find(i)
s_val = source[i]
t_val = target[i]
if s_val != t_val:
count_map[root][s_val] += 1
count_map[root][t_val] -= 1
distance = 0
for root in count_map:
for val in count_map[root]:
if count_map[root][val] > 0:
distance += count_map[root][val]
return distance
Complexity
Time: $O(N + S)$
WhereNis the length of thesourcearray andSis the number of allowed swaps. The union-find operations take near constant time, and we iterate through the arrays a few times.Space: $O(N)$
Since we store the parent array for union-find and the count map for the connected components.
Mistakes I Made
I jumbled up the reconcillation step instead of using the current bookkeeping method.
And we are done.