Problem Statement in English

You are given a list of strings folder representing paths in a file system. Each string represents a folder path, and the paths are absolute paths starting with a /. Your task is to remove all subfolders from the list such that no folder is a subfolder of another.


Approach

My approach is to use a Trie data structure to store the folder paths. The Trie will help us efficiently check if a folder is a subfolder of another folder. We will add each folder to the Trie and then traverse it to collect only the folders that are not subfolders of any other folder.

But there’s actually a simpler way to do this without using a Trie. We can sort the folder paths and then iterate through them, checking if the current folder is a subfolder of the previous folder. If it is not, we add it to the result list.


Solution in Python

class Trie:
    def __init__(self):
        self.children = {}

    def add(self, path):
        ptr = self.children
        subpaths = path.split("/")[1:]

        for i, p in enumerate(subpaths):
            if p not in ptr:
                ptr[p] = TrieNode(i == len(subpaths) - 1)
            else:
                ptr[p].end |= (i == (len(subpaths) - 1))
            ptr = ptr[p].children

    def parents(self, hm, prefix):
        kids = hm
        res = []

        for k, v in kids.items():
            new_prefix = f"{prefix}/{k}"
            if v.end:
                res.append(new_prefix)
            else:
                res.extend(self.parents(v.children, new_prefix))

        return res

class TrieNode:
    def __init__(self, end):
        self.end = end
        self.children = {}

class Solution:
    def removeSubfolders(self, folder: List[str]) -> List[str]:
        root = Trie()
        for f in folder:
            root.add(f)
        return root.parents(root.children, "")

Complexity

  • Time: $O(n)$
    Since we traverse each folder path once to add it to the Trie and then again to collect the results; (n is the number of folder paths).

  • Space: $O(n)$
    The space complexity is also $O(n)$ because we store each folder path in the Trie; (n is the number of folder paths).


Mistakes I Made

Forgot to consider overwriting whether the node is an end node or not when adding a new path to the Trie. This led to incorrect results when multiple paths shared the same prefix.


And we are done.