How to solve Advent of Code 2022 - Day 7 with Python

How to solve Advent of Code 2022 - Day 7 with Python

If you missed any previous days, click here for all my content about that: Advent of Code, if you want to know why you should participate and try to code the solution for yourself, click here: Advent Of Code 2022 – 7 Reasons why you should participate. If you're here to see the solution of Day 7, continue reading ;)

GitHub Repository

https://github.com/GalaxyInfernoCodes/Advent_Of_Code_2022

I will upload all of my solutions there - in the form of Python (or Scala alternatives) notebooks. You have to use your own input for the "input.txt" since everyone gets their own and everyone needs to provide a different solution based on their input.

Day 7 Puzzle

On day 7 of Advent of Code, we had to build a tree of a filesystem that we get knowledge of through recorded terminal commands, such as this:

$ cd /
$ ls
dir a
14848514 b.txt
8504156 c.dat
dir d
$ cd a
$ ls
dir e
29116 f
2557 g
62596 h.lst
$ cd e
$ ls
584 i
$ cd ..
$ cd ..
$ cd d
$ ls
4060174 j
8033020 d.log
5626152 d.ext
7214296 k

Which would generate the following tree:

- / (dir)
  - a (dir)
    - e (dir)
      - i (file, size=584)
    - f (file, size=29116)
    - g (file, size=2557)
    - h.lst (file, size=62596)
  - b.txt (file, size=14848514)
  - c.dat (file, size=8504156)
  - d (dir)
    - j (file, size=4060174)
    - d.log (file, size=8033020)
    - d.ext (file, size=5626152)
    - k (file, size=7214296)

Part 1

Here we then had to calculate the size of the folders to then add up the ones with a total size with less than 100,000. Now you could do this in a shortcut-way without reconstructing the complete tree. However, I chose to train my data structures and build that tree from top to bottom - or rather, from root to leaves.

Tree Nodes

First off, I created a class for the entries in the tree. This would either be a file or a directory, indicated by a boolean value and if it's a file, it has a size.

At the start the node does not have children or a parent. You can however add a child through the method **add_child** which assigns the child a parent before adding it to the children list.

Then I calculated the size recursively, always adding the size of the children to the current size until I arrive at a file (which by its nature does not have children), in which case I return just the file size.

I also included a print method for the tree. Frankly because debugging without being able to see the tree was nearly impossible.

class TreeNode:
    def __init__(self, is_dir: bool, name, size=None) -> None:
        self.is_dir = is_dir
        self.name = name
        self.size = size
        self.children = []
        self.parent = None

    def add_child(self, child):
        child.parent = self
        self.children.append(child)

    def get_size(self):
        if self.is_dir:
            total_size = 0
            for child in self.children:
                total_size += child.get_size()
            return total_size
        else:
            return self.size

    def print_children(self, level):
        if self.is_dir:
            print('--' * level + self.name + ' (total=' + str(self.get_size()) +')')
        else: 
            print('--' * level + self.name + ' (file=' + str(self.get_size()) +')')
        if len(self.children) > 0:
            for child in self.children:
                child.print_children(level+1)

Tree

I also wrote a class for the tree. You don't have to do that, most of the methods are one liners and its main purpose is keeping track of the current directory/node. You could easily do that in your main loop.

class Tree:
    def __init__(self) -> None:
        self.root = TreeNode(is_dir=True, name="root")
        self.current = self.root 

    def reset_to_root(self):
        self.current = self.root

    def go_up_one_level(self):
        self.current = self.current.parent

    def go_to_child(self, name):
        self.current = list(filter(lambda child: child.name == name, self.current.children))[0]

    def add_new_child(self, child):
        self.current.add_child(child)

Parsing the tree

There are a few different cases of lines:

  • '$ cd /' which returns us to the root directory
  • '$ ls' which starts the sequence of listing files and directories within the current ones
    • here I use a while loop to catch all of those and add them as children
  • '$ cd ..' simply goes up to the parent directory/node of the current one
  • '$ cd <name>' which enters the given sub-directory
    • side note: thankfully the (or my) input never tries to enter directories that you didn't previously see
with open('example.txt', 'r') as f:
    lines = f.readlines()
    lines = [entry.strip() for entry in lines]

tree = Tree()

while len(lines) > 0:
    line = lines.pop(0)
    if line == '$ cd /': 
        tree.reset_to_root()
    elif line == '$ ls':
        while len(lines)>0 and '$' not in lines[0]:
            line = lines.pop(0)
            size, name = line.split(' ')
            if size.isdigit():
                new_node = TreeNode(is_dir=False, name=name, size=int(size))
            else:
                new_node = TreeNode(is_dir=True, name=name)
            tree.add_new_child(new_node)
    elif line == '$ cd ..':
        tree.go_up_one_level()
    elif '$ cd' in line:
        _, _, name = line.split(' ')
        tree.go_to_child(name)

Putting it together

This is the method (within the TreeNode class), that solves Part 1:

def find_subdirectories_part1(self):
        dir_sizes = 0
        if self.is_dir:
            for child in self.children:
                if child.is_dir and child.get_size() <= 100000:
                    dir_sizes += child.get_size() + child.find_subdirectories_part1()
                else:
                    dir_sizes += child.find_subdirectories_part1()
        return dir_sizes

Here we recursively go through the tree building up a sum of file sizes as we go. When iterating through a nodes children, we only add the current directory size to our sum, if it's below the 100,000 threshold. Otherwise we just look deeper in the tree.

# call on root node to solve
tree.root.find_subdirectories_part1()

Part 2

After building up a whole tree in Part 1, Part 2 is fairly quick after reading the assignment. Basically, we are looking for a directory that is at least as big as 30000000 - (70000000 - tree.root.get_size()).

To do that I created a similar method as in Part 1. This time, I'm not building a sum, but instead collecting all the possible directory sizes in an array.

# method in TreeNode class
def find_subdirectories_part2(self, min_size):
        dir_sizes = []
        if self.is_dir:
            for child in self.children:
                if child.is_dir and child.get_size() >= min_size:
                    dir_sizes += [child.get_size()] + child.find_subdirectories_part2(min_size)
                else:
                    dir_sizes += child.find_subdirectories_part2(min_size)
        return dir_sizes

The instruction was the smallest sufficiently large directory, so finally I select the minimum of all directories found above.

total_space = 70000000
space_needed = 30000000
current_empty_space = 70000000 - tree.root.get_size()
possible_dirs = tree.root.find_subdirectories_part2(space_needed - current_empty_space)
min(possible_dirs)

Conclusion

I'm tired. I did this way too late.