Walking with filesystems: Go's new fs.FS interface

Walking with filesystems: Go's new fs.FS interface

To understand recursion, you must first understand recursion.

—Traditional

The new io/fs package introduced in Go 1.16 gives us a powerful new way of working with filesystems: that is, trees of files. In fact, the fs.FS interface can be used with more than just files: it abstracts the idea of a path-value map.

Introducing io/fs

In principle, any set of objects that can be addressed by a hierarchy of pathnames can be represented by an fs.FS. A tree of disk files is the obvious example, but if we design our program to operate on an fs.FS value, it can also process ZIP and tar archives, Go modules, arbitrary JSON, YAML, or CUE data, or even Web resources addressed by URLs.

Walk with me, then, as we take a tour of the new io/fs package, the fs.FS interface in particular, and the power of the filesystem abstraction.

A simple file counter

Suppose we have been tasked with writing a tool that will count the number of Go source files contained in some tree (for example, a project repository).

Opening a tree of files, addressed by some path, is straightforward. We can do this by calling os.DirFS:

fsys := os.DirFS("testdata/tree")

Walking the tree

Now, how do we walk this tree? In other words, how do we recursively traverse each folder within the tree, and visit every file, no matter how deeply nested?

The fs.WalkDir function does exactly this. It takes a filesystem and some starting path within it, and recursively walks the tree, visiting every file and folder (in lexical order; that is, alphabetically).

For each one it finds, it calls some function that you provide, passing it the pathname. For example:

var count int
fsys := os.DirFS("testdata/tree")
fs.WalkDir(fsys, ".", func(p string, d fs.DirEntry, err
    error) error {
    if filepath.Ext(p) == ".go" {
        count++
    }
    return nil
})
fmt.Println(count)

(Listing findgo/2)

A file-finding tree-walker

It looks like using a filesystem and fs.WalkDir will work for our file-finding program, so let’s see how to turn it into a full-fledged, well-tested Go package.

To do that, let’s expand our ambitions a bit. Counting files can be useful, but it seems a shame to go to all the trouble of finding the files, only to throw away everything but the number of files we found.

Suppose users wanted to get a list of those files; well, it’s bad luck for them, if all they have is the value of count. They’d have to walk the tree all over again.

On the other hand, if we have the list of files, it’s very easy to count them: just use the built-in len function. Finding files is the more general problem, so let’s try to solve that in a useful way.

As usual, let’s first think about the main function we’d like to write, with absolutely minimal paperwork. Something like this would be nice:

func main() {
    paths := findgo.Files(os.Args[1])
    for _, p := range paths {
        fmt.Println(p)
    }
}

(Listing findgo/3)

It wouldn’t actually be that simple, in practice, since we’d need to check that os.Args[1] exists, report errors, and so on. But the CLI isn’t the point of this example, so let’s take it as read for now, and see how findgo.Files would work.

It would need to take the pathname of some folder as its argument, and it would walk the tree rooted at that folder finding Go files, in the way that we’ve already done as a proof of concept. Let’s write a test for that.

func TestFilesCorrectlyListsFilesInTree(t *testing.T) {
    t.Parallel()
    want := []string{
        "file.go",
        "subfolder/subfolder.go",
        "subfolder2/another.go",
        "subfolder2/file.go",
    }
    got := findgo.Files("testdata/tree")
    if !cmp.Equal(want, got) {
        t.Error(cmp.Diff(want, got))
    }
}

(Listing findgo/3)

We’ll copy our example tree of files into testdata/tree so the test has something to work on. So the test is saying that if we call Files with this path, in which there are four Go files, it should return the expected slice of strings. Over to you to make this work.

GOAL: Implement Files.

Well, we’ve already more or less done it, haven’t we? We can take the code from our main.go proof of concept and move it straight into the findgo package. All we need to change is that, instead of incrementing a counter every time we find a file, we append its path to a slice instead.

func Files(path string) (paths []string) {
    fsys := os.DirFS(path)
    fs.WalkDir(fsys, ".", func(p string, d fs.DirEntry, err error) error {
        if filepath.Ext(p) == ".go" {
            paths = append(paths, p)
        }
        return nil
    })
    return paths
}

(Listing findgo/3)

Excellent! The program works perfectly on our little test tree. But we can imagine that a program with more complicated logic might run into problems, especially in large and complicated filesystems. How could we test cases like that?

The fstest.MapFS type is a neat way to test code that traverses filesystems, without needing any disk access. Instead, it’s an fs.FS that lives entirely in memory, based on a Go map.

Let’s see how to rewrite our test for Files using a MapFS instead of regular disk files.

func TestFilesCorrectlyListsFilesInMapFS(t *testing.T) {
    t.Parallel()
    fsys := fstest.MapFS{
        "file.go":                {},
        "subfolder/subfolder.go": {},
        "subfolder2/another.go":  {},
        "subfolder2/file.go":     {},
    }
    want := []string{
        "file.go",
        "subfolder/subfolder.go",
        "subfolder2/another.go",
        "subfolder2/file.go",
    }
    got := findgo.Files(fsys)
    if !cmp.Equal(want, got) {
        t.Error(cmp.Diff(want, got))
    }
}

(Listing findgo/4)

We’ll need to update Files to take an fs.FS as its parameter instead of a pathname. And since we’re receiving the filesystem now, we needn’t open it ourselves using os.DirFS, so we can remove that call.

Here’s the modified Files function:

func Files(fsys fs.FS) (paths []string) {
    fs.WalkDir(fsys, ".", func(p string, d fs.DirEntry, err error) error {
        if filepath.Ext(p) == ".go" {
            paths = append(paths, p)
        }
        return nil
    })
    return paths
}

(Listing findgo/4)

Using fs.FS in APIs

There’s nothing stopping you from writing your own fs.FS implementation, and it’s quite straightforward. Indeed, whenever you’re writing Go code to deal with data that could in principle be addressed as a path-value tree, you might like to consider accepting an fs.FS as input, or making your data type satisfy fs.FS itself. It all helps to make your libraries more flexible, useful, powerful, and friendly.

We can see the effect of this with our file-finder example. Initially, because it took a disk pathname, the only thing we could use it to search was a disk-based filesystem. Now that we’ve updated it to accept fs.FS, it can operate on anything satisfying that interface. Our test can pass it a MapFS and it works just fine.

So what else would work? We mentioned earlier some examples of other things that satisfy fs.FS. Just for fun, let’s try Files with a filesystem derived from a ZIP archive: after all, it should work, shouldn’t it?

First, let’s zip up our test tree folder and its contents using the zip command. If you don’t have that command, you can use anything that creates standard ZIP files, including the macOS Finder’s “Compress” action.

cd testdata

zip -r files.zip tree/

adding: tree/ (stored 0%)
adding: tree/subfolder/ (stored 0%)
adding: tree/subfolder/subfolder.go (stored 0%)
adding: tree/subfolder2/ (stored 0%)
adding: tree/subfolder2/another.go (stored 0%)
adding: tree/subfolder2/file.go (stored 0%)
adding: tree/file.go (stored 0%)

All these files are empty, which is why zipping them doesn’t seem to save much space, but that’s not the point: we just want a ZIP file to play with. Now, how do we open it as a filesystem?

Helpfully, Go provides facilities for reading ZIP files in the standard library package archive/zip, so here’s our test:

func TestFilesCorrectlyListsFilesInZIPArchive(t *testing.T) {
    t.Parallel()
    fsys, err := zip.OpenReader("testdata/files.zip")
    if err != nil {
        t.Fatal(err)
    }
    want := []string{
        "tree/file.go",
        "tree/subfolder/subfolder.go",
        "tree/subfolder2/another.go",
        "tree/subfolder2/file.go",
    }
    got := findgo.Files(fsys)
    if !cmp.Equal(want, got) {
        t.Error(cmp.Diff(want, got))
    }
}

(Listing findgo/4)

We call zip.OpenReader with the pathname of our test ZIP file, and the result is a value that satisfies fs.FS, so we can pass it directly to Files. And, of course, it gives us the correct answer:

PASS

Reassuring!

Don't write clean code, write CRISP code

Don't write clean code, write CRISP code

Review: 'Let's Go Further'

Review: 'Let's Go Further'