Be lazy. Use ranges.

For many of us (I daresay most of us), our journey into programming started with procedural languages. Before we knew about objects and higher order functions, we knew about if... else statements and loops. So it’s no surprise if we are quick to write a loop when we need to manipulate a set of data. At this point, they’re engrained in our collective consciousness. To suggest “something better than loops” sounds asinine.

But let’s do just that. If you take a step back, you realize loops carry some serious baggage as soon as they start doing any heavy lifting. Sean Parent discusses this at length in his presentation 3 Goals for Better Code. His first goal is “no raw loops”. Sean defines these as “any loop inside a function where the function serves a purpose larger than the algorithm implemented by the loop”. Why are they bad?

Allow me to answer the question with a question. Quick, what does this loop in this function do?

void PanelBar::RepositionExpandedPanels(Panel* fixed_panel) {
  CHECK(fixed_panel);

  // First, find the index of the fixed panel.
  int fixed_index = GetPanelIndex(expanded_panels_, *fixed_panel);
  CHECK_LT(fixed_index, static_cast<int>(expanded_panels_.size()));

  // Next, check if the panel has moved to the other side of another panel.
  const int center_x = fixed_panel->cur_panel_center();
  for (size_t i = 0; i < expanded_panels_.size(); ++i) {
    Panel* panel = expanded_panels_[i].get();
    if (center_x <= panel->cur_panel_center() ||
        i == expanded_panels_.size() - 1) {
      if (panel != fixed_panel) {
        // If it has, then we reorder the panels.
        std::tr1::shared_ptr<Panel> ref = expanded_panels_[fixed_index];
        expanded_panels_.erase(expanded_panels_.begin() + fixed_index);
        if (i < expanded_panels_.size()) {
          expanded_panels_.insert(expanded_panels_.begin() + i, ref);
        } else {
          expanded_panels_.push_back(ref);
        }
      }
      break;
    }
  }

  // Function continues...
}

Noted in Parent's talk, and taken from Chromium source code circa March 2014

I don’t know either. We can figure it out if we take the time, but it’s as clear as mud. Parent points out how raw loops are nasty since they:

Are difficult to reason about and prove postconditions. It’s hard to determine the state of expanded_panels (the data the loop is operating on) once the loop completes.
Are error-prone and likely to fail under non-obvious conditions. We have all written off-by-one errors. What happens at the start and end of a loop? When does a loop terminate if you modify the data it’s iterating over?
Introduce non-obvious performance problems. What’s the complexity of the loop above? It’s inserting and removing items from expanded_panels, so we probably get some sort of nasty quadratic.
Complicate reasoning about the surrounding code. Any variables touched in a raw loop become more difficult to reason about. When do they change? How many times? What are the side effects?

One issue Sean doesn’t discuss is how loops also aren’t particularly composable or portable. If you want to feed a loop’s output into some other algorithm, you have to slap the algorithm directly into the loop (making it even hairier), store the results in some container, or use some language feature like C# and Python’s yield.

We can do better.

Let’s look for a better way to traverse data. Abstract away the starting and stopping points of iteration, and we are left with the following operations:

Are we done iterating?
What is the current item?
Get the next item.

This is nothing new — it sounds suspiciously like the Iterator pattern the “Gang of Four” were writing about while I was still in diapers. True. But Design Patterns focuses on abstracting away the iteration plumbing to provide a common interface for different containers. There is so much more we can do with this idea! With types built around these operations (let’s call these types ranges), we can:

Lazily generate data on the fly. This is huge. The GoF largely discussed iterating over existing, in-memory data. Instead of storing the values, we can just create them whenever our range is asked for the next one. Since a huge part of performance on modern machines is based on how you access memory, it’s a big win to need much less memory in the first place. And of course, if you actually do need all your values at once (e.g. for random access), it’s trivial to slurp the values out of the range and into an array or list.
Make range-based algorithms. The laziness also allows us to easily write algorithms as functions which take a source range as input and provides a range with the desired transformation as output.

Case Study

The only things that annoy me more than programming articles without examples are ones with toy examples which come nowhere close to what a Real Program™ might look like. So let’s write a Real™ (but short) program to see how ranges work in practice.

The goal

Several Linux filesystems (XFS, ext4, btrfs, tmpfs) support sparse files, i.e. files which save space by omitting empty filesystem blocks containing only zeroes. Unfortunately, these “holes” in the files aren’t automatically created by writing a string of zeroes, but only by seeking past blocks with fseek, lseek, etc. It’s entirely possible more room on your hard drive could be saved by finding empty blocks in your files and replacing them with holes. Linux already provides a tool to punch holes (fallocate), but it only takes a single file. Let’s write our own hole punch. We’ll use D since its standard library has embraced ranges and is built almost entirely out of them.

The algorithm

To make a file sparse, read the file one block at a time, finding blocks containing only zeroes. For each contiguous run of these empty blocks, call fallocate (the system call, not the program mentioned above) with the start of our desired hole and its length.

Simple. Sorta.

Step one: Getting our files

Like most other command line utilities which operate on files, we want to take a bunch of paths as command line arguments. If the user specifies -r or --recursive, we’ll recurse into directories. Otherwise we’ll ignore them.

From D’s std.range library we get higher order functions like map and filter. These take an input range and a function (or lambda) to apply to each element, returning a range which applies the operation to the input. For example, if I had a range that gave a sequence of positive integers (1, 2, 3, …), I could convert it to a range of positive even integers with filter:

auto positiveEvens = positiveIntegers.filter!(i => i % 2 == 0);

If you’re wondering why the code is so excited, ! is D’s template syntax. filter!(foo)(bar) is akin to filter<foo>(bar) in languages like C++, C#, Java, etc. Here we’re providing a lambda that selects the elements we want to keep.

Using these tools, let’s get to work expanding our command line arguments into a range of files we want to punch:

InputRange!string argsToPaths(string[] paths, bool recursive)
{
    // Ignore paths that don't exist (see filterExisting)
    auto existingPaths = paths.filter!(p => filterExisting(p));

    // The .save is so we make a copy of the range, using the original
    // for directories below. This does mean we'll filter twice
    // (and possibly stat the file on the filesystem multiple times),
    // but it's a minuscule hit in comparison to the rest of the I/O
    // we'll be doing.
    auto files = existingPaths.save.filter!(p => p.isFile);
    auto dirs = existingPaths.filter!(p => p.isDir);

    if (recursive) {
        auto expandedDirs = dirs
            .map!(p => dirEntries(p, SpanMode.depth, false)) // recurse
            .joiner // Join these ranges into one contiguous one
            .filter!(p => p.isFile) // We only want the files
            .map!(de => de.name); // Reduce DirEntry back down to string

        // We wrap this so we can use polymorphism to have a single return type
        // (the InputRange interface)
        return inputRangeObject(chain(files, expandedDirs));
    }
    else {
        foreach (dir; dirs)
            stderr.writeln("ignoring directory " , dir);

        // We wrap this so we can use polymorphism to have a single return type
        // (the InputRange interface)
        return inputRangeObject(files);
    }
}

bool filterExisting(string path)
{
    if (path.exists) {

        auto de = DirEntry(path);
        if (!de.isFile && !de.isDir) {
            stderr.writeln("ignoring special file", path);
            return false;
        }

        return true;
    }
    else {
        stderr.writeln(path, " does not exist");
        return false;
    }
}

Normally we wouldn’t even worry about the return type for argsToPaths, since ranges in D use compile-time introspection to provide a sort of duck typing. If a type behaves like a range, you can feed it to another range. But in order to return the same type regardless of whether recursive is true or false, we must settle for run-time polymorphism. The helper function inputRangeObject creates objects which implement a common InputRange interface we can use for the return type.

Also note that map, joiner, and filter are all standalone functions which return ranges, but thanks to D’s Uniform Function Call Syntax (UFCS), we can easily chain them together without much syntactic noise. Consider what the definition of expandedDirs would look like without UFCS:

map!(de => de.name)(
    filter!(p => p.isFile)(
        joiner(
            map!(p => dirEntries(p, SpanMode.depth, false))(dirs))));

Ew.

Next: Lots o’ Posix

D of course has a standard file I/O library, but like similar libraries in other languages, it doesn’t provide things we need for this job, such as the number of actual filesystem blocks used by a file. So we’ll fall back on good old Posix. Let’s define a thin wrapper for opening files:

int openToReadAndWrite(string path)
{
    return open(path.toStringz, O_RDRW);
}

Once this is done, creating a range of file descriptors is as simple as mapping the range we got from argsToPaths and filtering out bad file descriptors (when open fails):

/// Filter out bad file descriptors with the side effect of warning users
/// about them.
bool filterDescriptorsAndWarn(string path, int fd)
{
    immutable ret = fd >= 0;
    if (!ret)
        stderr.writeln("Could not open ", path, ", skipping");

    return ret;
}

auto descriptorRange = argsToPaths(args, recursive)
    // Open the file descriptor and tack it on
    .map!(path => tuple!("path", "fd")(path, openToReadAndWrite(path)));
    .cache() // More on this later
    // Filter out bad file descriptors and warn about them
    .filter!(f => filterDescriptorsAndWarn(f.path, f.fd));

Now we have a range that will return a tuple (with named fields, no less!) of each file’s path and its opened Posix file descriptor. We need to get some information like the filesystem’s block size, so grab that with fstat.

/// Information about a file deduced from stat() that we care about
struct FileInfo {
    /// The apparent size of the file
    size_t logicalSize;
    /// The actual, on-disk size of the file (in multiples of block size)
    size_t actualSize;
    /// The size of the filesystem blocks for this file
    size_t blockSize;
}

FileInfo getFileInfo(int fd)
{
    stat_t ss;
    enforce(fstat(fd, &ss) == 0, "fstat failed");

    FileInfo ret;
    ret.logicalSize = ss.st_size;
    // st_blocks is in "blocks" of 512, regardless of the actual st_blksize.
    ret.actualSize = ss.st_blocks * 512;
    ret.blockSize = ss.st_blksize;

    return ret;
}

Getting runs of empty blocks

Armed with the block size of the file, we can find the sets of empty blocks that we want to replace with holes. Let’s define a quick struct to represent these.

struct ZeroRun {
    /// The offset in bytes from the start of the file
    size_t start;
    /// Length of the run, in bytes
    size_t length;
}

Now write a function that takes a file descriptor along with the file info and returns a range of ZeroRuns. A basic input range in D only needs to offer an empty property, a front property that gives the current element of the range, and a popFront function to advance to the next element. Let’s dive in.

auto getZeroRuns(int fd, const ref FileInfo fi)
{
    // See readBlock() below
    static struct BlockResults {
        /// The block contained no non-zero bytes
        bool isAllZeroes;
        /// The amount read (could be less than the block size if we hit EOF)
        ReturnType!read amountRead;
    }

    /// The type that acts as our range of ZeroRuns
    static struct PossibleHoleFinder {

    private:
        int fd; // file descriptor
        ubyte[] bb; // block buffer
        ZeroRun curr; // current item in the range

        /// Reads one filesystem block of the file and checks if it was all 0.
        BlockResults readBlock() {
            assert(bb !is null);
            BlockResults ret;
            ret.amountRead = read(fd, &bb[0], bb.length); // Standard Posix read
            enforce(ret.amountRead >= 0, "read() failed with errno " ~
                                         errno.to!string);
            // See if all the bytes we just read are 0.
            ret.isAllZeroes = all!(b => b == 0)(bb[0 .. ret.amountRead]);
            return ret;
        }

    public:
        // Constructor that takes the file descriptor and the block size
        this(int _fd, size_t bs) {
            fd = _fd;
            // Allocate a buffer the size of a block to read into
            bb = new ubyte[bs];
            popFront(); // Bring up the first zero run
        }

        @property auto ref front() { return curr; }

        void popFront()
        {
            enforce(!empty, "You cannot pop an empty range.");

            // Look for a block that is all zeroes (or stop at EOF)
            BlockResults br;
            do {
                br = readBlock();
            } while (br.amountRead > 0 && !br.isAllZeroes);

            // If we hit EOF, we're done.
            if (br.amountRead == 0) {
                bb = null; // Makes empty() true
                return;
            }

            // Otherwise we just hit the start of a new run of zeroes
            // Update curr (our current ZeroRun) as needed
            auto currentLocation = lseek(fd, 0, SEEK_CUR);
            enforce(currentLocation >= 0,
                    "lseek(fd, 0, SEEK_CUR) failed with errno " ~
                    errno.to!string);
            curr.start = currentLocation - br.amountRead;
            curr.length = br.amountRead;

            // Keep reading until we hit EOF or stop getting zeroes
            while(true) {
                br = readBlock();
                if (br.amountRead == 0 || !br.isAllZeroes)
                    return;

                curr.length += br.amountRead;

                // We may be in a hole, and can drastically speed things up
                // by jumping out of the hole using lseek with SEEK_DATA
                currentLocation = curr.start + curr.length;
                immutable soughtLocation = lseek(fd, currentLocation, SEEK_DATA);

                // There is a very unlikely but possible chance the rest
                // of this file is one big hole, in which case lseek will fail
                // with errno ENXIO
                if (soughtLocation < 0) {
                    enforce(errno == ENXIO,
                            "Unknown failure of lseek(SEEK_DATA), errno " ~
                            errno.to!string);
                    immutable end = lseek(fd, 0, SEEK_END);
                    curr.length = end - curr.start;
                    return;
                }

                // If we didn't skip over a hole, sought - current == 0,
                // so there's no need to branch here.
                curr.length += soughtLocation - currentLocation;
            }
        }

        // Our buffer pointer doubles as an indicator if we've hit EOF
        // See the EOF condition in popFront()
        @property bool empty() { return bb is null; }
    }
    // Ensure we have created an input range which returns ZeroRuns.
    static assert(isInputRange!PossibleHoleFinder);
    static assert(is(ReturnType!(PossibleHoleFinder.front) == ZeroRun));

    return PossibleHoleFinder(fd, fi.blockSize);
}

That was a lot at once, but most of the code was boilerplate around our Posix calls. By encapsulating all the nastiness here, the rest of our work becomes a cakewalk. Notice a few other things:

By defining the type inside the function, we ensure that nobody can create one by name, forcing users to concentrate on its interface (as a range) instead of its implementation. This pattern in D is known as Voldemort Types.
We can use static assert to double check at compile-time that the type we defined is an input range which returns ZeroRuns.

The easy part

To figure out how much space we’re going to save by punching all these holes, we can do some pretty simple math. We’ll sum up the space in a file made of empty blocks:

auto zeroSpace = getZeroRuns(file.fd, info)
    .map!(zr => zr.length)
    .sum;

Then do some simple algebra based on this value, the logical size of the file, and how much space it actually takes up:

size_t pessimalSize(const ref FileInfo fi)
{
    // The largest the file could be is its logical size
    // rounded up to the nearest block.
    immutable pessimal  = fi.logicalSize +
                          (fi.blockSize - fi.logicalSize % fi.blockSize);
    assert(pessimal >= fi.actualSize);  // Sanity check
    return pessimal;
}

size_t possibleSavings(const ref FileInfo fi, size_t zeroSpace)
{
    immutable optimal = pessimalSize(fi) - zeroSpace; // The smallest it could be

    // The amount of space we can save is the difference between the optimal
    // size and the current (actual) size, provided the value is positive.
    return fi.actualSize <= optimal ? 0 : fi.actualSize - optimal;
}

Now let’s actually punch the holes.

void punchHole(int fd, ZeroRun run)
{
    enforce(fallocate(fd, FALLOC_FL_KEEP_SIZE | FALLOC_FL_PUNCH_HOLE,
                      run.start, run.length) == 0,
            "fallocate failed with error " ~ errno.to!string);

}

We can punch these holes as we iterate through our zero runs using tee. tee calls a function on each element of a range before passing it through, unmodified, to the range it returns.

auto zeroSpace = getZeroRuns(file.fd, info)
    // While we're calculating the size of all zero runs in the file,
    // punch them into holes.
    .tee!(zr => punchHole(file.fd, zr))
    .map!(zr => zr.length)
    .sum;

The really nice thing about all of this is that there are no special cases for files which are already sparse. The savings math works fine, and if fallocate gets called on an existing hole, it’s a no-op. Easy.

Bringing everything together

Once we have all our helper functions and ranges done, the entire program is refreshingly simple. Yay abstractions.

int main(string[] args)
{
    bool verbose;
    bool machine; // Machine output (for scripts instead of humans)
    bool recursive; // Recurse into provided directories

    // Args parsing via std.getopt omitted for brevity

    real total = 0;

    auto descriptorRange = argsToPaths(args, recursive)
        // Open the file descriptor and tack it on
        .map!(path => tuple!("path", "fd")(path, openToReadAndWrite(path)))
        // Don't run the above mapping more than once per path
        .cache()
        // Filter out bad file descriptors and warn about them
        .filter!(f => filterDescriptorsAndWarn(f.path, f.fd));


    foreach (file; descriptorRange) {
        scope(exit) close(file.fd); // Be sure to close the file descriptor.

        auto info = getFileInfo(file.fd);

        auto zeroSpace = getZeroRuns(file.fd, info)
            // While we're calculating the size of all zero runs in the file,
            // punch them into holes.
            .tee!(zr => punchHole(file.fd, zr))
            .map!(zr => zr.length)
            .sum;

        immutable saved = possibleSavings(info, zeroSpace);
        total += saved;

        if (saved > 0 || verbose) {
            writeln(file.path, machine ? " " : " saved ",
                    machine ? saved.to!string : saved.toHuman);
        }
    }
    if (machine)
        writeln(total);
    else
        writeln("Total savings: ", total.toHuman);

    return 0;
}

You can find the whole shebang here on Github.

Some may be ready to hang me for the hypocrisy of having a main loop in a program built to demo ranges, but I offer two defenses:

This isn’t a “raw loop” — it implements our algorithm and does little else. All ancillary work has been taken elsewhere.
We could write more ranges to do some of the work in the loop, but this quickly becomes more trouble than it’s worth. (More on that shortly.)

From this example, we can see ranges solve almost all of the problems we had with raw loops. Their composability makes it easier to reason about code, and what we’re doing is much clearer. There are fewer gotchas — you chain the ranges up, and the results just fall out. There’s less overhead since data is lazily generated, so at any given time we only have the current item of the range in memory. As a bonus, the memory for that item is usually sitting on the stack, which makes it very hot (cache misses are unlikely) and bypasses debate about how to allocate it.

There are a few downsides:

Writing an algorithm as a range involves a bit more code than writing it as a loop. (On the flip side, as a range it’s much more reusable, composable, yadda yadda…)
Ranges can have some quirks when the algorithms they implement have side effects. Readers may have noticed a mysterious cache range applied to our file descriptors before we filter out the bad ones. When I was debugging, I hooked the program up to strace and noticed that it kept a file descriptor open for every single file. As it turns out, the implementation of filter calls front on its source range when its front is called and when its popFront is called. Since the map range before it was the one opening the file handles, this meant that each time we iterated through, we applied the mapping twice and opened two file handles, only one of which we closed.

The best way to avoid oddities like this is probably, “don’t have side effects in your range algorithms”, but this isn’t so easy when implementing a range that interacts with system APIs as we are doing here. cache helps by making sure the front of the previous range is only accessed once per element, keeping our side effects from occuring multiple times.
Stepping through chained ranges in a debugger would be hell since each popFront call cascades all the way through the chain, making everything happen “at once”. (On the flip side, using ranges means you’re less likely to spend lots of time stepping through your code in the first place).
Exceptions handling gets a bit quirky. If an exception gets thrown in the middle of a chain of ranges (say, for example, you don’t have read permission on a directory when we’re recursing through them in argsToPaths), you need to catch it and either skip over whatever caused the issue, or provide some “bad” element in its place. The D standard library provides a range called handle for this purpose. handle takes an exception type to catch, which range operations it should surround with exception handling, and a function to provide a replacement element. Though I’m usually a huge fan of exceptions over return codes, the latter might be a bit easier to work with when using ranges.

We see that the loops are still ultimately there, but by tucking them away into ranges, the rest of our work becomes that much easier. Life is also simplified by convenient language features like lambdas and compile-time introspection. You could certainly still implement ranges with named functions and runtime polymorphism everywhere, but you lose a bit of their elegance.

All the cool kids are doing it (and you should too).

We have just seen how to leverage ranges in D to great effect. C# has had range-like interfaces for years with the IEnumerable extension methods that came with LINQ, and I’m sure functional programming fans have noticed how closely ranges emulate a functional style. And Eric Neibler, a member of the ISO C++ committee and a Boost release manager, is working hard on getting them added to the C++ standard library. You can find a prototype of what he has in mind here.

Ranges, despite being such a simple idea, are incredibly powerful. Go write some.

References

The following are some excellent additional resources and were used heavily by this post:

3 Goals for Better Code by Sean Parent. Given at A9, an Amazon subsidiary.
On Iteration and Iterators Must Go (PDF slide deck) by Andrei Alexandrescu, which discuss issues with C++-style iterators and how we can be better serviced by range-based algorithms.
STL Concepts and Ranges, given by Eric Neibler at a Northwest C++ Users’ Group meeting. Eric is working on getting ranges into a future C++ standard. Slides for the talk can be found here (PDF).
Memory Disallocation, given by Walter Bright at DConf 2015. Walter discusses how ranges (and their laziness) can help reduce the number and size of memory allocations programs make.

Updates

2015-07-28: I discovered the cache range, how it can help with ranges which induce side effects, and added a mention of it to this article.