Lessons Learned #4: Flat Namespaces

13 Dec 2015

Some time back, a paper authored by Rob Pike titled The Hideous Name convinced me of the value of hierarchical naming conventions, and especially, of kernel-side support for such. When porting STS to run on the Kestrel-3 emulator, I used the opportunity to retrofit hierarchical naming support into the kernel. As the sole user of STS at the moment, breaking backward compatibility with existing software was not a consideration, and it seemed like as good a time as any to explore the concept.

This might have been a mistake, at least from an implementation complexity point of view; while I’ve regained feature parity with the latest Kestrel-2 version of STS, getting the web of interacting data structures and their respective memory management inside the kernel has been much harder to debug.

Looking back on my experience implementing STS, both the flat namespaced versions 1.0 through 1.2 for the Kestrel-2, and of the hierarchically namespaced version 1.5 for the Kestrel-3, it remains unclear to me that hierarchical namespaces offer any significant value over a flat name-space.

Before defending my position, I should probably explain what I mean by hierarchical filenaming and flat filenaming.

Flat Naming: IBM Mainframes

The first example comes from IBM, with their System/360 through z/Architecture mainframe family. With the exception of Unix or Linux ports, most operating systems for IBM’s mainframe family identify files using up to 44 characters. (Note: IBM literature refers to them as datasets, not as files; for our purposes, they’re the same thing.) These names are typically formatted as a dot-separated sequence of “qualifiers,” each of which can vary between one to eight characters. Here are a few examples:

DEPTNAME.USERNAME.FILENAME
SYS1.PARMLIB

As you can see, the mainframe environment does have a hierarchy of sorts, albeit one imposed mostly by convention rather than enforced by the kernel. Many mainframe applications are built to accept something like DEPTNAME.USERNAME as a common prefix for filenames it uses when processing related sets of data.

z/OS and its predecessors identify volumes (usually, disks) by name, such as VOL123. However, z/OS treats volume naming separately from file naming. This explains why “uncataloged” files must have their name and volume name specified separately in job steps or user interfaces. For example:

DSNAME(DEPT.USER.FILENAME) VOLUME(VOL123)

This is a gross over-simplification, of course. Things become significantly clearer and easier, though, if you catalog your file. This associates all the relevant metadata about how to find the file in a centralized directory service. That way, you need only provide the data set name, and magically, your application knows which volume, which cylinders on that volume, which record format to use, and all sorts of other useful information needed to perform file I/O.

In general, I’m not going to consider cataloging. I’m strictly keeping things simple, by maintaining a two-dimensional space of names: a file with the given name, located on a volume with a given name.

Because volumes were identified by label, and not by a specific device number, the system software can locate the physical device using a lookup of the volume name. Thus, most file references were resolved using two linear lookups: the first to resolve the volume, and the second to resolve the name on that volume.

Flat Naming: CBM DOS Naming Conventions

The second example comes from my experience with Commodore Business Machines’ 8-bit computers. Starting with the Commodore PET, and lasting all the way through to the Commodore 128, CBM DOS always implemented a flat naming environment. Each filename could be no more than 16 characters long. Unlike IBM mainframes, no particular convention for hierarchy existed, either in the system software or in applications. Nonetheless, prefix wildcards like * supported by external disk units, could be used to fake it. For example, from Commodore 64 BASIC, you can get a directory of all files starting with the common prefix FOO.BAR. like this:

LOAD "$FOO.BAR.*",8

then listing the final result. Likewise, the equivalent to removing an entire directory looked like this:

OPEN 15, 8, 15, "S0:FOO.BAR.*":CLOSE 15

The first OPEN command instructed the disk controller to walk its directory, and for all files matching the given filename pattern, scratch (delete) the file. The CLOSE command would block until the scratches have completed.

As with mainframe environments, applications needed to explicitly identify volumes on which files were located. However, instead of a volume name lookup, it did so by identifying the ordinal device number of the storage device. It’s device independent in the sense that applications generally didn’t care what kind of disk unit appeared at device 8, only that a disk unit appeared there. Specifics of the hardware only mattered when you had to deal with, among other things, serial I/O accelerators or disk copy protection mechanisms. Thus, CBM DOS and its interactions on the host computer could be said to be device independent as well; but, it clearly was not location independent.

Hierarchical Naming: Atari TOS, MS-DOS, Windows

Atari TOS, MS-DOS, Windows, et. al. provide hierarchical naming, but it’s not completely uniform. These operating systems provide two kinds of hierarchy: the device or volume layer, and the filename itself.

To illustrate these two levels of hierarchy, I’ve listed a few such filenames below:

C:\WINDOWS\WIN.EXE
U:\USR\LOCAL\LIB\LIBCURSES.SO

The former example might reference a file on the computer’s startup harddrive, most typically assigned the device C: since the two preceding devices are often reserved for floppy disks. The latter example might reference a network mount, and thus might reference files on a dedicated network server. DOS, Windows, etc. know how to handle the different access methods because of the information in front of the colon. The handler associated with “device C” knows how to access the local harddrive, while the handler associated with “device U” knows how to use the network. In both cases, everything after the colon is treated opaquely by whatever filesystem is associated with the associated (real or virtual) device.

Hierarchical Naming: Linux

Now, I’ll provide an example of a pure hierarchical filenaming convention. Unix is perhaps the prototypical example of such a thing. Indeed, MS-DOS, Windows, etc. gets their directory syntax from Unix. (However, a backslash \ is used instead of a forward slash /, because in MS-DOS, / was already commonly used for other command-related syntax.) Typical filenames appear like these:

/etc/aliases
/usr/local/lib/libcurses.so
/home/kc5tja/Desktop/lessons-learned.txt

Note that Unix-style hierarchical naming is basically the same as in Windows, except no indication of a device is readily visible. Unix determines which filesystem to access based on a prefix of the pathname. For example, / might correspond to C: in Windows, while /usr/local might correspond to a different disk drive (maybe D: in Windows), and so forth. Unix provides something called a mount table to keep track of these associations.

This means that you can replicate everything MVS and Windows does with clever naming conventions in the namespace. For example:

C:\WINDOWS\WIN.EXE                           /mnt/hd0/c/windows/win.exe
U:\USR\LOCAL\LIB\LIBCURSES.SO                /usr/local/lib/libcurses.so
DSNAME(DEPT.USER.FILENAME) VOLUME(VOL123)    /mnt/mainframe/vol123/dept.user.filename    

Based on this knowledge, one must wonder why one wouldn’t want to use hierarchical filenaming!

Back to STS

While I do not deny the user benefits of hierarchical naming, I can state with some authority that getting things right inside the OS is, well, hard.

Hierarchical Namespaces Requires More Complex Schema

It’s easy enough to get name resolution working correctly; however, doing this requires a ton more data structures than if you just used flat namespaces or an abstraction thereof. For example, to implement the file “/rom/m2” in STS V1.5’s ROM filesystem, I need a DirObj structure to represent /, another DirObj to represent rom, a DirEnt to name it rom, a RomFile to represent m2, and another DirEnt to name the m2 file. Had I used a flat namespace, this could have been implemented in only two data structures: the first being a linked list header, and the second being a node on that list describing the file and giving it its name. Done.

Some might say that with hierarchical namespaces you can rely on links (hard and soft) to provide multiple names to files and directories. GoboLinux uses these to excellent effect to make a trivial to implement, and virtually uncorruptable, package manager for the Linux environment. For a long time, I conceived of a package manager for STS and its successors strongly influenced by Gobo’s technology.

First, I’d like to point out that symbolic links are not a feature unique to hierarchical filesystems. Proof of this can be found in any Unix-like OS where you create a plurality of symbolic or hard links in a single directory.

Second, the value of links is questionable. Yes, I use them frequently, and when applied judiciously, they provide excellent value. Nonetheless, in the face of these links, you lose the benefits of hierarchy quickly. Getting something as simple as:

cd ..   # change to parent directory
pwd     # print current working directory name

correct is actually a hard enough problem that a genius-level systems software engineer, Rob Pike himself in fact, can only “solve” it for Plan 9 using a memory-inefficient, brute-force solution (to summarize, storing a complete copy of the filename for every tree node encountered as directory traversal happens. If you get to a specific point in the filesystem through N paths, that’s just too bad: that one point now has N chains of nodes in kernel space, each named for its unique path). This is actually an attack vector against any kernel using Pike’s method: using millions of processes, create a unique mount point or link, with a long name, to a common, well-known directory, then change into that directory via the mount or link. Do this enough times, and you’ll exhaust the kernel’s memory. (This is also why kernels should be as stateless as possible, but that’s a different lesson for a different post.)

Rob Pike is clearly aware of problems with links, as documented by his overall dislike for links in general, and symbolic links in particular. He’s written several papers on the topic, and I suggest you read them. They’re quite fascinating; and, even if you disagree with Pike, you’ll at least find them thought provoking.

Meanwhile, flat namespaces have evolved for decades (albeit in its own niches) without the need for links. The closest thing to symbolic links in z/OS is the VSAM catalog. In other environments, like AmigaOS, logical device names and environment variables serve similar roles to symbolic links, without incurring their expense, since cd .. cannot cross a device or logical name boundary.

Package Management Without Hierarchy

Above, I alluded to the GoboLinux package management method, and how I’d like to borrow heavily from it to implement STS’ own package management system.

So, how would I replicate GoboLinux’s package management semantics in a flat namespace without supporting symbolic links? One approach is to borrow an idea from z/OS, and use something called a “partitioned data set”, or on Unix, an “archive file.” A program is considered installed and available for a user to use if the program is present in a well-known archive file. Similar archive files could exist to hold different kinds of program resources; for example, we could store binaries in a file simply called ARCHIVE.PRG, libraries for binaries in ARCHIVE.LIBS, system-wide configurations in ARCHIVE.CONFIG, and so on. However, I think this will get messy over time.

Instead, I think a better solution is to stay closer to Gobo’s example. Given a future version of the STS command shell, and given that I type a command named cmd, that shell can simply look for a file PRG/cmd to run. We know it’s executable because, if it appears at all, it appears with the PRG/ prefix and the file’s header checks out. We needn’t concern ourselves with the current version of the command, since by definition, it is the version the user specifically requested. Typically, it will be the latest version installed, but simple tools would be provided to switch versions as well.

This program can depend on various libraries as well; for example, to provide a text-based user interface, PRG/cmd might elect to load LIB/textiface. Configuration for the program might be stored in a variety of files, such as CFG/cmd for system-wide configuration, username/CFG/cmd for a per-user configuration on a multi-user system, and so on. Plug-in modules for PRG/cmd can be stored on the filesystem as LIB/cmd/PLUG/whatever, PLUGIN/cmd/whatever, or whatever’s appropriate for the command’s design.

What if the user wants to invoke an older version of PRG/cmd? Since all the shell does is prepend the typed filename with PRG/, it follows that you can just name the version explicitly, perhaps MyPackage/1.4.2/cmd. This results in an attempt to run the executable PRG/MyPackage/1.4.2/cmd.

NOTE: In STS versions 1.0 through 1.2, I used the full-stop as a qualifier separator (A.B.C). In STS V1.5, I switched to using a slash (A/B/C), since I thought hierarchical namespaces would make mounting POSIX filesystems easier later on. I actually prefer using full-stops, and in all honesty, I’ll probably go back to using them for V2.0. This has the benefit that someone can make a filesystem driver for Linux or BSD, and not have to worry about faking a hierarchy to the OS where there is none. Besides, if I had used slashes, and my directory contains two files A/B and A/B/C, how do you map that into a Posix-compatible directory listing? B would have to appear as a file, or as a directory; it cannot appear as both concurrently.

Large Directory Support

A common criticism concerning flat namespaces is that they do not support large directories effectively. This was especially true for earlier filesystem implementations, but it was a problem just as poignant with hierarchical filesystems (such as with Unix System V Release 4), particularly with degenerate directory layouts.

Imagine a disk formatted in STS V1.0’s filesystem, SL5. SL5 maintained a single, flat directory, allocated as a single, contiguous chunk of disk space. SL5 can pack information for 8 files in a single disk sector. To support a directory of 256 files, we need at least 32 512-byte sectors. For a typical PC hard drive, this is typically about three tracks of data. To locate any file on the volume, you can reasonably expect to take around 10 to 20 milliseconds using a sequential search.

To support 256 million files, however, is a different matter. That would require 32 million sectors of alloted space, and probably would require close to 20,000 seconds to find a file using the same linear search algorithm.

Obviously, the continued reliance upon sequential searches is the problem. To fix this, one would need a tree of some kind. You could use a hierarchical naming system, where the structure of the tree is exposed to the user and directly under his or her control. Or, you can create an index, managed by the filesystem for you, to help locate arbitrary files quickly.

The problem with the former approach is that it leads inexorably to suboptimal directory layouts. For instance, when I wrote GCOM back in the day, I decided to use the filesystem to store the registry mappings from COM CLSIDs (effectively, strings of arbitrary hexadecimal values) to shared object files. I knew ahead of time that the directory in which this would happen could grow to be rather huge if GCOM ever took off. I was prepared to introduce a convention where you relied on subdirectories to improve directory lookup performance. For instance, if a CLSID had the value of AAABBBCCC…, then I could place the actual mapping in a file AAA/BBBCCC.... You also see this approach used in e-mail servers, Usenet clients, and other facilities working with tens of thousands of files, or more. The most obvious problem with this approach can be seen when you consider what happens when a disproportionate number of COM CLSIDs all share a common prefix. Do you then switch to two levels of this hierarchy over all, or just for that one subdirectory? Then, there’s all the text processing that must happen inside the library to make this magic all come together and work. In this case, hierarchy gets in the way.

Meanwhile, IBM’s linear filenaming approach for OS/360 (introduced 1964) has remained intact all the way through to today’s latest z/OS release, supporting everything from tape drives, to drum drives, to disk drives, to CD-ROMs, to network storage devices, and who knows what else will come in the future, handling everything from kilobytes to petabytes of data across tens to thousands of different volumes in high load, low response-time situations. It performs this well because, among other tricks of the trade, it relies on a system-managed index. The “hierarchy”, if you will, is computed dynamically as the filesystem and its flat namespace evolves over time.

Lesson Learned

Flat name-spaces using explicitly named volumes solve 90% of the problems I’d ever use a filesystem to solve, namely locating persistent data. It’s substantially simpler to implement than a Unix-semantics hierarchical namespace, with or without link support.

One legitimate criticism to a flat namespace is when you must rename a qualifier. With Unix, you just mv oldDir newDir, and suddenly, file oldDir/foo now is known as newDir/foo. To do this with a flat namespace, you’d have to iterate through the directory and rename every file with a matching prefix. Finding the first entry should be fast enough (as fast as opening the file); it’s the need to rename subsequent files that becomes the issue. With a good index data structure, such as a Skip List, you can compute both starting and ending locations to minimize the amount of work needed. There are more efficient ways of solving this efficiently as well, but again, is the cost actually justifiable?

Except for this one issue, which I consider minor based both on STS’ own application domain and on my experience managing thousands of Linux servers in data centers, I’m hard-pressed to think of any real problem with flat namespaces today. There’s nothing inherently wrong with Unix-like semantics; I just don’t think it’s worth the expense of putting it into the kernel.

One aside though: unlike z/OS, I would use longer filenames. I’d choose, say, minimum of 160 byte filenames, allowing me to use 9 qualifiers of 16 characters each, with some slop left over. I also wouldn’t impose a hard limit on qualifier length. Thus, a.someReallyLongQualifierHere.b would be a valid filename, despite the middle qualifier being longer than 16 characters. But, that’s just me.

author

Samuel A. Falvo II
Twitter: @SamuelAFalvoII
Google+: +Samuel A. Falvo II

About the Author

Software engineer by day. Amateur computer engineer by night. Founded the Kestrel Computer Project as a proof-of-concept back in 2007, with the Kestrel-1 computer built around the 65816 CPU. Since then, he's evolved the design to use a simple stack-architecture CPU with the Kestrel-2, and is now in the process of refining the design once more with a 64-bit RISC-V compatible engine in the Kestrel-3.

Samuel is or was:

  • a Forth, Oberon, J, and Go enthusiast.
  • an amateur radio operator (KC5TJA/6).
  • an amateur photographer.
  • an intermittent amateur astronomer, astrophotographer.
  • a student of two martial arts (don't worry; he's still rather poor at them, so you're still safe around him. Or not, depending on your point of view).
  • a former semiconductor verification technician for the HIPP-II and HIPP-III line of Hifn, Inc. line-speed compression and encryption VLSI chips.
  • the co-founder of Armored Internet, a small yet well-respected Internet Service Provider in Carlsbad, CA that, sadly, had to close its doors after three years.
  • the author of GCOM, an open-source, Microsoft COM-compatible component runtime environment. I also made a proprietary fork named Andromeda for Amiga, Inc.'s AmigaDE software stack. It eventually influenced AmigaOS 4.0's bizarre "interface" concept for exec libraries. (Please accept my apologies for this architectural blemish; I warned them not to use it in AmigaOS, but they didn't listen.)
  • the former maintainer and contributor to Gophercloud.
  • a contributor to Mimic.

Samuel seeks inspirations in many things, but is particularly moved by those things which moved or enabled him as a child. These include all things Commodore, Amiga, Atari, and all those old Radio-Electronics magazines he used to read as a kid.

Today, he lives in the San Francisco Bay Area with his beautiful wife, Steph, and four cats; 13, 6.5, Tabitha, and Panther.