Non-hierarchical filing systems

Traditionally, hierarchical systems have been used to store information - a particular fact might be in a particular paragraph on a particular piece of paper, which is stored in a particular folder in a particular drawer of a particular filing cabinet in a particular office on a particular floor of a particular building in a particular city of a particular country of the world.

This is also the manner in which computers tend to store information. On a Windows system there is the "My computer" icon, under which there are a number of drives, each of which can contain files and subfolders. On a Unix system there is "/" (root) under which there are subdirectories (on my Debian GNU/Linux installation these include "/bin", "/dev", "/home", "/lib", "/lost+found", "/tmp", "/usr", "/var"). Again these contain files and subdirectories which can also contain files and subdirectories. A particular piece of information (file) is located by specifying an ordered list of folders - for example, this document might be found at "/home/andrew/work/website/andrew/computer/hierarchy.html" on a Unix system, or "C:\Work\Website\Andrew\computer\hierarchy.html" on a Windows system.

This is how things have always been done, and it is generally taken for that this is how it will always be done. That doesn't mean it's the best way, however. Modern systems are breaking down the boundaries of their hierarchies. For some time, Unix systems have had a facility called "symlinking", whereby an entry can be created in a directory which is neither a file not a subdirectory by a placeholder leading to another file somewhere else on the hierarchy. This can be likened to an entry in a dictionary for a synonym, or a piece of paper in a filing cabinet saying "this document is in folder x in cabinet y". Thus you can effectively have two copies of the document, but not use precious space having two copies (plus, if the document is edited, you don't get out of date versions cropping up).

Windows, too, is catching up. The "Single Instance Store" is more of a space saving device than an organizational tool, though - if you have two identical copies of a file on your hard disk, it puts them both on the same physical place on the hard disk. If you change one of the two files, then it creates a copy and modifies that rather than modifying both files.

The trouble is that in any given hierarchy, there is more than one place you might want to put a given file. Take the typical MP3 collection, for example. Do you classify by artist, or genre, or name of song, or stuff them all in together? At the moment this is left up to personal preference, but that's not necessarily the best way of doing things. I organize my MP3s by artist, except for the songs by artists whom I only have one song by, which I put all in together in my MP3 "root". This has it's advantages (I can see at a glance all of the artists and a good fraction of the song titles, the root folder isn't too unweily and I don't always have to go into a subfolder to see what songs I have by a given artist) and it's disadvantages (I have to do a search to find out the artist if I only know the song name, there's no concept of genre, because of the fact that Windows lists all the folders before any of the files, I occasionally find I have duplicates and if I get another song by an artist I only have one song by, the previous song get moved, which means it doesn't get played until I rebuild the playlist).

What is needed is a new filing system, one in which the searching, sorting and linking operations needed to overcome the disadvantages in any hierarchical system are made fundamental operations of the system - one in which the needs of the users rather than technical considerations are put first.

What I envision is a system whereby a file is located by one or more keywords in much the same way as you use a search engine to locate a document on the world wide web. So you could enter "music Frank_Sinatra" to get a list of music files by Frank Sinatra or "music Mack_The_Knife" to get a list of recordings of that song (possibly by various different artists). The key point is that the system doesn't distinguish between "music/Frank_Sinatra/Mack_The_Knife" and "music/Mack_The_Knife/Frank_Sinatra". Or even between either of those and "Frank_Sinatra/Mack_The_Knife/music" - it's completely commutative.

This system is much more reliable than a search engine, though, because a search engine indexes all words in a document, whereas this system just indexes "filenames" which are a lot less haphazard. A "filename" on this system might look something like this:
Media: Recorded music
Format: MP3
Artist: Frank Sinatra
Title: Mack The Knife
Size: 3,703,497 bytes
Length: 4:24
Bitrate: 112Kbps
Last modified: 15th January 2000, 02:10
Comments: ...

And so on, with any other infomation relevant to that file. This might seem a bit unwieldy for a filename, but remember that all this information is stored by the computer anyway (usually twice, in the case of an MP3 file, because of the ID3 tag) and with a bit of careful programming it should be possible to consider this the "filename" with minimal extra overhead.

Obviously not all fields are relevant to all files (for example, "Bitrate" would have no meaning for a text file, but "Written by" would). It is important that exactly which fields are present is flexible, and that new fields can be added that weren't necessarily even thought of when the filing system was designed.

So now operations which would previously have been classed as searches are classed in the way that simply looking in the contents of a particular directory are now. You could even make it work the same by having ordering the fields in some way. For example, "media" might be a fundamental class, equivalent to a directory in the root directory. Upon opening "media" the computer would show the different types of media files you have on your computer: "recorded music, musical score, photographs, drawings, movies, animations, web pages". Upon opening "recorded music" you might then be faced with a list of artists, and so on, just like it works today. But you aren't limited to that order. Upon opening "media" you could also choose to go directly to "artist" and see various different types of file associated with a particular artist.

It is probably best left up to the user to configure which items appear by default in the "root directory" - for example someone who did lots of things with music on their computer might choose to put a link to "music" there whilst someone who just had a few music files might just leave it in "media". The system provides an infrastructure whereby people can organize their files and make them easy to search and browse through, but is also very flexible.

Where the system really comes into its own is if it is linked to other computers. You could search for media by "Frank Sinatra" not just on your own hard disk, but in *the entire world*. You could see what someone's been doing lately by searching for files they have written (and made public) and sorting them in order of date. The possibilities are almost limitless.

Of course, there's still the problem of designing user interfaces to access this information easily. It's taken decades to perfect an interface to access files in the current, hierarchical system (in my opinion, the "Explorer" in Microsoft Windows 95 is the best and friendliest file management interface yet designed, although it still has its flaws) but hopefully what has been learnt from that will make it possible to implement an interface for the new system much more quickly.

If you're interested in this subject, there is now a mailing list devoted to discussing it. Join by clicking
here.

Leave a Reply