Monday, February 16, 2009

A talk a CU

Last week I gave a seminar at the Computer Science Department at the University of Colorado in Boulder. The topic was an architecture of storage management, based on just a few elements. One can summarize them as:
  1. update streams - to track last committed and executed requests
  2. a changelog, sufficient to do and undo operations, including data writes, with the ability to restrict the changelog to a fileset
  3. versioning of objects, management and operations on FIDs
  4. some database elements inside the file system, e.g. the ability to locate objects on certain servers
Most of these have over the years been discovered by the Lustre effort. However, an organic piecemeal implementation will not show that this is a concise and, likely, a usable proposal. To show that would require building something from the ground up that implements the key elements of this system, and demonstrate recoverability, clustering and various storage management features. While I trust this can be done, I wonder how many data management applications cannot be served by my proposal.

Some key elements, such as sessions, are things that are now, years after their implementation in Lustre, finding their way into systems like pNFS. However, one needs to go well beyond that to obtain the aggregate benefits.