Backup Format Description for Cumulus: Efficient Filesystem Backup to the Cloud Version: "LBS Snapshot v0.8" NOTE: This format specification is intended to be mostly stable, but is still subject to change before the 1.0 release. The code may provide additional useful documentation on the format. NOTE2: The name of this project has changed from LBS to Cumulus. However, to avoid introducing gratuitous changes into the format, in most cases any references to "LBS" in the format description have been left as-is. The name may be changed in the future if the format is updated. This document simply describes the snapshot format. It is described from the point of view of a decompressor which wishes to restore the files from a snapshot. It does not specify the exact behavior required of the backup program writing the snapshot. For details of the current backup program, see implementation.txt. This document does not explain the rationale behind the format; for that, see design.txt. DATA CHECKSUMS ============== In several places in the Cumulus format, a cryptographic checksum may be used to allow data integrity to be verified. At the moment, only the SHA-1 checksum is supported, but it is expected that other algorithms will be supported in the future. When a checksum is called for, the checksum is always stored in a text format. The general format used is = identifies the checksum algorithm used, and allows new algorithms to be added later. At the moment, the only permissible value is "sha1", indicating a SHA-1 checksum. is a sequence of hexadecimal digits which encode the checksum value. For sha1, should be precisely 40 digits long. A sample checksum string is sha1=67049e7931ad7db37b5c794d6ad146c82e5f3187 SEGMENTS & OBJECTS: STORAGE AND NAMING ====================================== A Cumulus snapshot consists, at its base, of a collection of /objects/: binary blobs of data, much like a file. Higher layers interpret the contents of objects in various ways, but the lowest layer is simply concerned with storing and naming these objects. An object is a sequence of bytes (octets) of arbitrary length. An object may contain as few as zero bytes (though such objects are not very useful). Object sizes are potentially unbounded, but it is recommended that the maximum size of objects produced be on the order of megabytes. Files of essentially unlimited size can be stored in a Cumulus snapshot using objects of modest size, so this should not cause any real restrictions. For storage purposes, objects are grouped together into /segments/. Segments use the TAR format; each object within a segment is stored as a separate file. Segments are named using UUIDs (Universally Unique Identifiers), which are 128-bit numbers. The textual form of a UUID is a sequence of lowercase hexadecimal digits with hyphens inserted at fixed points; an example UUID is a704eeae-97f2-4f30-91a4-d4473956366b This segment could be stored in the filesystem as a file a704eeae-97f2-4f30-91a4-d4473956366b.tar The UUID used to name a segment is assigned when the segment is created. Filters can be layered on top of the segment storage to provide compression, encryption, or other features. For example, the example segment above might be stored as a704eeae-97f2-4f30-91a4-d4473956366b.tar.bz2 or a704eeae-97f2-4f30-91a4-d4473956366b.tar.gpg if the file data had been filtered through bzip2 or gpg, respectively, before storage. Filtering of segment data is outside the scope of this format specification, however; it is assumed that if filtering is used, when decompressing the unfiltered data can be recovered (yielding data in the TAR format). Objects within a segment are numbered sequentially. This sequence number is then formatted as an 8-digit (zero-padded) hexadecimal (lowercase) value. The fully qualified name of an object consists of the segment name, followed by a slash ("/"), followed by the object sequence number. So, for example a704eeae-97f2-4f30-91a4-d4473956366b/000001ad names an object. Within the segment TAR file, the filename used for each object is its fully-qualified name. Thus, when extracted using the standard tar utility, a segment will produce a directory with the same name as the segment itself, and that directory will contain a set of sequentially-numbered files each storing the contents of a single object. NOTE: When naming an object, the segment portion consists of the UUID only. Any extensions appended to the segment when storing it as a file in the filesystem (for example, .tar.bz2) are _not_ part of the name of the object. There are two additional components which may appear in an object name; both are optional. First, a checksum may be added to the object name to express an integrity constraint: the referred-to data must match the checksum given. A checksum is enclosed in parentheses and appended to the object name: a704eeae-97f2-4f30-91a4-d4473956366b/000001ad(sha1=67049e7931ad7db37b5c794d6ad146c82e5f3187) Secondly, an object may be /sliced/: a subset of the bytes actually stored in the object may be selected to be returned. The slice syntax is [+] where is the first byte to return (as a decimal offset) and specifies the number of bytes to return (again in decimal). It is invalid to select using the slice syntax a range of bytes that does not fall within the original object. The slice specification should be appended to an object name, for example: a704eeae-97f2-4f30-91a4-d4473956366b/000001ad[264+1000] selects only bytes 264..1263 from the original object. As an abbreviation, the slice syntax [] is shorthand for [0+] In place of a traditional slice, the annotation [=] may be used. This is somewhat similar to specifying [], but additionally asserts that the referenced object is exactly bytes long--that is, this slice syntax does not change the bytes returned at all, but can be used to provide information about the underlying object store. Both a checksum and a slice can be used. In this case, the checksum is given first, followed by the slice. The checksum is computed over the original object contents, before slicing. Special Objects --------------- In addition to the standard syntax for objects described above, the special name "zero" may be used instead of segment/sequence number. This represents an object consisting entirely of zeroes. The zero object must have a slice specification appended to indicate the size of the object. For example zero[1024] represents a block consisting of 1024 null bytes. A checksum should not be given. The slice syntax should use the abbreviated length-only form. FILE METADATA LISTING ===================== A snapshot stores two distinct types of data into the object store described above: data and metadata. Data for a file may be stored as a single object, or the data may be broken apart into blocks which are stored as separate objects. The file /metadata/ log (which may be spread across multiple objects) specifies the names of the files in a snapshot, metadata about them such as ownership and timestamps, and gives the list of objects that contain the data for the file. The metadata log consists of a set of stanzas, each of which are formatted somewhat like RFC 822 (email) headers. An example is: name: etc/fstab checksum: sha1=11bd6ec140e4ec3110a91e1dd0f02b63b701421f data: 2f46bce9-4554-4a60-a4a2-543637bd3989/000001f7 group: 0 (root) mode: 0644 mtime: 1177977313 size: 867 type: - user: 0 (root) The meanings of all the fields are described later. A blank line separates stanzas with information about different files. In addition to regular stanzas, the metadata listing may contain a line containing an object reference prefixed with "@". Such a line indicates that the contents of the referenced object should be fetched and parsed as a metadata listing at this point, prior to continuing to parse the current object. Several common encodings are used for various fields. The encoding used for each field is specified in the field listing that follows. encoded string: An arbitrary string (octet sequence), with bytes optionally escaped by replacing a byte with %xx, where "xx" is a hexadecimal representation of the byte replaced. For example, space can be replaced with "%20". This is the same escaping mechanism as used in URLs. integer: An integer, which may be written in decimal, octal, or hexadecimal. Strings starting with 0 are interpreted as octal, and those starting with 0x are intepreted as hexadecimal. Common fields (required in all stanzas): path [encoded string]: Full path of the file archived. Note: In previous versions (<= 0.2) the name of this field was "name". user [special]: The user ID of the file, as an integer, optionally followed by a space and the corresponding username, as an escaped string enclosed in parentheses. group [special]: The group ID which owns the file. Encoding is the same as for the user field: an integer, with an optional name in parentheses following. mode [integer]: Unix mode bits for the file. type [special]: A single character which indicates the type of file. The type indicators are meant to be consistent with the characters used with the -type option to find(1), and the file type checks in test(1): f regular file b block device c character device d directory l symlink p pipe s socket Note that previous versions used '-' to indicate a regular file. This character should not be generated in any new snapshots, but may be encountered in old snapshots (those with a format version <= 0.2). mtime [integer]: Modification time of the file. Optional common fields: links [integer]: Number of hard links to this file, generally only reported if greater than 1. inode [string]: String specifying the inode number of this file when it was dumped. If "links" is greater than 1, then searching for other files that have an identical "inode" value can be used to determine which files should be hard-linked together when restoring. The inode field should be treated as an opaque string and compared for equality as such; an implementation may choose whatever representation is convenient. The format produced by the standard tool is // (where and specify the device of the containing filesystem and is the inode number of the file). ctime [integer]: Change time for the inode. Special fields used for regular files: checksum [string]: Checksum of the file contents. size [integer]: Size of the file, in bytes. data [reference list]: Whitespace-separated list of object references. The referenced data, when concatenated in the listed order, will reconstruct the file data. Any reference that begins with a "@" character is an indirect reference--the given object includes a whitespace-separated list of object references which should be parsed in the same manner as the data field. Special fields used for symbolic links: target[encoded string]: The target of the symlink, as returned by readlink(2). Note: In old version of the format (<= 0.2), this field was called "contents" instead of "target". Special fields used for block and character device files: device[special]: The major and minor number of the device. Encoded as "major/minor", where major is the major device number encoded into an integer, and minor is the minor device number. SNAPSHOT DESCRIPTOR =================== The snapshot descriptor is a small file which describes a single snapshot. It is one of the few files which is not stored as an object in the segment store. It is stored as a separate file, in plain text, but in the same directory as segments are stored. The name of snapshot descriptor file is snapshot--.lbs is a descriptive text which can be used to distinguish several logically distinct sets of snapshots (such as snapshots for two different directory trees) that are being stored in the same location. gives the date and time the snapshot was taken; the format is %Y%m%dT%H%M%S (20070806T092239 means 2007-08-06 09:22:39). The contents of the descriptor are a set of RFC 822-style headers (much like the metadata listing). The fields which are defined are: Format: The string "LBS Snapshot v0.6" which identifies this file as a Cumulus backup descriptor. The version number (v0.6) might change if there are changes to the format. It is expected that at some point, once the format is stabilized, the version identifier will be changed to v1.0. Producer: A informative string which identifies the program that produced the backup. Date: The date the snapshot was produced. This matches the timestamp encoded in the filename, but is written out in full. A timezone is given. For example: "2007-08-06 09:22:39 -0700". Scheme: The field from the descriptor filename. Segments: A whitespace-seprated list of segment names. Any segment which is referenced by this snapshot must be included in the list, since this list can be used in garbage-collecting old segments, determining which segments need to be downloaded to completely reconstruct a snapshot, etc. Root: A single object reference which points to the metadata listing for the snapshot. Checksums: A checksum file may be produced (with the same name as the snapshot descriptor file, but with extension .sha1sums instead of .lbs) containing SHA-1 checksums of all segments. This field contains a checksum of that file. Intent: Informational; records the value of the --intent flag when the snapshot was created, and can be used when determining which snapshots to later delete.