Each record contains 5 parts: - prefix - record data - record content - checksum - padding The prefix is: - The string "=" - A single byte identifying the record type, which is one of the following: * -- archive prefix, used primarily to identify the file format version (in the path below) b -- block device c -- character device d -- directory f -- file h -- hardlink l -- symlink p -- named pipe s -- named socket - The content length followed by a colon - A path, followed by a NUL byte The record data is a string followed by a NUL byte, the contents of which depend on the record type: - For the prefix record, the record data contains the file format version identifier ("2" for this document) followed by a series of parameters separated by "&". The name and value in a parameter are seperated by "=" and the contents are URL-encoded. - For hardlinks and symlinks, the record data is empty. - For all other records, it contains the following, seperated by colons: - the file mode, in octal - the access time - the change time - the modification time - the owner ID number - the owner ID name - the group ID number - the group ID name The record content also depends on the record type. For all records except file records, the length is exactly the number of bytes indicated in the header. The record content contains: - For hardlinks and symlinks, the path to the file to link to. - For directories, a list of all the files in that directory, each terminated with a NUL byte. - For file records, a series of segments containing a length field followed by that many bytes of data, all followed by a NUL byte. - For the prefix record, a list of the paths used to build the archive, each terminated with a NUL byte, prefixed with either a '+' (included paths), or a '-' (excluded paths). This will eventually be used to determine which paths to extract could be in the archive before scanning the archive, to reduce search overheads. - For all other record types, nothing. File records have an internal structure that identifies its own length as it is being encoded, and the length represents the original length of the file. The checksum is the first N bytes of the raw 16-byte MD5 hash of all the data in the record, starting at the first byte of the prefix. N is determined when the archive is created and defaults to the full 16 bytes. The padding is enough NUL bytes to pad the record out to a multiple of blocksize bytes if the blocksize is non-zero (defaults to zero). All numbers are encoded in ASCII decimal, unless another base is specified, terminated by either a colon (':') or a NUL byte. The following parameters are written in the prefix record to identify parameters detailing how the archive was created for the extractor: - "blocksize": the value of the blocksize parameter - "md5length": the number of bytes of MD5 checksum