Srlog2 Design Documentation

Bruce Guenter

February 10, 2015

1 Introduction

The original srlog package originated as an internal mechanism for collecting all system logs at FutureQuest, Inc. in one place for analysys.

1.1 Requirements

When figuring out how to accomplish this, we identified several key requirements:

We considered using some tools that were already available for this task. In particular, reusing a tool like SSH would have been ideal. However, we were unaware of any such tool that give the delivery guarantees we wanted.

1.2 Initial Implementation

The initial implementation was fairly limited, and there were a number of design mistakes. The packet format allowed no variations in what cryptography mechanisms were used. It was hard coded to use MD5 for authentication, the nistp224 elliptic curve for key exchange, and AES192-CBC for encryption, with no hashing of the shared secret to produce the encryption key, no IV, and no resets between packets. Each service required its own secret key, and needed the server key copied into its directory. Senders were identified exclusively by IP and authenticated by a manually copied public key. The packet format was also overly optimized for the established connection path, and only allowed one line per packet.

1.3 Rewriting to srlog2

I recognized a number of the original design decisions were poor choices or outright mistakes, and set out to fix them. In order to avoid recreating some original mistakes or throwing away existing knowledge, all of the changes were done incrementally, resulting in a system that was at least minimally usable at each step.

However, many of the choices resulted in a system that was completely incompatible with the original srlog externally, even though much of the internal mechanism was still the same. In particular the protocol and the key file handling were completely overhauled. So, the package name (and the name of all the programs) was changed to reflect these differences, and to prevent confusion between the old and new packages.

2 Design

2.1 Network Protocol

2.1.1 Network Transport

All data is exchanged over UDP with a default port number of 11014. The sender and receiver first optionally negotiate encryption parameters, and then establish a virtual connection over which the sender delivers its log messages. Only acknowledgements are sent by the receiver to successful packets; no negative acknowledgements are possible.

2.1.2 Packet Formats

2.1.2.1 Data Formats

All integers are unsigned, and encoded in LSB order.

A “timestamp” is encoded as a 4-byte integer number of seconds since the UNIX epoch, and a 4-byte integer nanosecond offset since the last whole second. Using unsigned integers, this will be adequate until the year 2106.

Strings is encoded as a 1 or 2 byte length integer followed by the unencoded data. No trailing NUL byte is used (externally).

2.1.2.2 PRQ: Preferences Query

Offset	Size	Type	Description
0	4	Constant	Packet type ‘`SRL2`’
4	4	Constant	Message type ‘`PRQ1`’
8	8	String	Nonce
16	1+N	String	Authenticator list (‘`HMAC-MD5`’)
??	1+N	String	Key exchange list (‘`nistp224`’ or ‘`curve25519\000nistp224`’)
??	1+N	String	Key hash list (‘`SHA256`’)
??	1+N	String	Encryptor list (‘`AES128-CBC-ESSIV`’)
??	1+N	String	Compressor list (‘`null`’)

Notes:

Multiple items in each list are separated byte the NUL byte.

2.1.2.3 PRF: Preferences Response

Offset	Size	Type	Description
0	4	Constant	Packet type ‘`SRL2`’
4	4	Constant	Message type ‘`PRF1`’
8	8	String	Copy of nonce
16	1+N	String	Authenticator choice
??	1+N	String	Key exchange choice
??	1+N	String	Key hash choice
??	1+N	String	Encryptor choice
??	1+N	String	Compressor choice

2.1.2.4 INI: Initialization Packet

Offset	Size	Type	Description
0	4	Constant	Packet format ‘`SRL2`’
4	4	Constant	Packet type ‘`INI1`’
8	8	Integer	Initial sequence number
16	8	Timestamp	Initial timestamp
20	1+N	String	Sender name
??	1+N	String	Service name
??	1+N	String	Authenticator name (A)
??	1+N	String	Key exchange name (E)
??	1+N	String	Key hash name (H)
??	1+N	String	Cipher name (C)
??	1+N	String	Compressor name (Z)
??	sizeof(E)	E	Client session public key
??	sizeof(A)	A	Authenticator

2.1.2.5 CID: Initialization Response

Offset	Size	Type	Description
0	4	Constant	Packet type ‘`SRL2`’
4	4	Constant	Message type ‘`CID1`’
8	sizeof(E)	E	Server session public key
??	sizeof(A)	A	Authenticator

2.1.2.6 MSG: Message Packet

Offset	Size	Type	Description
0	4	Constant	Packet type ‘`SRL2`’
4	4	Constant	Message type ‘`MSG1`’
8	8	Unsigned	Initial sequence number
16	1	Unsigned	Message count M
??	8	Timestamp	Timestamp
??	2+N	String	Line
??	??	Char	Padding to fill out encryption block
??	4	CRC-32	Check code on encrypted data
??	sizeof(A)	A	Authenticator

Notes:

The timestamp and line items are repeated M times.
Everything from the first timestamp to the CRC is encrypted.
The sequence number and message count are explicitly not in the encrypted section, since an attacker can trivially determine them from INI/CID packets and the returning ACK packets.
The ACK to a MSG packet uses the sequence number of the last line in the packet.
The length of padding is: P=B-(2+8+N+4)%B where B is the block size of the cipher algorithm. It will be at least one byte, and at most the encryption block size.
A check code (CRC-32) is included inside the encrypted data on MSG1 packets to ensure that the encryption state is properly synchronized on client and server.
A 32-bit CRC takes no longer than a 16-bit CRC to calculate on modern (32-bit) CPUs, perhaps even shorter due to using native word size. The difference in resulting packet size is negligable.

2.1.2.7 ACK: Message Acknowledgement Packet

Offset	Size	Type	Description
0	4	Constant	Packet type ‘`SRL2`’
4	4	Constant	Message type ‘`ACK1`’
8	8	Unsigned	Sequence number
16	sizeof(A)	A	Authenticator

2.1.2.8 SRQ: Status Request

Offset	Size	Type	Description
0	4	Constant	Packet type ‘`SRL2`’
4	4	Constant	Message type ‘`SRQ1`’
8	8	String	Nonce

2.1.2.9 SRP: Status Response

Offset	Size	Type	Description
0	4	Constant	Packet type ‘`SRL2`’
4	4	Constant	Message type ‘`SRP1`’
8	8	String	Copy of nonce
16	2+N	String	Text status report

2.2 Key Exchange

The shared secrets for the INI, CID, MSG, and ACK packets are computed as follows:

Packet	Client Computes	Server Computes
INI	Client secret * Server public	Server secret * Client public
CID	Client session secret * Server public	Server secret * Client session public
ACK or MSG	Client session secret * Server session public	Server session secret * Client session public

2.3 Encryption Parameters

The current system is hard coded to use HMAC-MD5 for authentication and AES-CBC as the cipher with a 128-bit key for encryption and ESSIV, with the first 32 bytes of the SHA256 hash of the nistp224 shared secret used for the key. Additionally, the first 32 bytes of the SHA256 hash of the previous SHA256 hash is used as the ESSIV encryptor. The system may use either nistp224 or curve25519 for key exchange, depending on if curve25519 keys and software support is present on both ends.

2.4 Logging Format

The logging format (the format of the lines written by srlog2d to be read by a log processor) reflects the fact that multiple lines will frequently be output for the same sender/service combination. In this way, it encapsulates the manner in which log data arrives – each packet contains one or more log lines (usually more).

So, instead of having information about the service on each line of output, there is a seperate line type for identifying the service. This actually simplifies the sender, as the actual log lines can be passed by the logger into the output file or pipe without modification.

3 Detailed Changes

This chapter describes in detail the changes made between the original package and srlog2. Some of the explanation for the design decisions above is explained here.

3.1 Multiple lines per packet

The largest real problem encountered with the original system was the high system load caused by the receiver. Having the protocol handle a single line per packet meant that each log line would cause the system to handle two interrupts (incoming and outgoing), and the receiver would have to do a decryption and two full secure hashes. This ended up being a significant issue as we were handling well over 1,000 lines/sec.

Adding a new packet type that would transmit multiple lines was not a big problem, but the bigger issue came with encryption. Since the CBC state was not reset between packets, retransmissions caused a huge implementation headache that could not be satisfactorily resolved.

3.2 IV computed using E(Salt|Sector)

To resolve the CBC issue, the IV was initially forced to zero at the start of each packet. Then while researching disk encryption I came across a scheme called E(Salt,Sector)IV or ESSIV. In this scheme, the key used for the primary encryption is hashed to key another encryptor. To produce the IV for each packet, the (public) sequence number is encrypted (in simple ECB mode) with this (secret) key material to produce a deterministic but still secret IV. This eliminated encryption ordering issues, making one of the issues with having multiple lines per packet disappear.

3.3 Introducing curve25519

After writing the original package, the author of the nistp224 package, Daniel J. Bernstein produced another, stronger, elliptic curve key exchange protocol called curve25519. The nistp224 package was no longer being maintained, and had known bugs causing serious performance regressions with modern compilers, and the author was advocating the use of curve25519 over it.

Initially I was inclined to switch the entire system to curve25519 and drop nistp224 entirely, but the core math of the new system was written entirely in assembler, and the released code only worked on Intel/AMD 32-bit systems. As a result, a mechanism was introduced which would allow either system to be used, with a preference for the longer keys where both were supported.

3.4 New packet format

The original packet format had two shortcomings. First, there was no identification information in the packet other than the leading sequence number, and that was only useful if there was a single line in the packet. To add more packet formats, the sequence numbers from 0xffffffff00000000 and up were reserved. While it is improbable that any sender would ever get close to this number, it is still a poor kludge for multiple packet types. Second, all numbers were represented in MSB order but all the systems using it used LSB ordering, requiring byte swapping on each packet.

So, a new packet format was designed that improved on several attributes. First, the format itself included a version number in both a format identifier and a seperate type identifier, allowing for easily adding more packet types and for future updates to the format. Second, the single line packet was rejected in favor of a explicitly handling multiple lines in each transmission. Finally, all numbers were encoded in LSB order.

3.5 Sender Names

The first design for srlog used IP addresses exclusively to identify senders in the receiver program. This however led to problems when the IP address on a sender changed. In particular, when a sender had multiple IP addresses, the kernel would make an arbitrary choice of which one to use for sending, and that could confuse the receiver. Switching from strictly IPs to names has the additional benefit of allowing support for roaming senders, which has happened when we set up servers in one place and install them in another.

4 Miscellaneous

4.1 External Encryption Libraries

Originally, I had set up the package to use a built in Rijndael (AES) implementation for symmetric encryption. There are, however, several encryption libraries available which may be preferable due to being more portable and/or faster (due to the use of assembler etc).

Here are the features I have identified in an encryption library as being required or desireable for srlog2:

The candidate libraries that I found are:

libmcrypt
The calling convention for libmcrypt does not seperate setting the IV from rekeying the algorithm. There appears to be no easy or standard way of accessing the internal state of the encryptor to set it directly either. It supports all the other requirements and is relatively small and popular among systems that use PHP.
libcrypto from OpenSSL
The documentation on OpenSSL is missing large pieces of required details, and does not include details on AES (which is known to be included in the software). The calling convention for Blowfish and DES (other block ciphers) indicate that it supports setting a seperate IV for each encryption operation. It is also by far the most popular and likely most portable library of all of the choices. However, libcrypto is a huge library, several times larger than any of the other candidates.
libgcrypt
This library supports all the required features, and should be portable everywhere GnuPG is supported (which should be nearly everywhere). It is however a fairly sizeable library.
beecrypt
This is one of the smaller encryption libraries, and supports all the required features, although support for much outside the requirements is not high (the only other supported encryption algorithm is Blowfish). Portability appears to be quite good (many processors and OSs are listed). It is required by recent versions of RPM, so it will be present on all recent RedHat, Fedora, and Mandrake Linux systems. I had serious problems getting beecrypt built on Gentoo, however, as it conflicts with the stable versions of rpm. It also doesn’t appear to be very popular.
MatrixSSL
This is a very small library, with the standard library coming in at around 50kB on most systems. It supports AES, but the API documentation doesn’t appear to provide any way to directly access it.
cryptlib
The cryptlib library is a large library, probably outsizing OpenSSL itself, with many language bindings. This was my first encounter with cryptlib, and I am aware of no commonly used packages that actually depend on it, and as such its popularity is very low.
libtomcrypt
While the original web page is currently presenting something completely unrelated (“The Musicians of the New Mexico Symphony Orchestra”), there are many copies of this excellent library mirrored on the web, including the mirror link above. The included documentation is good and the API provides all the requirements (and then some). The library itself is specifically targetted at embeded systems, and so is very compact, and is popular in circles that target smaller systems.

I have switched to using libtomcrypt based on its good API and documentation, compact size, and public domain status. The encryption support in srlog2 is already encapsulated into a single source file, so switching to another library should not be a large effort. Ideally the build process could switch between several libraries depending on which was present at build time, but that’s more work than it’s worth for now.

1 Introduction
2 Design
3 Detailed Changes
4 Miscellaneous
- 4.1 External Encryption Libraries

This document was generated on February 10, 2015 using texi2html 5.0.