Srlog2 Design Documentation


1. Introduction

The original srlog package originated as an internal mechanism for collecting all system logs at FutureQuest, Inc. in one place for analysys.


1.1 Requirements

When figuring out how to accomplish this, we identified several key requirements:

We considered using some tools that were already available for this task. In particular, reusing a tool like SSH would have been ideal. However, we were unaware of any such tool that give the delivery guarantees we wanted.


1.2 Initial Implementation

The initial implementation was fairly limited, and there were a number of design mistakes. The packet format allowed no variations in what cryptography mechanisms were used. It was hard coded to use MD5 for authentication, the nistp224 elliptic curve for key exchange, and AES192-CBC for encryption, with no hashing of the shared secret to produce the encryption key, no IV, and no resets between packets. Each service required its own secret key, and needed the server key copied into its directory. Senders were identified exclusively by IP and authenticated by a manually copied public key. The packet format was also overly optimized for the established connection path, and only allowed one line per packet.


1.3 Rewriting to srlog2

I recognized a number of the original design decisions were poor choices or outright mistakes, and set out to fix them. In order to avoid recreating some original mistakes or throwing away existing knowledge, all of the changes were done incrementally, resulting in a system that was at least minimally usable at each step.

However, many of the choices resulted in a system that was completely incompatible with the original srlog externally, even though much of the internal mechanism was still the same. In particular the protocol and the key file handling were completely overhauled. So, the package name (and the name of all the programs) was changed to reflect these differences, and to prevent confusion between the old and new packages.


2. Design


2.1 Network Protocol


2.1.1 Network Transport

All data is exchanged over UDP with a default port number of 11014. The sender and receiver first optionally negotiate encryption parameters, and then establish a virtual connection over which the sender delivers its log messages. Only acknowledgements are sent by the receiver to successful packets; no negative acknowledgements are possible.


2.1.2 Packet Formats


2.1.2.1 Data Formats

All integers are unsigned, and encoded in LSB order.

A “timestamp” is encoded as a 4-byte integer number of seconds since the UNIX epoch, and a 4-byte integer nanosecond offset since the last whole second. Using unsigned integers, this will be adequate until the year 2106.

Strings is encoded as a 1 or 2 byte length integer followed by the unencoded data. No trailing NUL byte is used (externally).


2.1.2.2 PRQ: Preferences Query

Offset

Size

Type

Description

0

4

Constant

Packet type ‘SRL2

4

4

Constant

Message type ‘PRQ1

8

8

String

Nonce

16

1+N

String

Authenticator list (‘HMAC-MD5’)

??

1+N

String

Key exchange list (‘nistp224’ or ‘curve25519\000nistp224’)

??

1+N

String

Key hash list (‘SHA256’)

??

1+N

String

Encryptor list (‘AES128-CBC-ESSIV’)

??

1+N

String

Compressor list (‘null’)

Notes:


2.1.2.3 PRF: Preferences Response

Offset

Size

Type

Description

0

4

Constant

Packet type ‘SRL2

4

4

Constant

Message type ‘PRF1

8

8

String

Copy of nonce

16

1+N

String

Authenticator choice

??

1+N

String

Key exchange choice

??

1+N

String

Key hash choice

??

1+N

String

Encryptor choice

??

1+N

String

Compressor choice


2.1.2.4 INI: Initialization Packet

Offset

Size

Type

Description

0

4

Constant

Packet format ‘SRL2

4

4

Constant

Packet type ‘INI1

8

8

Integer

Initial sequence number

16

8

Timestamp

Initial timestamp

20

1+N

String

Sender name

??

1+N

String

Service name

??

1+N

String

Authenticator name (A)

??

1+N

String

Key exchange name (E)

??

1+N

String

Key hash name (H)

??

1+N

String

Cipher name (C)

??

1+N

String

Compressor name (Z)

??

sizeof(E)

E

Client session public key

??

sizeof(A)

A

Authenticator


2.1.2.5 CID: Initialization Response

Offset

Size

Type

Description

0

4

Constant

Packet type ‘SRL2

4

4

Constant

Message type ‘CID1

8

sizeof(E)

E

Server session public key

??

sizeof(A)

A

Authenticator


2.1.2.6 MSG: Message Packet

Offset

Size

Type

Description

0

4

Constant

Packet type ‘SRL2

4

4

Constant

Message type ‘MSG1

8

8

Unsigned

Initial sequence number

16

1

Unsigned

Message count M

??

8

Timestamp

Timestamp

??

2+N

String

Line

??

??

Char

Padding to fill out encryption block

??

4

CRC-32

Check code on encrypted data

??

sizeof(A)

A

Authenticator

Notes:


2.1.2.7 ACK: Message Acknowledgement Packet

Offset

Size

Type

Description

0

4

Constant

Packet type ‘SRL2

4

4

Constant

Message type ‘ACK1

8

8

Unsigned

Sequence number

16

sizeof(A)

A

Authenticator


2.1.2.8 SRQ: Status Request

Offset

Size

Type

Description

0

4

Constant

Packet type ‘SRL2

4

4

Constant

Message type ‘SRQ1

8

8

String

Nonce


2.1.2.9 SRP: Status Response

Offset

Size

Type

Description

0

4

Constant

Packet type ‘SRL2

4

4

Constant

Message type ‘SRP1

8

8

String

Copy of nonce

16

2+N

String

Text status report


2.2 Key Exchange

The shared secrets for the INI, CID, MSG, and ACK packets are computed as follows:

Packet

Client Computes

Server Computes

INI

Client secret * Server public

Server secret * Client public

CID

Client session secret * Server public

Server secret * Client session public

ACK or MSG

Client session secret * Server session public

Server session secret * Client session public


2.3 Encryption Parameters

The current system is hard coded to use HMAC-MD5 for authentication and AES-CBC as the cipher with a 128-bit key for encryption and ESSIV, with the first 32 bytes of the SHA256 hash of the nistp224 shared secret used for the key. Additionally, the first 32 bytes of the SHA256 hash of the previous SHA256 hash is used as the ESSIV encryptor. The system may use either nistp224 or curve25519 for key exchange, depending on if curve25519 keys and software support is present on both ends.


2.4 Logging Format

The logging format (the format of the lines written by srlog2d to be read by a log processor) reflects the fact that multiple lines will frequently be output for the same sender/service combination. In this way, it encapsulates the manner in which log data arrives – each packet contains one or more log lines (usually more).

So, instead of having information about the service on each line of output, there is a seperate line type for identifying the service. This actually simplifies the sender, as the actual log lines can be passed by the logger into the output file or pipe without modification.


3. Detailed Changes

This chapter describes in detail the changes made between the original package and srlog2. Some of the explanation for the design decisions above is explained here.


3.1 Multiple lines per packet

The largest real problem encountered with the original system was the high system load caused by the receiver. Having the protocol handle a single line per packet meant that each log line would cause the system to handle two interrupts (incoming and outgoing), and the receiver would have to do a decryption and two full secure hashes. This ended up being a significant issue as we were handling well over 1,000 lines/sec.

Adding a new packet type that would transmit multiple lines was not a big problem, but the bigger issue came with encryption. Since the CBC state was not reset between packets, retransmissions caused a huge implementation headache that could not be satisfactorily resolved.


3.2 IV computed using E(Salt|Sector)

To resolve the CBC issue, the IV was initially forced to zero at the start of each packet. Then while researching disk encryption I came across a scheme called E(Salt,Sector)IV or ESSIV. In this scheme, the key used for the primary encryption is hashed to key another encryptor. To produce the IV for each packet, the (public) sequence number is encrypted (in simple ECB mode) with this (secret) key material to produce a deterministic but still secret IV. This eliminated encryption ordering issues, making one of the issues with having multiple lines per packet disappear.


3.3 Introducing curve25519

After writing the original package, the author of the nistp224 package, Daniel J. Bernstein produced another, stronger, elliptic curve key exchange protocol called curve25519. The nistp224 package was no longer being maintained, and had known bugs causing serious performance regressions with modern compilers, and the author was advocating the use of curve25519 over it.

Initially I was inclined to switch the entire system to curve25519 and drop nistp224 entirely, but the core math of the new system was written entirely in assembler, and the released code only worked on Intel/AMD 32-bit systems. As a result, a mechanism was introduced which would allow either system to be used, with a preference for the longer keys where both were supported.


3.4 New packet format

The original packet format had two shortcomings. First, there was no identification information in the packet other than the leading sequence number, and that was only useful if there was a single line in the packet. To add more packet formats, the sequence numbers from 0xffffffff00000000 and up were reserved. While it is improbable that any sender would ever get close to this number, it is still a poor kludge for multiple packet types. Second, all numbers were represented in MSB order but all the systems using it used LSB ordering, requiring byte swapping on each packet.

So, a new packet format was designed that improved on several attributes. First, the format itself included a version number in both a format identifier and a seperate type identifier, allowing for easily adding more packet types and for future updates to the format. Second, the single line packet was rejected in favor of a explicitly handling multiple lines in each transmission. Finally, all numbers were encoded in LSB order.


3.5 Sender Names

The first design for srlog used IP addresses exclusively to identify senders in the receiver program. This however led to problems when the IP address on a sender changed. In particular, when a sender had multiple IP addresses, the kernel would make an arbitrary choice of which one to use for sending, and that could confuse the receiver. Switching from strictly IPs to names has the additional benefit of allowing support for roaming senders, which has happened when we set up servers in one place and install them in another.


4. Miscellaneous


4.1 External Encryption Libraries

Originally, I had set up the package to use a built in Rijndael (AES) implementation for symmetric encryption. There are, however, several encryption libraries available which may be preferable due to being more portable and/or faster (due to the use of assembler etc).

Here are the features I have identified in an encryption library as being required or desireable for srlog2:

The candidate libraries that I found are:

I have switched to using libtomcrypt based on its good API and documentation, compact size, and public domain status. The encryption support in srlog2 is already encapsulated into a single source file, so switching to another library should not be a large effort. Ideally the build process could switch between several libraries depending on which was present at build time, but that's more work than it's worth for now.


Table of Contents


This document was generated by Bruce Guenter on October, 23 2008 using texi2html 1.78.