1 Introduction | ||
2 Design | ||
3 Detailed Changes | ||
4 Miscellaneous |
The original srlog package originated as an internal mechanism for collecting all system logs at FutureQuest, Inc. in one place for analysys.
When figuring out how to accomplish this, we identified several key requirements:
We considered using some tools that were already available for this task. In particular, reusing a tool like SSH would have been ideal. However, we were unaware of any such tool that give the delivery guarantees we wanted.
The initial implementation was fairly limited, and there were a number of design mistakes. The packet format allowed no variations in what cryptography mechanisms were used. It was hard coded to use MD5 for authentication, the nistp224 elliptic curve for key exchange, and AES192-CBC for encryption, with no hashing of the shared secret to produce the encryption key, no IV, and no resets between packets. Each service required its own secret key, and needed the server key copied into its directory. Senders were identified exclusively by IP and authenticated by a manually copied public key. The packet format was also overly optimized for the established connection path, and only allowed one line per packet.
I recognized a number of the original design decisions were poor choices or outright mistakes, and set out to fix them. In order to avoid recreating some original mistakes or throwing away existing knowledge, all of the changes were done incrementally, resulting in a system that was at least minimally usable at each step.
However, many of the choices resulted in a system that was completely incompatible with the original srlog externally, even though much of the internal mechanism was still the same. In particular the protocol and the key file handling were completely overhauled. So, the package name (and the name of all the programs) was changed to reflect these differences, and to prevent confusion between the old and new packages.
All data is exchanged over UDP with a default port number of 11014. The sender and receiver first optionally negotiate encryption parameters, and then establish a virtual connection over which the sender delivers its log messages. Only acknowledgements are sent by the receiver to successful packets; no negative acknowledgements are possible.
All integers are unsigned, and encoded in LSB order.
A “timestamp” is encoded as a 4-byte integer number of seconds since the UNIX epoch, and a 4-byte integer nanosecond offset since the last whole second. Using unsigned integers, this will be adequate until the year 2106.
Strings is encoded as a 1 or 2 byte length integer followed by the unencoded data. No trailing NUL byte is used (externally).
Offset | Size | Type | Description |
---|---|---|---|
0 | 4 | Constant | Packet type ‘SRL2’ |
4 | 4 | Constant | Message type ‘PRQ1’ |
8 | 8 | String | Nonce |
16 | 1+N | String | Authenticator list (‘HMAC-MD5’) |
?? | 1+N | String | Key exchange list (‘nistp224’ or ‘curve25519\000nistp224’) |
?? | 1+N | String | Key hash list (‘SHA256’) |
?? | 1+N | String | Encryptor list (‘AES128-CBC-ESSIV’) |
?? | 1+N | String | Compressor list (‘null’) |
Notes:
NUL
byte.
Offset | Size | Type | Description |
---|---|---|---|
0 | 4 | Constant | Packet type ‘SRL2’ |
4 | 4 | Constant | Message type ‘PRF1’ |
8 | 8 | String | Copy of nonce |
16 | 1+N | String | Authenticator choice |
?? | 1+N | String | Key exchange choice |
?? | 1+N | String | Key hash choice |
?? | 1+N | String | Encryptor choice |
?? | 1+N | String | Compressor choice |
Offset | Size | Type | Description |
---|---|---|---|
0 | 4 | Constant | Packet format ‘SRL2’ |
4 | 4 | Constant | Packet type ‘INI1’ |
8 | 8 | Integer | Initial sequence number |
16 | 8 | Timestamp | Initial timestamp |
20 | 1+N | String | Sender name |
?? | 1+N | String | Service name |
?? | 1+N | String | Authenticator name (A) |
?? | 1+N | String | Key exchange name (E) |
?? | 1+N | String | Key hash name (H) |
?? | 1+N | String | Cipher name (C) |
?? | 1+N | String | Compressor name (Z) |
?? | sizeof(E) | E | Client session public key |
?? | sizeof(A) | A | Authenticator |
Offset | Size | Type | Description |
---|---|---|---|
0 | 4 | Constant | Packet type ‘SRL2’ |
4 | 4 | Constant | Message type ‘CID1’ |
8 | sizeof(E) | E | Server session public key |
?? | sizeof(A) | A | Authenticator |
Offset | Size | Type | Description |
---|---|---|---|
0 | 4 | Constant | Packet type ‘SRL2’ |
4 | 4 | Constant | Message type ‘MSG1’ |
8 | 8 | Unsigned | Initial sequence number |
16 | 1 | Unsigned | Message count M |
?? | 8 | Timestamp | Timestamp |
?? | 2+N | String | Line |
?? | ?? | Char | Padding to fill out encryption block |
?? | 4 | CRC-32 | Check code on encrypted data |
?? | sizeof(A) | A | Authenticator |
Notes:
Offset | Size | Type | Description |
---|---|---|---|
0 | 4 | Constant | Packet type ‘SRL2’ |
4 | 4 | Constant | Message type ‘ACK1’ |
8 | 8 | Unsigned | Sequence number |
16 | sizeof(A) | A | Authenticator |
Offset | Size | Type | Description |
---|---|---|---|
0 | 4 | Constant | Packet type ‘SRL2’ |
4 | 4 | Constant | Message type ‘SRQ1’ |
8 | 8 | String | Nonce |
Offset | Size | Type | Description |
---|---|---|---|
0 | 4 | Constant | Packet type ‘SRL2’ |
4 | 4 | Constant | Message type ‘SRP1’ |
8 | 8 | String | Copy of nonce |
16 | 2+N | String | Text status report |
The shared secrets for the INI, CID, MSG, and ACK packets are computed as follows:
Packet | Client Computes | Server Computes |
---|---|---|
INI | Client secret * Server public | Server secret * Client public |
CID | Client session secret * Server public | Server secret * Client session public |
ACK or MSG | Client session secret * Server session public | Server session secret * Client session public |
The current system is hard coded to use HMAC-MD5 for authentication and AES-CBC as the cipher with a 128-bit key for encryption and ESSIV, with the first 32 bytes of the SHA256 hash of the nistp224 shared secret used for the key. Additionally, the first 32 bytes of the SHA256 hash of the previous SHA256 hash is used as the ESSIV encryptor. The system may use either nistp224 or curve25519 for key exchange, depending on if curve25519 keys and software support is present on both ends.
The logging format (the format of the lines written by
srlog2d
to be read by a log processor) reflects the fact
that multiple lines will frequently be output for the same
sender/service combination. In this way, it encapsulates the manner
in which log data arrives – each packet contains one or more log
lines (usually more).
So, instead of having information about the service on each line of output, there is a seperate line type for identifying the service. This actually simplifies the sender, as the actual log lines can be passed by the logger into the output file or pipe without modification.
This chapter describes in detail the changes made between the original package and srlog2. Some of the explanation for the design decisions above is explained here.
The largest real problem encountered with the original system was the high system load caused by the receiver. Having the protocol handle a single line per packet meant that each log line would cause the system to handle two interrupts (incoming and outgoing), and the receiver would have to do a decryption and two full secure hashes. This ended up being a significant issue as we were handling well over 1,000 lines/sec.
Adding a new packet type that would transmit multiple lines was not a big problem, but the bigger issue came with encryption. Since the CBC state was not reset between packets, retransmissions caused a huge implementation headache that could not be satisfactorily resolved.
To resolve the CBC issue, the IV was initially forced to zero at the start of each packet. Then while researching disk encryption I came across a scheme called E(Salt,Sector)IV or ESSIV. In this scheme, the key used for the primary encryption is hashed to key another encryptor. To produce the IV for each packet, the (public) sequence number is encrypted (in simple ECB mode) with this (secret) key material to produce a deterministic but still secret IV. This eliminated encryption ordering issues, making one of the issues with having multiple lines per packet disappear.
After writing the original package, the author of the nistp224 package, Daniel J. Bernstein produced another, stronger, elliptic curve key exchange protocol called curve25519. The nistp224 package was no longer being maintained, and had known bugs causing serious performance regressions with modern compilers, and the author was advocating the use of curve25519 over it.
Initially I was inclined to switch the entire system to curve25519 and drop nistp224 entirely, but the core math of the new system was written entirely in assembler, and the released code only worked on Intel/AMD 32-bit systems. As a result, a mechanism was introduced which would allow either system to be used, with a preference for the longer keys where both were supported.
The original packet format had two shortcomings. First, there was no
identification information in the packet other than the leading
sequence number, and that was only useful if there was a single line
in the packet. To add more packet formats, the sequence numbers from
0xffffffff00000000
and up were reserved. While it is
improbable that any sender would ever get close to this number, it is
still a poor kludge for multiple packet types. Second, all numbers
were represented in MSB order but all the systems using it used LSB
ordering, requiring byte swapping on each packet.
So, a new packet format was designed that improved on several attributes. First, the format itself included a version number in both a format identifier and a seperate type identifier, allowing for easily adding more packet types and for future updates to the format. Second, the single line packet was rejected in favor of a explicitly handling multiple lines in each transmission. Finally, all numbers were encoded in LSB order.
The first design for srlog used IP addresses exclusively to identify senders in the receiver program. This however led to problems when the IP address on a sender changed. In particular, when a sender had multiple IP addresses, the kernel would make an arbitrary choice of which one to use for sending, and that could confuse the receiver. Switching from strictly IPs to names has the additional benefit of allowing support for roaming senders, which has happened when we set up servers in one place and install them in another.
Originally, I had set up the package to use a built in Rijndael (AES) implementation for symmetric encryption. There are, however, several encryption libraries available which may be preferable due to being more portable and/or faster (due to the use of assembler etc).
Here are the features I have identified in an encryption library as being required or desireable for srlog2:
The candidate libraries that I found are:
The calling convention for libmcrypt does not seperate setting the IV from rekeying the algorithm. There appears to be no easy or standard way of accessing the internal state of the encryptor to set it directly either. It supports all the other requirements and is relatively small and popular among systems that use PHP.
The documentation on OpenSSL is missing large pieces of required details, and does not include details on AES (which is known to be included in the software). The calling convention for Blowfish and DES (other block ciphers) indicate that it supports setting a seperate IV for each encryption operation. It is also by far the most popular and likely most portable library of all of the choices. However, libcrypto is a huge library, several times larger than any of the other candidates.
This library supports all the required features, and should be portable everywhere GnuPG is supported (which should be nearly everywhere). It is however a fairly sizeable library.
This is one of the smaller encryption libraries, and supports all the required features, although support for much outside the requirements is not high (the only other supported encryption algorithm is Blowfish). Portability appears to be quite good (many processors and OSs are listed). It is required by recent versions of RPM, so it will be present on all recent RedHat, Fedora, and Mandrake Linux systems. I had serious problems getting beecrypt built on Gentoo, however, as it conflicts with the stable versions of rpm. It also doesn’t appear to be very popular.
This is a very small library, with the standard library coming in at around 50kB on most systems. It supports AES, but the API documentation doesn’t appear to provide any way to directly access it.
The cryptlib library is a large library, probably outsizing OpenSSL itself, with many language bindings. This was my first encounter with cryptlib, and I am aware of no commonly used packages that actually depend on it, and as such its popularity is very low.
While the original web page is currently presenting something completely unrelated (“The Musicians of the New Mexico Symphony Orchestra”), there are many copies of this excellent library mirrored on the web, including the mirror link above. The included documentation is good and the API provides all the requirements (and then some). The library itself is specifically targetted at embeded systems, and so is very compact, and is popular in circles that target smaller systems.
I have switched to using libtomcrypt based on its good API and documentation, compact size, and public domain status. The encryption support in srlog2 is already encapsulated into a single source file, so switching to another library should not be a large effort. Ideally the build process could switch between several libraries depending on which was present at build time, but that’s more work than it’s worth for now.
This document was generated on February 10, 2015 using texi2html 5.0.