Since `ByteBuf` is everywhere in the codebase, moving to the `Buffer` interface will be the most difficult part of the migration.
The main causality is the HAProxy support, which seems to have not been pushed up to Maven Central or Sonatype snapshots.
OpenSSL is much more portable and optimized (important for aarch64) and most systems already have a version.
Unfortunately, OpenSSL likes to break their ABI. Thankfully, Velocity's natives system is very flexible largely, so we can provide multiple versions of this crypto.
Versions of the dynamically-linked crypto were compiled on CentOS 7 (still supported until 2024, uses OpenSSL 1.0.x) and Debian 9 (the oldest distro including OpenSSL 1.1.0, whose LTS supports ends in 2022). The choice of distros was intended to cover most modern distributions (2014 and afterwards).
An ARM compilation (using Debian 9) will be published soon.
This native has been compiled and tested on Ubuntu 18.04 aarch64 on AWS Graviton2.
The cipher native is at the moment unlikely to provide a performance boost as mbed TLS doesn't provide an accelerated AES extension.
We have dropped the rarely used kqueue and replaced it with the new Netty aarch64
native. In addition, lay down the foundation for other aarch64 natives.
This served a good purpose when I used macOS as a primary development system, but those days are gone (I use Linux now). The spirit of multiple variants is preserved by our special Java 11+ optimized compression.
libdeflate is significantly faster than vanilla zlib, zlib-ng, and Cloudflare zlib. It is also MIT-licensed (so no licensing concerns). In addition, it simplifies a lot of the native code (something that's been tricky to get right).
While we're at it, I have also taken the time to fine-time compression in Velocity in general. Thanks to this work, native compression only requires one JNI call, an improvement from the more than 2 (sometimes up to 5) that were possible before. This optimization also extends to the existing Java compressors, though they require potentially two JNI calls.
The compression itself is zero-copy in most cases. However, the overhead
needed to copy a direct buffer into a heap buffer (and back) is still
present. If possible, use Linux for best performance.
All AES implementations being used are 'copy safe', where the source and
destination arrays may be the same. Lets save ourself a copy and reap
the performance wins!
Using the fact that the Java Deflater/Inflater API now supports
ByteBuffers as of Java 11, we can provide performance benefits equivalent
to the Velocity 1.0.x native compression on servers running Java 11+ on
non-macOS and non-Linux platforms (such as Windows).
zlib-ng boasts higher throughput than regular zlib, by combining patches
from Cloudflare, zlib, and ARM's improvements to zlib along with a more
modern codebase.
Profiling consistently shows that compression is the largest CPU expense
by far, so even a minor speed-up here is significant.