OpenSSL is much more portable and optimized (important for aarch64) and most systems already have a version.
Unfortunately, OpenSSL likes to break their ABI. Thankfully, Velocity's natives system is very flexible largely, so we can provide multiple versions of this crypto.
Versions of the dynamically-linked crypto were compiled on CentOS 7 (still supported until 2024, uses OpenSSL 1.0.x) and Debian 9 (the oldest distro including OpenSSL 1.1.0, whose LTS supports ends in 2022). The choice of distros was intended to cover most modern distributions (2014 and afterwards).
An ARM compilation (using Debian 9) will be published soon.
This native has been compiled and tested on Ubuntu 18.04 aarch64 on AWS Graviton2.
The cipher native is at the moment unlikely to provide a performance boost as mbed TLS doesn't provide an accelerated AES extension.
We have dropped the rarely used kqueue and replaced it with the new Netty aarch64
native. In addition, lay down the foundation for other aarch64 natives.
This served a good purpose when I used macOS as a primary development system, but those days are gone (I use Linux now). The spirit of multiple variants is preserved by our special Java 11+ optimized compression.
libdeflate is significantly faster than vanilla zlib, zlib-ng, and Cloudflare zlib. It is also MIT-licensed (so no licensing concerns). In addition, it simplifies a lot of the native code (something that's been tricky to get right).
While we're at it, I have also taken the time to fine-time compression in Velocity in general. Thanks to this work, native compression only requires one JNI call, an improvement from the more than 2 (sometimes up to 5) that were possible before. This optimization also extends to the existing Java compressors, though they require potentially two JNI calls.
The compression itself is zero-copy in most cases. However, the overhead
needed to copy a direct buffer into a heap buffer (and back) is still
present. If possible, use Linux for best performance.
All AES implementations being used are 'copy safe', where the source and
destination arrays may be the same. Lets save ourself a copy and reap
the performance wins!
Using the fact that the Java Deflater/Inflater API now supports
ByteBuffers as of Java 11, we can provide performance benefits equivalent
to the Velocity 1.0.x native compression on servers running Java 11+ on
non-macOS and non-Linux platforms (such as Windows).
zlib-ng boasts higher throughput than regular zlib, by combining patches
from Cloudflare, zlib, and ARM's improvements to zlib along with a more
modern codebase.
Profiling consistently shows that compression is the largest CPU expense
by far, so even a minor speed-up here is significant.
Instead of going through the general process function, which does extra
checks to ensure that its arguments are sane, move "closer" to the action
by stripping down to the actual implementation.
* Delay switch to new server until after JoinGame is sent.
Unfortunately, in some cases (especially vanilla Minecraft) some login
disconnects are sent after ServerLoginSuccess but before JoinGame.
We've been using ServerLoginSuccess but it has caused too many problems.
Now Velocity will switch to a stripped-down version of the play session
handler until JoinGame is received. This handler does very little by
itself: it simply forwards plugin messages (for Forge) and waits for the
JoinGame packet from the server.
This is an initial version: only vanilla Minecraft 1.12.2 was tested.
However this is the way Waterfall without entity rewriting does server
switches (which, in turn, is inherited from BungeeCord).
* Move to Gradle 5 and Error Prone.
We now try to work within the boundaries given by the native. In the
case of Java natives, we work with byte arrays. With natives, always use
direct buffers.
However, the numbers do favor the natives, since they work with direct
byte buffers, without any copying. For the most part, this commit is
intended to improve the lives of Velocity users on Windows.