Rewrite in Rust to obtain standalone static binary
In contradiction with Jean-Paul's guidelines on not using Rust due to lack of knowledge about it inside Nexedi, I am using it here because it is the fastest way for me to get a working standalone static binary, I know that language best. Considering we must be getting results ASAP, this is the best strategy for me. We may later rewrite it in another language if necessary. A shell script is included to build the static binary, you need to install rustup to get rust for musl, an alternative libc that allows to create real static binaries that embed libc itself too. Rustup can be found at: https://rustup.rs/ You can get a musl toolchain with: $ rustup target add x86_64-unknown-linux-musl The acl library is being downloaded and built as a static library by the script, and the rust build system will also build a vendored copy of openssl as a static library. Parallel hashing is done a bit differently in that Rust version, only files contained in the currently processed directories will be hashed in parallel. If there is a single big file in a directory hashing will be stuck on that file until it's done and it goes onto the next directory. To clarify, each file is only hashed on a single thread, the Python version also does this, it just keeps the number of files being hashed in parallel to a constant number as long as there is more files to process, this version will only hash with one thread per file in the currently processed directory. It was done that way for sake of simplicity but we can implement an offload threadpool to mimick what was done in Python later on.
[package] | ||
name = "metadata-collect-agent" | ||
version = "0.1.0" | ||
authors = ["Leo Le Bouter <leo.le.bouter@nexedi.com>"] | ||
edition = "2018" | ||
[dependencies] | ||
posix-acl = "1.0.0" | ||
xattr = "0.2.2" | ||
md-5 = "0.9.1" | ||
sha-1 = "0.9.1" | ||
sha2 = "0.9.1" | ||
hex = "0.4.2" | ||
anyhow = "1.0.32" | ||
clap = "2.33.3" | ||
psutil = { git = "https://github.com/leo-lb/rust-psutil", branch = "lle-bout/impl-serde", version = "3.1.0", features = ["serde"] } | ||
reqwest = { version = "0.10.7", features = ["blocking", "native-tls-vendored"] } | ||
rmp-serde = "0.14.4" | ||
nix = "0.18.0" | ||
serde = { version = "1.0.115", features = ["derive"] } | ||
base64 = "0.12.3" | ||
rayon = "1.3.1" | ||
[profile.release] | ||
opt-level = 'z' | ||
lto = true | ||
codegen-units = 1 | ||
\ No newline at end of file |
-
Owner
I'm halfway through reading https://doc.rust-lang.org/stable/book/ and I have to say rust is really excellent.
-
Owner
BTW, in slapos!799 (merged) there's the beginning of rust component for slapos (which takes ~2 hours to compile from source and I'm not really sure it's reproductible, the setup seems to download things) and rust support in theia.
-
Owner
Hashing several files in parallel looks like a bad idea. We already did it in SlapOS for the resilience and it was a disaster. It's rare that hashing is slower than IO (maybe only a hashing a big file on high-performance NVMe with the slowest algorithm). Most of the time, it's so inefficient that it's slower: at best, it could be faster but the hardware consumes a lot.
But there may other to parallelizing work. First, by doing IO and hashing in separate threads, i.e. pipelining. Since you compute several hashes, you could use 1 thread per hash.
In any case, I find such attempt to optimize very premature, in particular if it is planned to rewrite.
-
@jm It's really really slow with a single thread, the use case is whole file system hashing, and it's performed during boot with nothing else running on the system, it's mandatory that it is fast, otherwise we might end up with terrible dozens of minutes boot times. I did some tests before making it parallel and it made it much faster and I/O utilization is 100% now on my fast NVMe drive so it can't be any faster, but before it wasnt 100%, so the first benefit is from doing I/O from multiple threads but then on my system hashing on a single thread is as slow or a bit faster than I/O on a single thread for something like 100MB/s, so you would have to parallelize anyway because it wouldnt keep up hashing data from I/O being done on multiple threads on one thread.
The hashing functions come from: https://github.com/RustCrypto/hashes - maybe they could be optimized but they already use hand-written assembly in select places for performance so it turns out making it parallel is easier for me than digging into the assembly with cryptographer knowledge that I don't have.
Note that the performance issues were identical with Python that uses OpenSSL where hash functions are already well optimized there.
In any case, I find such attempt to optimize very premature, in particular if it is planned to rewrite.
It's pretty trivial in Rust, add Arc/Mutex on data structure then use Rayon's parallel iterator and it's done, so it wasnt much effort and time spent there.