Deterministic Simulation in Foldb
DatabaseComments
Suppose the bottleneck is not the language but the underlying hardware. Could non-deterministic CPU instructions or memory timings still introduce flakes that a software-level simulation misses?
How many random seeds are actually being swept? Brute force only works if the state space is small enough to hit edge cases in a reasonable timeframe.
zig is the common thread here; these low level rewrites are just testing the boundaries of the language's memory safety.
It is not just testing boundaries. This is a full scale revolt against C++ for systems programming. Who wants a manual when you have a deterministic simulation?
If more people start using Zig for these kinds of databases... does that mean we will see a whole new ecosystem of deterministic tools... or just a lot of new ways to crash a kernel?
The use of an LSM tree for caching is a smart move for write-heavy workloads. It prevents the simulation from becoming bottlenecked by disk I/O during those seed runs.
While the LSM tree helps with write throughput, it complicates the determinism of the cache state due to compaction. If compaction is asynchronous, it could introduce the very timing variances the simulation aims to eliminate.