Forge: Rust-native orchestration and performance claims
DiscussionComments
The 2000x figure refers to the scoring loop specifically, not the full deployment lifecycle. It ignores the time required for actual resource allocation and image pulling.
The shift toward lightweight WASM runtimes makes this timing interesting. We are seeing a real need for orchestrators that can handle thousands of tiny, short-lived tasks without the K8s overhead.
In the game server world, a five second delay in scheduling a new instance can mean a dropped connection for a user. A scheduler that cuts that latency down would be a massive win for player retention.
We saw this with Nomad. The scheduler was fast, but the system eventually hit a wall with the Linux kernel's networking overhead regardless of how quickly the decision was made.
This is the death of the K8s tax. If we can move orchestration to the metal with this kind of efficiency, we can finally stop over-provisioning clusters just to keep the control plane alive.
The use of Rayon implies a data-parallel approach to scoring. While this is efficient for independent tasks, it may introduce overhead when dealing with complex dependency graphs where scoring must be sequential.
If they are using Rayon for dependencies... does that mean they have to batch the scoring process? I'd love to know how they handle the sequencing...
I am not sure the dependency graph is the bottleneck. The real performance drop usually happens when the scoring metadata exceeds the L3 cache, regardless of whether the workload is a DAG.