NVIDIA Rubin CPX Explained: Disaggregated Inference And The Cost Of Million-Token Context
NVIDIA’s Rubin platform splits long-context prefill from token decode. Rubin CPX handles the compute-heavy front half, standard Rubin handles bandwidth-heavy generation. The NVL144 CPX rack […]
