Pulsar 16B matches 30B-class reasoning at half the parameter count

Multiverse Computing's open-source Pulsar 16B delivers frontier-grade reasoning performance at roughly half the parameter count of comparable models, validated on Nvidia's accelerated infrastructure.

Multiverse Computing's Pulsar 16B matches the reasoning performance of 30-billion-parameter models with just 16.15 billion total parameters and 3.1 billion active, cutting the computational footprint by nearly half while preserving benchmark scores across math, science and coding tasks.

"Running advanced AI locally has historically required compromising on model size or performance," Enrique Lizaso, cofounder and chief executive officer of Multiverse Computing, said. "What we're demonstrating with Pulsar 16B is that frontier-grade reasoning can now be deployed without the overhead of cloud-scale infrastructure, at a footprint enterprises can actually run and scale economically."

Built on a compressed version of Nvidia's Nemotron 3 Nano — a hybrid Mamba2-Transformer with Mixture-of-Experts architecture — Pulsar 16B scores 87.22 on the AIME 2025 math reasoning benchmark, within a tenth of a point of the uncompressed 31.6-billion-parameter base model and 15 points ahead of gpt-oss-20B. On GPQA-Diamond, a PhD-level science benchmark, it scores 71.41, matching the uncompressed model and beating gpt-oss-20B's 58.88. The model also outperforms gpt-oss-20B by 14 points on instruction following and 11 points on function calling.

The efficiency gain translates directly to lower deployment costs. On an Nvidia Blackwell GPU handling 32 concurrent requests, Pulsar 16B in FP8 precision delivers 4,808 tokens per second of system throughput, a 43% increase over the base model's 3,363 tokens per second, while cutting time-to-first-token to 1.24 seconds from 2.18 seconds. For enterprises running high-concurrency agentic workflows or continuously processing long documents, the savings in GPU procurement and energy costs could be substantial.

Memory and Inference Advantages

Pulsar 16B achieves significant reductions in model weight memory across all supported precisions — BF16, FP8 and NVFP4 — compared with the Nemotron-3-Nano-30B-A3B base model. The compression, achieved using Multiverse Computing's CompactifAI technology in combination with Nvidia's Model Optimizer and Megatron Bridge libraries, required no retraining from scratch. Instead, the company identified and removed mathematical redundancy within the trained network while preserving the reasoning behaviors learned during training.

Long-context performance, often the first casualty of aggressive compression, remains intact. Needle-in-a-haystack retrieval stays essentially perfect on both sides of the 100,000-token mark, and Pulsar 16B tracks the uncompressed base model closely on harder RULER tasks at extended context lengths, according to Multiverse Computing's evaluations across LongBench, AA-LCR, the RULER suite and NIAH variants.

Competitive Implications

The release pressures other open-source model developers — including those behind gpt-oss-20B and similar mid-size architectures — to demonstrate comparable efficiency gains. For Nvidia, the collaboration reinforces the value of its inference-optimized Blackwell hardware: a model that runs 43% faster on the same GPU creates a stronger return-on-investment case for enterprise customers evaluating Nvidia's infrastructure. Multiverse Computing, a member of Nvidia's Inception program, serves more than 100 global customers including Iberdrola, Bosch and the Bank of Canada, positioning it to capture demand from regulated industries seeking sovereign AI deployments where data never leaves on-premises infrastructure.

Pulsar 16B is available on Hugging Face under the Apache 2.0 license. The model is designed for single-node setups, on-premises environments and latency-sensitive systems where the cost of frontier-class reasoning had previously been prohibitive.

This article is for informational purposes only and does not constitute investment advice.