Microsoft Touts CPU-Based AI Model as Energy-Efficient Alternative

April, 20, 2025 - 09:13
Space/Science news

TEHRAN (Tasnim) – Microsoft’s new BitNet b1.58 model significantly reduces memory and energy requirements while matching the capabilities of full-precision AI models, offering a promising low-resource alternative.

Most large language models today rely on 16- or 32-bit floating-point numbers to store neural network weights.

While this provides high precision, it results in large memory usage and intensive processing demands.
In contrast, Microsoft’s General Artificial Intelligence group has introduced BitNet b1.58, a model using just three weight values: -1, 0, and 1.

The ternary structure, built upon Microsoft Research’s 2023 work, reduces complexity and provides what the researchers call "substantial advantages in computational efficiency".

Despite the reduced precision, the model "can achieve performance comparable to leading open-weight, full-precision models of similar size across a wide range of tasks", they claim.

The idea of simplifying model weights through quantization isn't new in AI.

Researchers have long experimented with compressing weights to reduce memory usage.
Some of the most aggressive approaches have led to so-called “BitNets,” which use a single bit to represent +1 or -1.

BitNet b1.58 doesn’t go that far but employs what’s called a 1.58-bit system—since it averages log(3)/log(2) bits per weight.

The researchers note it’s «the first open-source, native 1-bit LLM trained at scale», with 2 billion parameters and a training dataset of 4 trillion tokens.

Unlike previous post-training quantization attempts, which often degrade performance, BitNet b1.58 was trained natively with simplified weights.

Other native BitNet models have been smaller, lacking the ability to match full-precision models in performance.

This makes Microsoft’s approach stand out.

The efficiency gains are significant.

BitNet b1.58 requires just 0.4GB of memory, compared to 2 to 5GB for similar-sized open-weight models.

Its ternary design allows inference using mainly addition operations, avoiding expensive multiplications.

Researchers estimate it consumes 85 to 96 percent less energy than comparable models.

A demo running on an Apple M2 CPU illustrates the model’s capability.

Thanks to a specialized kernel, BitNet b1.58 performs several times faster than standard full-precision transformers.

It can generate text at «speeds comparable to human reading (5-7 tokens per second)» using only a single CPU, according to the researchers.

Performance doesn't appear to be compromised.

While independent verification is still pending, Microsoft reports that BitNet achieves benchmark scores nearly equal to other models in its size class.

The model showed strong results in reasoning, math, and general knowledge tasks.

Despite its success, researchers acknowledge there’s still much to understand.

"Delving deeper into the theoretical underpinnings of why 1-bit training at scale is effective remains an open area", they write.

Additionally, future research is needed for BitNets to match the memory and context capabilities of today’s largest models.

Nonetheless, this development signals a promising alternative as hardware and energy costs for running high-performance models continue to rise.

Microsoft's BitNet suggests that full-precision models may be overengineered for many tasks—akin to muscle cars burning excess fuel when a more efficient solution might suffice.