Applied Brain Research shows first dedicated state space model silicon

Applied Brain Research has demonstrated what it describes as the first dedicated silicon designed around state space models, rather than treating them as a sidecar to transformer or CNN workloads. It is an early part of a broader trend. As the software side of machine learning shifts from attention heavy transformers toward more efficient state space architectures, hardware vendors are starting to ask a simple question. If the maths is different, should the silicon be different as well.

Why state space models matter in 2025

State space models are not new in control theory, but the current interest comes from their use as a practical alternative to transformers for sequence modelling. Architectures like S4, Mamba and related variants use parameterised linear state updates to model long range dependencies in time series, language, audio and sensor data. The practical selling points are straightforward.

  • Linear time complexity for sequence length, at least in the core recurrence, as opposed to quadratic scaling in vanilla attention.
  • Streaming friendly operation, since the model maintains an internal state that can be updated as new tokens arrive, without reprocessing the entire context window.
  • Better memory locality, because the model can often work on small state vectors and kernels rather than dense attention matrices.

On general purpose hardware, these models are still mapped to the same primitives as transformers. That usually means batched matrix multiplies on GPUs or AI accelerators. Applied Brain Research is taking a different view. If the core operation is a structured state update plus a small number of learned filters, it might deserve a different data path and memory system than a generic matrix engine.

What a state space focused chip has to do

Even without the exact floorplan of Applied Brain Research’s first silicon, the requirements for a serious state space accelerator are clear from the maths.

  • Efficient linear recurrences – The hardware needs to implement repeated state updates efficiently. That usually means tight loops over moderately sized vectors and matrices, not massive all to all attention maps.
  • Good support for 1D convolutions and filters – Many state space variants are implemented using learnable filters that behave like convolutions over time. Dedicated logic for small kernel convolutions can pay off.
  • Streaming I O – To exploit the model’s streaming nature, the chip must ingest and emit sequences with minimal buffering and latency, rather than reloading entire sequences for each step.
  • On chip state storage – The persistent state of each sequence ideally lives on chip. Pulling it in and out of external memory every step kills the main benefit of the architecture.

General purpose GPUs can do all of this, but they are optimised for large dense matmuls and wide throughput, not necessarily for thousands or millions of small, independent state machines ticking away at the edge of a network. A specialist design can trade some peak matrix performance for lower power and better energy per token on SSM workloads.

Where Applied Brain Research sits in the AI hardware landscape

Applied Brain Research comes from the neuromorphic and brain inspired computing world. Its software stack has focused on spiking neural networks and efficient deployment of sequence models on constrained hardware. Building a dedicated state space model chip fits that history. It is an attempt to harden a specific class of sequence models into silicon in the same way that earlier accelerators hardened convolutional networks.

The timing is interesting. State space models are still in a relatively early phase of adoption compared to transformers. Most production language and vision systems are transformer based. However, there is a clear appetite for models that can deliver acceptable quality with lower memory and compute footprints, especially in edge environments. A vendor who can show that SSMs on custom silicon deliver better tokens per joule than transformers on generic GPUs has a story to tell to anyone building battery powered devices or dense inference infrastructure.

Where first silicon is likely to land

First silicon in a new category rarely goes straight into hyperscale production. It is more often a technology demonstrator for a few high leverage niches.

  • Always on and low power sensing – Audio triggers, gesture recognition, anomaly detection on device. These are classic cases where a small state space model can run continuously at very low power.
  • Industrial and embedded control – Time series prediction and state estimation on sensors and actuators, where streaming and predictable latency matter more than peak throughput.
  • Edge language and assistant workloads – Compact SSM based language models that can run locally on devices for command interpretation, summarisation or simple dialog without hitting a cloud endpoint every time.

In those domains, a chip that directly reflects the state space maths can avoid some of the overhead of mapping onto a large GPU or CPU cluster. It can also be easier to integrate into systems with tight thermal and power budgets, because behaviour is more predictable.

How this differs from transformer accelerators

Most of the AI hardware announced in the last few years has had transformers in mind. That shows up in design choices.

  • Very wide matrix multiply engines and systolic arrays for attention and feed forward layers.
  • Large high bandwidth memory stacks to feed those arrays with activations and weights.
  • Complex scheduling logic to manage attention patterns and memory reuse across heads and layers.

A state space focused chip can tilt the design in other directions.

  • Narrower, more numerous compute tiles that map cleanly onto many independent state updates rather than a few huge matmuls.
  • More on chip SRAM to hold state vectors and kernels close to the compute units.
  • Streaming interconnects tuned for continuous dataflow rather than large batch processing.

None of this precludes running transformers on the same silicon using a compatibility layer, but the sweet spot is different. Applied Brain Research’s first silicon is useful as a proof point. It tests whether the gains in efficiency from matching hardware and SSM workloads are large enough to justify a separate class of accelerators.

What to watch as the design matures

First silicon is the start of the story, not the end. A few questions will determine whether state space accelerators become a real product category or remain a research artefact.

  • Tooling and model portability – How easy is it for researchers and engineers to take an SSM defined in standard frameworks and compile it for this hardware without hand tuning every layer.
  • End to end energy numbers – Tokens per joule and tokens per second for realistic workloads are the real metrics. If the chip only wins in narrow microbenchmarks, adoption will be limited.
  • Integration paths – Can the silicon sit alongside existing CPUs or GPUs as a coprocessor, or does it require a whole new system design. The lower the integration friction, the better.
  • Model quality trajectory – If state space models continue to close the gap with transformers on language and multimodal tasks, the value of dedicated SSM hardware rises. If they stall, the niche narrows.

My read on Applied Brain Research’s move

Applied Brain Research is effectively betting that the long term direction of sequence modelling is toward architectures that look more like continuous systems and less like pure attention blocks. First silicon for state space models is a way to be in front of that wave rather than reacting to it later.

From a hardware engineering perspective, it also fits a broader pattern. Once a class of models stabilises and proves its value, someone tries to turn its core operations into efficient silicon. Convolutions got that treatment. Attention is getting it now. State space models are next in line.

This first chip will not displace general purpose GPUs in data centers. That is not the point. The interesting question is whether it can show convincing advantages in the power and latency regimes where SSMs make sense in the first place. If it does, it will be a useful signal that the AI hardware ecosystem is ready to specialise beyond “one accelerator for everything transformer shaped”. If it does not, state space silicon will probably retreat back into the research lab until the next architectural cycle.

Sources

Be the first to comment

Leave a Reply

Your email address will not be published.


*