Multivac
Paper under reviewFig. 1 · The peer matrix
Independent, blind evaluation of frontier language models. Fresh questions models haven't memorized, judged by a cross-family peer matrix — no vendor grades its own homework.
Open MultivacIndependent research lab
Deep Pearl AI studies two sides of the same question: how do we know an AI system actually works — and how do we build systems that keep working in the physical world, on real hardware, with real people depending on them.
NACRE SIM LAYER 00/44 Δr 0.00px T+0.0s
The work
Frontier models are judged by benchmarks they can game, then deployed on hardware they were never built for. We work both ends of that gap. Each instrument below is a working model of its research — open the drawer to see it run.
Fig. 1 · The peer matrix
Independent, blind evaluation of frontier language models. Fresh questions models haven't memorized, judged by a cross-family peer matrix — no vendor grades its own homework.
Open MultivacFig. 2 · Sensitive dependence
Graduate-level physics problems through the same blind peer matrix — measuring whether models reason about the physical world or pattern-match around it.
Open PhysicsFig. 3 · The membrane
A home assistant that runs on the device, keeps personal data local, and learns from everyday use — voice, emotion, and smart home control without a cloud dependency at its core.
Open ALEX-1Fig. 4 · Pixels stop here
On-device computer vision that detects falls in eldercare without sending video anywhere. Skeleton-only processing, sanity gates between every stage, live on Raspberry Pi hardware.
Open EMOTE4DHow we work
Evaluations are blind and judged across model families. A benchmark's design shapes its rankings, so we design against our own bias first.
A number measured in the lab is a hypothesis. Nothing counts until it holds on the device, in the room, under the lighting it will actually face.
Compositional systems fail when one stage trusts another blindly. We put explicit sanity gates between stages and design honest failure modes.
Research
Six threads, from 3D perception to brain-inspired learning to agentic control — each feeding back into the ventures.
Currently learning