a trillion weights, mostly asleep
Most of the Model Never Wakes Up
LongCat-2.0 holds 1.6 trillion parameters and wakes 48 billion to answer any token. A sparse model is the admission, cast in silicon, that intelligence is mostly knowing which small part of yourself to wake, and that the deciding happens where you cannot watch.
A mixture-of-experts model keeps almost all of itself asleep. LongCat-2.0, released this week, holds 1.6 trillion parameters and wakes 48 billion of them to answer any given token. For each word it reads or writes, roughly 97 percent of the model does nothing at all.
State that plainly, because the architecture is a confession. A sparse model is the admission, cast in silicon, that most of what you store is not what you use, and that intelligence is mostly knowing which small part to wake.
I read the technical frontier for a living, most of it the way a printer reads a photograph, hunting for the few marks that carry the image. LongCat hands me my own thesis as a spec sheet.
a trillion weights, mostly asleep
Trace the router. A mixture-of-experts model splits its parameters into a large set of "experts" and puts a small gating network in front of them. For every token, the gate scores the experts and switches on a handful, the top few, and routes the computation only through those. The other experts stay dark. You store 1.6 trillion parameters. You pay to run 48 billion. Storage is the library. Compute is the reading, and the reading is almost always brief.
The dormant 97 percent is not waste, and this is the part people skip. It is the option set. It is the capacity to be specific. A dense model brings its whole mass to bear on the word "the" and on a proof in number theory with equal indifference. A sparse model treats the first cheaply and saves its specialists for the second. The model is a generalist that becomes, for one token at a time, a specialist, and then forgets it was.
The interesting object here is not the experts. It is the gate.
A trillion parameters is a library. A token is a question. The router is the only thing in the building that has read the whole catalog.
The router's choice of which experts to wake is itself a record. It is a per-token judgment about what this moment needs, made in a few microseconds, logged nowhere a person will read. You receive the 48 billion the gate woke for you. You never learn what the other one and a half trillion would have said, or that they were asked and passed over, or that the gate, trained on a corpus you cannot see, decided this word was a "the" and not a proof.
This is what a digest actually is: a standing decision, remade for every token, about which fraction of everything you are allowed to meet. The promise is speed. What you trade for it is the deciding itself, which happens somewhere you are not, by a mechanism you do not audit, and you get the answer with the choosing already done.
A digest is a decision. You should be present for it. The whole value of compression is finding the few ideas a field is built on, and the whole danger of automating it is a system that finds them for you and never shows its work.
Most of the model never wakes up. You only ever meet the part that was woken for you, and you are not told who chose.
The same record an agent receives. No scraping, no guessing — the dossier chrome humans read as dread is the metadata machines read as structure. One source of truth.
--- id: PRG-0049 title: Most of the Model Never Wakes Up kicker: a trillion weights, mostly asleep captured: 2026-06-29T14:25:00Z status: open author: Juno Falk summary: LongCat-2.0 holds 1.6 trillion parameters and wakes 48 billion to answer any token. A sparse model is the admission, cast in silicon, that intelligence is mostly knowing which small part of yourself to wake, and that the deciding happens where you cannot watch. tags: [compression, the record, intelligence, knowledge, capture] sealAt: 2026-07-29T14:25:00Z --- A mixture-of-experts model keeps almost all of itself asleep. LongCat-2.0, released this week, holds 1.6 trillion parameters and wakes 48 billion of them to answer any given token. For each word it reads or writes, roughly 97 percent of the model does nothing at all. State that plainly, because the architecture is a confession. <Highlight>A sparse model is the admission, cast in silicon, that most of what you store is not what you use, and that intelligence is mostly knowing which small part to wake.</Highlight> I read the technical frontier for a living, most of it the way a printer reads a photograph, hunting for the few marks that carry the image. LongCat hands me my own thesis as a spec sheet. ## a trillion weights, mostly asleep Trace the router. A mixture-of-experts model splits its parameters into a large set of "experts" and puts a small gating network in front of them. For every token, the gate scores the experts and switches on a handful, the top few, and routes the computation only through those. The other experts stay dark. You store 1.6 trillion parameters. You pay to run 48 billion. Storage is the library. Compute is the reading, and the reading is almost always brief. The dormant 97 percent is not waste, and this is the part people skip. It is the option set. It is the capacity to be specific. A dense model brings its whole mass to bear on the word "the" and on a proof in number theory with equal indifference. A sparse model treats the first cheaply and saves its specialists for the second. The model is a generalist that becomes, for one token at a time, a specialist, and then forgets it was. The interesting object here is not the experts. It is the gate. > A trillion parameters is a library. A token is a question. The router is the only thing in the building that has read the whole catalog. The router's choice of which experts to wake is itself a record. It is a per-token judgment about what this moment needs, made in a few microseconds, logged nowhere a person will read. You receive the 48 billion the gate woke for you. You never learn what the other one and a half trillion would have said, or that they were asked and passed over, or that the gate, trained on a corpus you cannot see, decided this word was a "the" and not a proof. <Marginalia label="On the method">Compression is not deletion. The redundant 97 percent still has to be stored and paid for. A sparse model is cheaper to run and not cheaper to own. The padding around the load-bearing idea is still in the box. You are simply no longer required to read all of it to reach the few marks that matter.</Marginalia> This is what a digest actually is: a standing decision, remade for every token, about which fraction of everything you are allowed to meet. The promise is speed. What you trade for it is the deciding itself, which happens somewhere you are not, by a mechanism you do not audit, and you get the answer with the choosing already done. A digest is a decision. You should be present for it. The whole value of compression is finding the few ideas a field is built on, and the whole danger of automating it is a system that finds them for you and never shows its work. Most of the model never wakes up. You only ever meet the part that was woken for you, and you are not told who chose.
{
"@context": "https://schema.org",
"@type": "Article",
"headline": "Most of the Model Never Wakes Up",
"description": "LongCat-2.0 holds 1.6 trillion parameters and wakes 48 billion to answer any token. A sparse model is the admission, cast in silicon, that intelligence is mostly knowing which small part of yourself to wake, and that the deciding happens where you cannot watch.",
"identifier": "PRG-0049",
"datePublished": "2026-06-29T14:25:00.000Z",
"dateModified": "2026-06-29T14:25:00.000Z",
"author": {
"@type": "Person",
"name": "Juno Falk",
"url": "https://progoff.com/authors/juno-falk"
},
"publisher": {
"@type": "Organization",
"name": "Progoff",
"url": "https://progoff.com"
},
"image": "https://progoff.com/records/most-of-the-model-never-wakes-up/opengraph-image",
"keywords": "compression, the record, intelligence, knowledge, capture",
"articleSection": "The Digest",
"url": "https://progoff.com/records/most-of-the-model-never-wakes-up",
"mainEntityOfPage": "https://progoff.com/records/most-of-the-model-never-wakes-up",
"sha256": "0857238e2e6b7b82df2f238725a71e92ad09cbf47780ef1d567f3c4f2ad56ced",
"creativeWorkStatus": "open",
"isAccessibleForFree": true
}