Optimizing AI Costs: The Shift from Tokenmaxxing to Modelmaxxing

UpTrajectory Review

This article discusses a significant shift in how companies are managing their AI costs, moving from a strategy of maximizing token usage to a more strategic approach called modelmaxxing. By selecting the appropriate AI models for specific tasks, businesses can optimize their spending while still leveraging advanced technology. This change is particularly relevant as companies face rising AI expenses and seek to balance innovation with budget constraints.

For small business owners, this shift is crucial. It highlights the importance of being strategic about AI investments rather than simply maximizing usage. By understanding which tasks require advanced models and which can be handled by more cost-effective options, operators can significantly reduce costs without sacrificing quality. As AI continues to evolve, staying informed about these trends will be essential for maintaining a competitive edge.

“My team is getting to use the best stuff, but they're using it a lot more efficiently.” — Business Insider

Takeaway: Adopt modelmaxxing to optimize AI costs by matching tasks with the appropriate AI models.

From the original item — Business Insider:

A man in the middle of a circle of laptops displaying AI logos. — Getty Images; Tyler Le/BI

Twice a week, Morgan Linton tells his 16 engineers which AI models to use and when.

Business Insider spoke to Linton, the Lake Tahoe-based chief technology officer of AI startup Bold Metrics, 50 minutes before his engineering team’s standup. He planned to tell one team to use Claude Fable on low, and another to use GPT-5.5 on high. A third is using Cursor with Composer 2.5 and getting “totally perfect results,” he said.

Being specific about model use means Linton doesn’t have to set hard token caps.

“My team is getting to use the best stuff, but they’re using it a lot more efficiently,” he said.

The first half of 2026 was characterized by one word in the AI community: tokenmaxxing — referring to companies urging their employees to use AI as much as possible. But after reviewing the AI bills their employees were racking up, companies from Uber to Microsoft are taking a more considered approach.

Founders, software engineers, UX designers, and even non-technical vibe-coding enthusiasts are catching on to one cost-saving hack: model switching. They route their most difficult, intellectually challenging tasks to pricier frontier models and offload easier, repetitive tasks to older and cheaper ones.

And as companies cut back on AI budgets and impose usage caps, this token hygiene tactic could help you get more bang for your buck.

Goodbye, tokenmaxxing

There are, of course, good reasons to use the most recent model. OpenAI’s Kaylin Voss wrote on LinkedIn that better models “reduce retries, supervision, and wasted effort.”

But some tasks simply don’t merit the costs. Coinbase CEO Brian Armstrong was one of the first to put it into words in an X post on June 7.

“80% of workloads will be running on 99% cheaper models within 12-18 months,” he wrote, adding that the other 20% will continue to run on the latest models where “IQ maxxing is important.”

Chris Maconi was never a fan of tokenmaxxing. The Huntsville-based cofounder of the AI startup Hechura said he runs his company with a “human-in-the-loop” attitude, and isn’t setting up overnight bots to keep on coding. Model choice is part of this anti-tokenmaxxing outlook.

Maconi remembers the OpenClaw hype cycle — a Mac Mini-encapsulated AI agent that was especially token-burning, given its 24/7 use and broad autonomy. When he set up his OpenClaw, Maconi started with cheap Gemini models before switching to Anthropic’s Haiku.

“I’m not afraid to go and try some of these lower-end models to see if they can provide the intelligence that we need,” Maconi said.

Stretching their tokens in creative ways

Tanvi Pisal, a 29-year-old Big Tech user-experience designer, said she learned the hard way to use models more efficiently.

Pisal uses tools such as Figma, ChatGPT, and Claude to brainstorm and formulate product requirements documents. She has a company subscription to ChatGPT and pays for the basic $20/month Claude Pro package. At the start, she said she would use Claude to brainstorm the UX from scratch, a process in which she “wasted months of tokens” and still didn’t finish the task.

“So now what I do is I design everything in Figma first, then I put those screenshots into Claude. I tell Claude to keep the UI as is and build the entire functionality and flow,” Pisal added. “Doing this design-first process really helps me save tokens.”

She also chooses to brainstorm ideas with ChatGPT — which is free for her thanks to her enterprise plan — then takes the refined ideas to Claude to create more polished documents.

Alejandra Thomas, a software engineer and tech content creator based in New York City, said she runs tests on every new model released to see what each is good at.

“I try not to use the most expensive or advanced model just because it’s available. For simple tasks, I always use lighter models or none at all,” Thomas said.

Ed Stevens, the CEO of AI sales company Scoot, said that he likes to “pick a horse and ride it.” His engineers will land on a model, try it for a few months, and then determine if it’s up to snuff. If there’s a shiny new model — or if they think they can achieve the same for cheaper — they change horses, Stevens said.

The idea of squeezing the juice out of each token exemplifies the scarcity mindset, according to Dan Ariely, a behavioral economics researcher and professor at Duke University.

Ariely said token budgets remind him of cellphones back in the day, when they came with a limited number of minutes of talk time. He said people would try to max out their minutes at the end of the month, even if that meant calling people they didn’t really want to.

“Tokens create a model of scarcity where people can’t use as much as they want. It creates a target for use, and it creates a psychology of waste if people don’t reach their target,” he said. He added that because they don’t want to go over the limit and pay extra per use, users switch to models from other companies to save on cash once they’ve hit the token ceiling.

There’s a tool for that

If AI modelmaxxing sounds exhausting, the good news is you don’t have to make these switching decisions on your own.

Model routing startups are all the rage. These companies provide software that designates tasks to specific models — sometimes including open-source — based on complexity. They’re a venture hit, with startups like OpenRouter being showered with cash.

David Gilmore runs one of these companies, Rayline. His tool intercepts requests and determines whether they could go to cheaper, often open-source models. Many of his firm’s clients fall prey to the “FOMO moment,” he said. Then, they get their API bill and realize they need to scale back.

The number of firms using a routing platform is inching up, Ramp’s lead economist, Ara Kharazian, told Business Insider. Last year, Kharazian found that around 1% of firms used a model router; this year, it’s 5%.

The San Francisco-based investment firm BlockSpaceForce uses OpenRouter, Fireworks, and Together AI. Spencer Yang, its managing partner, also advocated asking a cheaper model first whether a more expensive one would be needed for your task.

“The models themselves are actually getting really good at assessing their own complexity,” Yang said.

Some companies continue to default to using the most recent, highest-costing models. Hecura cofounder Maconi pegged it to laziness.

“People don’t want to do the hard work of understanding which models are good at which things,” he said. “They just want to ride the hype train.”

Read the original article on Business Insider

Read the full article at Business Insider →