GLM 5.2: A Live Review of an Opus-Level Open-Weights Model
I put the new GLM 5.2 open-weights model to the test, setting it up in Cursor and Claude Code to tackle a codebase audit, a live redesign, and a 45-minute autonomous bug hunt. Here's how it performed—and what it cost.
Claire Vo

What if you could get Claude Opus-level reasoning at a tiny fraction of the cost? That’s the question I wanted to answer in this episode. It’s the first of many reviews I’ll be doing on open-weight and open-source models to see if we should all be paying the tax to Anthropic and OpenAI, or if we can run these models ourselves and get comparable results.
Today, we’re looking at GLM 5.2. This is a general language model from the Beijing-based startup Z.ai. The term you’ll hear is “open weight,” which simply means the trained model weights are publicly available for download. This lets you run it on your own hardware, fine-tune it on your own data, and generally inspect how it works. You can self-host it, and as I’ll show you, you can run your own inference for incredibly cheap. The breathless AI bros have been saying GLM 5.2 delivers Opus-level intelligence for a fraction of the cost, and if that’s true, it’s a very big deal.
On paper, the specs are impressive. It has a one million token context window and supports all the modern features we expect: reasoning mode, function calling, structured output, and context caching. Benchmarks show it’s right up there with Claude Opus 4.8 and beating Gemini 3.1 Pro on tests like SWE-Bench Pro. It’s a text-to-text model, so no images in or out, but for coding, it seems poised to be a serious contender. I decided to put it through its paces on my own projects, with no script, to see what would happen.
Setting Up GLM 5.2 in Cursor and Claude Code
First things first, how do you even use a model like this? If you’re not running a beefy local setup (my little laptop certainly isn't), you can use a hosted inference provider. I chose Open Router, which offers a unified API for a huge variety of commercial and open-weight models. I signed up, added a credit card with a spending limit, and generated an API key.
With my key in hand, I set up the model in my two primary code editors: Cursor and Claude Code.
Configuration for Cursor
Setting this up in Cursor was almost super easy, but one little detail took me 30 minutes to figure out because it was completely undocumented. Here’s the trick so you don’t have to waste that time.
- Go to Cursor Settings > Models tab.
- Under API Keys, find the OpenAI API Key section. Paste your API key from Open Router into this field and toggle it on.
- Here’s the critical part: toggle on Override OpenAI Base URL and set it to
https://openrouter.ai/api/v1/cursor. That/cursorat the end is mandatory and was the piece I had to hunt for. - Finally, click View all Models, select Add a custom model, and enter
zai/glm-5.2.
Now, GLM 5.2 will appear as an option in your model dropdown in the Cursor chat.
Configuration for Claude Code
For Claude Code, the process is also straightforward and involves editing a couple of configuration files.
- Edit your shell profile. This is usually a file like
~/.zshrcor~/.bashrc. You need to add your Open Router credentials as environment variables. Add these lines to the file, replacing the placeholder with your actual key:
export OPENROUTER_API_KEY="your-open-router-api-key"
export OPENROUTER_BASE_URL="https://openrouter.ai/api/v1"
export CLAUDE_ANTHROPIC_API_KEY=""- Edit your Claude settings. Open the file located at
~/.claude/settings.jsonin a text editor. Change themodelproperty to point to the GLM 5.2 model string from Open Router:
{
"model": "z-ai/glm-5.2"
}Note: The exact model string can change, so always check the model list on [Open Router](httpss://openrouter.ai) for the correct identifier. The one I used for the test was `zai/glm-5.2`.
With that done, any session you run in Claude Code will now route through Open Router and use GLM 5.2 as the default model.
Workflow 1: Codebase Audit and HTML Architecture Visualization
With the setup complete, I jumped into my first test: exploring an existing codebase. This is a fundamental task for any developer, so it’s a great measure of a model’s ability to reason and understand code structure. I pointed it at the ChatPRD codebase with a simple prompt:
This is the chat, PRD code base. Please explore it and tell me a little bit about its architecture and the most recent things we have been shipping on this code base.
It was fast and surprisingly accurate. It correctly identified the stack (a Next.js app), laid out the architecture, and even correctly summarized our recent work on Chat V2 stability, billing, and security hygiene. It felt like I was talking to a competent engineer.
To push it further, I asked it to make the output more visual and human-friendly. We all know this is the year of HTML.
Turn this into an HTML page that can communicate the overall architecture of the app and give a sense of the upcoming roadmap. You can use whatever components you want to make this look good and communicate to me the end developer, the major pieces of the architecture and product strategy. Give me a page to pull up when it's ready to review.
What it produced was honestly great. The HTML page it generated had a clean design, good structure, and—this is the part that really impressed me—it got the ChatPRD Pink brand color right! GPT-4 often gives me ugly green and navy, and Claude loves its own orange. GLM 5.2 created a page that looked like it belonged to our brand. It broke down our product pillars, diagrammed the "anatomy of a chat turn," and even correctly predicted our upcoming roadmap themes like enterprise motion and knowledge/retrieval. For a first test, this was a huge win.
Workflow 2: Redesigning the 'How I AI' Landing Page Hero
Next, I wanted to see how it handled design within an existing system. I tasked it with redesigning the hero section of our How I AI blog page on the ChatPRD marketing site.
Let's redesign the header hero section of the How I AI landing page... I wanna redesign it so it is higher quality design. It is a better call to action to workflows, and it helps with anything we need on SEO. Design whatever you like, looking forward to what you make.
Its first pass was solid. It created a much stronger call-to-action, added useful metadata (episode count, frequency), and built a neat little sidebar widget that looked like an audio player for the YouTube, Spotify, and Apple Podcasts links. But the buttons in that widget were too bright and wide. I gave it a round of feedback:
I really like this, except for the listen to the show sidebar. YouTube, Spotify and Apple Podcasts are very bright buttons. They're super overwhelming and they're very wide for the text that's in it. I think this component could look a lot higher quality... Can you take another pass?
It quickly generated a new version with much more subtle, smaller, dark-themed buttons that fit our design system better. The speed and quality of iteration were impressive. For front-end and design tasks where you can anchor the model on an existing design system, GLM 5.2 feels like a powerful and cost-effective tool. I would definitely put it in my rotation for design work.
Workflow 3: A 45-Minute Autonomous Bug-Fixing Task
For the final and most demanding test, I wanted to see how GLM 5.2 handled a long, complex, autonomous task. Before I even started recording the episode, I kicked off a job in Cursor with this prompt:
Pull the last 72 hours of century errors and versel error logs and build a prioritized plan, a bug fixes based on observed issues.
This task ran for about 45 minutes in the background while I was doing everything else. It built a to-do list, made tool calls to Sentry and Vercel, asked me for authentication, and then started analyzing the data. I’ll admit, there was a moment of suspense. I checked in on its reasoning process and saw it was really struggling to write some TypeScript and React, which is 98% of what I do. I thought the test might be a failure.
I just had to complain to the model gods, and then it broke through. It finally compiled and presented me with its plan. The result was a beautiful, dark-mode HTML canvas that looked like a native part of Cursor. It had analyzed 20 Sentry errors and 5 Vercel log signals and created a prioritized list of 14 planned fixes. Most importantly, it surfaced two P0 bugs that were actively impacting users and that we hadn't caught. It even gave me a suggested sequence for tackling the fixes. I was blown away. Despite the earlier struggle with React, the final output was exactly what I needed—a clear, actionable, and intelligent plan.
The Verdict and The Cost
So, is GLM 5.2 a replacement for Opus? For some tasks, absolutely. I found it excellent for front-end work, design ideation, and long-running backend agentic tasks. Its main weakness in my tests was writing complex React and TypeScript, where it seemed to stumble more than the frontier models.
But the real story is the price. After all of this testing—the setup, the codebase audit, the redesigns, and the 45-minute autonomous agent task—I checked my Open Router usage. I had spent a grand total of $3.36. That was for about 6 million tokens with a 72% cache rate. Running a similar set of tasks with Opus would have cost significantly more.
For that price, the performance is a steal. I’m going to keep GLM 5.2 running in Cursor and Claude Code for a while to see how it handles more of my day-to-day work. It’s clear that open-weights models are catching up fast, and for many use cases, they offer a powerful and incredibly affordable alternative to the big proprietary players. This is just the first open-weights model I’m reviewing, and I can’t wait to see what else is out there.


