AI as a technology is moving fast. From the time ChatGPT went mainstream in 2023, it’s grown from a cute way to generate funny poems to a defining technology that is likely to change everything. With its amazingly quick rise, it’s tricky to get the terms and language correct. Even the tech media doesn’t understand it, comparing Mythos to MDASH to Daybreak, when those comparisons make no sense! Let’s understand models vs harnesses vs labs vs initiatives and all that lays between.

Examples

The full explanations are below, but here are some examples:

Frontier AI Labs Anthropic
OpenAI
Google DeepMind
xAI
Open Weight Labs DeepSeek
Meta
Mistral
Models Claude Sonnet 4.6, Claude Opus 4.8, Claude Mythos Preview
GPT-5.5, GPT-5.4-mini
Gemini 3.5 Flash, Gemini 3.1 Pro
DeepSeek V4
Llama 4
Mistral Large 3, Magistral
Clients Claude.ai, Claude Code
ChatGPT, Codex
Gemini
Perplexity
Cursor, Windsurf, GitHub Copilot
Project MDASH, Claude Security, Iron Curtain
Initiatives Glasswing, Daybreak

Definitions

AI Foundations

LLMs

LLM stands for Large Language Model. All modern AI models at this time are LLMs, and every client, harness, and initiative is built around using them.

An LLM is trained on huge amounts of text from the internet, books, and code, and during that process it learns the statistical patterns of language well enough to generate plausible continuations of whatever text you give it. The result of that training is a “model” that gets used to generate output one piece at a time. Modern LLMs also handle code, math, images, audio, and video.

Model

A model in practice is a specific trained artifact with a name and a version. It is made up of a giant collection of numerical weights (often tens to hundreds of billions of parameters).

Labs typically ship a family of models tuned for different cost and capability trade-offs. Anthropic has Claude Opus (largest, most capable, slowest, most expensive), Claude Sonnet (mid-tier balance), and Claude Haiku (smallest, cheapest, fastest). OpenAI has GPT-5.5 and GPT-5.4-mini. Google has Gemini Pro and Gemini Flash.

Within a family, models are versioned. Version numbers are not directly comparable across labs and don’t always map cleanly to capability even within a lab. Release notes and benchmark publications are more reliable indicators than version numbers.

Tokens

Tokens are how a model sees text. Internally, an LLM doesn’t read characters or words, it reads tokens. A token is a chunk of text decided during model training. Most common words are one token, but less common words get split into pieces. For example, “strawberry” might be split into “straw” and “berry” by some tokenizers.

Tokens matter practically for a few reasons.

  • Pricing for LLM usage is usually per-token, with separate rates for what you send (input tokens) and what the model returns (output tokens).
  • The context window, which is how much information the model can see at once, is measured in tokens.
  • A famous failure mode where models miscount the letters in “strawberry” stems from the model seeing tokens, not letters.

A rule of thumb is that one token is roughly 4 English characters, or about 0.75 of a word.

Context Window

The context window is the maximum amount of input a model can consider at once, measured in tokens. Anything beyond that limit gets dropped, truncated, or compressed.

Modern frontier models offer windows from around 200,000 tokens (a moderate book) up to 1 million tokens or more (a small library). A larger window lets the model see more code, more files, or longer conversations. Bigger context also costs more per call and often makes responses slower, because the model has to attend to every token in the window.

When a conversation runs longer than the window can hold, the client has to make a choice about what to drop, summarize, or compress. This is why long chat sessions can “forget” earlier details.

Reasoning Models

Most modern frontier models can run in either a normal mode or a “thinking” or “reasoning” mode. In thinking mode, the model spends extra tokens working through a problem before producing its final answer, much like a human writing scratch notes. Anthropic calls this “extended thinking”, OpenAI’s o-series and DeepSeek’s R-series are designed around it, and Google has thinking variants of Gemini.

Thinking mode usually produces better results for complex tasks, especially math, coding, and multi-step reasoning, but it’s slower and uses more tokens. Some models let the caller decide how much thinking budget to spend, and some clients hide the thinking tokens from the user and only show the final answer.

Tools / MCPs

A modern LLM can do more than just respond with text. Most of them are trained to output structured “tool calls” that the client can execute on the model’s behalf. A tool can be anything the client decides to expose, such as reading a file, querying a database, browsing the web, running a shell command, or sending an email.

Here’s how the flow works.

  1. The client provides the model a list of the tools available, with their names, descriptions, and parameters.
  2. The model, when generating its response, can generate a tool call instead of (or in addition to) text.
  3. The client sees the tool call, and rather than showing the user, instead runs the underlying function, and feeds the result back to the model as more context.
  4. The model continues its response with the new information.

This is how clients go beyond chat to edit files, run tests, and search the web.

MCP stands for Model Context Protocol. It’s an open standard introduced by Anthropic in late 2024 that defines a common way for tool servers and clients to talk to each other. Without MCP, every client had to implement custom integrations for every tool. With MCP, a tool server (for example, one that exposes a GitHub repo) can be plugged into any MCP-compatible client. MCP has been broadly adopted across the major clients.

AI Labs

AI labs are the companies and research organizations that train and release the underlying models. They fall into two main groups based on what they release to the public. Frontier labs train at the leading edge but keep their model weights private, exposing access through APIs and their own clients. Open weight labs publish the trained weights for anyone to download and run. A small subset go further and publish enough of the training pipeline that the model can be reproduced from scratch.

Frontier

The “frontier” refers to the leading edge of AI capability. A frontier lab is one that trains models at or near that edge, with training runs costing on the order of hundreds of millions to billions of dollars. Three labs are widely considered frontier in the Western ecosystem.

  • Anthropic makes the Claude family of models. Founded in 2021 by former OpenAI researchers, with a focus on AI safety research.
  • OpenAI makes the GPT family. Their consumer launch of ChatGPT in November 2022 brought modern AI into the mainstream.
  • Google DeepMind is Google’s AI research arm, formed by merging Google Brain with DeepMind. Makers of the Gemini family of models.

Frontier labs keep their model weights private. To use their models, you go through their API or one of their official clients. You can’t download the model and run it on your own hardware.

Open Weight

Open weight labs train models and release the weights publicly. Anyone can download a copy and run it on their own hardware, host it on a cloud provider, or fine-tune it for their own purposes. The most prominent open weight labs include:

  • DeepSeek, a Chinese lab whose V3 and R1 model releases in early 2025 shifted the field by matching frontier benchmarks at a fraction of the training cost.
  • Meta, which releases the Llama family with open weights.
  • Mistral, a French lab releasing the Mistral and Magistral families.

The weights (the giant pile of numbers that make up the trained model) are downloadable, but the training data and exact training process are usually kept private. So you can run and modify the model, but you can’t fully reproduce it from scratch.

Most people don’t run open weight models on their own hardware. Instead, hosted inference providers like Together AI, Fireworks, Groq, Cerebras, and OpenRouter serve these models over an API for less than the frontier labs charge for theirs.

Open Source

Open weight is not the same as open source. “Open weight” only means the model’s trained weights are downloadable. The training data, training code, and exact training process are usually kept private.

True open source AI models, where the training data, training code, and the model weights are all published, are much rarer. The Allen Institute for AI releases OLMo under a fully open philosophy, including not just the weights but also the training data, training scripts, and intermediate checkpoints. EleutherAI has done similar work with the Pythia model series, with a focus on making research reproducible.

Open source models tend to lag behind the frontier in raw capability, but they let researchers study exactly how a model came to be the way it is, which closed and open weight releases don’t allow.

Clients

The client is what connects the user to the model, and handles the response. Clients are responsible for deciding if the response should be directly displayed to the user, or if tool calls should be made and actions taken. The client manages the interactions between the user, the model, and any tool calls / filesystem that’s available to it.

Raw API

Users don’t just interact with a model. They need a client. At the most basic level, there’s an API, where users can send a simple HTTP request to interact with the model:

curl https://api.anthropic.com/v1/messages \
  -H "content-type: application/json" \
  -H "x-api-key: $ANTHROPIC_API_KEY" \
  -H "anthropic-version: 2023-06-01" \
  -d '{
    "model": "claude-opus-4-8",
    "max_tokens": 1024,
    "messages": [{
      "role": "user",
      "content": "Hello, Claude"
    }]
  }'

Web UIs

While this is useful for building products / programs / scripts that use AI, it’s not what most users need. The most common interface for AI models is through web user interfaces (UIs), such as claude.ai, ChatGPT, or Gemini.

image-20260604175024722

There are other websites that use the frontier models via the API but provide their own interface, such as Perplexity.ai. Perplexity uses its own proprietary Sonar model (based on Meta’s open source model), as well as makes API calls to other models for specific tasks to give what it considers the best user experience.

Custom IDEs / IDE Plugins

Most of the major models have extensions for IDEs like VS Code. These plugins provide a window alongside the IDE and interact with it and the code in the current directory to help with coding projects. There’s also GitHub Copilot and other plugins that work across frontier models and bill monthly with usage limits.

image-20260604174958550

There are also custom IDEs like Cursor and Devin (formerly Windsurf) that allow connecting to different models and provide the integrated AI coding experience.

Terminal UIs

Claude Code from Anthropic was the first terminal AI client, and has since been followed up with Codex from OpenAI. These are very similar to working in an IDE, just from the raw command line.

image-20260604174510065

These handle the same kinds of projects as the IDEs, with access to the filesystem as well as the model and tools.

Harnesses

A harness is a specialized client built around a single workflow rather than a general chat interface. Rather than giving the user a blank slate, a harness has a specific input with a series of prompts and other handlers designed to drive toward a goal. Some examples:

  • Microsoft has Project MDASH, a multi-model agentic security system that “orchestrates more than 100 specialized AI agents across an ensemble of frontier and distilled models to discover, debate, and prove exploitable bugs end-to-end.”
  • Anthropic has Claude Security, which takes a GitHub repo and has a similar orchestration to scan code for vulnerabilities, validate findings, and propose patches.
  • Iron Curtain, a secure runtime for AI agents from Niels Provos.

Initiatives

An “Initiative” in this glossary refers to a large lab-led campaign that gives a defined cohort of partners access to a particular model or capability under specific terms. Initiatives are not models (the thing producing tokens) and not harnesses (specialized clients). They are usually announced with a name, a target use case, and a pool of usage credits or compute.

Glasswing is an initiative from Anthropic that brought together 40 major software providers with the goal of using the most capable (at the time) AI models to identify and fix vulnerabilities in their applications. Anthropic opted to not release their Mythos Preview model for general use, instead granting access to these companies, each with $100 million in usage credits. They also contributed $4 million worth of credits to open source maintainers.

OpenAI followed about five weeks later with Daybreak. It’s not totally clear what exactly Daybreak is on the page. Its goal is to use AI to “accelerate cyber defenders and continuously secure software”, but it doesn’t really say how, other than to show different access tiers for levels of safeguards applied.