Context7 Efficiency Benchmark

This repository benchmarks the cost and tool-use efficiency of answering documentation questions with two different retrieval strategies:

Context7: Claude Code is restricted to the Context7 MCP tools.
Web search: Claude Code is restricted to WebSearch and WebFetch.

For each question, the runner records the generated response, total estimated cost, token counts, and tool calls.

How It Works

The benchmark runner is src/claudeCode.ts.

For a selected question set, it:

Loads questions from questions/<question-set>.txt.
Runs every question once using Context7.
Waits 5.5 minutes so Anthropic prompt-cache effects do not carry into the next run.
Runs every question once using web search.
Writes per-question results and an aggregate summary to results/.

Repository Layout

.
├── questions/
│   ├── questions1.txt
│   ├── questions2.txt
│   ├── questions3.txt
│   ├── questions4.txt
│   └── questions5.txt
├── src/
│   ├── claudeCode.ts
│   └── types.ts
├── package.json
└── tsconfig.json

Generated results are written to:

results/
├── context7/<question-set>.json
└── search/<question-set>.json

Requirements

Install dependencies:

npm install

Create a .env file in the repository root:

ANTHROPIC_SEARCH_API_KEY=your_anthropic_key_for_search_runs
ANTHROPIC_C7_API_KEY=your_anthropic_key_for_context7_runs
CONTEXT7_API_KEY=your_context7_api_key

Two Anthropic key variables are used so search and Context7 runs can be tracked or billed separately. They may point to the same key if you do not need that separation.

Running A Benchmark

Run one of the five question sets:

Valid question sets are:

questions1
questions2
questions3
questions4
questions5

Example:

npm run benchmark -- questions5

Result Format

Each result file is a JSON object keyed by the exact question text. Each question record contains:

{
  "generatedResponse": "The model response text...",
  "totalCost": 0,
  "inputTokens": 0,
  "outputTokens": 0,
  "cacheCreationInputTokens": 0,
  "cacheReadInputTokens": 0,
  "totalTokens": 0,
  "toolCallCount": 0,
  "toolNames": []
}

Each file also includes a summary record:

{
  "summary": {
    "type": "summary",
    "questionCount": 20,
    "averages": {
      "totalCost": 0,
      "inputTokens": 0,
      "outputTokens": 0,
      "cacheCreationInputTokens": 0,
      "cacheReadInputTokens": 0,
      "totalTokens": 0,
      "toolCallCount": 0
    }
  }
}

Notes

Results depend on the current model, SDK behavior, and API pricing.
The runner waits 5.5 minutes between Context7 and search runs to reduce prompt-cache carryover, but provider-side behavior can still affect measurements.
Existing result files are updated in place. Re-running the same question set overwrites records for matching question text.

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Repository files navigation

Context7 Efficiency Benchmark

How It Works

Repository Layout

Requirements

Running A Benchmark

Result Format

Notes

About

Uh oh!

Releases

Packages

Uh oh!

Uh oh!

Contributors

Uh oh!

Languages

Name		Name	Last commit message	Last commit date
Latest commit History 9 Commits
questions		questions
src		src
.gitignore		.gitignore
README.md		README.md
package-lock.json		package-lock.json
package.json		package.json
tsconfig.json		tsconfig.json

Folders and files

Latest commit

History

Repository files navigation

Context7 Efficiency Benchmark

How It Works

Repository Layout

Requirements

Running A Benchmark

Result Format

Notes

About

Resources

Uh oh!

Stars

Watchers

Forks

Releases

Packages 0

Uh oh!

Uh oh!

Contributors

Uh oh!

Languages

Packages