Skip to content

upstash/context7-efficiency-benchmark

Folders and files

NameName
Last commit message
Last commit date

Latest commit

 

History

9 Commits
 
 
 
 
 
 
 
 
 
 
 
 
 
 

Repository files navigation

Context7 Efficiency Benchmark

This repository benchmarks the cost and tool-use efficiency of answering documentation questions with two different retrieval strategies:

  • Context7: Claude Code is restricted to the Context7 MCP tools.
  • Web search: Claude Code is restricted to WebSearch and WebFetch.

For each question, the runner records the generated response, total estimated cost, token counts, and tool calls.

How It Works

The benchmark runner is src/claudeCode.ts.

For a selected question set, it:

  1. Loads questions from questions/<question-set>.txt.
  2. Runs every question once using Context7.
  3. Waits 5.5 minutes so Anthropic prompt-cache effects do not carry into the next run.
  4. Runs every question once using web search.
  5. Writes per-question results and an aggregate summary to results/.

Repository Layout

.
├── questions/
│   ├── questions1.txt
│   ├── questions2.txt
│   ├── questions3.txt
│   ├── questions4.txt
│   └── questions5.txt
├── src/
│   ├── claudeCode.ts
│   └── types.ts
├── package.json
└── tsconfig.json

Generated results are written to:

results/
├── context7/<question-set>.json
└── search/<question-set>.json

Requirements

Install dependencies:

npm install

Create a .env file in the repository root:

ANTHROPIC_SEARCH_API_KEY=your_anthropic_key_for_search_runs
ANTHROPIC_C7_API_KEY=your_anthropic_key_for_context7_runs
CONTEXT7_API_KEY=your_context7_api_key

Two Anthropic key variables are used so search and Context7 runs can be tracked or billed separately. They may point to the same key if you do not need that separation.

Running A Benchmark

Run one of the five question sets:

Valid question sets are:

questions1
questions2
questions3
questions4
questions5

Example:

npm run benchmark -- questions5

Result Format

Each result file is a JSON object keyed by the exact question text. Each question record contains:

{
  "generatedResponse": "The model response text...",
  "totalCost": 0,
  "inputTokens": 0,
  "outputTokens": 0,
  "cacheCreationInputTokens": 0,
  "cacheReadInputTokens": 0,
  "totalTokens": 0,
  "toolCallCount": 0,
  "toolNames": []
}

Each file also includes a summary record:

{
  "summary": {
    "type": "summary",
    "questionCount": 20,
    "averages": {
      "totalCost": 0,
      "inputTokens": 0,
      "outputTokens": 0,
      "cacheCreationInputTokens": 0,
      "cacheReadInputTokens": 0,
      "totalTokens": 0,
      "toolCallCount": 0
    }
  }
}

Notes

  • Results depend on the current model, SDK behavior, and API pricing.
  • The runner waits 5.5 minutes between Context7 and search runs to reduce prompt-cache carryover, but provider-side behavior can still affect measurements.
  • Existing result files are updated in place. Re-running the same question set overwrites records for matching question text.

About

No description, website, or topics provided.

Resources

Stars

Watchers

Forks

Releases

No releases published

Packages

 
 
 

Contributors