Skip to content

fix(media): don't crash when media block data is raw bytes#1695

Open
devteamaegis wants to merge 1 commit into
langfuse:mainfrom
devteamaegis:fix/media-bytes-data-crash
Open

fix(media): don't crash when media block data is raw bytes#1695
devteamaegis wants to merge 1 commit into
langfuse:mainfrom
devteamaegis:fix/media-bytes-data-crash

Conversation

@devteamaegis

@devteamaegis devteamaegis commented Jun 9, 2026

Copy link
Copy Markdown
Contributor

What's broken

MediaManager._find_and_process_media builds a data URI for Anthropic- and Vertex-style media blocks with f"...;base64," + data["data"]. The branch guards only check that the data key exists, not that its value is a str. If a logged payload carries raw image bytes there, the str + bytes concatenation raises an uncaught TypeError. Because the caller runs outside the masking try/except, the crash propagates out of span/observation creation.

payload = {"role": "user", "content": [
    {"type": "base64", "media_type": "image/png", "data": b"\x89PNG ..."}]}
mm._find_and_process_media(data=payload, trace_id="t", observation_id="o", field="input")
# -> TypeError: can only concatenate str (not "bytes") to str

Why it happens

The format guards verify the dict shape but assume data["data"] is a base64 string.

Fix

Add an isinstance(data["data"], str) check to both branches, so non-string data falls through to normal pass-through handling instead of crashing.

Test

Processing a base64 block whose data is bytes passes through unchanged instead of raising.

Greptile Summary

This PR guards two branches in _find_and_process_media that build base64 data URIs by adding isinstance(data["data"], str) checks, preventing a TypeError crash when a logged payload carries raw bytes instead of a base64 string in the "data" field of Anthropic- or Vertex-style media blocks.

  • Anthropic block (type == "base64"): now skips media processing and passes the dict through unchanged when data["data"] is not a str.
  • Vertex block (type == "media"): same guard added for the mime_type-keyed variant.
  • A new unit test covers both block shapes with bytes data and asserts no media is enqueued.

Confidence Score: 5/5

Safe to merge — the change is a minimal two-line guard that prevents a crash without altering any existing code path for valid string data.

Both guards touch only the condition that selects a code path; the string-data path is identical to before, and the bytes-data path now falls through to the existing generic dict recursion rather than throwing. The new test directly exercises both block shapes with raw bytes and confirms pass-through with no queued uploads.

No files require special attention.

Flowchart

%%{init: {'theme': 'neutral'}}%%
flowchart TD
    A[_process_data_recursively] --> B{isinstance data dict?}
    B -- No --> C[Other checks / pass-through]
    B -- Yes --> D{Anthropic block?\ntype==base64, media_type, data}
    D -- No --> E{Vertex block?\ntype==media, mime_type, data}
    D -- Yes --> F{isinstance data.data str?}
    E -- Yes --> G{isinstance data.data str?}
    F -- Yes --> H[Build base64 URI\nUpload media\nReturn copied dict with LangfuseMedia]
    F -- No --> I[Fall through to dict recursion\nPass-through unchanged]
    G -- Yes --> J[Build base64 URI\nUpload media\nReturn copied dict with LangfuseMedia]
    G -- No --> I
    E -- No --> K[Recurse into dict values]
Loading

Reviews (1): Last reviewed commit: "fix(media): don't crash when Anthropic/V..." | Re-trigger Greptile

The Anthropic ('type':'base64') and Vertex ('type':'media') branches in
_find_and_process_media built a data URI via string concatenation with
data['data'], assuming it is always a str. When a logged payload carries
raw bytes there, this raised an uncaught TypeError that propagated out of
span creation. Guard both branches with isinstance(data['data'], str) so
non-string media data passes through unprocessed instead of crashing.

@claude claude Bot left a comment

Copy link
Copy Markdown

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Claude Code Review

This pull request is from a fork — automated review is disabled. A repository maintainer can comment @claude review to run a one-time review.

@CLAassistant

CLAassistant commented Jun 9, 2026

Copy link
Copy Markdown

CLA assistant check
All committers have signed the CLA.

@devteamaegis devteamaegis force-pushed the fix/media-bytes-data-crash branch from 68b0a20 to cf2af23 Compare June 11, 2026 19:44
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

2 participants