Skip to content

feat: conflict-of-interest signal on verify_citation #245

@zoharbabin

Description

@zoharbabin

Summary

verify_citation already checks existence, retraction, link liveness, and claim coverage. It does not yet check whether the source being cited has a financial or brand interest in the claim being verified. A ClickUp blog post cited as evidence that "ClickUp is the best project management tool" is a real URL, not retracted, and may even address the claim — but it is not independent evidence. This issue adds a conflictOfInterest field to verify_citation output to surface that relationship explicitly.

Current state in code (verified)

  • internal/tools/verify_citation.go — three dispatch paths: verifyByDOI (lines ~120–260), verifyByURL (lines ~265–400), verifyByReference (lines ~405–530). None emit a conflictOfInterest field.
  • verifyCitationInput struct has Citation string (DOI, URL, or reference) and Claim string. The citation URL is resolved in all three paths.
  • verifyByURL already calls classifySource() from internal/tools/classify.go to get source type and reputation — the classified URL host is already in scope.
  • internal/tools/classify.go:classifySource() returns content.SourceClassification with DomainCategory and the raw host available via url.Parse.

Implementation plan

Step 1 — New struct in internal/tools/verify_citation.go

// ConflictOfInterestSignal is non-nil when the cited source's domain
// matches a brand entity named in the claim. It does not mean the source is
// wrong — it means the source is not independent.
type ConflictOfInterestSignal struct {
    Detected     bool   `json:"detected"`
    CitingDomain string `json:"citingDomain"` // host of the citation URL, e.g. "clickup.com"
    BrandToken   string `json:"brandToken"`   // the matching term from the claim, e.g. "clickup"
    // Explanation is a human-readable summary for the AI to surface.
    Explanation string `json:"explanation"`
}

Step 2 — Detection function in internal/tools/verify_citation.go

// detectConflictOfInterest checks whether the host of citationURL matches any
// brand-like token from claim. A "brand-like token" is a word in the claim
// that (a) is not a stop word, (b) is longer than 3 characters, and (c) when
// lowercased appears as a substring of the citation host's registrable domain.
//
// Examples:
//   detectConflictOfInterest("https://clickup.com/blog/...", "ClickUp is the best PM tool")
//   → {Detected: true, CitingDomain: "clickup.com", BrandToken: "clickup"}
//
//   detectConflictOfInterest("https://pcmag.com/review", "ClickUp is the best PM tool")
//   → {Detected: false}
//
// The function is intentionally permissive on detection but returns Detected:false
// for generic terms ("best", "tool", "software", "platform") that appear in the
// citation domain but are not brand identifiers.
func detectConflictOfInterest(citationURL, claim string) *ConflictOfInterestSignal

Implementation notes:

  • Reuse the same stop-word set from internal/content/claim.go:claimStopWords — import or duplicate a small subset. The function lives in the tools package so importing content is correct.
  • Additional exclusion list for generic product-category words: {"software", "platform", "tool", "tools", "app", "apps", "suite", "cloud", "hub", "base", "io"} — these appear in many SaaS domain names but are not brand identifiers.
  • Registrable-domain extraction: strip www. prefix, split on ., take first component. E.g. clickup.comclickup.
  • Match: strings.Contains(registrableDomain, claimToken) where claimToken is each non-stop, non-generic, len>3 token from the claim.
  • Explanation template: "The cited source (%s) is published by a party named in the claim ('%s'). This source may not be independent."

Step 3 — Call from all three dispatch paths

In verifyByDOI, verifyByURL, verifyByReference, after the citation URL is resolved, add:

coiSignal := detectConflictOfInterest(resolvedURL, input.Claim)

Emit in the result JSON:

if coiSignal != nil && coiSignal.Detected {
    result["conflictOfInterest"] = coiSignal
}

For verifyByDOI, the resolved URL is the DOI landing page (constructed as https://doi.org/ + doi). If the DOI resolves to the publisher domain (e.g. nature.com), detectConflictOfInterest will return false for brand claims — which is correct.

For verifyByReference (free-text, no URL), skip the check and omit the field.

Step 4 — Update docs/TOOLS.md

Add conflictOfInterest to the verify_citation output schema.

Output schema change

verify_citation result gains an optional field:

"conflictOfInterest": {
  "detected": true,
  "citingDomain": "clickup.com",
  "brandToken": "clickup",
  "explanation": "The cited source (clickup.com) is published by a party named in the claim ('clickup'). This source may not be independent."
}

Field is omitted when Detected is false, when the claim is empty, or when using verifyByReference path (no resolved URL).

Tests

Unit tests in internal/tools/verify_citation_test.go (new cases)

{
    name:     "clickup blog citing own product",
    citation: "https://clickup.com/blog/best-project-management-tools/",
    claim:    "ClickUp is the best project management tool for teams",
    wantCOI:  true,
    wantBrand: "clickup",
},
{
    name:     "independent site citing clickup",
    citation: "https://www.g2.com/categories/project-management",
    claim:    "ClickUp is the best project management tool for teams",
    wantCOI:  false,
},
{
    name:     "shopify blog citing shopify",
    citation: "https://www.shopify.com/blog/best-ecommerce-platforms",
    claim:    "Shopify is the best ecommerce platform for small business",
    wantCOI:  true,
    wantBrand: "shopify",
},
{
    name:     "doi citation — no conflict signal",
    citation: "10.1038/s41586-021-03819-2",
    claim:    "AlphaFold solves the protein structure problem",
    wantCOI:  false, // DOI path uses doi.org landing, not journal domain
},
{
    name:     "generic platform name should not trigger",
    citation: "https://www.platform.com/review",
    claim:    "this platform is the best tool",
    wantCOI:  false, // "platform" is in the generic exclusion list
},

Integration test: real verify_citation calls with //go:build live

Citation Claim Expected conflictOfInterest.detected
https://clickup.com/blog/best-project-management-tools/ "ClickUp is the best project management tool" true
https://www.shopify.com/blog/best-ecommerce-platforms "Shopify is best for small business ecommerce" true
https://www.pcmag.com/picks/the-best-project-management-software "ClickUp is the best project management tool" false
https://www.g2.com/categories/project-management "ClickUp is the best project management tool" false
10.1038/s41586-024-07487-w "GPT-4 outperforms human experts on medical exams" false

Docs drift gate

TestToolsDocMatchesRegistry and TestOutputSchemaMatchesResponse must pass after adding conflictOfInterest to docs/TOOLS.md.

Acceptance criteria

  • detectConflictOfInterest("https://clickup.com/blog/...", "ClickUp is the best PM tool") returns {Detected: true, BrandToken: "clickup"}
  • detectConflictOfInterest("https://pcmag.com/...", "ClickUp is the best PM tool") returns {Detected: false}
  • verify_citation(citation: "https://clickup.com/blog/best-project-management-tools/", claim: "ClickUp is the best project management tool") response includes conflictOfInterest.detected: true
  • verify_citation with a DOI citation never emits a false-positive conflictOfInterest (doi.org is not a brand domain)
  • verify_citation with an empty claim omits the conflictOfInterest field entirely
  • No existing tests broken (go test -race ./... passes)
  • docs/TOOLS.md updated; TestToolsDocMatchesRegistry passes

Labels / milestone

enhancement · P2 · pipeline
Milestone: v1.33.0 Anti-Sloptimization

Metadata

Metadata

Assignees

No one assigned

    Labels

    P2Medium priorityenhancementNew feature or requestpipelineContent extraction and scraping pipeline

    Projects

    No projects

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions