Summary
Brand-authored content that places their own product at the top of a ranking list is a primary vector for sloptimization (generative engine optimization). The source classifier already labels blogs and reputation tiers, but it doesn't detect the specific pattern of a page that (a) ranks products and (b) puts its own brand first. This issue adds a selfPromotionSignal field to the source classification output so AI assistants can distinguish "Shopify ranked #1 by PCMag" from "Shopify ranked #1 by Shopify."
Current state in code (verified)
internal/content/classify.go — ClassifySource() calls classifySourceType() + classifyDomainCategory() + LookupDomainReputation(). No self-promotion detection anywhere.
internal/tools/classify.go — classificationFields() emits sourceType, authorityTier, domainCategory, optionally domainReputation. No selfPromotionSignal key.
internal/tools/classify.go:enrichResultsWithReputation() — already attaches per-result classification to every web_search result. Adding the new field here propagates it automatically.
internal/content/classify.go:isBlogHost() — existing pattern for host-heuristic detection shows the right structure to follow.
Implementation plan
Step 1 — New struct in internal/content/classify.go
// SelfPromotionSignal is non-nil when the page matches the ranking-list +
// own-brand pattern. Attached to SourceClassification by ClassifySource.
type SelfPromotionSignal struct {
Detected bool `json:"detected"`
BrandDomain string `json:"brandDomain"` // host of the page, e.g. "shopify.com"
BrandToken string `json:"brandToken"` // extracted brand name, e.g. "shopify"
// RankPosition is the 1-based position of the brand's own product in the
// first ranking list found. 0 means the brand appears but not in position 1.
RankPosition int `json:"rankPosition"`
Confidence string `json:"confidence"` // "high" | "medium" | "low"
}
Add SelfPromotion *SelfPromotionSignal field to SourceClassification.
Step 2 — Detection function in internal/content/classify.go
// DetectSelfPromotion checks whether body contains a ranking list that puts
// the page's own domain brand in position 1. It is deliberately conservative:
// false negatives are preferable to false positives (mislabeling independent
// reviews as self-promotional breaks the tool's trust contract).
//
// Detection logic:
// 1. Extract the brand token from host (strip TLD + "www"): "shopify.com" → "shopify"
// 2. Scan for ranking patterns: ordered HTML lists (<ol>), numbered markdown
// lines ("1. Shopify"), or "Best X" / "#1" / "top pick" heading adjacency
// 3. Check whether the brand token appears within the first 2 list items
// 4. Confirm this is not a comparison article FROM an independent site that
// merely happens to start with the brand (use isBlogHost + host-match guard)
func DetectSelfPromotion(host, title, body string) *SelfPromotionSignal
Implementation notes:
- Brand token extraction:
strings.ToLower(strings.Split(registrableDomain(host), ".")[0]) where registrableDomain strips www. prefix and returns host.tld. Use same parent-domain walk logic already in reputation.go:LookupDomainReputation.
- Ranking pattern regex (compile once as package-level
var):
- Position 1 check: if the brand token appears in the first
<li> text or first numbered list entry, RankPosition = 1.
- Confidence:
"high": own domain host + brand in <li> position 1 + ranking title
"medium": own domain host + brand in position 1, no explicit ranking heading
"low": host match only, brand appears somewhere in content but no confirmed ranking structure
Step 3 — Wire into ClassifySource in internal/content/classify.go
ClassifySource already receives rawURL and would need the page body. The function signature must be extended:
// ClassifySource classifies a source URL. body is the extracted page text and
// is used for self-promotion detection (pass "" to skip that check).
func ClassifySource(rawURL string, authority float64, sig StructuredSignals, lens string, body string) SourceClassification
Update all callers:
internal/tools/classify.go:classifySource() already has body — pass it through.
- Any other callers of
ClassifySource pass "" to preserve existing behavior.
Step 4 — Emit in internal/tools/classify.go:classificationFields()
if c.SelfPromotion != nil && c.SelfPromotion.Detected {
fields["selfPromotionSignal"] = c.SelfPromotion
}
This automatically propagates to:
web_search results via enrichResultsWithReputation()
scrape_page output via classifySource() → classificationFields()
Step 5 — Update docs/TOOLS.md
Add selfPromotionSignal to the web_search and scrape_page output schemas under the sourceClassification section.
Output schema change
web_search result objects and scrape_page output gain an optional field:
"selfPromotionSignal": {
"detected": true,
"brandDomain": "shopify.com",
"brandToken": "shopify",
"rankPosition": 1,
"confidence": "high"
}
Field is omitted (not null) when detected is false or body is empty.
Tests
Unit test in internal/content/classify_test.go
func TestDetectSelfPromotion(t *testing.T) {
cases := []struct{
name string
host string
title string
body string
wantNil bool
wantRank int
wantConf string
}{
{
name: "shopify blog lists shopify first",
host: "www.shopify.com",
title: "Best Ecommerce Platforms for 2024",
body: "<ol><li>Shopify — best overall</li><li>WooCommerce</li><li>BigCommerce</li></ol>",
wantNil: false,
wantRank: 1,
wantConf: "high",
},
{
name: "independent review that ranks shopify first is not self-promotion",
host: "www.pcmag.com",
title: "Best Ecommerce Platforms",
body: "<ol><li>Shopify — best overall</li><li>WooCommerce</li></ol>",
wantNil: true,
},
{
name: "shopify blog that does not self-rank",
host: "www.shopify.com",
title: "How to Start an Online Store",
body: "Starting an online store requires a platform...",
wantNil: true,
},
{
name: "clickup blog lists clickup first",
host: "clickup.com",
title: "10 Best Project Management Tools",
body: "1. ClickUp\n2. Asana\n3. Monday.com",
wantNil: false,
wantRank: 1,
wantConf: "high",
},
}
}
Integration test: real URLs
Add to internal/tools/tools_test.go or a new internal/content/classify_live_test.go with //go:build live:
| URL |
Expected detected |
Expected rankPosition |
https://www.shopify.com/blog/best-ecommerce-platforms |
true |
1 |
https://clickup.com/blog/best-project-management-tools/ |
true |
1 |
https://www.notion.so/blog/best-note-taking-apps (if exists) |
true |
1 |
https://www.pcmag.com/picks/the-best-project-management-software |
false |
N/A |
https://www.techradar.com/best/best-ecommerce-platform |
false |
N/A |
Drift gate
Add selfPromotionSignal to the web_search output schema section in docs/TOOLS.md and confirm TestOutputSchemaMatchesResponse still passes (it validates schema fields against live handler output).
Acceptance criteria
Labels / milestone
enhancement · P2 · pipeline
Milestone: v1.33.0 Anti-Sloptimization (create if not exists)
Summary
Brand-authored content that places their own product at the top of a ranking list is a primary vector for sloptimization (generative engine optimization). The source classifier already labels blogs and reputation tiers, but it doesn't detect the specific pattern of a page that (a) ranks products and (b) puts its own brand first. This issue adds a
selfPromotionSignalfield to the source classification output so AI assistants can distinguish "Shopify ranked #1 by PCMag" from "Shopify ranked #1 by Shopify."Current state in code (verified)
internal/content/classify.go—ClassifySource()callsclassifySourceType()+classifyDomainCategory()+LookupDomainReputation(). No self-promotion detection anywhere.internal/tools/classify.go—classificationFields()emitssourceType,authorityTier,domainCategory, optionallydomainReputation. NoselfPromotionSignalkey.internal/tools/classify.go:enrichResultsWithReputation()— already attaches per-result classification to everyweb_searchresult. Adding the new field here propagates it automatically.internal/content/classify.go:isBlogHost()— existing pattern for host-heuristic detection shows the right structure to follow.Implementation plan
Step 1 — New struct in
internal/content/classify.goAdd
SelfPromotion *SelfPromotionSignalfield toSourceClassification.Step 2 — Detection function in
internal/content/classify.goImplementation notes:
strings.ToLower(strings.Split(registrableDomain(host), ".")[0])whereregistrableDomainstripswww.prefix and returnshost.tld. Use same parent-domain walk logic already inreputation.go:LookupDomainReputation.var):(?i)<li[^>]*>\s*(?:1\.|#1|first)[^<]*or(?i)^1\.\s+(markdown)<li>text or first numbered list entry,RankPosition = 1."high": own domain host + brand in<li>position 1 + ranking title"medium": own domain host + brand in position 1, no explicit ranking heading"low": host match only, brand appears somewhere in content but no confirmed ranking structureStep 3 — Wire into
ClassifySourceininternal/content/classify.goClassifySourcealready receivesrawURLand would need the page body. The function signature must be extended:Update all callers:
internal/tools/classify.go:classifySource()already hasbody— pass it through.ClassifySourcepass""to preserve existing behavior.Step 4 — Emit in
internal/tools/classify.go:classificationFields()This automatically propagates to:
web_searchresults viaenrichResultsWithReputation()scrape_pageoutput viaclassifySource()→classificationFields()Step 5 — Update
docs/TOOLS.mdAdd
selfPromotionSignalto theweb_searchandscrape_pageoutput schemas under thesourceClassificationsection.Output schema change
web_searchresult objects andscrape_pageoutput gain an optional field:Field is omitted (not
null) whendetectedis false or body is empty.Tests
Unit test in
internal/content/classify_test.goIntegration test: real URLs
Add to
internal/tools/tools_test.goor a newinternal/content/classify_live_test.gowith//go:build live:detectedrankPositionhttps://www.shopify.com/blog/best-ecommerce-platformstrue1https://clickup.com/blog/best-project-management-tools/true1https://www.notion.so/blog/best-note-taking-apps(if exists)true1https://www.pcmag.com/picks/the-best-project-management-softwarefalsehttps://www.techradar.com/best/best-ecommerce-platformfalseDrift gate
Add
selfPromotionSignalto theweb_searchoutput schema section indocs/TOOLS.mdand confirmTestOutputSchemaMatchesResponsestill passes (it validates schema fields against live handler output).Acceptance criteria
DetectSelfPromotion("www.shopify.com", "Best Ecommerce Platforms", body_with_ol_shopify_first)returns{Detected: true, RankPosition: 1, Confidence: "high"}DetectSelfPromotion("www.pcmag.com", ...)always returnsnil(no false positives on independent reviewers)web_search(query: "best ecommerce platforms", claim: "Shopify is the best ecommerce platform")result for shopify.com URLs includesselfPromotionSignal.detected: truescrape_page(url: "https://www.shopify.com/blog/best-ecommerce-platforms")output includesselfPromotionSignalgo test -race ./...passes)docs/TOOLS.mdupdated;TestToolsDocMatchesRegistrypassesLabels / milestone
enhancement·P2·pipelineMilestone: v1.33.0 Anti-Sloptimization (create if not exists)