Skip to content

feat: site_login / site_logout — interactive auth-capture + AES-encrypted cookie vault for login-gated scraping #241

@zoharbabin

Description

@zoharbabin

Summary

Two new tools — site_login and site_logout — that let a non-technical user authenticate to a login-gated website by opening a visible (headed) browser window, logging in as a human (including 2FA and CAPTCHA), and having the resulting session cookies captured, AES-256-GCM encrypted, and transparently injected into future scrape_page calls. The user's password is never seen by the server. The scrape_page API is unchanged — auth is invisible to the caller.

Motivation: LinkedIn posts, Instagram profiles, gated research repositories, and similar pages return extractionQuality:"partial" or kind:"blocked" even when the user has a valid account. There is currently no way to pass auth state to the scraper. Automated form-fill is fragile, bot-detected, and a ToS violation. The correct model is: user logs in as a human, server stores only the resulting cookie state, subsequent scrapes inject it automatically.

Design goals

  • Non-technical UX: the user logs in normally in a visible Chrome window — no JSON, no headers, no cookie copying
  • Zero credential exposure: password typed directly into the real site; tool only reads post-auth CDP cookie state
  • Transparent injection: scrape_page callers see no interface change; auth is automatic when cookies are available for the target domain
  • Defense-in-depth security: AES-256-GCM at rest (reusing existing persist.DiskStore), deterministic eTLD+1 domain scoping, SSRF-safe client unchanged, audit trail without cookie values, per-user isolation, TTL clamped to server-set cookie expiry
  • Honest about ToS: explicit warning in tool description and docs that this may violate platform Terms; account suspension is a known risk the user accepts

Non-goals

  • Automated form-fill (fragile, bot-detected, ToS violation — explicitly rejected)
  • HTTP/multi-tenant server mode support for the capture step (headed browser requires a local display; HTTP mode returns a clear error)
  • Bypassing 2FA or CAPTCHA (the user handles those interactively in the browser)
  • Bulk harvesting of third-party data
  • OAuth token refresh (a future extension)

Architecture

site_login tool                      (internal/tools/site_session.go)
    └── cookievault.Store            (internal/cookievault/cookievault.go)
            └── persist.Store        (internal/persist/disk.go — AES-256-GCM, already exists)

scrape_page (unchanged signature)
    └── pipeline.Scrape(ctx, url, n) (unchanged signature)
            ├── auth.UserIDFromContext(ctx) + TenantIDFromContext(ctx)
            ├── vault.Get(ctx, tenantID, userID, eTLD+1(targetURL))
            │       └── eTLD+1 gate: golang.org/x/net/publicsuffix.EffectiveTLDPlusOne
            ├── browser tier: page.SetCookies(domainCookies) before Navigate
            └── HTTP tiers: NewSSRFSafeClientWithCookies(allowPrivate, targetURL, cookies)
                    └── cookiejar with publicsuffix.List — RFC 6265 scoping enforced by Go stdlib

No circular imports. Package dependency order:

cookievault → persist, x/net/publicsuffix
scraper     → cookievault (Reader interface only)
tools       → cookievault + scraper
main        → all of the above

1. New package: internal/cookievault/cookievault.go

Create this file from scratch. No existing package to modify.

// Package cookievault stores per-user, per-domain, AES-256-GCM-encrypted session
// cookie bundles. The scraper pipeline uses the read-only Reader interface; the
// site_login / site_logout tools use the full Store interface.
//
// Data model: one Bundle per (tenantID, userID, registrableDomain). Stored as
// AES-256-GCM-encrypted JSON via persist.Store. TTL = min(maxTTL, server-set
// cookie expiry). Bundle keys: "cookievault:<tenantID>:<userID>:<eTLD+1>".
// Domain index key: "cookievault:index:<tenantID>:<userID>" → []string of domains.
//
// Cookie values MUST NOT appear in logs, audit records, or data-subject exports.
package cookievault

import (
    "context"
    "encoding/json"
    "time"

    "golang.org/x/net/publicsuffix"

    "github.com/zoharbabin/web-researcher-mcp/internal/persist"
)

// StoredCookie is the serializable form of a browser cookie. Import-free from
// go-rod so the storage layer has no browser dependency.
type StoredCookie struct {
    Name     string    `json:"name"`
    Value    string    `json:"value"`
    Domain   string    `json:"domain"`
    Path     string    `json:"path"`
    Secure   bool      `json:"secure"`
    HTTPOnly bool      `json:"httpOnly"`
    SameSite string    `json:"sameSite,omitempty"`
    Expires  time.Time `json:"expires,omitempty"` // zero = session-scoped
}

// Bundle is the stored unit: all cookies for one (tenant, user, registrableDomain)
// captured at a single login event.
type Bundle struct {
    RegistrableDomain string         `json:"registrableDomain"`
    CapturedAt        time.Time      `json:"capturedAt"`
    EarliestExpiry    time.Time      `json:"earliestExpiry"` // drives TTL; zero if all session cookies
    TenantID          string         `json:"tenantId"`
    UserID            string         `json:"userId"`
    SchemaVersion     int            `json:"schemaVersion"` // always 1
    Cookies           []StoredCookie `json:"cookies"`
}

// Reader is the read-only interface used by the scraper pipeline. Intentionally
// minimal — the pipeline never writes sessions, only reads them.
type Reader interface {
    Get(ctx context.Context, tenantID, userID, registrableDomain string) (Bundle, bool)
}

// Store is the full read/write interface used by the tools.
type Store interface {
    Reader
    Save(ctx context.Context, b Bundle, maxTTL time.Duration) error
    Delete(ctx context.Context, tenantID, userID, registrableDomain string)
    ListDomains(ctx context.Context, tenantID, userID string) []string
    ExportUser(ctx context.Context, tenantID, userID string) (any, error)
    EraseUser(ctx context.Context, tenantID, userID string) (int, error)
}

// Noop is the default when COOKIE_VAULT_ENABLED=false. All operations are safe
// no-ops — zero behavior change when the feature is not configured.
type Noop struct{}

func NewNoop() *Noop                                                    { return &Noop{} }
func (Noop) Save(_ context.Context, _ Bundle, _ time.Duration) error   { return nil }
func (Noop) Get(_ context.Context, _, _, _ string) (Bundle, bool)      { return Bundle{}, false }
func (Noop) Delete(_ context.Context, _, _, _ string)                  {}
func (Noop) ListDomains(_ context.Context, _, _ string) []string       { return nil }
func (Noop) ExportUser(_ context.Context, _, _ string) (any, error)    { return nil, nil }
func (Noop) EraseUser(_ context.Context, _, _ string) (int, error)     { return 0, nil }

var _ Store = (*Noop)(nil)

// New returns a Store backed by the given persist.Store.
// maxTTL is the upper bound; 0 → 24h; capped at 7 days internally.
func New(store persist.Store, maxTTL time.Duration) Store { ... } // see full impl below

// RegistrableDomain wraps publicsuffix.EffectiveTLDPlusOne with a safe fallback.
// Returns "" for invalid/private/IP inputs (caller must gate on "" → skip injection).
func RegistrableDomain(host string) string {
    d, err := publicsuffix.EffectiveTLDPlusOne(host)
    if err != nil {
        return ""
    }
    return d
}

Full storeImpl implementation (write in cookievault.go below the interface declarations):

type storeImpl struct {
    store  persist.Store
    maxTTL time.Duration
}

func New(store persist.Store, maxTTL time.Duration) Store {
    if maxTTL <= 0 { maxTTL = 24 * time.Hour }
    if maxTTL > 7*24*time.Hour { maxTTL = 7 * 24 * time.Hour }
    return &storeImpl{store: store, maxTTL: maxTTL}
}

func bundleKey(tenantID, userID, domain string) string {
    return "cookievault:" + tenantID + ":" + userID + ":" + domain
}
func indexKey(tenantID, userID string) string {
    return "cookievault:index:" + tenantID + ":" + userID
}

// effectiveTTL = min(maxTTL, timeUntilExpiry), floored at 1s.
// All-session-cookie bundles (EarliestExpiry zero) use maxTTL as fallback.
func effectiveTTL(b Bundle, maxTTL time.Duration) time.Duration {
    if b.EarliestExpiry.IsZero() { return maxTTL }
    until := time.Until(b.EarliestExpiry)
    if until <= 0 { return time.Second }
    if until < maxTTL { return until }
    return maxTTL
}

func (s *storeImpl) Save(ctx context.Context, b Bundle, maxTTL time.Duration) error {
    if maxTTL <= 0 { maxTTL = s.maxTTL }
    data, err := json.Marshal(b)
    if err != nil { return err }
    s.store.Set(ctx, bundleKey(b.TenantID, b.UserID, b.RegistrableDomain), data, effectiveTTL(b, maxTTL))
    s.updateIndex(ctx, b.TenantID, b.UserID, b.RegistrableDomain)
    return nil
}

func (s *storeImpl) Get(ctx context.Context, tenantID, userID, registrableDomain string) (Bundle, bool) {
    data, ok := s.store.Get(ctx, bundleKey(tenantID, userID, registrableDomain))
    if !ok { return Bundle{}, false }
    var b Bundle
    if err := json.Unmarshal(data, &b); err != nil { return Bundle{}, false }
    return b, true
}

func (s *storeImpl) Delete(ctx context.Context, tenantID, userID, registrableDomain string) {
    s.store.Delete(ctx, bundleKey(tenantID, userID, registrableDomain))
    s.removeFromIndex(ctx, tenantID, userID, registrableDomain)
}

func (s *storeImpl) ListDomains(ctx context.Context, tenantID, userID string) []string {
    raw := s.loadRawIndex(ctx, tenantID, userID)
    var live []string
    for _, d := range raw {
        if _, ok := s.Get(ctx, tenantID, userID, d); ok {
            live = append(live, d)
        }
    }
    if len(live) != len(raw) { s.saveIndex(ctx, tenantID, userID, live) } // lazy prune
    return live
}

func (s *storeImpl) ExportUser(ctx context.Context, tenantID, userID string) (any, error) {
    // IMPORTANT: cookie Values are intentionally omitted — they are live credentials.
    type entry struct {
        Domain      string    `json:"domain"`
        CapturedAt  time.Time `json:"capturedAt"`
        ExpiresAt   time.Time `json:"expiresAt,omitempty"`
        CookieCount int       `json:"cookieCount"`
    }
    var out []entry
    for _, d := range s.ListDomains(ctx, tenantID, userID) {
        b, ok := s.Get(ctx, tenantID, userID, d)
        if !ok { continue }
        out = append(out, entry{Domain: d, CapturedAt: b.CapturedAt, ExpiresAt: b.EarliestExpiry, CookieCount: len(b.Cookies)})
    }
    return out, nil
}

func (s *storeImpl) EraseUser(ctx context.Context, tenantID, userID string) (int, error) {
    domains := s.ListDomains(ctx, tenantID, userID)
    for _, d := range domains {
        s.store.Delete(ctx, bundleKey(tenantID, userID, d))
    }
    s.store.Delete(ctx, indexKey(tenantID, userID))
    return len(domains), nil
}

func (s *storeImpl) updateIndex(ctx context.Context, tenantID, userID, domain string) {
    raw := s.loadRawIndex(ctx, tenantID, userID)
    for _, d := range raw {
        if d == domain { return }
    }
    s.saveIndex(ctx, tenantID, userID, append(raw, domain))
}

func (s *storeImpl) removeFromIndex(ctx context.Context, tenantID, userID, domain string) {
    var filtered []string
    for _, d := range s.loadRawIndex(ctx, tenantID, userID) {
        if d != domain { filtered = append(filtered, d) }
    }
    s.saveIndex(ctx, tenantID, userID, filtered)
}

func (s *storeImpl) loadRawIndex(ctx context.Context, tenantID, userID string) []string {
    data, ok := s.store.Get(ctx, indexKey(tenantID, userID))
    if !ok { return nil }
    var out []string
    _ = json.Unmarshal(data, &out)
    return out
}

func (s *storeImpl) saveIndex(ctx context.Context, tenantID, userID string, domains []string) {
    if data, err := json.Marshal(domains); err == nil {
        s.store.Set(ctx, indexKey(tenantID, userID), data, s.maxTTL)
    }
}

var _ Store = (*storeImpl)(nil)

2. internal/consent/consent.go — add PurposeSessionCapture

Add to the Purpose const block:

// PurposeSessionCapture covers storing AES-encrypted browser session cookies for
// replay by the scraper. Regulated: cookies are personal data (GDPR Recital 30),
// capture may violate platform ToS (users must acknowledge this), and stored
// sessions are subject to data-subject rights (export/erasure).
PurposeSessionCapture Purpose = "session_capture"

Update AllPurposes:

var AllPurposes = []Purpose{PurposeMemory, PurposeAnalytics, PurposeWorkspace, PurposeSessionCapture}

3. internal/config/config.go — cookie vault config

Add CookieVaultConfig struct (near FeatureConfig):

// CookieVaultConfig configures the encrypted session-cookie store used by
// site_login and site_logout. Disabled by default.
type CookieVaultConfig struct {
    Enabled               bool
    EncryptionKey         string // 64 hex chars; MUST differ from CACHE_ENCRYPTION_KEY
    EncryptionKeyPrev     string // previous key for lazy rotation (same semantics as CACHE_ENCRYPTION_KEY_PREV)
    MaxTTLHours           int    // default 24; max 168 (7 days); clamped to server-set cookie expiry
    CaptureTimeoutSeconds int    // default 300; max 600 (how long the headed window stays open)
}

Add CookieVault CookieVaultConfig field to Config struct.

In Load():

cfg.CookieVault = CookieVaultConfig{
    Enabled:               envBool("COOKIE_VAULT_ENABLED", false),
    EncryptionKey:         os.Getenv("COOKIE_VAULT_ENCRYPTION_KEY"),
    EncryptionKeyPrev:     os.Getenv("COOKIE_VAULT_ENCRYPTION_KEY_PREV"),
    MaxTTLHours:           envInt("COOKIE_VAULT_MAX_TTL_HOURS", 24),
    CaptureTimeoutSeconds: envInt("SESSION_CAPTURE_TIMEOUT_SECONDS", 300),
}
if cfg.CookieVault.Enabled {
    if len(cfg.CookieVault.EncryptionKey) != 64 {
        return nil, fmt.Errorf("COOKIE_VAULT_ENCRYPTION_KEY must be 64 hex characters when COOKIE_VAULT_ENABLED=true")
    }
    if cfg.CookieVault.EncryptionKeyPrev != "" && len(cfg.CookieVault.EncryptionKeyPrev) != 64 {
        return nil, fmt.Errorf("COOKIE_VAULT_ENCRYPTION_KEY_PREV must be 64 hex characters if set")
    }
    if cfg.CookieVault.MaxTTLHours > 168 { cfg.CookieVault.MaxTTLHours = 168 }
    if cfg.CookieVault.CaptureTimeoutSeconds > 600 { cfg.CookieVault.CaptureTimeoutSeconds = 600 }
}

Also add Config.ChromePath string to pass through to the capture function (already exists as CHROME_PATH env var in config — verify field name and pass it through).

.env.example additions (near the memory/analytics block):

# ── Cookie vault (site_login / site_logout) ─────────────────────────────────────
# COOKIE_VAULT_ENABLED=false
# COOKIE_VAULT_ENCRYPTION_KEY=          # 64 hex chars; MUST differ from CACHE_ENCRYPTION_KEY
# COOKIE_VAULT_ENCRYPTION_KEY_PREV=     # previous key for zero-downtime rotation
# COOKIE_VAULT_MAX_TTL_HOURS=24         # max bundle retention hours (1–168)
# SESSION_CAPTURE_TIMEOUT_SECONDS=300   # headed login window timeout (max 600)

4. internal/scraper/ssrf.goNewSSRFSafeClientWithCookies

Add this function (no changes to existing functions):

import (
    "net/http/cookiejar"
    "golang.org/x/net/publicsuffix"
    "github.com/zoharbabin/web-researcher-mcp/internal/cookievault"
)

// NewSSRFSafeClientWithCookies returns a new SSRF-safe HTTP client whose cookie
// jar is pre-seeded with domain-matched cookies from the vault. The jar uses
// publicsuffix.List so RFC 6265 domain and scheme scoping is enforced by the
// Go stdlib — a Secure cookie is never sent to an http:// host and cookies are
// never sent cross-domain.
//
// The caller MUST only use the returned client for requests to targetURL's
// registrable domain; the jar enforces this but the restriction is explicit in
// the type contract.
//
// If cookies is nil or empty, this is equivalent to NewSSRFSafeClient(allowPrivate).
func NewSSRFSafeClientWithCookies(allowPrivate bool, targetURL string, cookies []cookievault.StoredCookie) (*http.Client, error) {
    jar, err := cookiejar.New(&cookiejar.Options{PublicSuffixList: publicsuffix.List})
    if err != nil {
        return nil, err
    }
    if len(cookies) > 0 {
        u, err := url.Parse(targetURL)
        if err != nil {
            return nil, err
        }
        var httpCookies []*http.Cookie
        for _, c := range cookies {
            hc := &http.Cookie{
                Name:     c.Name,
                Value:    c.Value,
                Domain:   c.Domain,
                Path:     c.Path,
                Secure:   c.Secure,
                HttpOnly: c.HTTPOnly,
            }
            if !c.Expires.IsZero() {
                hc.Expires = c.Expires
                hc.RawExpires = c.Expires.Format(time.RFC1123)
            }
            httpCookies = append(httpCookies, hc)
        }
        jar.SetCookies(u, httpCookies)
    }
    client := NewSSRFSafeClient(allowPrivate) // unchanged transport (SSRF blocking still applies)
    client.Jar = jar
    return client, nil
}

5. internal/scraper/pipeline.goCookieVault field + injection helpers

5a. Add to PipelineConfig

// CookieVault, when non-nil, is queried for domain-matched session cookies
// before each scrape. nil (default, Noop) means zero behavior change.
// tenantID and userID are read from the request context via auth package.
CookieVault cookievault.Reader

5b. New helper: vaultCookiesForURL

// vaultCookiesForURL returns vault cookies for the target URL's registrable domain,
// or nil when the vault is absent or has no stored session.
// tenantID/userID are read from ctx via auth.TenantIDFromContext / auth.UserIDFromContext.
func (p *Pipeline) vaultCookiesForURL(ctx context.Context, rawURL string) []cookievault.StoredCookie {
    if p.config.CookieVault == nil {
        return nil
    }
    u, err := url.Parse(rawURL)
    if err != nil {
        return nil
    }
    rd := cookievault.RegistrableDomain(u.Hostname())
    if rd == "" {
        return nil
    }
    tenantID := auth.TenantIDFromContext(ctx)
    userID   := auth.UserIDFromContext(ctx)
    bundle, ok := p.config.CookieVault.Get(ctx, tenantID, userID, rd)
    if !ok {
        return nil
    }
    return bundle.Cookies
}

5c. HTTP tier injection

In each HTTP-tier scrape method (scrapeMarkdown, scrapeHTML, scrapeStealth, scrapePatents — anywhere that constructs an *http.Client or uses p.client):

// Build a per-request client. When vault cookies are available for this domain,
// use a fresh client with a pre-seeded jar (avoids sharing jar state between
// concurrent requests). Fall back to the shared SSRF-safe client otherwise.
reqClient := p.client
if cookies := p.vaultCookiesForURL(ctx, url); len(cookies) > 0 {
    var err error
    reqClient, err = scraper.NewSSRFSafeClientWithCookies(p.config.AllowPrivateIPs, url, cookies)
    if err != nil {
        slog.Warn("cookie injection skipped", "url", url, "error", err)
        reqClient = p.client
    }
}
// Use reqClient for all subsequent http.Request calls in this method.

Note: p.client is the current shared SSRF-safe client. Per-request client creation only when cookies are present avoids any performance impact on the common (unauthenticated) path.

5d. Browser tier injection

New method in browser.go (or in a new internal/scraper/inject_browser.go):

// injectVaultCookies looks up stored session cookies for rawURL's domain and
// injects them into the page via proto.NetworkSetCookies before navigation.
// Non-fatal: on any error it logs and returns without blocking the scrape.
func (p *Pipeline) injectVaultCookies(ctx context.Context, page *rod.Page, rawURL string) {
    cookies := p.vaultCookiesForURL(ctx, rawURL)
    if len(cookies) == 0 {
        return
    }
    u, _ := url.Parse(rawURL)

    var params []*proto.NetworkCookieParam
    for _, c := range cookies {
        // Redundant domain-safety check (defense in depth).
        cookieDomain := strings.TrimPrefix(c.Domain, ".")
        if cookievault.RegistrableDomain(cookieDomain) != cookievault.RegistrableDomain(u.Hostname()) {
            continue
        }
        // Never inject a Secure cookie into an http:// target.
        if c.Secure && u.Scheme != "https" {
            continue
        }
        param := &proto.NetworkCookieParam{
            Name:     c.Name,
            Value:    c.Value,
            Domain:   c.Domain,
            Path:     c.Path,
            Secure:   c.Secure,
            HTTPOnly: c.HTTPOnly,
        }
        if !c.Expires.IsZero() {
            exp := proto.TimeSinceEpoch(float64(c.Expires.Unix()))
            param.Expires = &exp
        }
        if c.SameSite != "" {
            param.SameSite = proto.NetworkCookieSameSite(c.SameSite)
        }
        params = append(params, param)
    }
    if len(params) == 0 {
        return
    }
    if err := page.SetCookies(params); err != nil {
        slog.Warn("browser cookie injection failed", "url", rawURL, "error", err)
    }
}

Call site in scrapeBrowser (after stealth.Page(browser), before page.Navigate):

p.injectVaultCookies(ctx, page, url)

Important: page.SetCookies targets the page's isolated session context (via CDP sessionID), not the global browser profile. This means injected cookies are page-scoped and cannot leak to other concurrent scrapes. Verify this is the case for the pinned go-rod version — the CDP Network.setCookies command's scope is determined by whether the sessionId is set in the CDP message.


6. New file: internal/scraper/capture.go — headed browser login capture

This file contains CaptureLoginSession and its helpers. It is the only place that launches a headed browser; it is entirely separate from getBrowserPool().

package scraper

import (
    "context"
    "fmt"
    "net/url"
    "os"
    "strings"
    "time"

    "github.com/go-rod/rod"
    "github.com/go-rod/rod/lib/launcher"
    "github.com/go-rod/rod/lib/proto"

    "github.com/zoharbabin/web-researcher-mcp/internal/cookievault"
)

// captureHook is a test seam. When non-nil, CaptureLoginSession calls it instead
// of launching a real browser. Tests set this via package init or t.Cleanup reset.
// MUST be nil in production; never exported.
var captureHook func(ctx context.Context, loginURL, registrableDomain, tenantID, userID, chromePath string) (cookievault.Bundle, error)

// loginPageSubstrings are URL path fragments that indicate the user is still on
// an auth page. Login is considered complete when NONE of these appear in the
// current URL AND (a known session cookie appeared OR new cookies are present).
var loginPageSubstrings = []string{
    "/login", "/signin", "/sign-in", "/auth", "/checkpoint",
    "/accounts/login", "/session/new", "/oauth", "/sso",
    "/saml", "/oidc", "/forgot", "/reset-password",
}

// knownSessionCookies maps registrable domains to the cookie name(s) that
// definitively indicate a successful session. Checking these provides sub-second
// detection vs. waiting for URL change on redirect-heavy flows.
// This map is an accelerator only — the URL-change fallback fires regardless.
var knownSessionCookies = map[string][]string{
    "linkedin.com":  {"li_at"},
    "instagram.com": {"sessionid"},
    "facebook.com":  {"xs"},
    "twitter.com":   {"auth_token"},
    "x.com":         {"auth_token"},
    "github.com":    {"user_session"},
    "reddit.com":    {"reddit_session"},
    "notion.so":     {"token_v2"},
}

// CaptureLoginSession opens a headed (visible) browser at loginURL and blocks
// until the user completes login or ctx is cancelled.
//
// Isolation guarantees:
//   - Separate launcher instance from getBrowserPool() — NEVER touches poolOnce.
//   - Ephemeral UserDataDir (os.MkdirTemp, 0700), destroyed via defer on every exit path.
//   - The profile dir is cleaned up even on panic (deferred os.RemoveAll).
//
// This function MUST only be called in STDIO mode (the caller enforces this).
func CaptureLoginSession(
    ctx context.Context,
    loginURL, registrableDomain, tenantID, userID, chromePath string,
) (cookievault.Bundle, error) {
    if captureHook != nil {
        return captureHook(ctx, loginURL, registrableDomain, tenantID, userID, chromePath)
    }

    tempDir, err := os.MkdirTemp("", "web-researcher-capture-*")
    if err != nil {
        return cookievault.Bundle{}, fmt.Errorf("could not create capture profile dir: %w", err)
    }
    if err := os.Chmod(tempDir, 0700); err != nil {
        _ = os.RemoveAll(tempDir)
        return cookievault.Bundle{}, fmt.Errorf("could not secure capture profile dir: %w", err)
    }
    defer os.RemoveAll(tempDir) // CRITICAL: runs on all exit paths including panic+recover

    l := launcher.New().
        Headless(false).
        UserDataDir(tempDir).
        Set("no-sandbox").
        Set("disable-dev-shm-usage").
        Set("no-first-run").
        Set("no-default-browser-check").
        Set("disable-background-networking")
    if chromePath != "" && chromePath != chromeDisabled {
        l = l.Bin(chromePath)
    }

    controlURL, err := l.Launch()
    if err != nil {
        return cookievault.Bundle{}, fmt.Errorf("could not launch browser for login: %w", err)
    }
    browser := rod.New().ControlURL(controlURL)
    if err := browser.Connect(); err != nil {
        l.Kill()
        return cookievault.Bundle{}, fmt.Errorf("could not connect to browser: %w", err)
    }
    defer func() {
        _ = browser.Close()
        l.Kill()
    }()

    page, err := browser.Page(proto.TargetCreateTarget{URL: loginURL})
    if err != nil {
        return cookievault.Bundle{}, fmt.Errorf("could not open login page: %w", err)
    }
    defer page.Close()
    page = page.Context(ctx)

    if err := page.WaitLoad(); err != nil {
        if ctx.Err() != nil {
            return cookievault.Bundle{}, fmt.Errorf("login timed out before page loaded")
        }
        return cookievault.Bundle{}, fmt.Errorf("login page failed to load: %w", err)
    }

    return pollUntilLoggedIn(ctx, page, loginURL, registrableDomain, tenantID, userID)
}

// pollUntilLoggedIn polls at 1-second intervals for login-completion signals.
// Signals (first match wins):
//  1. A knownSessionCookie for registrableDomain appeared with a non-empty value.
//  2. Current URL no longer matches any loginPageSubstring AND at least one cookie
//     is present for the domain (catches redirect-based and SPA flows).
//  3. ctx cancelled → timeout error.
//  4. page.Info() fails → browser was closed by user → abort error.
func pollUntilLoggedIn(
    ctx context.Context,
    page *rod.Page,
    loginURL, registrableDomain, tenantID, userID string,
) (cookievault.Bundle, error) {
    ticker := time.NewTicker(time.Second)
    defer ticker.Stop()
    sessionNames := knownSessionCookies[registrableDomain]

    for {
        select {
        case <-ctx.Done():
            return cookievault.Bundle{}, fmt.Errorf("login timed out — the browser window has been closed automatically")
        case <-ticker.C:
            info, err := page.Info()
            if err != nil {
                return cookievault.Bundle{}, fmt.Errorf("login aborted — browser window was closed")
            }
            cookies, err := page.Cookies(nil)
            if err != nil {
                continue // transient CDP error; keep polling
            }
            if detectLoginComplete(info.URL, cookies, registrableDomain, sessionNames) {
                return buildBundle(cookies, registrableDomain, tenantID, userID), nil
            }
        }
    }
}

// detectLoginComplete applies the two-signal heuristic.
func detectLoginComplete(currentURL string, cookies []*proto.NetworkCookie, rd string, sessionNames []string) bool {
    // Signal 1: known session cookie appeared.
    for _, name := range sessionNames {
        for _, c := range cookies {
            if c.Name == name && c.Value != "" {
                return true
            }
        }
    }
    // Signal 2: URL moved away from all login-path patterns AND domain has cookies.
    for _, pattern := range loginPageSubstrings {
        if strings.Contains(strings.ToLower(currentURL), pattern) {
            return false
        }
    }
    // URL is not a login page — check for domain-matching cookies.
    for _, c := range cookies {
        if cookievault.RegistrableDomain(strings.TrimPrefix(c.Domain, ".")) == rd {
            return true
        }
    }
    return false
}

// buildBundle constructs a Bundle from the page's cookie state, filtered to
// cookies whose Domain eTLD+1 matches registrableDomain.
func buildBundle(cookies []*proto.NetworkCookie, rd, tenantID, userID string) cookievault.Bundle {
    var stored []cookievault.StoredCookie
    var earliest time.Time
    for _, c := range cookies {
        if cookievault.RegistrableDomain(strings.TrimPrefix(c.Domain, ".")) != rd {
            continue
        }
        sc := cookievault.StoredCookie{
            Name:     c.Name,
            Value:    c.Value,
            Domain:   c.Domain,
            Path:     c.Path,
            Secure:   c.Secure,
            HTTPOnly: c.HTTPOnly,
            SameSite: string(c.SameSite),
        }
        if exp := float64(c.Expires); exp > 0 {
            t := time.Unix(int64(exp), 0)
            sc.Expires = t
            if earliest.IsZero() || t.Before(earliest) {
                earliest = t
            }
        }
        stored = append(stored, sc)
    }
    return cookievault.Bundle{
        RegistrableDomain: rd,
        CapturedAt:        time.Now().UTC(),
        EarliestExpiry:    earliest,
        TenantID:          tenantID,
        UserID:            userID,
        SchemaVersion:     1,
        Cookies:           stored,
    }
}

7. New file: internal/tools/site_session.go

Both tools in one file (they share consent/identity/vault patterns).

site_login (write tool — writeAnnotations(false))

Input struct:

type siteLoginInput struct {
    URL    string `json:"url"    jsonschema:"Login page URL (e.g. https://www.linkedin.com/login). A browser window opens; log in normally.,required"`
    Domain string `json:"domain,omitempty" jsonschema:"Override the registrable domain (e.g. 'linkedin.com'). Inferred from URL if omitted; override when the login URL is on a different subdomain than the content."`
}

Registration guard: only register when deps.CookieVault is non-Noop (mirror the memory_save pattern — check via type assertion _, isNoop := deps.CookieVault.(*cookievault.Noop); !isNoop).

Handler logic (complete):

  1. !deps.Features.StdioMode → return {status:"http_mode_unsupported", reason:"site_login requires a local display..."}
  2. !deps.Consent.HasConsent(ctx, consent.PurposeSessionCapture){status:"no_consent", reason:"..."}
  3. userID == "" || userID == "anonymous"{status:"unavailable", reason:"requires authenticated user"}
  4. Validate URL via scraper.ValidateScrapeURL
  5. Compute registrableDomain from input.Domain (if set) or url.Parse(input.URL).Hostname() via cookievault.RegistrableDomain; return toolError on ""
  6. context.WithTimeout(ctx, time.Duration(deps.Config.CookieVault.CaptureTimeoutSeconds)*time.Second)
  7. bundle, err := scraper.CaptureLoginSession(captureCtx, rawURL, registrableDomain, tenantID, userID, deps.Config.ChromePath)
  8. On error: return structuredError(err.Error(), ToolError{Kind: ErrKindAuth, Retryable: false, SuggestedAction: ActionInformUser})
  9. deps.CookieVault.Save(ctx, bundle, time.Duration(deps.Config.CookieVault.MaxTTLHours)*time.Hour)
  10. Audit (domain + cookieCount — never cookie values)
  11. Return {status:"captured", domain, cookieCount, capturedAt, expiresAt (if non-zero), trust:"user-asserted-content"}

Tool description (verbatim — required for TestToolDescriptionQuality):

"Open a browser window for you to log into a website normally (username, password, 2FA, CAPTCHA all handled by you). After login, the session cookies are captured and encrypted so future scrape_page calls to that site work without re-logging-in. Credentials are never seen by the server — only the post-login cookie state is stored. IMPORTANT: Automating a logged-in session may violate the website's Terms of Service. Use only with your own accounts on sites you are authorized to access programmatically. STDIO-only: requires a local display. Use site_logout to revoke the stored session."

site_logout (write tool — writeAnnotations(true) — idempotent)

Input struct:

type siteLogoutInput struct {
    Domain string `json:"domain" jsonschema:"Registrable domain to revoke the stored session for (e.g. 'linkedin.com').,required"`
}

Handler logic:

  1. Consent + identity gates (same as site_login)
  2. Validate/normalize domain via cookievault.RegistrableDomain
  3. _, exists := deps.CookieVault.Get(ctx, tenantID, userID, rd) then deps.CookieVault.Delete(...)
  4. Return {status:"ok"|"not_found", domain:rd} — never an error (Delete is always safe)

Output schemas (add to internal/tools/schemas.go):

var siteLoginOutputSchema = map[string]any{
    "type": "object",
    "properties": map[string]any{
        "status":      map[string]any{"type": "string", "enum": []any{"captured", "timeout", "aborted", "no_consent", "unavailable", "http_mode_unsupported"}},
        "domain":      map[string]any{"type": "string"},
        "cookieCount": map[string]any{"type": "integer"},
        "capturedAt":  map[string]any{"type": "string", "format": "date-time"},
        "expiresAt":   map[string]any{"type": "string", "format": "date-time"},
        "reason":      map[string]any{"type": "string"},
        "trust":       trustUserAsserted,
    },
}

var siteLogoutOutputSchema = map[string]any{
    "type": "object",
    "properties": map[string]any{
        "status": map[string]any{"type": "string", "enum": []any{"ok", "not_found", "no_consent", "unavailable"}},
        "domain": map[string]any{"type": "string"},
    },
}

8. internal/tools/registry.goDependencies + Features + RegisterAll

Add to Dependencies:

// CookieVault holds encrypted session cookies for site_login / site_logout.
// Is cookievault.Noop when COOKIE_VAULT_ENABLED=false (default).
CookieVault cookievault.Store

Add to Features:

// StdioMode is true when running without an HTTP port (no PORT env var).
// Required to gate site_login, which opens a headed browser needing a local display.
StdioMode bool

In RegisterAll, add (mirror the memory/analytics conditional pattern):

if _, isNoop := deps.CookieVault.(*cookievault.Noop); !isNoop {
    registerSiteLogin(srv, deps)
    registerSiteLogout(srv, deps)
}

9. cmd/web-researcher-mcp/main.go — wiring

After the memory/analytics/workspace store construction block:

// Cookie vault (site_login / site_logout).
var vaultStore cookievault.Store = cookievault.NewNoop()
if cfg.CookieVault.Enabled {
    vaultPersist, err := persist.NewDiskStore(
        filepath.Join(cacheDir, "cookievault"),
        cfg.CookieVault.EncryptionKey,
        cfg.CookieVault.EncryptionKeyPrev,
    )
    if err != nil {
        return fmt.Errorf("cookie vault persist store: %w", err)
    }
    maxTTL := time.Duration(cfg.CookieVault.MaxTTLHours) * time.Hour
    vaultStore = cookievault.New(vaultPersist, maxTTL)
    dataSubjectRegistry.Register(
        "cookievault",
        datasubject.ExporterFunc(func(ctx context.Context, s datasubject.Subject) (any, error) {
            return vaultStore.ExportUser(ctx, s.TenantID, s.UserID)
        }),
        datasubject.EraserFunc(func(ctx context.Context, s datasubject.Subject) (int, error) {
            return vaultStore.EraseUser(ctx, s.TenantID, s.UserID)
        }),
    )
}

Wire into Dependencies:

deps := tools.Dependencies{
    // ... existing fields ...
    CookieVault: vaultStore,
    Features: tools.Features{
        // ... existing fields ...
        StdioMode: cfg.Port == 0,
    },
}

Wire into scraper.NewPipeline:

scraper.NewPipeline(scraper.PipelineConfig{
    // ... existing fields ...
    CookieVault: vaultStore, // cookievault.Store implements cookievault.Reader
})

10. internal/tools/metadata_test.go + tools_test.go

metadata_test.go — add to expectedTools:

"site_login",
"site_logout",

Add to writeTools in TestAllToolsHaveAnnotations:

"site_login": true,
"site_logout": true,

tools_test.go — in setupTestDeps(), add:

memPersist := persist.NewMemoryStore()
deps.CookieVault = cookievault.New(memPersist, 24*time.Hour)
deps.Features.StdioMode = true

The conditional registration check (_, isNoop := deps.CookieVault.(*cookievault.Noop); !isNoop) will be true for the non-Noop MemoryStore-backed vault, so site_login and site_logout appear in listTools.


11. docs/TOOLS.md — tool sections

Add two sections in the correct numeric sequence (tools 27 and 28 based on current expectedTools count of 26). Format must exactly match the ## Tool N: \name`pattern required byTestToolsDocMatchesRegistry`.

The site_login section must include a Terms and authorization subsection:

Major platforms (LinkedIn, Meta, Twitter/X, Instagram) explicitly prohibit automated access to logged-in content in their Terms of Service. Using site_login with these platforms may result in account suspension. This tool is intended for use with your own accounts on sites where you have authorization for programmatic access (e.g., internal systems, developer environments, sites whose ToS permit it). The MCP server and its authors accept no liability for ToS violations or resulting account actions.


Security model (authoritative summary)

Domain isolation: the primary gate

eTLD+1(targetURL.Hostname()) is computed deterministically in Go via golang.org/x/net/publicsuffix.EffectiveTLDPlusOne. This value is the vault key. No LLM-settable parameter, tool argument, or scraped page content can influence which vault entry is retrieved for a given scrape_page call. A prompt-injection attack instructing the model to "use LinkedIn cookies for attacker.com" fails because the gate is server-side and unconditional.

A second per-cookie check inside injectVaultCookies / injectCookiesIntoBrowserPage verifies eTLD+1(cookieDomain) == eTLD+1(targetURL.Hostname()) as defense-in-depth against deserialization bugs or future storage tampering.

Cross-domain redirect handling: for HTTP tiers, cookiejar with publicsuffix.List enforces RFC 6265 per-hop — a 302 from linkedin.comevil.com drops the LinkedIn cookie on the second hop automatically.

Credential vs. cookie

The user's password is typed into the real login form inside the headed Chrome window. The server uses CDP Network.getAllCookies (exposed as page.Cookies(nil) in go-rod) to read the post-auth cookie state — HttpOnly cookies included. The server never intercepts, stores, or transmits the password.

Encryption at rest

Reuses persist.DiskStore (AES-256-GCM, per-blob AAD, atomic rename, 0600 files, 0700 dir, SHA-256-hashed filenames, lazy two-key rotation). Dedicated COOKIE_VAULT_ENCRYPTION_KEY MUST differ from CACHE_ENCRYPTION_KEY — this limits blast radius from cache key exposure and ensures the high-value vault has an independent secret.

Session TTL

min(COOKIE_VAULT_MAX_TTL_HOURS, earliest server-set cookie expiry). Default cap: 24h. Hard max: 7 days. The persist.DiskStore 8-byte expiry prefix enforces TTL at read time without a background cleaner — an expired bundle returns (Bundle{}, false) on Get.

Headed browser isolation

The auth-capture browser is launched fresh per site_login call via launcher.New() with a os.MkdirTemp UserDataDir. It NEVER touches getBrowserPool() or poolOnce. The temp dir is removed via defer os.RemoveAll on all exit paths. The scraping pool (headless, shared, poolOnce singleton) is completely unaffected.

Audit log safety

Cookie values MUST NOT appear in any audit event, log line, or exported data. Audit events carry: tool_name, tenant_id, user_id, timestamp, success, metadata: {registrableDomain, cookieCount, expiresAt}. Data-subject export carries: domain, capturedAt, expiresAt, cookieCount — never value.

SSRF defense-in-depth

Cookie injection does not weaken SSRF protection:

  • HTTP tiers: NewSSRFSafeClientWithCookies uses the same newSSRFSafeTransport — private IP blocking applies regardless of auth state.
  • Browser tier: CDP page.SetCookies is domain-scoped; the SSRF-safe client and the eTLD+1 gate both apply before injection.

A prompt-injected SSRF attempt (e.g., scrape http://169.254.169.254/ with LinkedIn cookies) fails at the SSRF transport layer — cookies are irrelevant to the IP-level block.


ToS disclaimer (required in tool description and docs)

The site_login tool description MUST include:

"IMPORTANT: Automating a logged-in session may violate the website's Terms of Service. Use only with your own accounts on sites you are authorized to access programmatically."

LinkedIn §8.2, Twitter/X ToS (Sept 2023), Meta §3.2.3, Instagram §4.2 all explicitly prohibit automated access including with valid credentials. Use of this feature with those platforms is a ToS violation the user knowingly accepts. The tool's docs section must state this clearly, mirroring the project's "plain language tone" standard (no jargon — a grad student must understand the risk).


Test matrix

Unit tests: internal/cookievault/cookievault_test.go

Test Verifies
TestRegistrableDomain_Known linkedin.com, www.linkedin.com, sub.linkedin.com → "linkedin.com"
TestRegistrableDomain_Invalid IP, localhost, "not-a-domain!!!" → ""
TestNoop_AllOps All methods no-op and never panic
TestStoreImpl_SaveGet_RoundTrip Save then Get returns same bundle
TestStoreImpl_Overwrite Second Save same domain overwrites; ListDomains still one entry
TestStoreImpl_TTLExpiry Bundle expires after TTL (inject clock via MemoryStore mocked TTL)
TestStoreImpl_EarliestExpiry_Clamps_TTL Cookie expiry in 1h → TTL ≤ 1h
TestStoreImpl_AllSessionCookies_UsesMaxTTL All-session bundle (Expires zero) → TTL = maxTTL
TestStoreImpl_Delete Delete removes bundle and index entry
TestStoreImpl_ListDomains_LazyPrune Expired bundles removed from index on next ListDomains call
TestStoreImpl_ExportUser_NoCookieValues ExportUser result contains domain/capturedAt/count; Value field absent from JSON
TestStoreImpl_EraseUser All bundles + index removed; ListDomains → nil
TestDomainIsolation_TenantBoundary TenantA cookies NOT returned for TenantB same user
TestDomainIsolation_UserBoundary UserA cookies NOT returned for UserB same tenant
TestEffectiveTTL_Cases All four branches: zero expiry, past expiry, expiry < maxTTL, expiry > maxTTL

Unit tests: internal/scraper/capture_test.go

Test Verifies
TestDetectLoginComplete_KnownSessionCookie li_at present → true for linkedin.com
TestDetectLoginComplete_URLChangedFromLogin URL /login → /feed, no known cookie → true
TestDetectLoginComplete_StillOnLogin URL still /login → false
TestDetectLoginComplete_NoCookies URL changed but no domain cookies → false
TestBuildBundle_DomainFilter google.com cookie excluded from linkedin.com bundle
TestBuildBundle_EarliestExpiry Multi-cookie; soonest expiry wins
TestBuildBundle_AllSessionCookies EarliestExpiry zero when all Expires zero
TestCaptureLoginSession_HookOverride captureHook fires; no browser launched

Unit tests: internal/scraper/pipeline_test.go (additions)

Test Verifies
TestVaultCookiesForURL_NilVault Returns nil without panic
TestVaultCookiesForURL_NoMatch Vault has cookies for other domain → nil
TestVaultCookiesForURL_Match Returns bundle cookies for matching domain
TestInjectVaultCookies_NilVault No-op; page.SetCookies never called
TestInjectVaultCookies_DomainMismatch Cross-domain cookie excluded
TestInjectVaultCookies_SecureDrop Secure cookie not injected into http:// target
TestNewSSRFSafeClientWithCookies_ScopedToTargetDomain Jar only sends cookies to target domain; other-domain httptest server receives none
TestNewSSRFSafeClientWithCookies_SecureCookieDroppedOnHTTP Secure cookie absent on http:// request
TestNewSSRFSafeClientWithCookies_NilCookies Returns valid client without panic

Unit tests: internal/tools/tools_test.go (additions)

Test Verifies
TestSiteLogin_NotRegisteredWhenNoop ListTools excludes site_login when CookieVault is Noop
TestSiteLogin_HTTPMode StdioMode=false → {status:"http_mode_unsupported"}
TestSiteLogin_NoConsent Missing consent → {status:"no_consent"}
TestSiteLogin_EmptyURL toolError("url is required")
TestSiteLogin_InvalidDomain domain="bad!!!" → toolError
TestSiteLogin_CaptureSuccess captureHook returns valid bundle; vault.Get confirms storage; output status="captured", cookieCount matches
TestSiteLogin_CaptureTimeout captureHook returns error; structuredError with ErrKindAuth returned
TestSiteLogin_AuditNoCookieValues Mock auditor asserts no Value field in audit metadata
TestSiteLogout_OK Site_login then site_logout; subsequent vault.Get returns (_, false)
TestSiteLogout_NotFound Logout on domain with no stored session → {status:"not_found"} (not an error)
TestSiteLogout_Idempotent Two logout calls → "ok" then "not_found"; never an error
TestSiteLogout_NoConsent Missing consent → {status:"no_consent"}

Integration tests: injection round-trip

In internal/scraper/pipeline_test.go, add an HTTPS httptest.Server that:

  • Returns 200 OK with Content-Type: text/html and a substantial article body when the auth cookie is present
  • Returns 401 Unauthorized when the auth cookie is absent
Test Verifies
TestHTTPTierCookieInjection_AuthSuccess Vault pre-seeded; scrape returns full content
TestHTTPTierCookieInjection_NoCookies Vault empty; scrape returns 401/blocked
TestHTTPTierCookieInjection_WrongDomain Vault has cookies for different eTLD+1; not injected; 401
TestHTTPTierCookieInjection_ExpiredBundle Vault TTL elapsed; Get returns false; 401
TestHTTPTierCookieInjection_TenantIsolation TenantA cookies not injected for TenantB context

E2E tests (tests/e2e///go:build live)

// TestE2E_LinkedInWithPreloadedSession:
//   Prereq: LINKEDIN_TEST_BUNDLE_JSON env var — pre-captured bundle JSON for a test account.
//   1. Unmarshal bundle and Save directly into vault (bypass headed browser in CI).
//   2. Call scrape_page on a known public LinkedIn post URL.
//   3. Assert extractionQuality:"complete" and content length > 200 chars.
//
// TestE2E_SiteLogout_RestoresUnauthState:
//   Follows TestE2E_LinkedInWithPreloadedSession.
//   1. Call site_logout for "linkedin.com".
//   2. Call scrape_page on the same URL.
//   3. Assert extractionQuality:"partial" (unauthenticated = back to scraping without auth).

Build order

Each step is independently compilable and testable. Run go test -race ./... after each step.

  1. internal/cookievault/cookievault.go + cookievault_test.go — no upstream changes; all unit tests pass
  2. internal/consent/consent.goPurposeSessionCapture + AllPurposes; no test changes
  3. internal/config/config.go + .env.exampleCookieVaultConfig struct + Load() + validation
  4. internal/scraper/ssrf.goNewSSRFSafeClientWithCookies; new unit tests for jar scoping
  5. internal/scraper/pipeline.goCookieVault cookievault.Reader in PipelineConfig; vaultCookiesForURL helper; nil-safe injection in HTTP tiers; injectVaultCookies helper; call site in scrapeBrowser. Existing tests pass (nil vault = zero behavior change)
  6. internal/scraper/capture.go + capture_test.goCaptureLoginSession + helpers; captureHook test seam
  7. internal/tools/schemas.gositeLoginOutputSchema, siteLogoutOutputSchema
  8. internal/tools/site_session.goregisterSiteLogin, registerSiteLogout
  9. internal/tools/registry.goCookieVault field, StdioMode field, conditional RegisterAll block
  10. internal/tools/metadata_test.go + tools_test.goexpectedTools, writeTools, setupTestDeps
  11. cmd/web-researcher-mcp/main.go — vault construction, datasubject registration, pipeline wiring
  12. docs/TOOLS.md## Tool 27: \site_login`+## Tool 28: `site_logout`` sections
  13. go test -race ./... + make verify + make rebuild-local + IRL smoke test

Acceptance criteria

  • go test -race ./... passes with zero failures
  • TestAllToolsRegistered: site_login + site_logout appear in listTools when CookieVault is non-Noop
  • TestAllToolsHaveAnnotations: both tools carry writeAnnotations; read tools unchanged
  • TestToolsDocMatchesRegistry: docs/TOOLS.md contains matching ## Tool N: sections
  • TestOutputSchemaMatchesResponse: output schemas match actual JSON for both tools
  • TestExternalContentToolsCarryTrustMarker: site_login output carries "user-asserted-content"
  • Cookie values do NOT appear in any audit event, log line, or ExportUser result (asserted in TestSiteLogin_AuditNoCookieValues)
  • site_login in HTTP mode returns {status:"http_mode_unsupported"} without launching a browser
  • site_logout is idempotent: {status:"ok"} then {status:"not_found"}; never IsError:true
  • eTLD+1 gate: cookies for linkedin.com NOT injected into google.com request (pipeline integration test)
  • make rebuild-localsite_login/site_logout absent from tool list when COOKIE_VAULT_ENABLED is unset
  • Data-subject erasure: after EraseUser, ListDomains → nil, Get(_, false) for any stored domain
  • No new direct dependencies other than golang.org/x/net/publicsuffix (verify: go list -m golang.org/x/net — almost certainly already present transitively via go-rod)
  • Headed browser profile dir removed after every site_login exit path (success, timeout, abort, error)

Implementation notes

  1. golang.org/x/net/publicsuffix: run go list -m golang.org/x/net before go get — it is almost certainly a transitive dependency of go-rod and therefore already in go.sum.

  2. proto.NetworkCookieParam.Expires type: check the pinned go-rod version's exact type (*proto.TimeSinceEpoch vs. proto.TimeSinceEpoch). The field is omitempty so omitting it for session cookies is safe.

  3. Browser incognito context for injection: stealth.Page(browser) creates a per-page context but shares the browser profile. For auth-injected scrapes, browser.MustIncognito().MustPage(url) creates a fully isolated context and prevents cookie state from leaking between concurrent authenticated scrapes. Evaluate using this for the browser tier when len(vaultCookies) > 0.

  4. captureHook test seam: the package-level var MUST be nil in production. Reset it in each test via t.Cleanup(func() { captureHook = nil }) to avoid test-order contamination.

  5. LinkedIn lidc 24h TTL: the lidc cookie (data-center routing) expires in 24 hours. After 24h a previously-captured LinkedIn session may start returning session-not-found errors on some requests. The vault's TTL enforcement (clamped to earliestExpiry) handles this correctly — the bundle expires when lidc expires, prompting the user to re-run site_login. This is by design and consistent with the honest-about-limitations principle.

  6. Consent auto-grant in STDIO mode: in STDIO mode with STDIO_USER_ID set, the existing memory/analytics consent is auto-granted on first run. PurposeSessionCapture should NOT be auto-granted — it requires explicit user acknowledgement of the ToS risk before first use. The consent grant for session_capture must be user-initiated (e.g., a separate MCP tool call to the consent endpoint, or via CONSENT_GRANT env for operator-controlled deployment).

Metadata

Metadata

Assignees

Labels

P2Medium priorityenhancementNew feature or request

Projects

No projects

Milestone

No milestone

Relationships

None yet

Development

No branches or pull requests

Issue actions