RFD 3207

Search relevance weighting v2

Discussion by @lowlydba Created March 15, 2026 Updated March 15, 2026 2 min read

Search quality is acceptable for exact RFD numbers but weak for concept-level queries where terms appear in long comment threads. The proposed v2 weighting favors title and summary text, then proposal body, and finally comment bodies with a lower multiplier. We should evaluate this against a fixed benchmark set of 40 representative queries gathered from team usage and compare top-5 precision with the current tuning. Open questions include how aggressively to de-prioritize stale threads and whether archived discussions should receive a recency penalty rather than being filtered entirely.

Discussion Comments

1. @lowlydba ↗ · March 15, 2026

The benchmark plan looks strong, but we should lock the query set before tuning begins to avoid unintentional overfitting. I also recommend splitting results by query intent: known-item lookup, exploratory learning, and troubleshooting. The same scoring rules rarely perform best across all three. A segmented report would make the tradeoffs explicit before we settle on default weights.

_{View Comment ↗}

2. @lowlydba ↗ · March 15, 2026

I am supportive, especially the lower weight for long comment tails, but we should preserve discoverability for high-signal comments that contain implementation decisions. Maybe add a small boost when comments include change markers, timelines, or accepted-action language. That would keep important context searchable without letting noisy threads dominate relevance for broad terms.

_{View Comment ↗}