Follow-up to CAP-5 (KB-grounded idea generation). Idea generation latency/cost is dominated by RE-SENDING the large KB context (~46k tokens) on every Claude call β including the dedup top-up. Observed ~110s for a 10-idea batch with a 184k-char KB.
Fix: prompt caching. Mark the KB + strategy as a cached prefix (1h TTL, extended-cache beta header) so:
* the top-up reuses the cached KB for ~free,
* repeat generations within the hour are fast + ~10x cheaper on input,
* this removes the conditional-top-up trade-off we shipped as an interim workaround.
Optionally: move idea generation into the agent β it already has the 1h prompt-caching infra, KB-injection patterns, and tool plumbing, so it is a natural home (it is a one-shot batch tool vs the conversational flow, so a moderate refactor).
Ship after CAP-5 is in production.
Please authenticate to join the conversation.
Planned
Feature Request
About 3 hours ago

Chris Koronowski
Get notified by email when there are changes.
Planned
Feature Request
About 3 hours ago

Chris Koronowski
Get notified by email when there are changes.