Idea generation: prompt caching + move to agent

Follow-up to CAP-5 (KB-grounded idea generation). Idea generation latency/cost is dominated by RE-SENDING the large KB context (~46k tokens) on every Claude call β€” including the dedup top-up. Observed ~110s for a 10-idea batch with a 184k-char KB.

Fix: prompt caching. Mark the KB + strategy as a cached prefix (1h TTL, extended-cache beta header) so:

* the top-up reuses the cached KB for ~free,
* repeat generations within the hour are fast + ~10x cheaper on input,
* this removes the conditional-top-up trade-off we shipped as an interim workaround.

Optionally: move idea generation into the agent β€” it already has the 1h prompt-caching infra, KB-injection patterns, and tool plumbing, so it is a natural home (it is a one-shot batch tool vs the conversational flow, so a moderate refactor).

Ship after CAP-5 is in production.

Please authenticate to join the conversation.

Upvoters
Status

Planned

Board
πŸ’‘

Feature Request

Date

About 3 hours ago

Author

Chris Koronowski

Subscribe to post

Get notified by email when there are changes.