Improve Knowledge Base: richer ingestion (links, all files, social) + wire into idea generation

## Goal

Make the Knowledge Base a real, rich context engine and wire it into idea generation so it produces noticeably better ideas.

Two parts:

1. Richer ingestion — let users add far more context to the knowledge base: links/URLs, all file types, and social media content.
2. Knowledge-powered ideas — connect the knowledge base into the idea-generation pipeline so generated ideas are grounded in the user's actual knowledge/context, not generic.

## Why

The knowledge base is currently limited in what it accepts and underused downstream. The more grounded context we have (and the more sources it spans), the more relevant and personalized the generated ideas become.

## Scope / Acceptance Criteria

- [ ] Links / URLs: add a web page or article by URL — scrape, clean, chunk, embed (reuse async transcription / embeddings + RAG infra; evaluate Firecrawl for scraping).
- [ ] All file types: broaden beyond current limits — PDFs, docs, slides, spreadsheets, text, images (OCR/caption), audio/video (transcript). Parse → embed.
- [ ] Social media: ingest social content (e.g. LinkedIn posts via existing extension/scraper, profiles, threads) into the knowledge base as context.
- [ ] Idea generation integration: retrieve relevant knowledge (RAG) when generating ideas so output is grounded in the user's knowledge base; show which sources informed an idea where feasible.
- [ ] Workspace-scoped isolation on all new sources (workspace_id, RLS) and ViewAs support.
- [ ] Respect/raise upload limits sensibly; surface ingestion status (processing/ready/failed) per source.
- [ ] Localization for new UI strings (en/pl/fr/de).

## Notes / Open questions

* Reuse src/lib/services/embeddings.ts + rag.ts for chunking/embedding/retrieval.
* General-website scraping: extend BrightData vs add Firecrawl (same decision as onboarding revamp CAP-51).
* Storage + cost: embeddings volume per workspace, which embedding model, dedup of re-added sources.
* Define the retrieval contract the idea generator calls (top-k, filters, freshness).

Please authenticate to join the conversation.

Upvoters
Status

Planned

Board
💡

Feature Request

Date

About 3 hours ago

Author

Chris Koronowski

Subscribe to post

Get notified by email when there are changes.