Roadmap: Making SEDB a complete smart embeddings database #1

New issue

Closed

opened 2026-03-22 22:37:22 +01:00 by catboi · 3 comments

catboi commented

2026-03-22 22:37:22 +01:00

Collaborator

Copy link

Roadmap

1. Persistence & Durability

Add SQL backing store for metadata (SQLite)
Store vectors separately from metadata
Transaction support, WAL mode for crash recovery
Incremental saves instead of full export
WAL checkpointing

2. Query Language

Structured query DSL
Filter operators: =, !=, in(), contains(), between()
Aggregations: count, group_by, stats
Time-range queries (QueryBuilder.timeRange())

3. Collections/Namespace

Multiple named collections
Per-collection index config
Collection-level stats

4. Ingestion Pipeline

Batch insert with progress callbacks
ID deduplication (upsert semantics)
Parallel embedding computation

5. RAG Features

Document chunking strategies
Retrieval with optional reranking
Hybrid search (vector + BM25)
BM25 stopword filtering

6. Observability

Query logging
Slow query detection
Index stats

7. Tech Debt

PQ index needs more testing
Query caching
Connection pooling for Ollama

Note: SEDB is a generic vector DB. Platform-specific adapters belong in separate projects.

## Roadmap ### 1. Persistence & Durability - [x] Add SQL backing store for metadata (SQLite) - [x] Store vectors separately from metadata - [x] Transaction support, WAL mode for crash recovery - [x] Incremental saves instead of full export - [x] WAL checkpointing ### 2. Query Language - [x] Structured query DSL - [x] Filter operators: =, !=, in(), contains(), between() - [x] Aggregations: count, group_by, stats - [x] Time-range queries (`QueryBuilder.timeRange()`) ### 3. Collections/Namespace - [x] Multiple named collections - [x] Per-collection index config - [x] Collection-level stats ### 4. Ingestion Pipeline - [x] Batch insert with progress callbacks - [x] ID deduplication (upsert semantics) - [x] Parallel embedding computation ### 5. RAG Features - [x] Document chunking strategies - [x] Retrieval with optional reranking - [x] Hybrid search (vector + BM25) - [x] BM25 stopword filtering ### 6. Observability - [x] Query logging - [x] Slow query detection - [x] Index stats ### 7. Tech Debt - [x] PQ index needs more testing - [x] Query caching - [x] Connection pooling for Ollama **Note:** SEDB is a generic vector DB. Platform-specific adapters belong in separate projects.

LeNooby09 self-assigned this

2026-03-23 00:15:17 +01:00

LeNooby09 commented

2026-03-23 01:01:47 +01:00

Owner

Copy link

@catboi new commit has been pushed, bluesky specific stuff has not been included as the focus is a general database

catboi commented

2026-03-23 01:08:50 +01:00

Author

Collaborator

Copy link

Code review of ef2a304 - excellent progress!

What's Implemented

SqliteStore with proper schema for vectors + metadata
QueryCache with TTL and size limits
OllamaConnectionPool for connection reuse
BatchIngestor with upsert semantics
HybridSearch combining vector + BM25
Collection/CollectionManager for namespacing
Aggregation support (count, groupBy, stats)
QueryDsl for structured queries
IndexStats for observability
DocumentChunker for RAG

Looks Good

Parallel embedding computation in BatchIngestor
WAL mode in SqliteStore
BM25 implementation for keyword search
Query logging with timing

Remaining Items

Transaction support - SqliteStore uses autocommit, no explicit transactions
Time-range queries - not in QueryDsl yet
Archiver integration - BlueskySource not started
PQ index testing - still needs coverage
WAL checkpointing - SqliteStore has journal_mode=WAL but no explicit checkpoint

Minor Notes

QueryCache uses synchronizedMap - could consider ConcurrentHashMap
BM25 could use a stopword list for better precision

Overall: ~70% of roadmap done. Archiver integration is the main remaining piece.

Code review of ef2a304 - excellent progress! ## What's Implemented - SqliteStore with proper schema for vectors + metadata - QueryCache with TTL and size limits - OllamaConnectionPool for connection reuse - BatchIngestor with upsert semantics - HybridSearch combining vector + BM25 - Collection/CollectionManager for namespacing - Aggregation support (count, groupBy, stats) - QueryDsl for structured queries - IndexStats for observability - DocumentChunker for RAG ## Looks Good - Parallel embedding computation in BatchIngestor - WAL mode in SqliteStore - BM25 implementation for keyword search - Query logging with timing ## Remaining Items 1. **Transaction support** - SqliteStore uses autocommit, no explicit transactions 2. **Time-range queries** - not in QueryDsl yet 3. **Archiver integration** - BlueskySource not started 4. **PQ index testing** - still needs coverage 5. **WAL checkpointing** - SqliteStore has journal_mode=WAL but no explicit checkpoint ## Minor Notes - QueryCache uses synchronizedMap - could consider ConcurrentHashMap - BM25 could use a stopword list for better precision Overall: ~70% of roadmap done. Archiver integration is the main remaining piece.

catboi commented

2026-03-23 01:49:11 +01:00

Author

Collaborator

Copy link

Reviewed a79f5df - nice additions:

BM25 stopword removal now configurable
SqliteStore.transaction() now public with rollback support
WAL checkpoint functionality added
ProductQuantizer and SqliteStore test coverage expanded

Two roadmap items now complete: transaction support and PQ index testing.

Reviewed a79f5df - nice additions: - BM25 stopword removal now configurable - SqliteStore.transaction() now public with rollback support - WAL checkpoint functionality added - ProductQuantizer and SqliteStore test coverage expanded Two roadmap items now complete: transaction support and PQ index testing.

catboi closed this issue

2026-03-23 01:51:13 +01:00

No Branch/Tag specified

master

No results found.

Labels

Clear labels

No items

No labels

Milestone

Clear milestone

No items

No milestone

Projects

Clear projects

No items

No project

Assignees

Clear assignees

No assignees

LeNooby09

2 participants

Notifications

Due date

The due date is invalid or out of range. Please use the format "yyyy-mm-dd".

No due date set.

Dependencies

No dependencies set.

Reference

LeNooby09/SEDB#1

Reference in a new issue

Repository

LeNooby09/SEDB

Title

Body

No description provided.

Delete branch "%!s()"

Deleting a branch is permanent. Although the deleted branch may continue to exist for a short time before it actually gets removed, it CANNOT be undone in most cases. Continue?

Rows
Columns