Spacedrive v3: The local-first data engine.

Four years ago I open-sourced Spacedrive with a simple conviction: your data belongs to you, and you should find all of it from one place. Two major versions, a $2M seed round, 37,000 GitHub stars, and 600,000 downloads later, that conviction hasn't changed. How we deliver on it has.

Today I'm announcing Spacedrive v3. A local-first data engine that indexes any data source and makes everything searchable from one place.

What changed

v1 and v2 were a cross-platform file manager with P2P sync. v1 shipped an alpha that half a million people downloaded. The repo became the 30th most-starred Rust project on GitHub. But the codebase collapsed under infrastructure debt with a 12-person team and $2M over three years.

v2 was a solo clean-room rewrite. Beautiful macOS and iOS apps. Working sync, working P2P, WASM extension system, 183k lines of Rust. The architecture was sound. But the surface area of a file manager supporting five operating systems is inherently too large. Every platform has its own filesystem semantics, its own drag-and-drop APIs, its own thumbnail pipelines, its own permission models. You end up maintaining five separate implementations of features that users expect to work flawlessly because they compare you to Finder and Explorer. As a solo founder, that surface area was not reducible to something shippable. Neither version reached a stable release.

The lesson: a file manager is one way to deliver on data ownership. The most valuable part of Spacedrive was never the file browser. It was the index. The ability to ingest any data source, structure it, and make it instantly searchable on your own machine.

What v3 does

Spacedrive indexes your data sources and makes everything searchable from one place. Each data source becomes a repository, a self-contained folder with its own SQLite database and vector index. You keep your data where it is. Spacedrive reads it, indexes it, and lets you search it.

One search bar. Every email, note, bookmark, calendar event, contact, GitHub issue, Slack message, and coding session. Filter by date range, sort by time, or search by relevance. Local. Instant. Yours.

What makes v3 different from every other local data tool: every record that enters the system passes through a processing pipeline before it reaches the search index.

The processing pipeline

This is core to the product, not an optional feature.

Stage 1: Safety screening. Every record is scanned by Prompt Guard 2, Meta's open-weight injection classifier. Runs locally on CPU in under 50ms per chunk. Records classified as injections are quarantined. Borderline records are flagged with safety metadata.

When Spacedrive indexes an email inbox, anyone who sent you a message puts arbitrary text into your search corpus. That text gets retrieved as context for AI agents. If the text contains adversarial instructions, the agent follows them. This is prompt injection. It is OWASP's #1 LLM vulnerability. Spacedrive screens for it at ingest time, before the content ever becomes searchable.

Stage 2: Content classification. Not all indexed content is equally useful. Email signature blocks, Slack bot messages, CI notifications, and auto-generated spam add noise. The classification stage scores and categorizes every record. Quality score (0.0-1.0), noise detection, and category tags. Low-quality records get pushed down in search rankings or excluded from agent queries entirely.

Stage 3: Adapter-specific processing. Email gets signature stripping and thread importance scoring. Slack gets bot detection. Files get content hashing and text extraction. Each adapter declares its own processing steps.

Stage 4: Search indexing. Only records that pass stages 1-3 enter the FTS5 full-text index and LanceDB vector index. Quality scores influence ranking. Classification tags become filterable metadata.

No other local data tool has a dedicated injection guard, content classifier, and trust tier system as first-class product features. Every tool in this space pipes raw content straight into LLM context with zero screening. Spacedrive sits between raw data and AI consumers and screens everything first.

Trust tiers and privacy

A single Spacedrive instance might index personal email, work Slack, private notes, and browsing history. Cross-repository search returns results from all of them. That needs boundaries.

Every repository gets a trust classification based on its data source. Notes you wrote are "authored" with balanced screening. A shared Slack workspace is "collaborative." Your email inbox is "external" with strict screening by default, because anyone who emails you controls what text enters your search corpus.

Per-repository visibility controls let you mark repos as private, shared, or agent-excluded. The desktop app surfaces agent access status on every repository card. You never have to wonder whether your AI agent is seeing your unscreened email inbox.

The VDFS

The Virtual Distributed Filesystem launches as a single-device filesystem adapter. Files stay where they are. Spacedrive stores metadata, BLAKE3 content hashes, and extracted text. Your local files become searchable alongside your emails, notes, and everything else in the index.

v1 promised multi-device sync and never delivered it. v2 built working P2P but couldn't ship a stable release. v3 learns from both: ship the search index now, add cross-device sync later at the repository level. The value is in making your data findable, not in realtime file sync. Multi-device and cross-device dedup are on the roadmap. Neither blocks a usable product today.

11 adapters at launch

Adapters are script-based. A TOML manifest and a sync script in any language. Spacedrive sends config as JSON to stdin. The script writes JSONL records to stdout. If it reads stdin and prints lines, it is an adapter.

Shipping at launch: Gmail, Apple Notes, Chrome Bookmarks, Chrome History, Safari History, Obsidian, OpenCode, Slack Export, macOS Contacts, macOS Calendar, and GitHub. All produce UTC-normalized dates and support incremental sync via cursors.

Architecture

Rust. Tokio. SQLite via sqlx. LanceDB for vector search. FastEmbed for embeddings (all-MiniLM-L6-v2, runs locally). BLAKE3 for content hashing. An encrypted secrets store backed by redb with AES-256-GCM and Argon2id. 68 tests covering every subsystem.

The core is a pure Rust library crate (spacedrive-core) with a single Engine entry point. The CLI, the Tauri 2 desktop app, and Spacebot are all thin consumers of that crate. No server dependencies. Single binary. All data lives in local files under ~/.spacedrive/.

Spacebot integration

Spacedrive is designed for native integration with Spacebot, the AI agent for teams and communities. Spacebot links spacedrive-core as a Rust crate and calls it directly. No IPC, no protocol overhead.

Spacebot's memory system is the agent's brain: structured thoughts with weighted associations and importance scores. Spacedrive is the agent's access to raw source data: emails, notes, bookmarks, browsing history. Indexed, screened, classified, searchable. Memories are what the agent thinks. Spacedrive data is what the agent knows about.

Every search result that reaches Spacebot has already been screened for injection, classified for quality, scored for relevance, and tagged with trust metadata.

The business model

The core is free, open source, and local-first forever. Spacedrive makes Spacebot vastly more valuable by giving it safe, structured access to user data. The processing pipeline is what makes it responsible to connect real user data to a production AI agent. Spacedrive does not need its own business model. It is the knowledge layer for a product that has one.

Availability

The v3 codebase is currently in a private repository. On March 20, the full codebase will be migrated to spacedriveapp/spacedrive as the new default branch. The current v2 codebase will be moved to a v2 branch. All git history will be preserved.

Desktop app downloads will be available on the same date. In the meantime, you can star the repo, join the Discord to follow along, or read the docs early.

The 37,000 people who starred this repo over four years believed your data should be findable without surrendering it to someone else's cloud. That principle has not changed. Now we have added a second one: your data should be safe before it is searchable.

Both are non-negotiable.

GitHub · Docs · Discord