Search Layer

Semantic Search & Collections

Source Files: semanticSearch.ts, smartCollections.ts Complexity: Medium

The Semantic Search & Smart Collections feature allows you to search and group your notes conceptually by their meaning and tags, rather than relying on exact keyword matching.

1. Overview

Traditional search engines look for exact words. If you search for "security", you might miss a note that discusses "authorization mechanism" without explicitly using the word "security". This feature solves this problem in two ways:

  • AI-Powered Semantic Search: Leverages an LLM-based batching process to evaluate search relevance, providing a semantic search experience without requiring a dedicated vector database or local embeddings model.
  • Smart Collections: Allows you to save dynamic queries that combine tags, date ranges, and semantic search queries. These collections are updated in real-time as your vault changes.

2. Under the Hood & Code Walkthrough

2.1 LLM-Based Search Batching

To make semantic search work without a local vector database, the extension batches notes in groups of 50. It displays each note with its filename and summary or snippet, and asks the LLM to return the indices of the most relevant notes matching the search query:

export async function searchNotes(query: string, notes: NoteInfo[]): Promise<string[]> {
    const BATCH_SIZE = 50;
    const allIndices: Array<{ index: number; batchOffset: number }> = [];

    for (let i = 0; i < notes.length; i += BATCH_SIZE) {
        const batch = notes.slice(i, i + BATCH_SIZE);
        const noteList = batch.map((n, idx) => {
            const display = buildNoteEntry(n.filePath, n.summary, n.snippet);
            return `${idx + 1}. ${display}`;
        }).join('\n');

        const prompt = `You are a note search assistant. Given the search query and a list of notes with their summaries, return the indices of the most relevant notes (up to 10), ranked by relevance.

Query: "${query}"

Notes:
${noteList}

Respond with ONLY a JSON array of indices, e.g. [3, 7, 1]. No other text.`;

        try {
            const response = await chatCompletionWithRetry(prompt);
            const indices = parseSearchResults(response);
            for (const idx of indices) {
                if (idx >= 1 && idx <= batch.length) {
                    allIndices.push({ index: idx - 1 + i, batchOffset: allIndices.length });
                }
            }
        } catch {}
    }

    return allIndices.slice(0, 10).map(item => notes[item.index].filePath);
}

2.2 Two-Phase Evaluation for Smart Collections

When running a collection, the extension uses a fast, two-phase process: it filters notes locally by tags and date ranges first (fast path), then runs the LLM search on that subset only (slow path):

export async function runCollection(collection: Collection, workspaceRoot: string): Promise<string[]> {
    const allNotes = await gatherNotesForFilter(workspaceRoot);
    
    // Fast path: Filter by tags & dates locally
    let filtered = allNotes.filter(note => matchesCollection(note, collection));

    // Slow path: LLM semantic query on candidate subset
    if (collection.query && filtered.length > 0) {
        const noteInfos = await gatherNotes(workspaceRoot);
        const filteredPaths = new Set(filtered.map(n => n.filePath));
        const relevantNotes = noteInfos.filter(n => filteredPaths.has(n.filePath));
        return searchNotes(collection.query, relevantNotes);
    }

    return filtered.map(n => n.filePath);
}