Search Interface

easysearch is a lightweight search interface for browsing and querying transcription outputs, built into easytranscriber. It indexes the alignment JSON files into a SQLite database with full-text search and serves a web UI for searching, browsing documents, and playing back audio with synchronized transcript highlighting.

See the demo transcription for a preview of the synchronized highlighting.

Installation

The easysearch dependencies are optional. Install them with:

pip install easytranscriber[search]

Quick start

After running the transcription pipeline, start the search server by pointing it at your alignment outputs and audio files:

easysearch --alignments-dir output/alignments --audio-dir data/audio

This will:

Index all alignment JSON files into a local SQLite database (search.db).
Start a web server at http://127.0.0.1:8642.

On subsequent launches, only new or modified files are re-indexed. Use --reindex to force a full re-index.

Audio playback

Clicking a search result takes you to the document page at the matching timestamp. The audio player seeks to that position and begins playback. The transcript view highlights the currently playing word in real time, and you can click any sentence to jump to that point in the audio.

Note

Some browsers block autoplay by default. If audio doesn’t start automatically when navigating from a search result, a play button overlay will appear – click it to begin playback.

How indexing works

easysearch automatically detects the index mode from the file contents:

chunks mode — used when files are produced by the easytranscriber ASR pipeline. Indexes at the VAD chunk level: each chunk is a continuous speech region (~20–30 seconds) that may span multiple sentences. Each chunk becomes a searchable row in the database.
alignments mode — used when files are produced by easyaligner (ground-truth text alignment), where chunks carry no text. Indexes at the alignment segment level (sentence or paragraph), giving finer-grained search results.

The mode is detected by checking whether the first chunk in the file contains text. You can override it with --index-mode if needed.

In chunks mode:

A search query matches when all terms appear within the same chunk.
Words that span across adjacent chunks won’t match as a combined query.
Because chunks are larger than individual sentences, cross-sentence queries within the same chunk will match.

In alignments mode:

A search query matches when all terms appear within the same alignment segment.
Finer-grained than chunks — queries must fit within a single sentence or paragraph.

The document page displays the transcript at the alignment segment level in both modes, so you can click any sentence to jump to that point in the audio.

Search syntax

The search uses SQLite’s FTS5 full-text search engine. The following query syntax is supported:

Query	Matches
`climate change`	Chunks containing both words (implicit AND)
`"climate change"`	Exact phrase
`climate OR weather`	Either word
`climate NOT weather`	climate but not weather
`econom*`	Prefix match: economy, economic, etc.
`NEAR(climate change, 3)`	Both words within 3 tokens of each other

CLI reference

easysearch --help

Option	Default	Description
`--alignments-dir`	`output/alignments`	Directory containing alignment JSON files
`--audio-dir`	`data`	Directory containing source audio files
`--db`	`search.db`	Path to the SQLite database file
`--host`	`127.0.0.1`	Host to bind to
`--port`	`8642`	Port to listen on
`--per-page`	`20`	Results per page
`--snippets-per-doc`	`5`	Max matching snippets shown per document in search results
`--reindex`		Force full re-index of all JSON files
`--index-mode`	auto	Override index mode: `chunks` or `alignments` (auto-detected if omitted)