Search Interface
easysearch is a lightweight search interface for browsing and querying transcription outputs, built into easytranscriber. It indexes the alignment JSON files into a SQLite database with full-text search and serves a web UI for searching, browsing documents, and playing back audio with synchronized transcript highlighting.
See the demo transcription for a preview of the synchronized highlighting.
Installation
The easysearch dependencies are optional. Install them with:
pip install easytranscriber[search]Quick start
After running the transcription pipeline, start the search server by pointing it at your alignment outputs and audio files:
easysearch --alignments-dir output/alignments --audio-dir data/audioThis will:
- Index all alignment JSON files into a local SQLite database (
search.db). - Start a web server at http://127.0.0.1:8642.
On subsequent launches, only new or modified files are re-indexed. Use --reindex to force a full re-index.
Audio playback
Clicking a search result takes you to the document page at the matching timestamp. The audio player seeks to that position and begins playback. The transcript view highlights the currently playing word in real time, and you can click any sentence to jump to that point in the audio.
Some browsers block autoplay by default. If audio doesn’t start automatically when navigating from a search result, a play button overlay will appear – click it to begin playback.
How indexing works
easysearch automatically detects the index mode from the file contents:
chunksmode — used when files are produced by theeasytranscriberASR pipeline. Indexes at the VAD chunk level: each chunk is a continuous speech region (~20–30 seconds) that may span multiple sentences. Each chunk becomes a searchable row in the database.alignmentsmode — used when files are produced byeasyaligner(ground-truth text alignment), where chunks carry no text. Indexes at the alignment segment level (sentence or paragraph), giving finer-grained search results.
The mode is detected by checking whether the first chunk in the file contains text. You can override it with --index-mode if needed.
In chunks mode:
- A search query matches when all terms appear within the same chunk.
- Words that span across adjacent chunks won’t match as a combined query.
- Because chunks are larger than individual sentences, cross-sentence queries within the same chunk will match.
In alignments mode:
- A search query matches when all terms appear within the same alignment segment.
- Finer-grained than chunks — queries must fit within a single sentence or paragraph.
The document page displays the transcript at the alignment segment level in both modes, so you can click any sentence to jump to that point in the audio.
Search syntax
The search uses SQLite’s FTS5 full-text search engine. The following query syntax is supported:
| Query | Matches |
|---|---|
climate change |
Chunks containing both words (implicit AND) |
"climate change" |
Exact phrase |
climate OR weather |
Either word |
climate NOT weather |
climate but not weather |
econom* |
Prefix match: economy, economic, etc. |
NEAR(climate change, 3) |
Both words within 3 tokens of each other |
CLI reference
easysearch --help| Option | Default | Description |
|---|---|---|
--alignments-dir |
output/alignments |
Directory containing alignment JSON files |
--audio-dir |
data |
Directory containing source audio files |
--db |
search.db |
Path to the SQLite database file |
--host |
127.0.0.1 |
Host to bind to |
--port |
8642 |
Port to listen on |
--per-page |
20 |
Results per page |
--snippets-per-doc |
5 |
Max matching snippets shown per document in search results |
--reindex |
Force full re-index of all JSON files | |
--index-mode |
auto | Override index mode: chunks or alignments (auto-detected if omitted) |