Search Interface

easysearch is a lightweight search interface for browsing and querying transcription outputs, built into easytranscriber. It indexes the alignment JSON files into a SQLite database with full-text search and serves a web UI for searching, browsing documents, and playing back audio with synchronized transcript highlighting.

See the demo transcription for a preview of the synchronized highlighting.

Installation

The easysearch dependencies are optional. Install them with:

pip install easytranscriber[search]

Quick start

After running the transcription pipeline, start the search server by pointing it at your alignment outputs and audio files:

easysearch --alignments-dir output/alignments --audio-dir data/audio

This will:

  1. Index all alignment JSON files into a local SQLite database (search.db).
  2. Start a web server at http://127.0.0.1:8642.

On subsequent launches, only new or modified files are re-indexed. Use --reindex to force a full re-index.

Audio playback

Clicking a search result takes you to the document page at the matching timestamp. The audio player seeks to that position and begins playback. The transcript view highlights the currently playing word in real time, and you can click any sentence to jump to that point in the audio.

Note

Some browsers block autoplay by default. If audio doesn’t start automatically when navigating from a search result, a play button overlay will appear – click it to begin playback.

How indexing works

easysearch indexes transcriptions at the chunk level. Each chunk is a continuous speech region (~20–30 seconds) produced by voice activity detection, and may span multiple sentences. Each chunk becomes a searchable row in the database.

This means:

  • A search query matches when all terms appear within the same chunk.
  • Words that span across adjacent chunks won’t match as a combined query.
  • Because chunks are larger than individual sentences, cross-sentence queries within the same chunk will match.

The document page still displays the transcript at the alignment segment (sentence) level, so you can click any sentence to jump to that point in the audio.

Search syntax

The search uses SQLite’s FTS5 full-text search engine. The following query syntax is supported:

Query Matches
climate change Chunks containing both words (implicit AND)
"climate change" Exact phrase
climate OR weather Either word
climate NOT weather climate but not weather
econom* Prefix match: economy, economic, etc.
NEAR(climate change, 3) Both words within 3 tokens of each other

CLI reference

easysearch --help
Option Default Description
--alignments-dir output/alignments Directory containing alignment JSON files
--audio-dir data Directory containing source audio files
--db search.db Path to the SQLite database file
--host 127.0.0.1 Host to bind to
--port 8642 Port to listen on
--per-page 20 Results per page
--snippets-per-doc 5 Max matching snippets shown per document in search results
--reindex Force full re-index of all JSON files