feat(insights): add local (self-hosted) transcription & translation provider#844
feat(insights): add local (self-hosted) transcription & translation provider#844xynstr wants to merge 2 commits intomynaparrot:mainfrom
Conversation
…rovider Adds a new 'local' provider type implementing insights.Provider. The provider talks to a separate companion service over a minimal WebSocket protocol (PCM16 audio in, partial/final text out) and an Azure-compatible HTTP endpoint for translation. Reference implementation: https://github.com/xynstr/plugnmeet-local-insights (MIT). Useful for GDPR-sensitive or air-gapped deployments where sending audio to Azure/Google is not an option, and for cost-sensitive operators running on CPU-only hardware. - Purely additive: no changes to existing interfaces or protobuf. - config_sample.yaml: commented example block. - docs/providers/local.md: setup guide and protocol documentation.
There was a problem hiding this comment.
Code Review
This pull request introduces a new 'local' insights provider for self-hosted transcription and translation, utilizing a companion service over WebSockets. The implementation includes a new documentation file, configuration samples, and the Go client logic. Feedback focuses on optimizing the HTTP client usage for translation requests, preventing potential memory exhaustion when reading response bodies, and improving the robustness of the transcription stream's read loop and memory management during audio sample processing.
| type LocalProvider struct { | ||
| account *config.ProviderAccount | ||
| service *config.ServiceConfig | ||
| logger *logrus.Entry | ||
| } | ||
|
|
||
| // NewProvider creates a new LocalProvider. | ||
| func NewProvider(providerAccount *config.ProviderAccount, serviceConfig *config.ServiceConfig, log *logrus.Entry) (insights.Provider, error) { | ||
| return &LocalProvider{ | ||
| account: providerAccount, | ||
| service: serviceConfig, | ||
| logger: log.WithField("service", "local"), | ||
| }, nil |
There was a problem hiding this comment.
Creating a new http.Client for every translation request is inefficient as it prevents connection reuse (TCP/TLS keep-alive). It is recommended to initialize the client once in the LocalProvider struct and reuse it.
| type LocalProvider struct { | |
| account *config.ProviderAccount | |
| service *config.ServiceConfig | |
| logger *logrus.Entry | |
| } | |
| // NewProvider creates a new LocalProvider. | |
| func NewProvider(providerAccount *config.ProviderAccount, serviceConfig *config.ServiceConfig, log *logrus.Entry) (insights.Provider, error) { | |
| return &LocalProvider{ | |
| account: providerAccount, | |
| service: serviceConfig, | |
| logger: log.WithField("service", "local"), | |
| }, nil | |
| type LocalProvider struct { | |
| account *config.ProviderAccount | |
| service *config.ServiceConfig | |
| logger *logrus.Entry | |
| httpClient *http.Client | |
| } | |
| // NewProvider creates a new LocalProvider. | |
| func NewProvider(providerAccount *config.ProviderAccount, serviceConfig *config.ServiceConfig, log *logrus.Entry) (insights.Provider, error) { | |
| return &LocalProvider{ | |
| account: providerAccount, | |
| service: serviceConfig, | |
| logger: log.WithField("service", "local"), | |
| httpClient: &http.Client{Timeout: 15 * time.Second}, | |
| }, nil | |
| } |
| client := &http.Client{Timeout: 15 * time.Second} | ||
| resp, err := client.Do(req) |
| defer resp.Body.Close() | ||
|
|
||
| if resp.StatusCode != http.StatusOK { | ||
| bodyBytes, _ := io.ReadAll(resp.Body) |
There was a problem hiding this comment.
Using io.ReadAll on an HTTP response body without a limit can lead to excessive memory consumption if the server returns a large response. Consider using io.LimitReader to read only a reasonable amount of data for error messages.
| bodyBytes, _ := io.ReadAll(resp.Body) | |
| bodyBytes, _ := io.ReadAll(io.LimitReader(resp.Body, 1024)) |
| resultsChan <- &insights.TranscriptionEvent{ | ||
| Type: insights.EventTypeError, | ||
| Error: err.Error(), | ||
| } |
There was a problem hiding this comment.
Sending to resultsChan can block indefinitely if the consumer stops reading from the channel. This would hang the readLoop goroutine. Use a select block with ctx.Done() to ensure the goroutine can exit even if the channel is full and the consumer is gone. This pattern should be applied to all sends to resultsChan within the loop.
select {
case resultsChan <- &insights.TranscriptionEvent{
Type: insights.EventTypeError,
Error: err.Error(),
}:
case <-ctx.Done():
return
}| return fmt.Errorf("stream is closed") | ||
| } | ||
|
|
||
| buf := make([]byte, len(sample)*2) |
- Reuse http.Client across translation requests (TCP/TLS keep-alive). - Bound error response body reads with io.LimitReader (1 KiB) to avoid unbounded memory consumption on malformed companion responses. - Send all readLoop events through a select with ctx.Done() so the goroutine can unwind when the consumer stops reading. - Pool PCM-encoding scratch buffers with sync.Pool to reduce GC pressure on the hot WriteSample path.
|
Thanks for the review! All five points are valid — pushed a follow-up commit (57d7f9f) that addresses them:
Local verification: |
|
Superseded by #848 — reworked along the lines you suggested (pure Go, on top of |
Motivation
Adds a
localprovider to the Insights system for self-hosted,privacy-preserving transcription and translation. Useful for:
blocked by policy.
Design
insights.Providerinterface — no changesto existing interfaces or protobuf messages.
WebSocket protocol (documented in
docs/providers/local.md).with another backend (whisper.cpp, Vosk, Deepgram self-hosted, …)
without touching any Go code.
The reference companion service lives in a separate repository
(https://github.com/xynstr/plugnmeet-local-insights, MIT licensed)
so it can have its own Python-native lifecycle and CI.
Scope
Purely additive. No behaviour change for existing
azure/googleusers.
Testing
per-language translation round-trip verified.
smallint8, VAD filter enabled, 500 ms chunks.target languages take roughly the same wall time as 1.
Compatibility & Licenses
CC-BY-NC 4.0. The companion README documents this explicitly and
points commercial operators at permissive alternatives
(e.g.
opus-mt-*). Transcription (Whisper) is MIT-compatible.translate_url), thetranslation model never loads and its license is not triggered.
Out of scope for this PR
SynthesizeText) and AI chat (AITextChat*) arestubbed with explicit "not supported" responses — the local provider
focuses on STT and MT only.
Happy to iterate on naming, packaging, or documentation based on review
feedback.