Skip to content

feat(insights): add local (self-hosted) transcription & translation provider#844

Closed
xynstr wants to merge 2 commits intomynaparrot:mainfrom
xynstr:feat/local-insights-provider
Closed

feat(insights): add local (self-hosted) transcription & translation provider#844
xynstr wants to merge 2 commits intomynaparrot:mainfrom
xynstr:feat/local-insights-provider

Conversation

@xynstr
Copy link
Copy Markdown

@xynstr xynstr commented Apr 24, 2026

Motivation

Adds a local provider to the Insights system for self-hosted,
privacy-preserving transcription and translation. Useful for:

  • GDPR-sensitive deployments where sending audio to Azure/Google is
    blocked by policy.
  • Air-gapped / on-premise installations.
  • Cost-sensitive operators running on CPU-only hardware.

Design

  • Implements the existing insights.Provider interface — no changes
    to existing interfaces or protobuf messages.
  • Communicates with a separate companion service over a minimal
    WebSocket protocol (documented in docs/providers/local.md).
  • Anyone can replace the reference companion (faster-whisper + NLLB)
    with another backend (whisper.cpp, Vosk, Deepgram self-hosted, …)
    without touching any Go code.

The reference companion service lives in a separate repository
(https://github.com/xynstr/plugnmeet-local-insights, MIT licensed)
so it can have its own Python-native lifecycle and CI.

Scope

pkg/insights/providers/local/client.go       (provider + translation)
pkg/insights/providers/local/transcribe.go   (WebSocket stream client)
pkg/insights/providers/local/languages.go    (supported language list)
pkg/services/insights/insights_service.go    (+3 lines: register "local")
config_sample.yaml                           (+ commented example block)
docs/providers/local.md                      (new: setup guide)

Purely additive. No behaviour change for existing azure / google
users.

Testing

  • End-to-end smoke-tested on ARM64 (Neoverse-N1, 10 cores) and x86_64.
  • Handshake → audio streaming → partial/final transcription →
    per-language translation round-trip verified.
  • faster-whisper small int8, VAD filter enabled, 500 ms chunks.
  • Multi-language translation batched in a single CTranslate2 call — 3
    target languages take roughly the same wall time as 1.

Compatibility & Licenses

  • This PR is MIT-licensed, like the rest of the project.
  • The reference companion's translation model (NLLB-200) is
    CC-BY-NC 4.0. The companion README documents this explicitly and
    points commercial operators at permissive alternatives
    (e.g. opus-mt-*). Transcription (Whisper) is MIT-compatible.
  • If only transcription is configured (no translate_url), the
    translation model never loads and its license is not triggered.

Out of scope for this PR

  • Speech synthesis (SynthesizeText) and AI chat (AITextChat*) are
    stubbed with explicit "not supported" responses — the local provider
    focuses on STT and MT only.
  • Batch file summarisation is likewise out of scope.

Happy to iterate on naming, packaging, or documentation based on review
feedback.

…rovider

Adds a new 'local' provider type implementing insights.Provider. The
provider talks to a separate companion service over a minimal WebSocket
protocol (PCM16 audio in, partial/final text out) and an Azure-compatible
HTTP endpoint for translation. Reference implementation:
https://github.com/xynstr/plugnmeet-local-insights (MIT).

Useful for GDPR-sensitive or air-gapped deployments where sending audio
to Azure/Google is not an option, and for cost-sensitive operators
running on CPU-only hardware.

- Purely additive: no changes to existing interfaces or protobuf.
- config_sample.yaml: commented example block.
- docs/providers/local.md: setup guide and protocol documentation.
@CLAassistant
Copy link
Copy Markdown

CLAassistant commented Apr 24, 2026

CLA assistant check
All committers have signed the CLA.

Copy link
Copy Markdown

@gemini-code-assist gemini-code-assist Bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Code Review

This pull request introduces a new 'local' insights provider for self-hosted transcription and translation, utilizing a companion service over WebSockets. The implementation includes a new documentation file, configuration samples, and the Go client logic. Feedback focuses on optimizing the HTTP client usage for translation requests, preventing potential memory exhaustion when reading response bodies, and improving the robustness of the transcription stream's read loop and memory management during audio sample processing.

Comment on lines +20 to +32
type LocalProvider struct {
account *config.ProviderAccount
service *config.ServiceConfig
logger *logrus.Entry
}

// NewProvider creates a new LocalProvider.
func NewProvider(providerAccount *config.ProviderAccount, serviceConfig *config.ServiceConfig, log *logrus.Entry) (insights.Provider, error) {
return &LocalProvider{
account: providerAccount,
service: serviceConfig,
logger: log.WithField("service", "local"),
}, nil
Copy link
Copy Markdown

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

medium

Creating a new http.Client for every translation request is inefficient as it prevents connection reuse (TCP/TLS keep-alive). It is recommended to initialize the client once in the LocalProvider struct and reuse it.

Suggested change
type LocalProvider struct {
account *config.ProviderAccount
service *config.ServiceConfig
logger *logrus.Entry
}
// NewProvider creates a new LocalProvider.
func NewProvider(providerAccount *config.ProviderAccount, serviceConfig *config.ServiceConfig, log *logrus.Entry) (insights.Provider, error) {
return &LocalProvider{
account: providerAccount,
service: serviceConfig,
logger: log.WithField("service", "local"),
}, nil
type LocalProvider struct {
account *config.ProviderAccount
service *config.ServiceConfig
logger *logrus.Entry
httpClient *http.Client
}
// NewProvider creates a new LocalProvider.
func NewProvider(providerAccount *config.ProviderAccount, serviceConfig *config.ServiceConfig, log *logrus.Entry) (insights.Provider, error) {
return &LocalProvider{
account: providerAccount,
service: serviceConfig,
logger: log.WithField("service", "local"),
httpClient: &http.Client{Timeout: 15 * time.Second},
}, nil
}

Comment thread pkg/insights/providers/local/client.go Outdated
Comment on lines +86 to +87
client := &http.Client{Timeout: 15 * time.Second}
resp, err := client.Do(req)
Copy link
Copy Markdown

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

medium

Use the shared httpClient instead of creating a new one for each request.

Suggested change
client := &http.Client{Timeout: 15 * time.Second}
resp, err := client.Do(req)
resp, err := p.httpClient.Do(req)

Comment thread pkg/insights/providers/local/client.go Outdated
defer resp.Body.Close()

if resp.StatusCode != http.StatusOK {
bodyBytes, _ := io.ReadAll(resp.Body)
Copy link
Copy Markdown

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

medium

Using io.ReadAll on an HTTP response body without a limit can lead to excessive memory consumption if the server returns a large response. Consider using io.LimitReader to read only a reasonable amount of data for error messages.

Suggested change
bodyBytes, _ := io.ReadAll(resp.Body)
bodyBytes, _ := io.ReadAll(io.LimitReader(resp.Body, 1024))

Comment on lines +101 to +104
resultsChan <- &insights.TranscriptionEvent{
Type: insights.EventTypeError,
Error: err.Error(),
}
Copy link
Copy Markdown

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

medium

Sending to resultsChan can block indefinitely if the consumer stops reading from the channel. This would hang the readLoop goroutine. Use a select block with ctx.Done() to ensure the goroutine can exit even if the channel is full and the consumer is gone. This pattern should be applied to all sends to resultsChan within the loop.

				select {
				case resultsChan <- &insights.TranscriptionEvent{
					Type:  insights.EventTypeError,
					Error: err.Error(),
				}:
				case <-ctx.Done():
					return
				}

return fmt.Errorf("stream is closed")
}

buf := make([]byte, len(sample)*2)
Copy link
Copy Markdown

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

medium

WriteSample is called frequently for every audio chunk. Allocating a new byte slice buf on every call increases GC pressure. Consider using a sync.Pool or a pre-allocated buffer protected by the existing mutex to improve performance.

- Reuse http.Client across translation requests (TCP/TLS keep-alive).
- Bound error response body reads with io.LimitReader (1 KiB) to avoid
  unbounded memory consumption on malformed companion responses.
- Send all readLoop events through a select with ctx.Done() so the
  goroutine can unwind when the consumer stops reading.
- Pool PCM-encoding scratch buffers with sync.Pool to reduce GC pressure
  on the hot WriteSample path.
@xynstr
Copy link
Copy Markdown
Author

xynstr commented Apr 24, 2026

Thanks for the review! All five points are valid — pushed a follow-up commit (57d7f9f) that addresses them:

  1. Reusable http.Client on the LocalProvider struct (combined with Feat: webhook #2).
  2. io.LimitReader bounded to 1 KiB when reading error response bodies.
  3. select with ctx.Done() on every send to resultsChan in readLoop via a small emit closure, so the goroutine can unwind even if the consumer stopped reading.
  4. sync.Pool of []byte buffers for WriteSample to reduce GC pressure on the hot audio-frame path.

Local verification: gofmt clean, go vet ./pkg/insights/providers/local/... clean, full go build produces a working binary.

@xynstr
Copy link
Copy Markdown
Author

xynstr commented Apr 28, 2026

Superseded by #848 — reworked along the lines you suggested (pure Go, on top of openai-go, no extra language). The diff there has nothing in common with this branch any more, so a new PR felt cleaner than a force-push. Closing this one to keep the review surface focused. Thanks for the steer!

@xynstr xynstr closed this Apr 28, 2026
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

2 participants