Skip to content
Closed
Show file tree
Hide file tree
Changes from 1 commit
Commits
File filter

Filter by extension

Filter by extension

Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
12 changes: 12 additions & 0 deletions config_sample.yaml
Original file line number Diff line number Diff line change
Expand Up @@ -252,6 +252,18 @@ insights:
credentials:
api_key: "YOUR_GEMINI_API_KEY_HERE"

# Self-hosted transcription & translation. Talks to a companion
# service over WebSocket (PCM16). Reference implementation:
# https://github.com/xynstr/plugnmeet-local-insights
# Useful for GDPR-sensitive or air-gapped deployments. Setup guide:
# docs/providers/local.md
#local:
# - id: "local-01"
# credentials: { api_key: "", region: "" }
# options:
# whisper_url: "ws://whisper-local:8002/ws/transcribe"
# translate_url: "http://whisper-local:8002/translate"

# 2. Define the services that USE the providers.
# The key ("transcription", "translation", "ai_text_chat", "meeting_summarizing") is the service name.
services:
Expand Down
104 changes: 104 additions & 0 deletions docs/providers/local.md
Original file line number Diff line number Diff line change
@@ -0,0 +1,104 @@
# Local (Self-Hosted) Insights Provider

The `local` provider enables self-hosted transcription and translation
without sending audio to a cloud vendor. It is useful for GDPR-sensitive
deployments, air-gapped environments, or cost-sensitive operators.

## Architecture

```
LiveKit audio ──► plugNmeet-server ──► local provider
├─ WebSocket (PCM16 16kHz mono)
companion service (Python)
├─ faster-whisper (STT)
└─ NLLB-200 (translation, optional)
```

The `local` provider is a thin Go client that speaks a simple WebSocket
protocol. The heavy lifting (speech recognition, translation) happens in a
separate companion service written in Python. The reference
implementation is available at:

**<https://github.com/xynstr/plugnmeet-local-insights>** (MIT licensed)

## Configuration

Add this block to your `config.yaml`:

```yaml
insights:
enabled: true
providers:
local:
- id: "local-01"
credentials:
api_key: ""
region: ""
options:
whisper_url: "ws://whisper-local:8002/ws/transcribe"
translate_url: "http://whisper-local:8002/translate"
services:
transcription:
provider: "local"
id: "local-01"
options: {}
translation:
provider: "local"
id: "local-01"
options: {}
```

`whisper_url` is the WebSocket endpoint of the companion service.
`translate_url` is optional and only needed when translation is used.

## Running the companion service

```bash
docker run -d --name whisper-local \
--network plugnmeet_net \
-p 8002:8002 \
ghcr.io/xynstr/plugnmeet-local-insights:latest
```

Or build from source — see the companion repo's README for details.

## Supported languages

Transcription (via faster-whisper):
`de`, `en`, `ar`, `uk`, `ru`, `fr`, `es`, `it`, `pl`, `tr`, `fa`, `zh`,
`ja`, `ko`, `pt`, `nl`.

Translation (via NLLB-200, optional, non-commercial license — see
companion repo for details): same list.

## Hardware notes

The reference implementation defaults to CPU with int8 quantization.
It is tested on:

- ARM64 (Neoverse-N1, 10 cores) — `small` Whisper model, real-time
transcription feasible with VAD and 500 ms chunks.
- x86_64 — similar performance profile.

For GPU, switch the companion service to `device=cuda` via environment
variables (see companion repo).

## Protocol

The WebSocket protocol is intentionally minimal:

```
Client → Server {"type":"start","lang":"de","transLangs":["en"]}
Client → Server <binary PCM16 audio frames>
Server → Client {"type":"partial","text":"...","lang":"de"}
Server → Client {"type":"final","text":"...","lang":"de"}
Server → Client {"type":"error","error":"..."}
Client → Server {"type":"end"}
```

Anyone implementing a different backend (e.g., whisper.cpp, Vosk, Deepgram
self-hosted) can replace the companion service without changing any Go
code, as long as this protocol is honored.
158 changes: 158 additions & 0 deletions pkg/insights/providers/local/client.go
Original file line number Diff line number Diff line change
@@ -0,0 +1,158 @@
package local

import (
"bytes"
"context"
"encoding/json"
"fmt"
"io"
"net/http"
"net/url"
"time"

"github.com/mynaparrot/plugnmeet-protocol/plugnmeet"
"github.com/mynaparrot/plugnmeet-server/pkg/config"
"github.com/mynaparrot/plugnmeet-server/pkg/insights"
"github.com/sirupsen/logrus"
)

// LocalProvider implements insights.Provider using local faster-whisper + NLLB translation.
type LocalProvider struct {
account *config.ProviderAccount
service *config.ServiceConfig
logger *logrus.Entry
}

// NewProvider creates a new LocalProvider.
func NewProvider(providerAccount *config.ProviderAccount, serviceConfig *config.ServiceConfig, log *logrus.Entry) (insights.Provider, error) {
return &LocalProvider{
account: providerAccount,
service: serviceConfig,
logger: log.WithField("service", "local"),
}, nil
Comment on lines +20 to +34
Copy link
Copy Markdown

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

medium

Creating a new http.Client for every translation request is inefficient as it prevents connection reuse (TCP/TLS keep-alive). It is recommended to initialize the client once in the LocalProvider struct and reuse it.

Suggested change
type LocalProvider struct {
account *config.ProviderAccount
service *config.ServiceConfig
logger *logrus.Entry
}
// NewProvider creates a new LocalProvider.
func NewProvider(providerAccount *config.ProviderAccount, serviceConfig *config.ServiceConfig, log *logrus.Entry) (insights.Provider, error) {
return &LocalProvider{
account: providerAccount,
service: serviceConfig,
logger: log.WithField("service", "local"),
}, nil
type LocalProvider struct {
account *config.ProviderAccount
service *config.ServiceConfig
logger *logrus.Entry
httpClient *http.Client
}
// NewProvider creates a new LocalProvider.
func NewProvider(providerAccount *config.ProviderAccount, serviceConfig *config.ServiceConfig, log *logrus.Entry) (insights.Provider, error) {
return &LocalProvider{
account: providerAccount,
service: serviceConfig,
logger: log.WithField("service", "local"),
httpClient: &http.Client{Timeout: 15 * time.Second},
}, nil
}

}

// CreateTranscription opens a WebSocket connection to the local whisper service.
func (p *LocalProvider) CreateTranscription(ctx context.Context, roomId, userId string, options []byte) (insights.TranscriptionStream, error) {
opts := &insights.TranscriptionOptions{}
if len(options) > 0 {
if err := json.Unmarshal(options, opts); err != nil {
return nil, fmt.Errorf("failed to unmarshal transcription options: %w", err)
}
}

whisperURL, _ := p.account.Options["whisper_url"].(string)
if whisperURL == "" {
return nil, fmt.Errorf("local provider: whisper_url not configured in account options")
}

return newTranscribeStream(ctx, whisperURL, roomId, userId, opts, p.logger)
}

// TranslateText calls the NLLB translation proxy (Azure Translation API-compatible).
func (p *LocalProvider) TranslateText(ctx context.Context, text, sourceLang string, targetLangs []string) (*plugnmeet.InsightsTextTranslationResult, error) {
translateURL, _ := p.account.Options["translate_url"].(string)
if translateURL == "" {
return nil, fmt.Errorf("local provider: translate_url not configured in account options")
}
if len(targetLangs) == 0 {
return nil, fmt.Errorf("at least one target language is required")
}

u, err := url.Parse(translateURL)
if err != nil {
return nil, fmt.Errorf("failed to parse translate_url: %w", err)
}
q := u.Query()
q.Add("from", sourceLang)
for _, l := range targetLangs {
q.Add("to", l)
}
u.RawQuery = q.Encode()

requestBody, err := json.Marshal([]struct {
Text string `json:"Text"`
}{{Text: text}})
if err != nil {
return nil, fmt.Errorf("failed to marshal translation request: %w", err)
}

req, err := http.NewRequestWithContext(ctx, "POST", u.String(), bytes.NewBuffer(requestBody))
if err != nil {
return nil, fmt.Errorf("failed to create translation request: %w", err)
}
req.Header.Set("Content-Type", "application/json")

client := &http.Client{Timeout: 15 * time.Second}
resp, err := client.Do(req)
Copy link
Copy Markdown

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

medium

Use the shared httpClient instead of creating a new one for each request.

Suggested change
client := &http.Client{Timeout: 15 * time.Second}
resp, err := client.Do(req)
resp, err := p.httpClient.Do(req)

if err != nil {
return nil, fmt.Errorf("translation request failed: %w", err)
}
defer resp.Body.Close()

if resp.StatusCode != http.StatusOK {
bodyBytes, _ := io.ReadAll(resp.Body)
Copy link
Copy Markdown

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

medium

Using io.ReadAll on an HTTP response body without a limit can lead to excessive memory consumption if the server returns a large response. Consider using io.LimitReader to read only a reasonable amount of data for error messages.

Suggested change
bodyBytes, _ := io.ReadAll(resp.Body)
bodyBytes, _ := io.ReadAll(io.LimitReader(resp.Body, 1024))

return nil, fmt.Errorf("translation request failed with status %d: %s", resp.StatusCode, string(bodyBytes))
}

var response []struct {
Translations []struct {
Text string `json:"text"`
To string `json:"to"`
} `json:"translations"`
}
if err := json.NewDecoder(resp.Body).Decode(&response); err != nil {
return nil, fmt.Errorf("failed to decode translation response: %w", err)
}
if len(response) == 0 || len(response[0].Translations) == 0 {
return nil, fmt.Errorf("empty translation response from local proxy")
}

translations := make(map[string]string)
for _, t := range response[0].Translations {
translations[t.To] = t.Text
}

return &plugnmeet.InsightsTextTranslationResult{
SourceText: text,
SourceLang: sourceLang,
Translations: translations,
}, nil
}

// SynthesizeText is not supported by the local provider.
func (p *LocalProvider) SynthesizeText(_ context.Context, _ []byte) (io.ReadCloser, error) {
return nil, fmt.Errorf("speech synthesis not supported by local provider")
}

// GetSupportedLanguages returns the list of supported languages.
func (p *LocalProvider) GetSupportedLanguages(serviceType insights.ServiceType) []*plugnmeet.InsightsSupportedLangInfo {
if langs, ok := supportedLanguages[serviceType]; ok {
result := make([]*plugnmeet.InsightsSupportedLangInfo, len(langs))
for i := range langs {
result[i] = &langs[i]
}
return result
}
return make([]*plugnmeet.InsightsSupportedLangInfo, 0)
}

func (p *LocalProvider) AITextChatStream(_ context.Context, _ string, _ []*plugnmeet.InsightsAITextChatContent) (<-chan *plugnmeet.InsightsAITextChatStreamResult, error) {
return nil, nil
}

func (p *LocalProvider) AIChatTextSummarize(_ context.Context, _ string, _ []*plugnmeet.InsightsAITextChatContent) (string, uint32, uint32, error) {
return "", 0, 0, nil
}

func (p *LocalProvider) StartBatchSummarizeAudioFile(_ context.Context, _, _, _ string) (string, string, error) {
return "", "", nil
}

func (p *LocalProvider) CheckBatchJobStatus(_ context.Context, _ string) (*insights.BatchJobResponse, error) {
return nil, nil
}

func (p *LocalProvider) DeleteUploadedFile(_ context.Context, _ string) error {
return nil
}
47 changes: 47 additions & 0 deletions pkg/insights/providers/local/languages.go
Original file line number Diff line number Diff line change
@@ -0,0 +1,47 @@
package local

import (
"github.com/mynaparrot/plugnmeet-protocol/plugnmeet"
"github.com/mynaparrot/plugnmeet-server/pkg/insights"
)

// supportedLanguages lists languages supported by faster-whisper (transcription)
// and NLLB-200 (translation). These are common language codes used in PlugNmeet.
var supportedLanguages = map[insights.ServiceType][]plugnmeet.InsightsSupportedLangInfo{
insights.ServiceTypeTranscription: {
{Code: "de", Name: "German", Locale: "de"},
{Code: "en", Name: "English", Locale: "en"},
{Code: "ar", Name: "Arabic", Locale: "ar"},
{Code: "uk", Name: "Ukrainian", Locale: "uk"},
{Code: "ru", Name: "Russian", Locale: "ru"},
{Code: "fr", Name: "French", Locale: "fr"},
{Code: "es", Name: "Spanish", Locale: "es"},
{Code: "it", Name: "Italian", Locale: "it"},
{Code: "pl", Name: "Polish", Locale: "pl"},
{Code: "tr", Name: "Turkish", Locale: "tr"},
{Code: "fa", Name: "Persian", Locale: "fa"},
{Code: "zh", Name: "Chinese", Locale: "zh"},
{Code: "ja", Name: "Japanese", Locale: "ja"},
{Code: "ko", Name: "Korean", Locale: "ko"},
{Code: "pt", Name: "Portuguese", Locale: "pt"},
{Code: "nl", Name: "Dutch", Locale: "nl"},
},
insights.ServiceTypeTranslation: {
{Code: "de", Name: "German", Locale: "de"},
{Code: "en", Name: "English", Locale: "en"},
{Code: "ar", Name: "Arabic", Locale: "ar"},
{Code: "uk", Name: "Ukrainian", Locale: "uk"},
{Code: "ru", Name: "Russian", Locale: "ru"},
{Code: "fr", Name: "French", Locale: "fr"},
{Code: "es", Name: "Spanish", Locale: "es"},
{Code: "it", Name: "Italian", Locale: "it"},
{Code: "pl", Name: "Polish", Locale: "pl"},
{Code: "tr", Name: "Turkish", Locale: "tr"},
{Code: "fa", Name: "Persian", Locale: "fa"},
{Code: "zh", Name: "Chinese", Locale: "zh"},
{Code: "ja", Name: "Japanese", Locale: "ja"},
{Code: "ko", Name: "Korean", Locale: "ko"},
{Code: "pt", Name: "Portuguese", Locale: "pt"},
{Code: "nl", Name: "Dutch", Locale: "nl"},
},
}
Loading