feat(insights): add OpenAI provider for transcription and translation by xynstr · Pull Request #848 · mynaparrot/plugNmeet-server

xynstr · 2026-04-28T21:14:15Z

Replaces #844 with the implementation direction you suggested there:

I saw the PR but it's depends on your own implementation on another language which I'm not familiar with. Also I don't want to introduce another language. We've plan to develop openAI API which should solve this type problem. If you're interested to implement it using go SDK e.g. https://github.com/openai/openai-go I'll be happy to work with.

This is that rework. The provider is pure Go, built on github.com/openai/openai-go/v3. No Python, no helper services, no extra build stages — everything lives inside pkg/insights/providers/openai/.

What it does

Transcription — POST /v1/audio/transcriptions, chunked from PCM16 supplied via TranscriptionStream.WriteSample. Each chunk is encoded as an in-memory WAV and uploaded; one final_result event is emitted per chunk. Realtime partials would require the websocket Realtime API, which openai-go does not yet cover and most self-hosted backends do not support — so this PR ships chunked-only on purpose, and Realtime streaming can land as a follow-up once the SDK adds it.
Translation — POST /v1/chat/completions with a response_format JSON Schema whose properties are the requested target language codes. One round-trip yields all target translations; Strict mode forces the schema. I deliberately picked Chat Completions over the newer Responses API so the same code path works against any OpenAI-compatible self-hosted backend (LocalAI, vLLM, llama.cpp-server, Ollama, …) — Responses is OpenAI-only.
TTS / AI chat / batch summarization are stubbed for now; happy to add them in follow-up PRs if there is interest.

Configuration

insights:
  enabled: true
  providers:
    openai:
      - id: "openai-cloud"
        credentials:
          api_key: "sk-..."
        # options.base_url is optional — set it to point at any
        # OpenAI-compatible endpoint (LocalAI, vLLM, llama.cpp-server, ...).
        options:
          chunk_seconds: 5
  services:
    transcription:
      provider: "openai"
      id: "openai-cloud"
      options:
        model: "gpt-4o-mini-transcribe"   # or whisper-1, gpt-4o-transcribe
    translation:
      provider: "openai"
      id: "openai-cloud"
      options:
        model: "gpt-4o-mini"

Knobs

providers.openai[].credentials.api_key — required.
providers.openai[].options.base_url — optional override; lets the same provider talk to a self-hosted OpenAI-API endpoint.
providers.openai[].options.chunk_seconds — transcription chunk duration (default 5).
services.<svc>.options.model — per-service model override.

Files

pkg/insights/providers/openai/client.go      168 +++
pkg/insights/providers/openai/transcribe.go  265 +++
pkg/insights/providers/openai/translate.go   145 +++
pkg/insights/providers/openai/languages.go    79 +++
pkg/services/insights/insights_service.go      3 +++  (case "openai")
go.mod / go.sum                                  +    (openai-go v3)

677 lines added, no other files touched. The factory dispatch in insights_service.go mirrors the existing azure / google cases.

Testing

Built and run end-to-end against a self-hosted setup:

Transcription via whisper-local running faster-whisper behind FastAPI (exposing OpenAI-compatible /v1/audio/transcriptions). Multipart upload, language hint, JSON response — all working through the new provider.
Translation via llama-local running ggml-org/llama.cpp:server with Qwen 2.5 7B Instruct Q4_K_M. JSON Schema strict mode is honoured by llama.cpp: dynamic properties produced by buildTranslationSchema(targetLangs) are populated correctly for multi-target requests (verified de → en + ar).

I have not yet pointed it at OpenAI cloud directly in this branch — happy to do so if useful for your CI smoke or for a maintainer review.

Notes

The previous draft's Dockerfile.local is no longer needed; the provider is pure Go and builds with the existing upstream docker-build/Dockerfile.
Closing feat(insights): add local (self-hosted) transcription & translation provider #844 in favor of this one to keep the review surface clean — the design pivoted, the diff is unrecognisable from the old branch, and a fresh PR felt more honest than a force-push.

Introduces a new "openai" insights provider implemented entirely in Go on top of github.com/openai/openai-go/v3. Replaces the third-party- language helper service experimented with in the previous draft PR; nothing outside this Go module is required to use it. Capabilities - Transcription via POST /v1/audio/transcriptions, chunked from PCM16 audio supplied through TranscriptionStream.WriteSample. Each chunk is encoded as an in-memory WAV and uploaded; one final_result event is emitted per chunk. Real-time partials require the Realtime websocket API which openai-go does not yet cover and most self-hosted backends do not support, so we deliberately ship chunked-only. - Translation via POST /v1/chat/completions with a JSON Schema response format whose properties are the requested target language codes. One round-trip yields all target translations; Strict mode forces the schema. Chat Completions (rather than the newer Responses API) keeps the provider compatible with self-hosted OpenAI-API endpoints (LocalAI, vLLM, llama.cpp-server, ...). - Speech synthesis, AI text chat, and batch summarisation are stubbed out for now and can be added in follow-ups. Configuration - providers.openai[].credentials.api_key - required. - providers.openai[].options.base_url - optional, point at any OpenAI-compatible endpoint to use a self-hosted backend. - providers.openai[].options.chunk_seconds - transcription chunk duration in seconds (default 5). - services.transcription.options.model / services.translation.options.model override the defaults (whisper-1 and gpt-4o-mini respectively).

gemini-code-assist

Code Review

This pull request introduces a new OpenAI provider for the insights service, enabling transcription and translation capabilities. The implementation includes a chunked audio streaming mechanism for transcription and a JSON-schema-constrained chat completion approach for translation. The reviewer suggested refactoring the language list definition to avoid redundant allocations and replacing the use of 'goto' in the JSON parsing logic with more idiomatic Go control flow.

gemini-code-assist · 2026-04-28T21:22:46Z

+var supportedLanguages = map[insights.ServiceType][]plugnmeet.InsightsSupportedLangInfo{
+	insights.ServiceTypeTranscription: whisperLanguages(),
+	insights.ServiceTypeTranslation:   whisperLanguages(),
+}
+
+func whisperLanguages() []plugnmeet.InsightsSupportedLangInfo {
+	return []plugnmeet.InsightsSupportedLangInfo{
+		{Code: "af", Name: "Afrikaans", Locale: "af"},
+		{Code: "ar", Name: "Arabic", Locale: "ar"},
+		{Code: "az", Name: "Azerbaijani", Locale: "az"},
+		{Code: "be", Name: "Belarusian", Locale: "be"},
+		{Code: "bg", Name: "Bulgarian", Locale: "bg"},
+		{Code: "bn", Name: "Bengali", Locale: "bn"},
+		{Code: "bs", Name: "Bosnian", Locale: "bs"},
+		{Code: "ca", Name: "Catalan", Locale: "ca"},
+		{Code: "cs", Name: "Czech", Locale: "cs"},
+		{Code: "cy", Name: "Welsh", Locale: "cy"},
+		{Code: "da", Name: "Danish", Locale: "da"},
+		{Code: "de", Name: "German", Locale: "de"},
+		{Code: "el", Name: "Greek", Locale: "el"},
+		{Code: "en", Name: "English", Locale: "en"},
+		{Code: "es", Name: "Spanish", Locale: "es"},
+		{Code: "et", Name: "Estonian", Locale: "et"},
+		{Code: "fa", Name: "Persian", Locale: "fa"},
+		{Code: "fi", Name: "Finnish", Locale: "fi"},
+		{Code: "fr", Name: "French", Locale: "fr"},
+		{Code: "gl", Name: "Galician", Locale: "gl"},
+		{Code: "he", Name: "Hebrew", Locale: "he"},
+		{Code: "hi", Name: "Hindi", Locale: "hi"},
+		{Code: "hr", Name: "Croatian", Locale: "hr"},
+		{Code: "hu", Name: "Hungarian", Locale: "hu"},
+		{Code: "hy", Name: "Armenian", Locale: "hy"},
+		{Code: "id", Name: "Indonesian", Locale: "id"},
+		{Code: "is", Name: "Icelandic", Locale: "is"},
+		{Code: "it", Name: "Italian", Locale: "it"},
+		{Code: "ja", Name: "Japanese", Locale: "ja"},
+		{Code: "kk", Name: "Kazakh", Locale: "kk"},
+		{Code: "kn", Name: "Kannada", Locale: "kn"},
+		{Code: "ko", Name: "Korean", Locale: "ko"},
+		{Code: "lt", Name: "Lithuanian", Locale: "lt"},
+		{Code: "lv", Name: "Latvian", Locale: "lv"},
+		{Code: "mi", Name: "Maori", Locale: "mi"},
+		{Code: "mk", Name: "Macedonian", Locale: "mk"},
+		{Code: "mr", Name: "Marathi", Locale: "mr"},
+		{Code: "ms", Name: "Malay", Locale: "ms"},
+		{Code: "ne", Name: "Nepali", Locale: "ne"},
+		{Code: "nl", Name: "Dutch", Locale: "nl"},
+		{Code: "no", Name: "Norwegian", Locale: "no"},
+		{Code: "pl", Name: "Polish", Locale: "pl"},
+		{Code: "pt", Name: "Portuguese", Locale: "pt"},
+		{Code: "ro", Name: "Romanian", Locale: "ro"},
+		{Code: "ru", Name: "Russian", Locale: "ru"},
+		{Code: "sk", Name: "Slovak", Locale: "sk"},
+		{Code: "sl", Name: "Slovenian", Locale: "sl"},
+		{Code: "sr", Name: "Serbian", Locale: "sr"},
+		{Code: "sv", Name: "Swedish", Locale: "sv"},
+		{Code: "sw", Name: "Swahili", Locale: "sw"},
+		{Code: "ta", Name: "Tamil", Locale: "ta"},
+		{Code: "th", Name: "Thai", Locale: "th"},
+		{Code: "tl", Name: "Tagalog", Locale: "tl"},
+		{Code: "tr", Name: "Turkish", Locale: "tr"},
+		{Code: "uk", Name: "Ukrainian", Locale: "uk"},
+		{Code: "ur", Name: "Urdu", Locale: "ur"},
+		{Code: "vi", Name: "Vietnamese", Locale: "vi"},
+		{Code: "zh", Name: "Chinese", Locale: "zh"},
+	}
+}


To improve efficiency and avoid allocating two identical slices, you can define the list of languages as a single package-level variable and reuse it in the supportedLanguages map. This also makes the code slightly more readable by removing the whisperLanguages() function.

var ( // whisperLangs is the list of languages supported by Whisper for transcription. // We reuse this for translation as modern models support this set. whisperLangs = []plugnmeet.InsightsSupportedLangInfo{ {Code: "af", Name: "Afrikaans", Locale: "af"}, {Code: "ar", Name: "Arabic", Locale: "ar"}, {Code: "az", Name: "Azerbaijani", Locale: "az"}, {Code: "be", Name: "Belarusian", Locale: "be"}, {Code: "bg", Name: "Bulgarian", Locale: "bg"}, {Code: "bn", Name: "Bengali", Locale: "bn"}, {Code: "bs", Name: "Bosnian", Locale: "bs"}, {Code: "ca", Name: "Catalan", Locale: "ca"}, {Code: "cs", Name: "Czech", Locale: "cs"}, {Code: "cy", Name: "Welsh", Locale: "cy"}, {Code: "da", Name: "Danish", Locale: "da"}, {Code: "de", Name: "German", Locale: "de"}, {Code: "el", Name: "Greek", Locale: "el"}, {Code: "en", Name: "English", Locale: "en"}, {Code: "es", Name: "Spanish", Locale: "es"}, {Code: "et", Name: "Estonian", Locale: "et"}, {Code: "fa", Name: "Persian", Locale: "fa"}, {Code: "fi", Name: "Finnish", Locale: "fi"}, {Code: "fr", Name: "French", Locale: "fr"}, {Code: "gl", Name: "Galician", Locale: "gl"}, {Code: "he", Name: "Hebrew", Locale: "he"}, {Code: "hi", Name: "Hindi", Locale: "hi"}, {Code: "hr", Name: "Croatian", Locale: "hr"}, {Code: "hu", Name: "Hungarian", Locale: "hu"}, {Code: "hy", Name: "Armenian", Locale: "hy"}, {Code: "id", Name: "Indonesian", Locale: "id"}, {Code: "is", Name: "Icelandic", Locale: "is"}, {Code: "it", Name: "Italian", Locale: "it"}, {Code: "ja", Name: "Japanese", Locale: "ja"}, {Code: "kk", Name: "Kazakh", Locale: "kk"}, {Code: "kn", Name: "Kannada", Locale: "kn"}, {Code: "ko", Name: "Korean", Locale: "ko"}, {Code: "lt", Name: "Lithuanian", Locale: "lt"}, {Code: "lv", Name: "Latvian", Locale: "lv"}, {Code: "mi", Name: "Maori", Locale: "mi"}, {Code: "mk", Name: "Macedonian", Locale: "mk"}, {Code: "mr", Name: "Marathi", Locale: "mr"}, {Code: "ms", Name: "Malay", Locale: "ms"}, {Code: "ne", Name: "Nepali", Locale: "ne"}, {Code: "nl", Name: "Dutch", Locale: "nl"}, {Code: "no", Name: "Norwegian", Locale: "no"}, {Code: "pl", Name: "Polish", Locale: "pl"}, {Code: "pt", Name: "Portuguese", Locale: "pt"}, {Code: "ro", Name: "Romanian", Locale: "ro"}, {Code: "ru", Name: "Russian", Locale: "ru"}, {Code: "sk", Name: "Slovak", Locale: "sk"}, {Code: "sl", Name: "Slovenian", Locale: "sl"}, {Code: "sr", Name: "Serbian", Locale: "sr"}, {Code: "sv", Name: "Swedish", Locale: "sv"}, {Code: "sw", Name: "Swahili", Locale: "sw"}, {Code: "ta", Name: "Tamil", Locale: "ta"}, {Code: "th", Name: "Thai", Locale: "th"}, {Code: "tl", Name: "Tagalog", Locale: "tl"}, {Code: "tr", Name: "Turkish", Locale: "tr"}, {Code: "uk", Name: "Ukrainian", Locale: "uk"}, {Code: "ur", Name: "Urdu", Locale: "ur"}, {Code: "vi", Name: "Vietnamese", Locale: "vi"}, {Code: "zh", Name: "Chinese", Locale: "zh"}, } // supportedLanguages enumerates the languages we surface for transcription // and translation. supportedLanguages = map[insights.ServiceType][]plugnmeet.InsightsSupportedLangInfo{ insights.ServiceTypeTranscription: whisperLangs, insights.ServiceTypeTranslation: whisperLangs, } )

gemini-code-assist · 2026-04-28T21:22:46Z

+	parsed := map[string]string{}
+	if err := json.Unmarshal([]byte(raw), &parsed); err != nil {
+		// Some local backends ignore Strict mode and wrap output in markdown
+		// fences or prose. Try one tolerant unwrap before giving up.
+		if cleaned := stripJSONNoise(raw); cleaned != raw {
+			if err2 := json.Unmarshal([]byte(cleaned), &parsed); err2 == nil {
+				goto OK
+			}
+		}
+		log.WithError(err).WithField("raw", truncate(raw, 256)).Warn("openai translation: failed to parse JSON response")
+		return nil, fmt.Errorf("failed to parse translation JSON: %w", err)
+	}
+OK:


While goto is functional here, it's often considered a code smell in modern Go as it can make control flow harder to follow. This block can be refactored to avoid goto by restructuring the error handling, which improves readability and aligns with common Go idioms.

parsed := map[string]string{} err := json.Unmarshal([]byte(raw), &parsed) if err != nil { // Some local backends ignore Strict mode and wrap output in markdown // fences or prose. Try one tolerant unwrap before giving up. cleaned := stripJSONNoise(raw) if cleaned != raw { err = json.Unmarshal([]byte(cleaned), &parsed) } } if err != nil { log.WithError(err).WithField("raw", truncate(raw, 256)).Warn("openai translation: failed to parse JSON response") return nil, fmt.Errorf("failed to parse translation JSON: %w", err) }

jibon57 · 2026-04-28T22:06:14Z

Thanks for it. I checked it very quickly and found that you've not considered to use realtime websocket API e.g.

https://developers.openai.com/api/docs/guides/realtime-websocket
https://github.com/openai/openai-go/blob/main/realtime/realtime.go

So, we'll lose partial (delta) results.

Adds a realtime transcription path alongside the existing chunked one, selectable via the provider account's mode option (default: chunked). - chunked (existing): WAV chunks POSTed to /v1/audio/transcriptions. Works against any OpenAI-compatible HTTP backend (faster-whisper-server, whisper.cpp, vLLM, LocalAI). - realtime (new): WebSocket to /v1/realtime?intent=transcription with PCM16 streaming, server VAD, partial_result deltas, and final_result on segment completion. Works against OpenAI cloud and Azure OpenAI Realtime. Also drops the goto fallback in translate.go in favour of a small parseTranslationJSON helper.

xynstr · 2026-04-29T00:28:34Z

Good catch. Pushed 8e7c952 adding both modes, switchable via a mode option on the provider:

mode: "chunked" (default): the existing /v1/audio/transcriptions path. Kept as default so self-hosted OpenAI-compatible servers (faster-whisper-server, whisper.cpp, vLLM, LocalAI) keep working, since none of them speak the Realtime protocol.
mode: "realtime": WebSocket to /v1/realtime?intent=transcription with PCM16 streaming and server VAD, emitting partial_result on deltas and final_result on segment completion. Works against OpenAI cloud and Azure OpenAI Realtime.

Used gorilla/websocket for the transport (already an indirect dep). The openai-go/v3/realtime package is types-only with no transport, so wiring our own felt simpler than mapping the Stainless unions for the few events we actually consume.

Also dropped the goto OK in translate.go in the same commit (Gemini review flagged it).

jibon57 · 2026-04-29T07:38:17Z

Also please consider other features like: TTS, AI chat, meeting summarization as well.

xynstr mentioned this pull request Apr 28, 2026

feat(insights): add local (self-hosted) transcription & translation provider #844

Closed

gemini-code-assist Bot reviewed Apr 28, 2026

View reviewed changes

jibon57 self-assigned this Apr 29, 2026

jibon57 self-requested a review April 29, 2026 07:35

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

feat(insights): add OpenAI provider for transcription and translation#848

feat(insights): add OpenAI provider for transcription and translation#848
xynstr wants to merge 2 commits intomynaparrot:mainfrom
xynstr:feat/openai-provider

xynstr commented Apr 28, 2026

Uh oh!

gemini-code-assist Bot left a comment

Uh oh!

gemini-code-assist Bot Apr 28, 2026

Uh oh!

gemini-code-assist Bot Apr 28, 2026

Uh oh!

jibon57 commented Apr 28, 2026

Uh oh!

xynstr commented Apr 29, 2026

Uh oh!

jibon57 commented Apr 29, 2026

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

2 participants

Conversation

xynstr commented Apr 28, 2026

What it does

Configuration

Knobs

Files

Testing

Notes

Uh oh!

gemini-code-assist Bot left a comment

Choose a reason for hiding this comment

Code Review

Uh oh!

gemini-code-assist Bot Apr 28, 2026

Choose a reason for hiding this comment

Uh oh!

gemini-code-assist Bot Apr 28, 2026

Choose a reason for hiding this comment

Uh oh!

jibon57 commented Apr 28, 2026

Uh oh!

xynstr commented Apr 29, 2026

Uh oh!

jibon57 commented Apr 29, 2026

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

2 participants