Skip to content

feat(go-sdk): high per-RPC allocation pressure on TCP poll path #3441

@matanper

Description

@matanper

Description

The Go client's TCP poll path allocates a fresh []byte for every RPC, with multiple sub-allocations re-encoding immutable values. Under sustained load this produces a large allocation/GC tax even when the actual message payloads are small.

Evidence

pprof showed consumer running at ~230 events/s with ~400-byte payloads showed:

  • ~78.6 GB allocated over 60 s (~1.3 GB/s) across ~591 M allocations (~9.85 M/s)
  • That works out to ~5.8 MB and ~43,000 allocations per event against ~400-byte payloads
  • Heap in use: ~60 MB — i.e. nearly all of the above is churn → heavy GC
  • CPU profile is dominated by internal/runtime/syscall.Syscall6 (33%) and runtime.futex (8.5%) — poll RPCs + scheduler/GC overhead, not business logic

Dominant allocators (alloc_space / alloc_objects)

From client/tcp and contracts:

  • tcp.(*IggyTcpClient).PollMessagescreatePayload + command.PollMessages.MarshalBinary~38% of bytes / ~49% of objects. A fresh request buffer is marshaled on every poll RPC, and createPayload allocates a second buffer to prepend the 4-byte length header and 4-byte command code.
  • contracts.Identifier.MarshalBinary~3.4 GB / ~45 M allocs over 60 s. The same immutable stream/topic/consumer/partition IDs are re-encoded into a fresh slice on every poll.
  • tcp.(*IggyTcpClient).read~5.6 GB / ~68 M objects. Per-RPC read buffers (both the 8-byte status header and the response body) are freshly allocated.

The data here is small; the cost is fixed overhead per RPC. At sustained polling rates this becomes the bottleneck.

Affected area / component

Go SDK

Proposed solution

1. Cache constant Identifier wire bytes + zero-alloc encoder path

Identifier values are constructed via NewIdentifier and never change. Pre-encode the wire form (Kind | Length | Value) once at construction and add:

  • Identifier.MarshalledSize() int — known up-front, lets callers size buffers without trial encodes.
  • Identifier.AppendBinary([]byte) ([]byte, error) — already present on Identifier; make the fast path use the cached bytes.

Mirror this on command.PollMessages so the full request body can be encoded into a caller-provided buffer.

MarshalBinary keeps its current signature and output for backward compatibility.

2. Pool the request wire-payload buffer in client/tcp

Add a sync.Pool of request buffers. IggyTcpClient.do builds the full wire payload (length header + command code + body) directly into a pooled buffer via the new AppendBinary path, eliminating both the MarshalBinary allocation and the createPayload allocation.

A small readInto helper lets the 8-byte response status header read into a stack-local array, removing one more allocation per RPC.

Commands that don't implement the new encoder interface (e.g. SendMessages) fall through to MarshalBinary with no behaviour change.

Alternatives considered

No response

Contribution

  • I'm willing to submit a pull request to implement this feature

Good first issue

  • I think this could be a good first issue for a new contributor

Metadata

Metadata

Assignees

Labels

goPull requests that update go code

Type

No type
No fields configured for issues without a type.

Projects

No projects

Milestone

No milestone

Relationships

None yet

Development

No branches or pull requests

Issue actions