Description
The Go client's TCP poll path allocates a fresh []byte for every RPC, with multiple sub-allocations re-encoding immutable values. Under sustained load this produces a large allocation/GC tax even when the actual message payloads are small.
Evidence
pprof showed consumer running at ~230 events/s with ~400-byte payloads showed:
- ~78.6 GB allocated over 60 s (~1.3 GB/s) across ~591 M allocations (~9.85 M/s)
- That works out to ~5.8 MB and ~43,000 allocations per event against ~400-byte payloads
- Heap in use: ~60 MB — i.e. nearly all of the above is churn → heavy GC
- CPU profile is dominated by
internal/runtime/syscall.Syscall6 (33%) and runtime.futex (8.5%) — poll RPCs + scheduler/GC overhead, not business logic
Dominant allocators (alloc_space / alloc_objects)
From client/tcp and contracts:
tcp.(*IggyTcpClient).PollMessages → createPayload + command.PollMessages.MarshalBinary — ~38% of bytes / ~49% of objects. A fresh request buffer is marshaled on every poll RPC, and createPayload allocates a second buffer to prepend the 4-byte length header and 4-byte command code.
contracts.Identifier.MarshalBinary — ~3.4 GB / ~45 M allocs over 60 s. The same immutable stream/topic/consumer/partition IDs are re-encoded into a fresh slice on every poll.
tcp.(*IggyTcpClient).read — ~5.6 GB / ~68 M objects. Per-RPC read buffers (both the 8-byte status header and the response body) are freshly allocated.
The data here is small; the cost is fixed overhead per RPC. At sustained polling rates this becomes the bottleneck.
Affected area / component
Go SDK
Proposed solution
1. Cache constant Identifier wire bytes + zero-alloc encoder path
Identifier values are constructed via NewIdentifier and never change. Pre-encode the wire form (Kind | Length | Value) once at construction and add:
Identifier.MarshalledSize() int — known up-front, lets callers size buffers without trial encodes.
Identifier.AppendBinary([]byte) ([]byte, error) — already present on Identifier; make the fast path use the cached bytes.
Mirror this on command.PollMessages so the full request body can be encoded into a caller-provided buffer.
MarshalBinary keeps its current signature and output for backward compatibility.
2. Pool the request wire-payload buffer in client/tcp
Add a sync.Pool of request buffers. IggyTcpClient.do builds the full wire payload (length header + command code + body) directly into a pooled buffer via the new AppendBinary path, eliminating both the MarshalBinary allocation and the createPayload allocation.
A small readInto helper lets the 8-byte response status header read into a stack-local array, removing one more allocation per RPC.
Commands that don't implement the new encoder interface (e.g. SendMessages) fall through to MarshalBinary with no behaviour change.
Alternatives considered
No response
Contribution
Good first issue
Description
The Go client's TCP poll path allocates a fresh
[]bytefor every RPC, with multiple sub-allocations re-encoding immutable values. Under sustained load this produces a large allocation/GC tax even when the actual message payloads are small.Evidence
pprof showed consumer running at ~230 events/s with ~400-byte payloads showed:
internal/runtime/syscall.Syscall6(33%) andruntime.futex(8.5%) — poll RPCs + scheduler/GC overhead, not business logicDominant allocators (alloc_space / alloc_objects)
From
client/tcpandcontracts:tcp.(*IggyTcpClient).PollMessages→createPayload+command.PollMessages.MarshalBinary— ~38% of bytes / ~49% of objects. A fresh request buffer is marshaled on every poll RPC, andcreatePayloadallocates a second buffer to prepend the 4-byte length header and 4-byte command code.contracts.Identifier.MarshalBinary— ~3.4 GB / ~45 M allocs over 60 s. The same immutable stream/topic/consumer/partition IDs are re-encoded into a fresh slice on every poll.tcp.(*IggyTcpClient).read— ~5.6 GB / ~68 M objects. Per-RPC read buffers (both the 8-byte status header and the response body) are freshly allocated.The data here is small; the cost is fixed overhead per RPC. At sustained polling rates this becomes the bottleneck.
Affected area / component
Go SDK
Proposed solution
1. Cache constant
Identifierwire bytes + zero-alloc encoder pathIdentifiervalues are constructed viaNewIdentifierand never change. Pre-encode the wire form (Kind | Length | Value) once at construction and add:Identifier.MarshalledSize() int— known up-front, lets callers size buffers without trial encodes.Identifier.AppendBinary([]byte) ([]byte, error)— already present onIdentifier; make the fast path use the cached bytes.Mirror this on
command.PollMessagesso the full request body can be encoded into a caller-provided buffer.MarshalBinarykeeps its current signature and output for backward compatibility.2. Pool the request wire-payload buffer in
client/tcpAdd a
sync.Poolof request buffers.IggyTcpClient.dobuilds the full wire payload (length header + command code + body) directly into a pooled buffer via the newAppendBinarypath, eliminating both theMarshalBinaryallocation and thecreatePayloadallocation.A small
readIntohelper lets the 8-byte response status header read into a stack-local array, removing one more allocation per RPC.Commands that don't implement the new encoder interface (e.g.
SendMessages) fall through toMarshalBinarywith no behaviour change.Alternatives considered
No response
Contribution
Good first issue