Skip to content

fix: partition update#89

Draft
ccm32004 wants to merge 6 commits into
mainfrom
feat/partition-update
Draft

fix: partition update#89
ccm32004 wants to merge 6 commits into
mainfrom
feat/partition-update

Conversation

@ccm32004

@ccm32004 ccm32004 commented Apr 24, 2026

Copy link
Copy Markdown
Collaborator

updates numaflow-java, due to changes made in numaflow

Found issues outlined in the current getPartition implementation, thus some changes were made in how the getActivePartition method is implemented.

Why watermarks weren't updating properly, causing the window to not close properly

  • Old getActivePartitions() (or deprecated method getPartitions) returned all Pulsar topic partitions ([0..4]), making Numaflow spin up 5 watermark processors per replica.
  • Under Pulsar's Shared subscription, messages from any partition can land on any consumer replica. There's no stable replica -> partition ownership, so multiple replicas end up claiming the same watermark processor IDs with inconsistent progress.
  • Low TPS (1 msg / 5s): some partitions never receive data on a given replica, so those processors stay at -1 and the min is pinned low
  • High TPS (10k rpu): all partitions receive data, but it's spread across replicas dynamically, so each replica only advances the processors whose messages it happened to see — gaps still stall the min.

Why this fix works

  • getActivePartitions() now returns defaultPartitions() — a singleton with this replica's NUMAFLOW_REPLICA index, so the watermark partition is the replica, not a Pulsar topic partition.
  • Each replica owns exactly one watermark processor that it fully controls, so there is no ambiguity, no shared/gapped processors, etc
    Matches Numaflow's upstream Rust Pulsar source, which has always modeled things this way.

fixes #87

Signed-off-by: Cece Ma <mayuqing131@gmail.com>
Signed-off-by: Cece Ma <mayuqing131@gmail.com>
Signed-off-by: Cece Ma <mayuqing131@gmail.com>
Signed-off-by: Cece Ma <mayuqing131@gmail.com>
@ccm32004 ccm32004 force-pushed the feat/partition-update branch from 483b862 to 34032c8 Compare April 24, 2026 16:10
@ccm32004 ccm32004 changed the title feat: partition update fix: partition update Apr 24, 2026
Signed-off-by: Cece Ma <mayuqing131@gmail.com>
@github-actions

github-actions Bot commented Apr 24, 2026

Copy link
Copy Markdown

Continuous benchmark

Config: batch size = 500

Compared to the latest result on gh-pages (last main run that updated charts). Regression when a metric degrades by more than vs baseline.

Metric This run Baseline (main) Change Status
Consumer Throughput 34566.67 msgs/sec 42005.56 msgs/sec -17.7% OK
Processing Latency (per batch) 14.48 ms 11.91 ms +21.6% OK
Read Latency (per batch) 7.39 ms 5.72 ms +29.2% OK
Ack Latency (per batch) 0.50 ms 0.48 ms +4.2% OK
Consumer CPU 1332 millicores 1443 millicores -7.7% ✅ Improved
Consumer Memory 372 MiB 356 MiB +4.5% OK

Result: No regression beyond the threshold vs main baseline.

Workflow run
· GitHub Pages charts

Signed-off-by: Cece Ma <mayuqing131@gmail.com>
@github-actions

Copy link
Copy Markdown

Code Coverage

Percentage of source code lines executed by unit tests.

Overall Project 82.25%
Files changed 100%

File Coverage
PulsarSource.java 76.92%

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

current getPartition impl causing watermark issues for partitioned topics

1 participant