Skip to content

Fix #375: chunked vector loading uses globally-unique primary keys#387

Open
idevasena wants to merge 1 commit into
mlcommons:mainfrom
idevasena:fix/issue-375-duplicate-pks-chunked-insert
Open

Fix #375: chunked vector loading uses globally-unique primary keys#387
idevasena wants to merge 1 commit into
mlcommons:mainfrom
idevasena:fix/issue-375-duplicate-pks-chunked-insert

Conversation

@idevasena
Copy link
Copy Markdown
Contributor

Fix #375: chunked vector loading uses globally-unique primary keys

Closes #375.

Problem

In a 1M-vector dry-run on a single Gen5 NVMe, vdb_benchmark reported mean recall@10 = 0.0090. The mlps_1m_1shards_1536dim_uniform_flat_gt ground-truth collection held only 10,000 vectors — 1% of the source collection — so almost every PK returned by the ANN search was missing from the GT set, and set_intersection / k collapsed.

Root cause

load_vdb.insert_data() built each batch's primary keys as

ids = list(range(batch_start, batch_end))

where batch_start / batch_end were the chunk-local indices. When num_vectors > chunk_size, the caller in main() invokes insert_data once per generated chunk and passes only that chunk's vectors. With chunk_size = 10_000, every chunk therefore inserted IDs 0..9_999, i.e. all 100 chunks collided on the same 10 000 primary keys.

  • The main collection's num_entities still reports 1 000 000 because Milvus counts physical rows, not distinct PKs — masking the bug during loading.
  • enhanced_bench.create_flat_collection() copies the source via query_iterator(), which deduplicates by PK, so the FLAT collection only ever sees the 10 000 unique IDs.
  • A second, smaller bug in enhanced_bench.py hardcoded the final copy-progress line to (100.0%), hiding the discrepancy in the logs (Copied 10000/1000000 vectors (100.0%) in the original report).

Fix

File Change
vdb_benchmark/vdbbench/load_vdb.py insert_data() takes a new start_id (default 0, preserves legacy single-chunk behavior). IDs are now range(start_id + batch_start, start_id + batch_end).
vdb_benchmark/vdbbench/load_vdb.py main() threads a running global_id_offset through the chunked-generation loop and passes it as start_id on every insert_data call. The else (single-chunk) branch passes start_id=0 explicitly for clarity.
vdb_benchmark/vdbbench/enhanced_bench.py Replace hardcoded (100.0%) with the real percentage in create_flat_collection().
vdb_benchmark/vdbbench/enhanced_bench.py New coverage guard: if the FLAT collection holds <99% of source_coll.num_entities, abort with a clear pointer to issue #375 instead of silently producing meaningless recall numbers.
vdb_benchmark/tests/tests/test_issue_375_chunked_insert_ids.py New regression suite (10 tests) covering the start_id offset, the three-chunk scenario from the bug report, uneven final chunks, batch sizes larger than the chunk, and the coverage-threshold parametrization.

Testing

smrc@dskbd029:~/Storage_Repo_Tests/storage_vdb375$ uv sync --extra vectordb --extra test
Resolved 98 packages in 1ms
Checked 98 packages in 1ms
smrc@dskbd029:~/Storage_Repo_Tests/storage_vdb375$ uv run pytest vdb_benchmark/tests/tests/test_issue_375_chunked_insert_ids.py \
              vdb_benchmark/tests/tests/test_load_vdb.py -v
================================================================ test session starts =================================================================
platform linux -- Python 3.12.3, pytest-9.0.2, pluggy-1.6.0 -- /home/smrc/Storage_Repo_Tests/storage_vdb375/.venv/bin/python
cachedir: .pytest_cache
rootdir: /home/smrc/Storage_Repo_Tests/storage_vdb375
configfile: pyproject.toml
plugins: hydra-core-1.3.2, mock-3.15.1, cov-7.1.0
collected 25 items                                                                                                                                   

vdb_benchmark/tests/tests/test_issue_375_chunked_insert_ids.py::TestInsertDataIdOffset::test_default_start_id_preserves_legacy_behavior 
        SETUP    F reset_milvus_connections
        vdb_benchmark/tests/tests/test_issue_375_chunked_insert_ids.py::TestInsertDataIdOffset::test_default_start_id_preserves_legacy_behavior (fixtures used: reset_milvus_connections)PASSED
        TEARDOWN F reset_milvus_connections
vdb_benchmark/tests/tests/test_issue_375_chunked_insert_ids.py::TestInsertDataIdOffset::test_start_id_offsets_all_batches 
        SETUP    F reset_milvus_connections
        vdb_benchmark/tests/tests/test_issue_375_chunked_insert_ids.py::TestInsertDataIdOffset::test_start_id_offsets_all_batches (fixtures used: reset_milvus_connections)PASSED
        TEARDOWN F reset_milvus_connections
vdb_benchmark/tests/tests/test_issue_375_chunked_insert_ids.py::TestInsertDataIdOffset::test_three_chunks_produce_globally_unique_ids 
        SETUP    F reset_milvus_connections
        vdb_benchmark/tests/tests/test_issue_375_chunked_insert_ids.py::TestInsertDataIdOffset::test_three_chunks_produce_globally_unique_ids (fixtures used: reset_milvus_connections)PASSED
        TEARDOWN F reset_milvus_connections
vdb_benchmark/tests/tests/test_issue_375_chunked_insert_ids.py::TestInsertDataIdOffset::test_uneven_final_chunk 
        SETUP    F reset_milvus_connections
        vdb_benchmark/tests/tests/test_issue_375_chunked_insert_ids.py::TestInsertDataIdOffset::test_uneven_final_chunk (fixtures used: reset_milvus_connections)PASSED
        TEARDOWN F reset_milvus_connections
vdb_benchmark/tests/tests/test_issue_375_chunked_insert_ids.py::TestInsertDataIdOffset::test_batch_size_larger_than_chunk 
        SETUP    F reset_milvus_connections
        vdb_benchmark/tests/tests/test_issue_375_chunked_insert_ids.py::TestInsertDataIdOffset::test_batch_size_larger_than_chunk (fixtures used: reset_milvus_connections)PASSED
        TEARDOWN F reset_milvus_connections
vdb_benchmark/tests/tests/test_issue_375_chunked_insert_ids.py::TestFlatGtCoverageGuard::test_coverage_threshold[1000000-1000000-True] 
        SETUP    F reset_milvus_connections
        SETUP    F flat_count[1000000]
        SETUP    F source_count[1000000]
        SETUP    F should_pass[True]
        vdb_benchmark/tests/tests/test_issue_375_chunked_insert_ids.py::TestFlatGtCoverageGuard::test_coverage_threshold[1000000-1000000-True] (fixtures used: flat_count, reset_milvus_connections, should_pass, source_count)PASSED
        TEARDOWN F should_pass[True]
        TEARDOWN F source_count[1000000]
        TEARDOWN F flat_count[1000000]
        TEARDOWN F reset_milvus_connections
vdb_benchmark/tests/tests/test_issue_375_chunked_insert_ids.py::TestFlatGtCoverageGuard::test_coverage_threshold[995000-1000000-True] 
        SETUP    F reset_milvus_connections
        SETUP    F flat_count[995000]
        SETUP    F source_count[1000000]
        SETUP    F should_pass[True]
        vdb_benchmark/tests/tests/test_issue_375_chunked_insert_ids.py::TestFlatGtCoverageGuard::test_coverage_threshold[995000-1000000-True] (fixtures used: flat_count, reset_milvus_connections, should_pass, source_count)PASSED
        TEARDOWN F should_pass[True]
        TEARDOWN F source_count[1000000]
        TEARDOWN F flat_count[995000]
        TEARDOWN F reset_milvus_connections
vdb_benchmark/tests/tests/test_issue_375_chunked_insert_ids.py::TestFlatGtCoverageGuard::test_coverage_threshold[10000-1000000-False] 
        SETUP    F reset_milvus_connections
        SETUP    F flat_count[10000]
        SETUP    F source_count[1000000]
        SETUP    F should_pass[False]
        vdb_benchmark/tests/tests/test_issue_375_chunked_insert_ids.py::TestFlatGtCoverageGuard::test_coverage_threshold[10000-1000000-False] (fixtures used: flat_count, reset_milvus_connections, should_pass, source_count)PASSED
        TEARDOWN F should_pass[False]
        TEARDOWN F source_count[1000000]
        TEARDOWN F flat_count[10000]
        TEARDOWN F reset_milvus_connections
vdb_benchmark/tests/tests/test_issue_375_chunked_insert_ids.py::TestFlatGtCoverageGuard::test_coverage_threshold[100000-1000000-False] 
        SETUP    F reset_milvus_connections
        SETUP    F flat_count[100000]
        SETUP    F source_count[1000000]
        SETUP    F should_pass[False]
        vdb_benchmark/tests/tests/test_issue_375_chunked_insert_ids.py::TestFlatGtCoverageGuard::test_coverage_threshold[100000-1000000-False] (fixtures used: flat_count, reset_milvus_connections, should_pass, source_count)PASSED
        TEARDOWN F should_pass[False]
        TEARDOWN F source_count[1000000]
        TEARDOWN F flat_count[100000]
        TEARDOWN F reset_milvus_connections
vdb_benchmark/tests/tests/test_issue_375_chunked_insert_ids.py::TestFlatGtCoverageGuard::test_coverage_threshold[0-1000000-False] 
        SETUP    F reset_milvus_connections
        SETUP    F flat_count[0]
        SETUP    F source_count[1000000]
        SETUP    F should_pass[False]
        vdb_benchmark/tests/tests/test_issue_375_chunked_insert_ids.py::TestFlatGtCoverageGuard::test_coverage_threshold[0-1000000-False] (fixtures used: flat_count, reset_milvus_connections, should_pass, source_count)PASSED
        TEARDOWN F should_pass[False]
        TEARDOWN F source_count[1000000]
        TEARDOWN F flat_count[0]
        TEARDOWN F reset_milvus_connections
vdb_benchmark/tests/tests/test_load_vdb.py::TestVectorGeneration::test_uniform_vector_generation 
        SETUP    F reset_milvus_connections
        vdb_benchmark/tests/tests/test_load_vdb.py::TestVectorGeneration::test_uniform_vector_generation (fixtures used: reset_milvus_connections)PASSED
        TEARDOWN F reset_milvus_connections
vdb_benchmark/tests/tests/test_load_vdb.py::TestVectorGeneration::test_normal_vector_generation 
        SETUP    F reset_milvus_connections
        vdb_benchmark/tests/tests/test_load_vdb.py::TestVectorGeneration::test_normal_vector_generation (fixtures used: reset_milvus_connections)PASSED
        TEARDOWN F reset_milvus_connections
vdb_benchmark/tests/tests/test_load_vdb.py::TestVectorGeneration::test_normalized_vector_generation 
        SETUP    F reset_milvus_connections
        vdb_benchmark/tests/tests/test_load_vdb.py::TestVectorGeneration::test_normalized_vector_generation (fixtures used: reset_milvus_connections)PASSED
        TEARDOWN F reset_milvus_connections
vdb_benchmark/tests/tests/test_load_vdb.py::TestVectorGeneration::test_chunked_vector_generation 
        SETUP    F reset_milvus_connections
        vdb_benchmark/tests/tests/test_load_vdb.py::TestVectorGeneration::test_chunked_vector_generation (fixtures used: reset_milvus_connections)PASSED
        TEARDOWN F reset_milvus_connections
vdb_benchmark/tests/tests/test_load_vdb.py::TestVectorGeneration::test_vector_generation_with_ids 
        SETUP    F reset_milvus_connections
        vdb_benchmark/tests/tests/test_load_vdb.py::TestVectorGeneration::test_vector_generation_with_ids (fixtures used: reset_milvus_connections)PASSED
        TEARDOWN F reset_milvus_connections
vdb_benchmark/tests/tests/test_load_vdb.py::TestVectorGeneration::test_vector_generation_progress_tracking 
        SETUP    F reset_milvus_connections
        vdb_benchmark/tests/tests/test_load_vdb.py::TestVectorGeneration::test_vector_generation_progress_tracking (fixtures used: reset_milvus_connections)PASSED
        TEARDOWN F reset_milvus_connections
vdb_benchmark/tests/tests/test_load_vdb.py::TestVectorLoading::test_batch_insertion 
        SETUP    F reset_milvus_connections
        SETUP    F mock_collection
        vdb_benchmark/tests/tests/test_load_vdb.py::TestVectorLoading::test_batch_insertion (fixtures used: mock_collection, reset_milvus_connections)PASSED
        TEARDOWN F mock_collection
        TEARDOWN F reset_milvus_connections
vdb_benchmark/tests/tests/test_load_vdb.py::TestVectorLoading::test_insertion_with_error_handling 
        SETUP    F reset_milvus_connections
        SETUP    F mock_collection
        vdb_benchmark/tests/tests/test_load_vdb.py::TestVectorLoading::test_insertion_with_error_handling (fixtures used: mock_collection, reset_milvus_connections)PASSED
        TEARDOWN F mock_collection
        TEARDOWN F reset_milvus_connections
vdb_benchmark/tests/tests/test_load_vdb.py::TestVectorLoading::test_parallel_insertion 
        SETUP    F reset_milvus_connections
        SETUP    F mock_collection
        vdb_benchmark/tests/tests/test_load_vdb.py::TestVectorLoading::test_parallel_insertion (fixtures used: mock_collection, reset_milvus_connections)PASSED
        TEARDOWN F mock_collection
        TEARDOWN F reset_milvus_connections
vdb_benchmark/tests/tests/test_load_vdb.py::TestVectorLoading::test_insertion_with_metadata 
        SETUP    F reset_milvus_connections
        SETUP    F mock_collection
        vdb_benchmark/tests/tests/test_load_vdb.py::TestVectorLoading::test_insertion_with_metadata (fixtures used: mock_collection, reset_milvus_connections)PASSED
        TEARDOWN F mock_collection
        TEARDOWN F reset_milvus_connections
vdb_benchmark/tests/tests/test_load_vdb.py::TestVectorLoading::test_insertion_rate_monitoring 
        SETUP    F reset_milvus_connections
        SETUP    F mock_collection
        vdb_benchmark/tests/tests/test_load_vdb.py::TestVectorLoading::test_insertion_rate_monitoring (fixtures used: mock_collection, reset_milvus_connections)PASSED
        TEARDOWN F mock_collection
        TEARDOWN F reset_milvus_connections
vdb_benchmark/tests/tests/test_load_vdb.py::TestVectorLoading::test_load_checkpoint_resume 
SETUP    S test_data_dir
        SETUP    F reset_milvus_connections
        vdb_benchmark/tests/tests/test_load_vdb.py::TestVectorLoading::test_load_checkpoint_resume (fixtures used: reset_milvus_connections, test_data_dir)PASSED
        TEARDOWN F reset_milvus_connections
vdb_benchmark/tests/tests/test_load_vdb.py::TestLoadOptimization::test_dynamic_batch_sizing 
        SETUP    F reset_milvus_connections
        vdb_benchmark/tests/tests/test_load_vdb.py::TestLoadOptimization::test_dynamic_batch_sizing (fixtures used: reset_milvus_connections)PASSED
        TEARDOWN F reset_milvus_connections
vdb_benchmark/tests/tests/test_load_vdb.py::TestLoadOptimization::test_memory_aware_loading 
        SETUP    F reset_milvus_connections
        vdb_benchmark/tests/tests/test_load_vdb.py::TestLoadOptimization::test_memory_aware_loading (fixtures used: reset_milvus_connections)PASSED
        TEARDOWN F reset_milvus_connections
vdb_benchmark/tests/tests/test_load_vdb.py::TestLoadOptimization::test_flush_optimization 
        SETUP    F reset_milvus_connections
        SETUP    F mock_collection
        vdb_benchmark/tests/tests/test_load_vdb.py::TestLoadOptimization::test_flush_optimization (fixtures used: mock_collection, reset_milvus_connections)PASSED
        TEARDOWN F mock_collection
        TEARDOWN F reset_milvus_connections
TEARDOWN S test_data_dir

================================================================= 25 passed in 0.13s =================================================================
smrc@dskbd029:~/Storage_Repo_Tests/storage_vdb375$ htop
smrc@dskbd029:~/Storage_Repo_Tests/storage_vdb375$ uv run python vdb_benchmark/vdbbench/load_vdb.py \
    --host 127.0.0.1 --collection test375_smoke \
    --num-vectors 50000 --chunk-size 10000 --dimension 1536 \
    --batch-size 5000 --distribution uniform
2026-05-26 00:56:51,123 - INFO - Connected to Milvus server at 127.0.0.1:19530
2026-05-26 00:56:51,123 - WARNING - FLOAT16 data type not available in this version of pymilvus. Using FLOAT_VECTOR instead.
2026-05-26 00:56:51,142 - INFO - Created collection 'test375_smoke' with 1536 dimensions and 1 shards
2026-05-26 00:56:51,142 - INFO - Creating index with parameters: {'index_type': 'DISKANN', 'metric_type': 'COSINE', 'params': {'MaxDegree': 16, 'SearchListSize': 200}}
2026-05-26 00:56:51,653 - INFO - Index creation command completed in 0.51 seconds
2026-05-26 00:56:51,653 - INFO - Generating 50000 vectors with 1536 dimensions using uniform distribution
2026-05-26 00:56:51,653 - INFO - Large vector count detected. Generating in chunks of 10,000 vectors
2026-05-26 00:56:51,653 - INFO - Generating chunk 1: 10,000 vectors
2026-05-26 00:56:51,877 - INFO - Generated chunk 0 (10,000 vectors) in 0.22 seconds. Progress: 0/50,000 vectors (0.0%)
2026-05-26 00:56:51,877 - INFO - Inserting chunk 1 (10,000 vectors) into collection 'test375_smoke' starting at id=0
2026-05-26 00:56:53,038 - INFO - Inserted batch 1/2: 50.00% complete, rate: 4308.30 vectors/sec, id_range=[0, 4999]
2026-05-26 00:56:54,045 - INFO - Inserted batch 2/2: 100.00% complete, rate: 4614.54 vectors/sec, id_range=[5000, 9999]
2026-05-26 00:56:54,045 - INFO - Inserted 10000 vectors in 2.17 seconds
2026-05-26 00:56:54,045 - INFO - Generating chunk 2: 10,000 vectors
2026-05-26 00:56:54,251 - INFO - Generated chunk 1 (10,000 vectors) in 0.21 seconds. Progress: 10,000/50,000 vectors (20.0%)
2026-05-26 00:56:54,251 - INFO - Inserting chunk 2 (10,000 vectors) into collection 'test375_smoke' starting at id=10000
2026-05-26 00:56:55,305 - INFO - Inserted batch 1/2: 50.00% complete, rate: 4744.59 vectors/sec, id_range=[10000, 14999]
2026-05-26 00:56:56,378 - INFO - Inserted batch 2/2: 100.00% complete, rate: 4702.16 vectors/sec, id_range=[15000, 19999]
2026-05-26 00:56:56,378 - INFO - Inserted 10000 vectors in 2.13 seconds
2026-05-26 00:56:56,378 - INFO - Generating chunk 3: 10,000 vectors
2026-05-26 00:56:56,583 - INFO - Generated chunk 2 (10,000 vectors) in 0.20 seconds. Progress: 20,000/50,000 vectors (40.0%)
2026-05-26 00:56:56,583 - INFO - Inserting chunk 3 (10,000 vectors) into collection 'test375_smoke' starting at id=20000
2026-05-26 00:56:57,526 - INFO - Inserted batch 1/2: 50.00% complete, rate: 5304.51 vectors/sec, id_range=[20000, 24999]
2026-05-26 00:56:58,458 - INFO - Inserted batch 2/2: 100.00% complete, rate: 5334.07 vectors/sec, id_range=[25000, 29999]
2026-05-26 00:56:58,458 - INFO - Inserted 10000 vectors in 1.87 seconds
2026-05-26 00:56:58,458 - INFO - Generating chunk 4: 10,000 vectors
2026-05-26 00:56:58,658 - INFO - Generated chunk 3 (10,000 vectors) in 0.20 seconds. Progress: 30,000/50,000 vectors (60.0%)
2026-05-26 00:56:58,659 - INFO - Inserting chunk 4 (10,000 vectors) into collection 'test375_smoke' starting at id=30000
2026-05-26 00:56:59,579 - INFO - Inserted batch 1/2: 50.00% complete, rate: 5432.54 vectors/sec, id_range=[30000, 34999]
2026-05-26 00:57:00,597 - INFO - Inserted batch 2/2: 100.00% complete, rate: 5159.16 vectors/sec, id_range=[35000, 39999]
2026-05-26 00:57:00,597 - INFO - Inserted 10000 vectors in 1.94 seconds
2026-05-26 00:57:00,597 - INFO - Generating chunk 5: 10,000 vectors
2026-05-26 00:57:00,802 - INFO - Generated chunk 4 (10,000 vectors) in 0.20 seconds. Progress: 40,000/50,000 vectors (80.0%)
2026-05-26 00:57:00,802 - INFO - Inserting chunk 5 (10,000 vectors) into collection 'test375_smoke' starting at id=40000
2026-05-26 00:57:01,893 - INFO - Inserted batch 1/2: 50.00% complete, rate: 4585.04 vectors/sec, id_range=[40000, 44999]
2026-05-26 00:57:02,787 - INFO - Inserted batch 2/2: 100.00% complete, rate: 5036.72 vectors/sec, id_range=[45000, 49999]
2026-05-26 00:57:02,787 - INFO - Inserted 10000 vectors in 1.99 seconds
2026-05-26 00:57:02,787 - INFO - Generated all 50,000 vectors in 11.13 seconds
2026-05-26 00:57:03,311 - INFO - Flush completed in 0.52 seconds
2026-05-26 00:57:03,311 - INFO - Starting to monitor index building progress (checking every 5 seconds)
2026-05-26 00:57:03,314 - INFO - Starting to monitor progress for collection: test375_smoke
2026-05-26 00:57:03,315 - INFO - Initial state: 0 of 50,000 rows indexed
2026-05-26 00:57:03,315 - INFO - Initial pending rows: 50,000
2026-05-26 00:57:08,317 - INFO - Progress: 0.00% complete... (0/50,000 rows) | Pending rows: 50,000
2026-05-26 00:57:13,323 - INFO - Progress: 0.00% complete... (0/50,000 rows) | Pending rows: 50,000
2026-05-26 00:57:18,329 - INFO - Progress: 0.00% complete... (0/50,000 rows) | Pending rows: 50,000
2026-05-26 00:57:23,333 - INFO - No pending rows detected. Assuming indexing phase is complete.
2026-05-26 00:57:23,334 - INFO - No pending rows for 0.0 seconds (waiting for 10 seconds to confirm)
2026-05-26 00:57:28,338 - INFO - No pending rows for 5.0 seconds (waiting for 10 seconds to confirm)
2026-05-26 00:57:33,341 - INFO - No pending rows for 10.0 seconds (waiting for 10 seconds to confirm)
2026-05-26 00:57:33,341 - INFO - No pending rows detected for 0 minutes. Process is considered complete.
2026-05-26 00:57:33,341 - INFO - Process fully complete! Total time: 0:00:30
2026-05-26 00:57:33,341 - INFO - Benchmark completed successfully!
smrc@dskbd029:~/Storage_Repo_Tests/storage_vdb375$ uv run python - <<'PY'
from pymilvus import connections, Collection
connections.connect("default", host="127.0.0.1", port="19530")
c = Collection("test375_smoke"); c.flush(); c.load()

# (a) Total physical rows
print("num_entities:", c.num_entities)            # expect 50000

# (b) PK range is contiguous and reaches the top — the real test.
# Pre-fix this maxed out at chunk_size-1 (9999).
tail = c.query(expr="id >= 49990", output_fields=["id"], limit=20)
ids = sorted(r["id"] for r in tail)
print("max id seen:", max(ids))                   # expect 49999, NOT 9999
print("tail ids:", ids)

# (c) Spot-check no duplicate at a chunk boundary
boundary = c.query(expr="id in [9999, 10000, 19999, 20000]",
                   output_fields=["id"], limit=10)
print("boundary ids found:", sorted(r['id'] for r in boundary))  # expect all four
PY
num_entities: 50000
max id seen: 49999
tail ids: [49990, 49991, 49992, 49993, 49994, 49995, 49996, 49997, 49998, 49999]
boundary ids found: [9999, 10000, 19999, 20000]
smrc@dskbd029:~/Storage_Repo_Tests/storage_vdb375$ uv run python vdb_benchmark/vdbbench/enhanced_bench.py \
    --host 127.0.0.1 --collection test375_smoke \
    --auto-create-flat --runtime 10 --queries 1000 \
    --recall-k 10 --search-limit 10 --batch-size 10 --processes 2

============================================================
ENHANCED VDB BENCH — runtime/query-count mode
============================================================
Results will be saved to: vdbbench_results/20260526_010852

============================================================
Database Verification and Collection Loading
============================================================
Connecting to Milvus server at 127.0.0.1:19530...
Collection test375_smoke already loaded.

Collection: test375_smoke  vectors=50000  dim=1536  index=DISKANN  metric=COSINE
Detected source vector field: 'vector'

============================================================
RECALL SETUP (outside benchmark timing)
============================================================
Ground truth is pre-computed using a FLAT (brute-force) index.
Using metric type: COSINE

Generating 1000 query vectors (dim=1536, seed=42)...
Generated 1000 query vectors.

Setting up FLAT collection: test375_smoke_flat_gt
Creating FLAT collection 'test375_smoke_flat_gt' from source 'test375_smoke'...
Source schema: pk_field='id' (INT64), vec_field='vector', vectors=50000
Copying 50000 vectors to FLAT collection (batch_size=5000)...
  Copied 50000/50000 vectors (100.0%)
Building FLAT index...
FLAT collection 'test375_smoke_flat_gt' ready with 50000 vectors.
Pre-computing ground truth for 1000 queries using FLAT index (top_k=10)...
Ground truth pre-computation complete: 1000 queries in 2.07s
Ground truth ready: 1000 queries pre-computed.

Collecting initial disk statistics...

============================================================
Benchmark Execution
============================================================
Starting benchmark: 2 processes × 500 queries/process
Recall: 1000 pre-generated queries, recall@10
NOTE: batch_end timing is placed BEFORE recall capture — performance unaffected.
NOTE: recall hits written to per-worker recall_hits_p<N>.jsonl files.
Staggering process startup by 0.500s
Starting process 0...
Process 0 initialized
Process 0 - Loading collection
Process 0: Writing results to vdbbench_results/20260526_010852/milvus_benchmark_p0.csv
Process 0: Starting benchmark ...
Process 0: Completed 100 queries in 0.31 seconds.
Starting process 1...
Process 1 initialized
Process 1 - Loading collection
Process 1: Writing results to vdbbench_results/20260526_010852/milvus_benchmark_p1.csv
Process 1: Starting benchmark ...
Process 0: Completed 200 queries in 0.58 seconds.
Process 1: Completed 100 queries in 0.25 seconds.
Process 0: Completed 300 queries in 0.87 seconds.
Process 1: Completed 200 queries in 0.51 seconds.
Process 0: Completed 400 queries in 1.13 seconds.
Process 1: Completed 300 queries in 0.77 seconds.
Process 0: Completed 500 queries in 1.41 seconds.
Process 0: Finished. Executed 500 queries in 1.44 seconds
Process 1: Completed 400 queries in 1.03 seconds.
Process 1: Completed 500 queries in 1.28 seconds.
Process 1: Finished. Executed 500 queries in 1.31 seconds
Reading final disk statistics...

Calculating recall from per-worker JSONL files...
  Loaded ANN hits for 500 unique query indices from 2 worker(s).
Calculating benchmark statistics...

============================================================
BENCHMARK SUMMARY
============================================================
Total Queries: 1000
Total Batches: 100
Total Runtime: 1.83s

QUERY STATISTICS
------------------------------------------------------------
Mean Latency:      2.70 ms
Median Latency:    2.56 ms
P95 Latency:       3.78 ms
P99 Latency:       4.01 ms
P99.9 Latency:     4.15 ms
P99.99 Latency:    4.15 ms
Throughput:        547.92 queries/second

BATCH STATISTICS
------------------------------------------------------------
Mean Batch Time:   26.97 ms
Median Batch Time: 25.62 ms
P95 Batch Time:    37.83 ms
P99 Batch Time:    40.07 ms
P99.9 Batch Time:  41.35 ms
P99.99 Batch Time: 41.48 ms
Max Batch Time:    41.49 ms
Batch Throughput:  37.07 batches/second

RECALL STATISTICS (recall@10)
------------------------------------------------------------
Mean Recall:       0.7292
Median Recall:     0.7000
Min Recall:        0.2000
Max Recall:        1.0000
P95 Recall:        0.9000
P99 Recall:        1.0000
Queries Evaluated: 500

DISK I/O DURING BENCHMARK
------------------------------------------------------------
Total Read:        679.21 MB  (372.15 MB/s,  47601 IOPS)
Total Write:       776.00 KB  (0.42 MB/s,  5 IOPS)

Per-Device Breakdown:
  sda:
    Read:  188.00 KB  (0.10 MB/s, 2 IOPS)
    Write: 252.00 KB  (0.13 MB/s, 0 IOPS)
  sda3:
    Read:  188.00 KB  (0.10 MB/s, 2 IOPS)
    Write: 252.00 KB  (0.13 MB/s, 0 IOPS)
  dm-0:
    Read:  188.00 KB  (0.10 MB/s, 2 IOPS)
    Write: 252.00 KB  (0.13 MB/s, 1 IOPS)
  nvme3n1:
    Read:  678.66 MB  (371.85 MB/s, 47596 IOPS)
    Write: 20.00 KB  (0.01 MB/s, 3 IOPS)

Detailed results: vdbbench_results/20260526_010852
Recall details:   vdbbench_results/20260526_010852/recall_stats.json
============================================================
smrc@dskbd029:~/Storage_Repo_Tests/storage_vdb375$ uv run python vdb_benchmark/vdbbench/list_collections.py --host 127.0.0.1 \
  | grep -i flat_gt        # expect ~50000 entities, not ~10000
2026-05-26 01:11:00,324 - INFO - Connected to Milvus server at 127.0.0.1:19530
2026-05-26 01:11:00,326 - INFO - Found 2 collections
2026-05-26 01:11:00,326 - INFO - Getting information for collection: test375_smoke
2026-05-26 01:11:00,740 - INFO - Getting information for collection: test375_smoke_flat_gt
2026-05-26 01:11:01,166 - INFO - Disconnected from Milvus server
| test375_smoke_flat_gt |          50000 |        1536 | FLAT          | COSINE         |            1 |

@idevasena idevasena requested a review from a team May 26, 2026 03:30
@github-actions
Copy link
Copy Markdown

MLCommons CLA bot All contributors have signed the MLCommons CLA ✍️ ✅

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

vdb benchmark shows a very low recall@10 because the flat_gt collection size is too small

2 participants