feat(OrigDatablock): add file count endpoints#2795
Conversation
There was a problem hiding this comment.
Hey - I've found 5 issues, and left some high level feedback:
- In
OrigDatablocksService.countFiles, accessingresult.countwill throw if the aggregation returns no documents; consider guarding against an empty aggregation result (e.g. defaultingresultto{ count: 0 }or checking for undefined) before readingcount. - In
countFiles,fieldsProjectionis typed asstring[]but defaulted to{}, which is inconsistent; defaulting to an empty array (e.g.filter.fields ?? []) would better match the type and avoid passing an object intoparsePipelineProjection. - The
@Querydecorator call incountFileshas mismatched closing parentheses/commas in the diff; it would be good to simplify/format this parameter declaration to avoid syntactic ambiguity and make it easier to read.
Prompt for AI Agents
Please address the comments from this code review:
## Overall Comments
- In `OrigDatablocksService.countFiles`, accessing `result.count` will throw if the aggregation returns no documents; consider guarding against an empty aggregation result (e.g. defaulting `result` to `{ count: 0 }` or checking for undefined) before reading `count`.
- In `countFiles`, `fieldsProjection` is typed as `string[]` but defaulted to `{}`, which is inconsistent; defaulting to an empty array (e.g. `filter.fields ?? []`) would better match the type and avoid passing an object into `parsePipelineProjection`.
- The `@Query` decorator call in `countFiles` has mismatched closing parentheses/commas in the diff; it would be good to simplify/format this parameter declaration to avoid syntactic ambiguity and make it easier to read.
## Individual Comments
### Comment 1
<location path="src/origdatablocks/origdatablocks.v4.controller.ts" line_range="745" />
<code_context>
+ )
+ queryFilter: string,
+ ) {
+ const parsedFilter = JSON.parse(queryFilter ?? "{}");
+
+ return this.origDatablocksService.countFiles(parsedFilter);
</code_context>
<issue_to_address>
**issue (bug_risk):** Unvalidated JSON.parse may throw on malformed filter input, causing 500 responses.
`JSON.parse(queryFilter ?? "{}")` will throw on any malformed `filter` query and result in a 500. Unless `FilterValidationPipe` guarantees valid JSON, please either wrap this in try/catch (or use a safe parse helper) and return an appropriate 4xx instead, and mirror the same handling in `countFilesPublic`.
</issue_to_address>
### Comment 2
<location path="src/origdatablocks/origdatablocks-public.v4.controller.ts" line_range="285-287" />
<code_context>
+ },
+ ),
+ )
+ queryFilter: string,
+ ) {
+ const parsedFilter = JSON.parse(queryFilter ?? "{}");
</code_context>
<issue_to_address>
**suggestion:** The queryFilter parameter is typed as required but used as if it may be undefined.
Here `queryFilter` is required, but `JSON.parse(queryFilter ?? "{}")` assumes it may be `undefined`. Since Nest can pass `undefined` when the query param is missing, please type this as `string | undefined` (or `queryFilter?: string`) so the typing matches actual runtime behavior.
```suggestion
)
queryFilter?: string,
) {
```
</issue_to_address>
### Comment 3
<location path="src/origdatablocks/origdatablocks.service.ts" line_range="390" />
<code_context>
+ filter: FilterQuery<OrigDatablockDocument>,
+ ): Promise<CountApiResponse> {
+ const whereFilter: FilterQuery<OrigDatablockDocument> = filter.where ?? {};
+ const fieldsProjection: string[] = filter.fields ?? {};
+
+ const pipeline: PipelineStage[] = [{ $match: whereFilter }];
</code_context>
<issue_to_address>
**issue (bug_risk):** Defaulting fieldsProjection to an object conflicts with its string[] type and intended usage.
`fieldsProjection` is declared as `string[]`, but the fallback is `{}`, which can cause type errors or break `isEmpty`/`parsePipelineProjection` if they expect an array. Please default to an empty array (e.g. `filter.fields ?? []`) or allow `undefined` in the type and handle it explicitly.
</issue_to_address>
### Comment 4
<location path="src/origdatablocks/origdatablocks.service.ts" line_range="421-425" />
<code_context>
+
+ pipeline.push({ $count: "count" });
+
+ const [result] = await this.origDatablockModel
+ .aggregate<{ count: number }>(pipeline)
+ .exec();
+
+ return { count: result.count ?? 0 };
+ }
+
</code_context>
<issue_to_address>
**issue (bug_risk):** Accessing result.count without guarding against an empty aggregation result can throw.
If the aggregation returns no documents, the destructured `result` will be `undefined`, so `result.count` will throw before `?? 0` is applied. Using optional chaining avoids this:
```ts
const [result] = await this.origDatablockModel
.aggregate<{ count: number }>(pipeline)
.exec();
return { count: result?.count ?? 0 };
```
</issue_to_address>
### Comment 5
<location path="src/origdatablocks/origdatablocks.service.ts" line_range="386" />
<code_context>
return { count };
}
+ async countFiles(
+ filter: FilterQuery<OrigDatablockDocument>,
+ ): Promise<CountApiResponse> {
</code_context>
<issue_to_address>
**issue (complexity):** Consider simplifying the new countFiles aggregation pipeline by counting list sizes directly and cleaning up the fields projection handling.
You can simplify `countFiles` while keeping behavior intact.
**1. Remove `$unwind`/`$count` in favor of `$project` + `$group`**
You don’t need to explode `dataFileList` just to count its elements. Project its size and sum:
```ts
async countFiles(
filter: FilterQuery<OrigDatablockDocument>,
): Promise<CountApiResponse> {
const whereFilter: FilterQuery<OrigDatablockDocument> = filter.where ?? {};
const fieldsProjection = filter.fields ?? [];
const pipeline: PipelineStage[] = [{ $match: whereFilter }];
this.addLookupFields(pipeline, filter.include);
if (!isEmpty(fieldsProjection)) {
const projection = parsePipelineProjection(fieldsProjection);
pipeline.push({ $project: projection });
}
// keep dataset constraint but simplify the stages
pipeline.push({
$lookup: {
from: "Dataset",
as: "dataset_temp",
let: { datasetId: "$datasetId" },
pipeline: [{ $match: { $expr: { $eq: ["$pid", "$$datasetId"] } } }],
},
});
// only keep origdatablocks for which a dataset exists
pipeline.push({
$match: {
$expr: { $gt: [{ $size: "$dataset_temp" }, 0] },
},
});
pipeline.push({ $unset: "dataset_temp" });
// count files by summing sizes of dataFileList
pipeline.push(
{
$project: {
fileCount: { $size: "$dataFileList" },
},
},
{
$group: {
_id: null,
count: { $sum: "$fileCount" },
},
},
);
const [result] = await this.origDatablockModel
.aggregate<{ count: number }>(pipeline)
.exec();
return { count: result?.count ?? 0 };
}
```
This keeps the dataset-existence constraint but removes `$addFields`, `$unwind`, and `$count`, making the intent “sum of list sizes” explicit.
**2. Fix `fieldsProjection` typing/initialization**
Right now `fieldsProjection` is typed as `string[]` but initialized with `{}` if `filter.fields` is an object (or falsy). Ensure its type/value align:
```ts
type FieldsInput = string[] | undefined; // adjust to whatever your filter type is
const fieldsProjection: string[] =
Array.isArray(filter.fields) ? filter.fields : [];
```
Or, if `filter.fields` can already be guaranteed as `string[]`, drop the `{}` default entirely:
```ts
const fieldsProjection: string[] = filter.fields ?? [];
```
This removes the implicit type mismatch and makes the projection logic easier to follow.
</issue_to_address>Help me be more useful! Please click 👍 or 👎 on each comment and I'll use the feedback to improve your reviews.
| return { count }; | ||
| } | ||
|
|
||
| async countFiles( |
There was a problem hiding this comment.
issue (complexity): Consider simplifying the new countFiles aggregation pipeline by counting list sizes directly and cleaning up the fields projection handling.
You can simplify countFiles while keeping behavior intact.
1. Remove $unwind/$count in favor of $project + $group
You don’t need to explode dataFileList just to count its elements. Project its size and sum:
async countFiles(
filter: FilterQuery<OrigDatablockDocument>,
): Promise<CountApiResponse> {
const whereFilter: FilterQuery<OrigDatablockDocument> = filter.where ?? {};
const fieldsProjection = filter.fields ?? [];
const pipeline: PipelineStage[] = [{ $match: whereFilter }];
this.addLookupFields(pipeline, filter.include);
if (!isEmpty(fieldsProjection)) {
const projection = parsePipelineProjection(fieldsProjection);
pipeline.push({ $project: projection });
}
// keep dataset constraint but simplify the stages
pipeline.push({
$lookup: {
from: "Dataset",
as: "dataset_temp",
let: { datasetId: "$datasetId" },
pipeline: [{ $match: { $expr: { $eq: ["$pid", "$$datasetId"] } } }],
},
});
// only keep origdatablocks for which a dataset exists
pipeline.push({
$match: {
$expr: { $gt: [{ $size: "$dataset_temp" }, 0] },
},
});
pipeline.push({ $unset: "dataset_temp" });
// count files by summing sizes of dataFileList
pipeline.push(
{
$project: {
fileCount: { $size: "$dataFileList" },
},
},
{
$group: {
_id: null,
count: { $sum: "$fileCount" },
},
},
);
const [result] = await this.origDatablockModel
.aggregate<{ count: number }>(pipeline)
.exec();
return { count: result?.count ?? 0 };
}This keeps the dataset-existence constraint but removes $addFields, $unwind, and $count, making the intent “sum of list sizes” explicit.
2. Fix fieldsProjection typing/initialization
Right now fieldsProjection is typed as string[] but initialized with {} if filter.fields is an object (or falsy). Ensure its type/value align:
type FieldsInput = string[] | undefined; // adjust to whatever your filter type is
const fieldsProjection: string[] =
Array.isArray(filter.fields) ? filter.fields : [];Or, if filter.fields can already be guaranteed as string[], drop the {} default entirely:
const fieldsProjection: string[] = filter.fields ?? [];This removes the implicit type mismatch and makes the projection logic easier to follow.
Description
This PR adds new endpoints to count files across origdatablocks linked to a specific dataset. It also includes API tests for the public and private file count endpoints.
Motivation
Fixes
Changes:
Tests included
Documentation
official documentation info
Summary by Sourcery
Add API support to count files across origdatablocks, including public and authenticated variants, and cover them with tests.
New Features:
Tests: