Fix legacy pdfjs dynamic import#6410
Conversation
There was a problem hiding this comment.
Code Review
This pull request introduces a loadLegacyPdfJs utility function in packages/components/src/utils.ts to handle the dynamic loading of the pdfjs-dist legacy ESM build, ensuring compatibility within a CommonJS environment by bypassing TypeScript's transpilation. This utility is integrated into the File, Pdf, and Arxiv document loaders. A review comment suggests that loadLegacyPdfJs should return the entire module object rather than a subset containing only getDocument and version to maintain consistency with standard dynamic imports and prevent potential breakage for consumers expecting other module exports.
| ): Promise<{ getDocument: unknown; version?: string }> => { | ||
| const modulePath = resolver('pdfjs-dist/legacy/build/pdf.mjs') | ||
| const pdfjs = await importer(pathToFileURL(modulePath).href) | ||
|
|
||
| return { | ||
| getDocument: pdfjs.getDocument, | ||
| version: pdfjs.version | ||
| } | ||
| } |
There was a problem hiding this comment.
The loadLegacyPdfJs function currently returns a subset of the pdfjs-dist module (getDocument and version). This might break compatibility if the consumer (e.g., PDFLoader or a user-defined script) expects other parts of the module, such as GlobalWorkerOptions or OPS. It is safer and more flexible to return the entire module object, matching the behavior of a standard dynamic import(). Per repository guidelines, using a default implementation is preferred unless specific behavior is required, and signature changes are acceptable for internal functions if callers are unaffected.
export const loadLegacyPdfJs = async (
importer: NativeModuleImporter = nativeImport,
resolver: ModuleResolver = (specifier) => require.resolve(specifier)
): Promise<any> => {
const modulePath = resolver('pdfjs-dist/legacy/build/pdf.mjs')
return await importer(pathToFileURL(modulePath).href)
}References
- Use a default (fallback) implementation unless the specific implementation has meaningfully different behavior or provides better error messages.
- A signature change that would be a breaking change for a public API is acceptable for an internal function if all its callers are known and unaffected by the change.
Summary
loadLegacyPdfJshelper that resolvespdfjs-dist/legacy/build/pdf.mjsand loads it with a native dynamicfile://import.pdfFile.legacyBuild, preserving existing upload config defaults.Testing
PATH="/opt/homebrew/opt/node@20/bin:$PATH" /Users/hermes-runtime/.npm/_npx/ee7ca6831d726ff5/node_modules/.bin/pnpm --filter flowise-components test -- src/utils.test.ts --runInBandPATH="/opt/homebrew/opt/node@20/bin:$PATH" /Users/hermes-runtime/.npm/_npx/ee7ca6831d726ff5/node_modules/.bin/pnpm --filter flowise-components buildPATH="/opt/homebrew/opt/node@20/bin:$PATH" ./node_modules/.bin/eslint packages/components/src/utils.ts packages/components/src/utils.test.ts packages/components/nodes/documentloaders/Pdf/Pdf.ts packages/components/nodes/documentloaders/File/File.ts packages/components/nodes/tools/Arxiv/core.ts --ext ts --report-unused-disable-directives --max-warnings 0PATH="/opt/homebrew/opt/node@20/bin:$PATH" node -e "const u=require('./packages/components/dist/src/utils.js'); u.loadLegacyPdfJs().then(m=>console.log(typeof m.getDocument, m.version)).catch(e=>{console.error(e); process.exit(1)})"PATH="/opt/homebrew/opt/node@20/bin:/Users/hermes-runtime/.npm/_npx/ee7ca6831d726ff5/node_modules/.bin:$PATH" ./node_modules/.bin/eslint packages/ui/src/ui-component/extended/FileUpload.jsx --ext js,jsx --report-unused-disable-directives --max-warnings 0PATH="/opt/homebrew/opt/node@20/bin:$PATH" /Users/hermes-runtime/.npm/_npx/ee7ca6831d726ff5/node_modules/.bin/pnpm --filter flowise-ui build