Replace deprecated pyarrow.HadoopFileSystem import with pyarrow.fs.HadoopFileSystem#870
Conversation
…pyarrow.fs.hadoopfilesystem` Signed-off-by: Nguyen Huy Hoang <181364121+huyhoang171106@users.noreply.github.com>
There was a problem hiding this comment.
Pull request overview
Updates Papermill’s PyArrow/HDFS optional dependency handling in papermill/iorw.py, intending to address a deprecation warning / forward-compatibility concern around HadoopFileSystem imports.
Changes:
- Adds a missing-dependency fallback assignment for
FileSelectorwhenpyarrowis not installed. - Keeps (already-present)
pyarrow.fsimport forFileSelectorandHadoopFileSystem.
💡 Add Copilot custom instructions for smarter, more guided reviews. Learn how to get started.
| FileSelector = missing_dependency_generator("pyarrow", "hdfs") | ||
| HadoopFileSystem = missing_dependency_generator("pyarrow", "hdfs") | ||
|
|
There was a problem hiding this comment.
missing_dependency_generator returns a zero-arg function. Assigning it to FileSelector/HadoopFileSystem means later calls like HadoopFileSystem(host="default") and FileSelector(path) will raise TypeError (unexpected args) instead of the intended PapermillOptionalDependencyException. Wrap the generator so the stub accepts *args, **kwargs (or adjust missing_dependency_generator) and raises the optional-dependency exception for both symbols; consider adding a regression test for missing-pyarrow behavior in the HDFS handler.
| FileSelector = missing_dependency_generator("pyarrow", "hdfs") | |
| HadoopFileSystem = missing_dependency_generator("pyarrow", "hdfs") | |
| _missing_pyarrow_hdfs = missing_dependency_generator("pyarrow", "hdfs") | |
| def FileSelector(*args, **kwargs): | |
| return _missing_pyarrow_hdfs() | |
| def HadoopFileSystem(*args, **kwargs): | |
| return _missing_pyarrow_hdfs() |
| try: | ||
| from pyarrow.fs import FileSelector, HadoopFileSystem | ||
| except ImportError: | ||
| FileSelector = missing_dependency_generator("pyarrow", "hdfs") |
There was a problem hiding this comment.
The PR title/description says it replaces a deprecated from pyarrow import HadoopFileSystem import, but the code already imports HadoopFileSystem from pyarrow.fs here. Please double-check whether the intended import change is already present on the target branch (and update the PR description/title), or whether this diff is missing the actual deprecated-import replacement.
Summary
The warning is caused by an outdated import in
papermill/iorw.py(around the reported line 50):from pyarrow import HadoopFileSystem. In PyArrow 2.0+, this symbol is deprecated at the top-level and should be imported frompyarrow.fs. Papermill’s optional dependency already allowspyarrow>=2, so current environments will keep emittingFutureWarninguntil this import is updated. This is not an immediate runtime failure, but it is a forward-compatibility risk (likely breakage in a future PyArrow release).Files changed
papermill/iorw.py(modified)Testing
What does this PR do?
Fixes #<issue_number>
Closes #816