Problem description
I am providing custom formatters for specific columns as dict.
If frame is large enough and some columns are truncated - then wrong formatters are applied to the columns.
(In my case that leads to crushes as wrong data type is received by the formatter).
Please notice, that behavior changes depending on the width of the console window as different columns are displayed.
Problem investigation
I have examined the code of my version of panda (1.0.5) and compared with the last version in GitHub - the bug seems to be still there.
The source of the problem starts with this method (DataFrameFormatter._to_str_columns), when
frame is set to truncated frame = self.tr_frame and then self._format_col(i) is called with index of the column in the TRUNCATED frame:
def _to_str_columns(self) -> List[List[str]]:
"""
Render a DataFrame to a list of columns (as lists of strings).
"""
# this method is not used by to_html where self.col_space
# could be a string so safe to cast
self.col_space = cast(int, self.col_space)
frame = self.tr_frame
# may include levels names also
str_index = self._get_formatted_index(frame)
if not is_list_like(self.header) and not self.header:
stringified = []
for i, c in enumerate(frame):
fmt_values = self._format_col(i)
Then this "truncated" column index is passed to self._get_formatter:
def _format_col(self, i: int) -> List[str]:
frame = self.tr_frame
formatter = self._get_formatter(i) # the problem is HERE? _get_formatter(frame.columns[i]) ?
which uses full frame columns to retrieve formatter using index i which corresponds to the columns of the truncated frame:
# ...
else:
if is_integer(i) and i not in self.columns:
i = self.columns[i]
return self.formatters.get(i, None)
Details
INSTALLED VERSIONS
commit : None
python : 3.6.10.final.0
python-bits : 64
OS : Linux
OS-release : 5.4.0-37-generic
machine : x86_64
processor : x86_64
byteorder : little
LC_ALL : None
LANG : en_US.UTF-8
LOCALE : en_US.UTF-8
pandas : 1.0.5
numpy : 1.18.5
pytz : 2019.3
dateutil : 2.8.1
pip : 20.0.2
setuptools : 46.0.0.post20200309
Cython : 0.29.15
pytest : 5.4.1
hypothesis : 5.19.3
sphinx : 2.4.0
blosc : None
feather : None
xlsxwriter : 1.2.8
lxml.etree : 4.5.0
html5lib : 1.0.1
pymysql : None
psycopg2 : None
jinja2 : 2.11.1
IPython : 7.16.1
pandas_datareader: None
bs4 : 4.8.2
bottleneck : 1.3.2
fastparquet : None
gcsfs : None
lxml.etree : 4.5.0
matplotlib : 3.2.2
numexpr : 2.7.1
odfpy : None
openpyxl : 3.0.3
pandas_gbq : None
pyarrow : None
pytables : None
pytest : 5.4.1
pyxlsb : None
s3fs : None
scipy : 1.4.1
sqlalchemy : 1.3.15
tables : 3.4.4
tabulate : 0.8.3
xarray : 0.15.0
xlrd : 1.2.0
xlwt : 1.3.0
xlsxwriter : 1.2.8
numba : 0.50.1
Problem description
I am providing custom formatters for specific columns as dict.
If frame is large enough and some columns are truncated - then wrong formatters are applied to the columns.
(In my case that leads to crushes as wrong data type is received by the formatter).
Please notice, that behavior changes depending on the width of the console window as different columns are displayed.
Problem investigation
I have examined the code of my version of panda (1.0.5) and compared with the last version in GitHub - the bug seems to be still there.
The source of the problem starts with this method (
DataFrameFormatter._to_str_columns), whenframe is set to truncated
frame = self.tr_frameand thenself._format_col(i)is called with index of the column in the TRUNCATED frame:Then this "truncated" column index is passed to
self._get_formatter:which uses full frame columns to retrieve formatter using index
iwhich corresponds to the columns of the truncated frame:Details
INSTALLED VERSIONS
commit : None
python : 3.6.10.final.0
python-bits : 64
OS : Linux
OS-release : 5.4.0-37-generic
machine : x86_64
processor : x86_64
byteorder : little
LC_ALL : None
LANG : en_US.UTF-8
LOCALE : en_US.UTF-8
pandas : 1.0.5
numpy : 1.18.5
pytz : 2019.3
dateutil : 2.8.1
pip : 20.0.2
setuptools : 46.0.0.post20200309
Cython : 0.29.15
pytest : 5.4.1
hypothesis : 5.19.3
sphinx : 2.4.0
blosc : None
feather : None
xlsxwriter : 1.2.8
lxml.etree : 4.5.0
html5lib : 1.0.1
pymysql : None
psycopg2 : None
jinja2 : 2.11.1
IPython : 7.16.1
pandas_datareader: None
bs4 : 4.8.2
bottleneck : 1.3.2
fastparquet : None
gcsfs : None
lxml.etree : 4.5.0
matplotlib : 3.2.2
numexpr : 2.7.1
odfpy : None
openpyxl : 3.0.3
pandas_gbq : None
pyarrow : None
pytables : None
pytest : 5.4.1
pyxlsb : None
s3fs : None
scipy : 1.4.1
sqlalchemy : 1.3.15
tables : 3.4.4
tabulate : 0.8.3
xarray : 0.15.0
xlrd : 1.2.0
xlwt : 1.3.0
xlsxwriter : 1.2.8
numba : 0.50.1