Issue
I have an excel sheet with more than 1000 columns and 300 rows. Some these cells have normal data, while some cells have a background color of red and some cells have normal white background but the text is red in color. For example, my excel sheet looks like below:
I am reading this excel sheet into Python (pandas) to use it as a dataframe and perform further actions on it. However, the red text and red cells need to be treated differently than the normal cells.
Therefore, I would like to split the above table into 3 tables, such that: Table one has all the cells but the red background cells are empty. Table 2 has only those rows and columns where the text is red. Table 3 has only those rows and columns where the background is red.
I guess it cannot be done in Pandas. I tried using StyleFrame but failed.
Can anyone help in this regard? Is there any python package that is helpful in this case?
Solution
This is pretty much the way to achieve this. It is not pretty as StyleFrame wasn't really designed to be used this way.
Reading the source Excel file
import numpy as np
from styleframe import StyleFrame, utils
sf = StyleFrame.read_excel('test.xlsx', read_style=True, use_openpyxl_styles=False)
1) All cells but cells with red background are empty
def empty_red_background_cells(cell):
if cell.style.bg_color in {utils.colors.red, 'FFFF0000'}:
cell.value = np.nan
return cell
sf_1 = StyleFrame(sf.applymap(empty_red_background_cells))
print(sf_1)
# C1 C2 C3 C4 C5 C6
# 0 a1 1.0 s nan 1001.0 1234.0
# 1 a2 12.0 s nan 1001.0 4322.0
# 2 a3 nan s nan 1001.0 4432.0
# 3 a4 232.0 s nan 1001.0 4432.0
# 4 a5 343.0 s 99.0 nan nan
# 5 a6 3.0 s 99.0 1001.0 4432.0
# 6 a7 34.0 s 99.0 1001.0 4432.0
# 7 a8 5.0 s nan 1001.0 4432.0
# 8 a9 6.0 s 99.0 1001.0 4432.0
# 9 a10 565.0 s 99.0 nan 4432.0
# 10 a11 5543.0 s 99.0 1001.0 4432.0
# 11 a12 112.0 s 99.0 1001.0 nan
# 12 a13 34345.0 s 99.0 1001.0 4432.0
# 13 a14 0.0 s 99.0 nan nan
# 14 a15 453.0 s 99.0 1001.0 nan
2) Only cells with red text
def only_cells_with_red_text(cell):
return cell if cell.style.font_color in {utils.colors.red, 'FFFF0000'} else np.nan
sf_2 = StyleFrame(sf.applymap(only_cells_with_red_text).dropna(axis=(0, 1), how='all'))
# passing a tuple to pandas.dropna is deprecated since pandas 0.23.0, but this can be
# avoided by simply calling dropna twice, once with axis=0 and once with axis=1
print(sf_2)
# C2 C6
# 7 nan 4432.0
# 8 nan 4432.0
# 9 565.0 nan
# 10 5543.0 nan
# 11 112.0 nan
3) Only cells with red background
def only_cells_with_red_background(cell):
return cell if cell.style.bg_color in {utils.colors.red, 'FFFF0000'} else np.nan
sf_3 = StyleFrame(sf.applymap(only_cells_with_red_background).dropna(axis=(0, 1), how='all'))
# passing a tuple to pandas.dropna is deprecated since pandas 0.23.0, but this can be
# avoided by simply calling dropna twice, once with axis=0 and once with axis=1
print(sf_3)
# C4 C6
# 0 99.0 nan
# 1 99.0 nan
# 2 99.0 nan
# 3 99.0 nan
# 13 nan 4432.0
# 14 nan 4432.0
Answered By - DeepSpace
0 comments:
Post a Comment
Note: Only a member of this blog may post a comment.