Issue
I need to create a Python bot to extract a column C from an Excel File 1, sheet 1 and to catalog in file 2 and sum how many numbers there are, from 0.00 to 0.99, from 1.00 to 1.99 etc. up to 12. All those numbers above 12 cataloged in the last row. Then I need to sum up how many all the numbers are.
I tried to write some code but it didn’t write anything on Excel file.
Solution
You can try something along the following lines;
- Read the Excel data file (Excel File 1) selecting just the required Column ('column C').
- Create an array of values 0.00 - 0.99, 1.00 - 1.99, 2.00 - 2.99, 3.00 - 3.99 up to a max of 12 and use it to create a new dataframe (df_write) grouping the values in the dataframe in the array ranges. Get the count for each range.
- Make a count of values greater than 12 and add as a new row to df_write.
- Sum all the values in the dataframe and add as a new row to df_write.
- Write the dataframe to Excel. In the example xlsxwriter is used as the engine which means the workbook (catalogfile) will be created/overwritten each time the code is run.
- Additional data/formatting can be included in the sheet. For example changing the text in a cell and adding a formula to count the total of all the grouped range values, which should equal the total number of rows read from the Excel data file (datafile).
import pandas as pd
datafile = "Excel File 1.xlsx"
catalogfile = 'Excel File 2.xlsx'
column = 'column C'
### Read specific column (column) from Excel Sheet
df_read = pd.read_excel(datafile, index_col=None, na_values=['NA'], usecols=[column])
# print(df_read)
### Create the dataframe of values within specified ranges to write to Excel
### Group ranges 0.00 - 0.99 in increments of 1 and make a count of each up to a max (12)
df_write = df_read.groupby(pd.cut(df_read[column], [float(i) - 0.01 for i in range(0, 13)])).count()
### Count values greater than 12 and add as row to the dataframe
df_write.loc['12+'] = df_read[df_read > 12].count()
### Sum all values in the column and add as row to the dataframe
df_write.loc[len(df_write.index) + 1] = df_read.sum()
### Rename Index Header
df_write.index.name = 'Range Totals'
### Rename Column Header
df_write.columns = ['Values Count']
### Write dataframe to Excel
### Using default engine Xlsxwriter so new workbook is created (any existing workbook is overwritten)
with pd.ExcelWriter(catalogfile) as writer:
df_write.to_excel(writer, sheet_name='Sheet1', index=True)
### Xlsxwriter formatting
workbook = writer.book
cell_format = workbook.add_format()
cell_format.set_bold(True)
ws = writer.sheets['Sheet1']
### Rename Row Header and add formula to count the totals for each range
### (should equal the total number of data rows read from Excel)
ws.write(df_write.size, 0, 'Column Total', cell_format)
ws.write_row(df_write.size + 1, 0, ['Total Rows', '=SUM(B2:B14)'], cell_format)
ws.autofit()
Example of what the Excel Sheet would look like for a Column consisting of 100 rows of data (i.e. hader excluded) read from the datafile.
The 'Range Totals' column is the Index column from the dataframe. The range text is as determined by the dataframe but actually cover the ranges 0.00 - 0.99, 1.00 - 1.99, 2.00 - 2.99, 3.00 - 3.99 etc.
If desired the index column can be dropped from the dataframe when writing to Excel and custom text written to the column using xlsxwriter instead, or a template with existing headers used (in this case the ExcelWriter would need to append mode and Openpyxl as the engine to write to an existing workbook).
Answered By - moken
0 comments:
Post a Comment
Note: Only a member of this blog may post a comment.