Automation to Read From Website and Write to Spreadsheet
A Elementary Guide to Automate Your Excel Reporting with Python
Use openpyxl to automate your Excel reporting with Python
Let's face it; no affair what our job is, sooner or later, nosotros will have to bargain with repetitive tasks like updating a daily report in Excel. Things could become worse if you work for a visitor that doesn't work with Python because you wouldn't exist able to solve this problem by using only Python.
But don't worry, you still can use your Pytho n skills to automate your excel reporting without having to convince your boss to migrate to Python! You lot simply have to employ the Python module openpyxl
to tell Excel what y'all want to do through Python. Different a previous article I wrote that encourages y'all to motion from Excel to Python, with openpyxl you would be able to stick to Excel while creating your reports with Python.
Table of Contents
1. The Dataset
2. Make a Pivot Table with Pandas
- Importing libraries
- Reading the Excel file
- Making a pivot table
- Exporting pivot table to Excel file
3. Make The Study with Openpyxl
- Creating row and column reference
- Calculation Excel charts through Python
- Applying Excel formulas through Python
- Formatting the written report sheet
4. Automating the Report with a Python Function (Total code)
- Applying the office to a single Excel file
- Applying the function to multiple Excel files
five. Schedule the Python Script to Run Monthly, Weekly, or Daily
The Dataset
In this guide, we'll employ an Excel file with sales data that is like to those files you lot have as inputs to make reports at piece of work. You lot tin download this file on Kaggle; however, it has a .csv format, then you should alter the extension to .xlsx
or merely download it from this Google Drive link (I likewise inverse the file name to supermarket_sales.xlsx)
Before writing any code, accept look at the file on Google Drive and familiarize yourself with it. That file is going to be the input to create the following written report through Python.
At present let'south make that report and automate it with Python!
Brand a Pivot Table with Pandas
Importing libraries
Now that you downloaded the Excel file, let's import the libraries nosotros'll use in this guide.
import pandas as pd
import openpyxl
from openpyxl import load_workbook
from openpyxl.styles import Font
from openpyxl.nautical chart import BarChart, Reference
import cord
We'll utilize Pandas to read the Excel file, create a pivot table, and consign it to Excel. Then we'll use the Openpyxl library to write Excel formulas, make charts and format the spreadsheet through Python. Finally, we'll create a Python function to automate this process.
Note: If you don't have those libraries installed in Python, you can easily install them by writing pip install pandas
and pip install openpyxl
on your terminal or command prompt.
Reading the Excel file
Earlier nosotros read the Excel file, make certain the file is in the aforementioned place where your Python script is located. Then, read the Excel file with pd.read_excel()
like in the post-obit lawmaking.
excel_file = pd.read_excel('supermarket_sales.xlsx')
excel_file[['Gender', 'Production line', 'Total']]
The file has many columns but nosotros'll but utilize the Gender, Product line, and Total columns for the report we're going to create. To show you how they look like, I selected them using double brackets. If nosotros print this on Jupyter Notebooks, you lot'll run into the post-obit dataframe that looks like an Excel spreadsheet.
Making a pin table
We can easily create a pin table from the excel_file
dataframe previously created. We merely demand to use the .pivot_table()
method. Let's say nosotros want to create a pivot table that shows the total coin spent by males and females on the dissimilar product lines. To practice so, nosotros write the post-obit lawmaking.
report_table = excel_file.pivot_table(index='Gender',
columns='Product line',
values='Total',
aggfunc='sum').round(0)
The report_table
should look something like this.
Exporting pivot table to Excel file
To export the previous pin table created we employ the .to_excel()
method. Within parentheses, we have to write the name of the output Excel file. In this instance, I'll proper name this file as report_2021.xlsx
We can besides specify the proper noun of the sheet nosotros want to create and in which cell the pivot tabular array should be located.
report_table.to_excel('report_2021.xlsx',
sheet_name='Written report',
startrow=4)
Now the Excel file is exported in the same folder your Python script is located.
Make The Study with Openpyxl
Every time nosotros desire to access a workbook we'll use the load_workbook
imported from openpyxl
and then save it with the .salve()
method. In the following sections, I'll exist loading and saving the workbook every fourth dimension nosotros modify the workbook; still, you lot but demand to do this once (like in the full code shown at the end of this guide)
Creating row and column reference
To automate the report, we need to take the minimum and maximum active column/row, so the code we're going to write keeps working even if we add more data.
To obtain the references in the workbook, nosotros offset load the workbook with load_workbook()
and locate the sheet we desire to piece of work with using wb['name_of_sheet']
. Then we access the agile cells with .active
wb = load_workbook('report_2021.xlsx')
sheet = wb['Written report'] # cell references (original spreadsheet)
min_column = wb.active.min_column
max_column = wb.agile.max_column
min_row = wb.agile.min_row
max_row = wb.active.max_row
Y'all tin can impress the variables created to get an idea of what they mean. For this case, nosotros obtain these numbers.
Min Columns: 1
Max Columns: 7
Min Rows: 5
Max Rows: 7
Open thereport_2021.xlsx
we exported before to verify this.
As you can in the motion-picture show in a higher place, the minimum row is 5 and the maximum row is vii. Also, the minimum row is A (one) and the maximum row is M (seven). These references will be extremely useful for the following sections.
Adding Excel charts through Python
To create an Excel chart from the pivot table we created nosotros need to use the Barchart
module we imported earlier. To identify the position of the information and category values, we use the Reference
module from openpyxl
(we imported Reference
in the outset of this article)
wb = load_workbook('report_2021.xlsx')
sheet = wb['Report'] # barchart
barchart = BarChart() #locate data and categories
information = Reference(sheet,
min_col=min_column+1,
max_col=max_column,
min_row=min_row,
max_row=max_row) #including headers
categories = Reference(sheet,
min_col=min_column,
max_col=min_column,
min_row=min_row+1,
max_row=max_row) #not including headers # calculation data and categories
barchart.add_data(information, titles_from_data=True)
barchart.set_categories(categories) #location nautical chart
sheet.add_chart(barchart, "B12") barchart.title = 'Sales by Product line'
barchart.manner = 5 #choose the chart mode wb.save('report_2021.xlsx')
Subsequently writing that code, the report_2021.xlsx
file should look like this.
Breaking down the code:
-
barchart = BarChart()
initializes abarchart
variable from theBarchart
form -
information
andcategories
are variables that stand for where that data is located. Nosotros're using the column and row references nosotros divers in a higher place to automate this. Also, keep in listen that I'm including the headers indata
only non incategories
- We use
add_data
andset_categories
to add the necessary data to thebarchart
. Insideadd_data
I'm adding thetitles_from_data=Truthful
because I included the headers fordata
- We use
canvass.add_chart
to specify what we want to add to the "Written report" sail and in which cell we want to add information technology - We tin can modify the default title and chart mode using
barchart.title
andbarchart.style
- We save all the changes with
wb.save()
Applying Excel formulas through Python
Y'all can write Excel formulas through Python the same style you lot'd write in an Excel canvas. For example, let's say nosotros wish to sum the data in cells B5 and B6 and show it on jail cell B7 with the currency style.
sheet['B7'] = '=SUM(B5:B6)'
sheet['B7'].style = 'Currency'
That's pretty uncomplicated, right? We tin can repeat that from column B to 1000 or use a for loop to automate it. Only first, nosotros need to get the alphabet to have it equally a reference for the names that columns accept in Excel (A, B, C, …) To do so, we use the cord
library and write the post-obit code.
import string
alphabet = listing(cord.ascii_uppercase)
excel_alphabet = alphabet[0:max_column]
impress(excel_alphabet)
If we print this we'll obtain a list from A to Chiliad.
This happens considering first, we created an alphabet
list from A to Z, but and so we took a slice [0:max_column]
to match the length of this list (7) with the first seven letters of the alphabet (A-G).
Note: Python lists showtime on 0, and then A=0, B=1, C=ii, and so on. Too, the [a:b] slice notation takes b-a elements (starting with "a" and ending with "b-ane")
After this, we can make a loop through the columns and apply the sum formula only now with column references, so instead of writing this,
sail['B7'] = '=SUM(B5:B6)'
sheet['B7'].way = 'Currency'
now we include reference and put it inside a for loop.
wb = load_workbook('report_2021.xlsx')
canvas = wb['Report'] # sum in columns B-G
for i in excel_alphabet:
if i!='A':
canvas[f'{i}{max_row+1}'] = f'=SUM({i}{min_row+ane}:{i}{max_row})'
sheet[f'{i}{max_row+1}'].style = 'Currency' # adding total characterization
sheet[f'{excel_alphabet[0]}{max_row+1}'] = 'Total' wb.save('report_2021.xlsx')
Subsequently running the code, nosotros get the =SUM formula in the "Full" row for columns between B to G.
Breaking down the code:
-
for i in excel_alphabet
loops through all the active columns, only and so we excluded the A cavalcade withif i!='A'
because the A cavalcade doesn't contain numeric data -
sheet[f'{i}{max_row+1}'] = f'=SUM({i}{min_row+1}:{i}{max_row}'
is the same equally writingsheet['B7'] = '=SUM(B5:B6)'
but now we do that for columns A to K -
sheet[f'{i}{max_row+ane}'].style = 'Currency'
gives the currency manner to cells below the maximum row. - We add the 'Total' characterization to the A column below the maximum row with
sheet[f'{excel_alphabet[0]}{max_row+1}'] = 'Total'
Formatting the report sheet
To finish the report, nosotros tin add a championship, subtitle and as well customize their font.
wb = load_workbook('report_2021.xlsx')
canvass = wb['Study'] canvass['A1'] = 'Sales Written report'
sheet['A2'] = '2021'
sheet['A1'].font = Font('Arial', bold=True, size=20)
canvass['A2'].font = Font('Arial', assuming=True, size=10) wb.save('report_2021.xlsx')
You can add other parameters inside Font()
. On this website, you can notice a list of styles available.
The final report should expect similar the following picture.
Automating the Written report with a Python Function
Now that the report is ready, we can put all the lawmaking we've written so far inside a function that automates the written report, and so the next time we want to make this study we only have to introduce the file proper noun and run information technology.
Notes: For this function to piece of work, the file name should have the construction "sales_month.xlsx" Also, I added a few lines of lawmaking that use the name of the month/twelvemonth of the sales file equally a variable, so we can reuse it in the output file and subtitle of the report.
The code beneath might look intimidating, but it's but what we've written so far plus the new variables file_name, month_name,
and month_and_extension
.
Applying the function to a unmarried Excel file
Let'south imagine the original file we downloaded has the proper name "sales_2021.xlsx" instead of "supermarket_sales.xlsx" With this nosotros tin can apply the formula to the report by writing the following
automate_excel('sales_2021.xlsx')
Subsequently running this lawmaking, you'll meet an Excel file named "report_2021.xlsx" in the same folder your Python script is located.
Applying the function to multiple Excel files
Permit's imagine at present nosotros accept only monthly Excel files "sales_january.xlsx" "sales_february.xlsx" and "sales_march.xlsx" (You lot tin find those files on my Github to exam them)
Yous can either apply the formula one by one to get 3 reports
automate_excel('sales_january.xlsx')
automate_excel('sales_february.xlsx')
automate_excel('sales_march.xlsx')
or you lot could concatenate them starting time using pd.concat()
and and so apply the function simply once.
# read excel files
excel_file_1 = pd.read_excel('sales_january.xlsx')
excel_file_2 = pd.read_excel('sales_february.xlsx')
excel_file_3 = pd.read_excel('sales_march.xlsx') # concatenate files
new_file = pd.concat([excel_file_1,
excel_file_2,
excel_file_3], ignore_index=True) # export file
new_file.to_excel('sales_2021.xlsx') # apply role
automate_excel('sales_2021.xlsx')
Schedule the Python Script to Run Monthly, Weekly, or Daily
You can schedule the Python script we've written in this guide to run whenever you want on your calculator. You just need to apply the task scheduler or crontab on Windows and Mac respectively.
If you don't know how to schedule a chore, click on the guide below to learn how to practice it.
Source: https://towardsdatascience.com/a-simple-guide-to-automate-your-excel-reporting-with-python-9d35f143ef7
0 Response to "Automation to Read From Website and Write to Spreadsheet"
Postar um comentário