Python Libraries for Excel Report Generation
This guide explains each library used in NetSuite Python report development and how they work together to generate Excel files.
Overview: How Libraries Work Together
┌─────────────────────────────────────────────────────────────────────────────┐
│ EXCEL REPORT GENERATION FLOW │
├─────────────────────────────────────────────────────────────────────────────┤
│ │
│ ┌──────────┐ ┌──────────┐ ┌─────────────┐ ┌──────────────────┐ │
│ │ NumPy │───►│ Pandas │───►│ XlsxWriter │───►│ Excel File │ │
│ │ │ │ │ │ │ │ (.xlsx) │ │
│ └──────────┘ └──────────┘ └─────────────┘ └──────────────────┘ │
│ │ │ │ │
│ ▼ ▼ ▼ │
│ Calculations Data Processing Formatting │
│ & Arrays & Transformation & Styling │
│ │
│ ┌──────────┐ ┌──────────┐ │
│ │ OpenPyXL │ │ Jupyter │ │
│ │ │ │ Notebook │ │
│ └──────────┘ └──────────┘ │
│ │ │ │
│ ▼ ▼ │
│ Reading Excel Development │
│ Files Environment │
│ │
└─────────────────────────────────────────────────────────────────────────────┘
Jupyter Notebook
What is Jupyter?
Jupyter Notebook is an interactive development environment that runs in your browser. It allows you to write and execute Python code in "cells" and see results immediately.
Why Use Jupyter for Report Development?
| Feature | Benefit |
|---|---|
| Cell-based execution | Run code piece by piece, not all at once |
| Immediate feedback | See results instantly after each cell |
| Data visualization | View DataFrames as formatted tables |
| Easy debugging | Test and fix code interactively |
| Documentation | Mix code with markdown explanations |
Jupyter Workflow Example
# Cell 1: Load your data
import pandas as pd
import json
with open('test_data.json', 'r') as f:
data = json.load(f)
print(f"Loaded {len(data)} records")
# Cell 2: Process data (run separately)
df = pd.DataFrame(data)
df.head() # See first 5 rows immediately
# Cell 3: Check calculations
df['total'] = df['amount1'] + df['amount2']
df[['amount1', 'amount2', 'total']] # Verify calculation
# Cell 4: Generate Excel (only run when ready)
df.to_excel('output.xlsx', index=False)
print("Excel file created!")
Key Jupyter Shortcuts
| Shortcut | Action |
|---|---|
Shift+Enter | Run cell and move to next |
Ctrl+Enter | Run cell and stay |
B | Insert cell below |
A | Insert cell above |
DD | Delete cell |
M | Change to Markdown |
Y | Change to Code |
Pandas
What is Pandas?
Pandas is the core library for data manipulation. Think of it as "Excel formulas and pivot tables in Python".
Core Concept: DataFrame
A DataFrame is like an Excel spreadsheet - it has rows and columns.
import pandas as pd
# Creating a DataFrame from NetSuite data (JSON format)
data = [
{"customer": "PT. ABC", "currency": "IDR", "amount": 5000000},
{"customer": "XYZ Corp", "currency": "USD", "amount": 1000},
{"customer": "PT. DEF", "currency": "IDR", "amount": 3000000}
]
df = pd.DataFrame(data)
print(df)
Output:
customer currency amount
0 PT. ABC IDR 5000000
1 XYZ Corp USD 1000
2 PT. DEF IDR 3000000
Common Pandas Operations for Reports
1. Filtering Data (Like Excel Filter)
# Filter rows where currency is IDR
idr_customers = df[df['currency'] == 'IDR']
# Filter with multiple conditions
large_idr = df[(df['currency'] == 'IDR') & (df['amount'] > 4000000)]
2. Adding Calculated Columns (Like Excel Formulas)
# Simple calculation
df['amount_with_tax'] = df['amount'] * 1.11
# Conditional calculation (like IF formula)
import numpy as np
df['category'] = np.where(df['amount'] > 1000000, 'Large', 'Small')
3. Grouping and Aggregation (Like Pivot Tables)
# Sum by currency
by_currency = df.groupby('currency')['amount'].sum()
# Multiple aggregations
summary = df.groupby('currency').agg({
'amount': ['sum', 'mean', 'count']
})
4. Sorting Data
# Sort by amount descending
df_sorted = df.sort_values('amount', ascending=False)
# Sort by multiple columns
df_sorted = df.sort_values(['currency', 'amount'], ascending=[True, False])
5. Handling Missing Data
# Check for missing values
print(df.isnull().sum())
# Fill missing values
df = df.fillna(0) # Replace NaN with 0
df = df.fillna({'amount': 0, 'customer': 'Unknown'})
Pandas to Excel Connection
# Basic export
df.to_excel('report.xlsx', index=False)
# Export with XlsxWriter engine for formatting
with pd.ExcelWriter('report.xlsx', engine='xlsxwriter') as writer:
df.to_excel(writer, sheet_name='Data', index=False)
# Access workbook for formatting
workbook = writer.book
worksheet = writer.sheets['Data']
NumPy
What is NumPy?
NumPy (Numerical Python) provides fast mathematical operations on arrays. Pandas is built on top of NumPy.
Why NumPy for Reports?
| Use Case | NumPy Function |
|---|---|
| Simple IF-ELSE | np.where() |
| Multiple conditions | np.select() |
| Mathematical operations | np.sum(), np.mean() |
| Handling NaN values | np.nan, np.isnan() |
Common NumPy Operations
1. Simple Conditional Logic (IF-ELSE)
import numpy as np
import pandas as pd
df = pd.DataFrame({
'amount': [500000, 2000000, 800000, 5000000]
})
# Like Excel: =IF(A1>1000000, "Large", "Small")
df['size'] = np.where(df['amount'] > 1000000, 'Large', 'Small')
Result:
amount size
0 500000 Small
1 2000000 Large
2 800000 Small
3 5000000 Large
2. Multiple Conditions (Nested IF)
# Like Excel: =IF(A1>5M, "Very Large", IF(A1>1M, "Large", IF(A1>500K, "Medium", "Small")))
conditions = [
df['amount'] > 5000000,
df['amount'] > 1000000,
df['amount'] > 500000
]
choices = ['Very Large', 'Large', 'Medium']
df['category'] = np.select(conditions, choices, default='Small')
3. Currency Conversion
# Convert foreign currency to IDR
conditions = [
df['currency'] == 'USD',
df['currency'] == 'SGD',
df['currency'] == 'EUR'
]
rates = [15500, 11500, 17000]
df['amount_idr'] = np.select(
conditions,
[df['amount'] * rate for rate in rates],
default=df['amount'] # Already IDR
)
4. Handling NaN in Calculations
# Replace NaN with 0 for calculations
df['amount'] = np.nan_to_num(df['amount'], nan=0)
# Check if value is NaN
df['is_missing'] = np.isnan(df['amount'])
XlsxWriter
What is XlsxWriter?
XlsxWriter creates Excel files with full formatting control - fonts, colors, borders, merged cells, conditional formatting, and more.
Why XlsxWriter?
| Feature | Description |
|---|---|
| Cell Formatting | Fonts, colors, borders, alignment |
| Number Formats | Currency, percentages, dates |
| Merged Cells | For headers and titles |
| Conditional Formatting | Highlight based on values |
| Column Widths | Auto or manual sizing |
| Multiple Sheets | Create workbooks with many sheets |
XlsxWriter Workflow
import xlsxwriter
import io
# Create workbook in memory (for Pyodide/download)
buffer = io.BytesIO()
workbook = xlsxwriter.Workbook(buffer)
worksheet = workbook.add_worksheet('Report')
# Define formats
header_format = workbook.add_format({
'bold': True,
'font_size': 12,
'font_color': 'white',
'bg_color': '#2563eb',
'align': 'center',
'valign': 'vcenter',
'border': 1
})
currency_format = workbook.add_format({
'num_format': '#,##0',
'align': 'right',
'border': 1
})
# Write data with formatting
worksheet.write('A1', 'Customer', header_format)
worksheet.write('B1', 'Amount', header_format)
worksheet.write('A2', 'PT. ABC')
worksheet.write('B2', 5000000, currency_format)
# Set column widths
worksheet.set_column('A:A', 25)
worksheet.set_column('B:B', 15)
workbook.close()
Common Format Properties
# Font formatting
font_format = workbook.add_format({
'bold': True,
'italic': True,
'font_size': 14,
'font_name': 'Arial',
'font_color': '#FF0000'
})
# Alignment
align_format = workbook.add_format({
'align': 'center', # horizontal: left, center, right
'valign': 'vcenter', # vertical: top, vcenter, bottom
'text_wrap': True
})
# Borders
border_format = workbook.add_format({
'border': 1, # thin border all sides
'top': 2, # medium top border
'bottom': 1,
'left': 1,
'right': 1
})
# Number formats
number_formats = {
'integer': '#,##0',
'decimal': '#,##0.00',
'currency_idr': '"Rp" #,##0',
'currency_usd': '"$" #,##0.00',
'percentage': '0.00%',
'date': 'dd/mm/yyyy'
}
Merging Cells for Headers
title_format = workbook.add_format({
'bold': True,
'font_size': 16,
'align': 'center',
'valign': 'vcenter'
})
# Merge A1:E1 for title
worksheet.merge_range('A1:E1', 'PT. COMPANY NAME', title_format)
worksheet.merge_range('A2:E2', 'Monthly Sales Report', title_format)
worksheet.merge_range('A3:E3', 'December 2024', title_format)
Conditional Formatting
# Highlight values > 1,000,000 in green
worksheet.conditional_format('B2:B100', {
'type': 'cell',
'criteria': '>',
'value': 1000000,
'format': workbook.add_format({'bg_color': '#C6EFCE'})
})
# Highlight negative values in red
worksheet.conditional_format('B2:B100', {
'type': 'cell',
'criteria': '<',
'value': 0,
'format': workbook.add_format({
'bg_color': '#FFC7CE',
'font_color': '#9C0006'
})
})
OpenPyXL
What is OpenPyXL?
OpenPyXL is used for reading existing Excel files. While XlsxWriter creates new files, OpenPyXL can read and modify existing ones.
When to Use OpenPyXL
| Scenario | Use |
|---|---|
| Create new Excel from scratch | XlsxWriter |
| Read existing Excel file | OpenPyXL |
| Modify existing Excel | OpenPyXL |
| Template-based reports | OpenPyXL |
Reading Excel Files
import pandas as pd
# Read Excel file into DataFrame
df = pd.read_excel('existing_file.xlsx', sheet_name='Sheet1')
# Read specific columns
df = pd.read_excel('file.xlsx', usecols=['A', 'B', 'C'])
# Read specific rows
df = pd.read_excel('file.xlsx', skiprows=5, nrows=100)
Reading with OpenPyXL Directly
from openpyxl import load_workbook
# Load existing workbook
wb = load_workbook('template.xlsx')
ws = wb.active
# Read cell value
value = ws['A1'].value
# Iterate through rows
for row in ws.iter_rows(min_row=2, max_row=100, min_col=1, max_col=5):
for cell in row:
print(cell.value)
Complete Example: All Libraries Together
# Cell 1: Imports
import pandas as pd
import numpy as np
import xlsxwriter
import io
import json
# Cell 2: Load Data (simulating NetSuite data)
data = [
{"customer": "PT. ABC", "currency": "IDR", "current": 5000000, "overdue_30": 2000000, "overdue_60": 0},
{"customer": "XYZ Corp", "currency": "USD", "current": 1000, "overdue_30": 500, "overdue_60": 200},
{"customer": "PT. DEF", "currency": "IDR", "current": 3000000, "overdue_30": 0, "overdue_60": 1500000},
]
df = pd.DataFrame(data)
print(f"Loaded {len(df)} records")
df.head()
# Cell 3: Process Data with Pandas & NumPy
# Fill any missing values
df = df.fillna(0)
# Calculate totals
df['total_overdue'] = df['overdue_30'] + df['overdue_60']
df['total'] = df['current'] + df['total_overdue']
# Add category using NumPy
df['risk'] = np.where(df['total_overdue'] > 1000000, 'High',
np.where(df['total_overdue'] > 0, 'Medium', 'Low'))
# Convert USD to IDR
df['total_idr'] = np.where(
df['currency'] == 'USD',
df['total'] * 15500,
df['total']
)
print("Processed data:")
df
# Cell 4: Generate Excel with XlsxWriter
buffer = io.BytesIO()
with pd.ExcelWriter(buffer, engine='xlsxwriter') as writer:
# Write data starting at row 5
df.to_excel(writer, sheet_name='AR Aging', startrow=5, index=False)
workbook = writer.book
worksheet = writer.sheets['AR Aging']
# Define formats
title_fmt = workbook.add_format({
'bold': True, 'font_size': 16, 'align': 'center'
})
header_fmt = workbook.add_format({
'bold': True, 'bg_color': '#2563eb', 'font_color': 'white',
'align': 'center', 'border': 1
})
currency_fmt = workbook.add_format({
'num_format': '#,##0', 'border': 1
})
high_risk_fmt = workbook.add_format({
'bg_color': '#FFC7CE', 'border': 1
})
# Write title (merged cells)
worksheet.merge_range('A1:H1', 'PT. COMPANY NAME', title_fmt)
worksheet.merge_range('A2:H2', 'AR Aging Report', title_fmt)
worksheet.merge_range('A3:H3', 'As of December 2024', title_fmt)
# Format header row
for col_num, column in enumerate(df.columns):
worksheet.write(5, col_num, column, header_fmt)
# Set column widths
worksheet.set_column('A:A', 20) # Customer
worksheet.set_column('B:B', 10) # Currency
worksheet.set_column('C:H', 15) # Numbers
# Conditional formatting for high risk
worksheet.conditional_format('G7:G100', {
'type': 'text',
'criteria': 'containing',
'value': 'High',
'format': high_risk_fmt
})
# Save to file
with open('ar_aging_report.xlsx', 'wb') as f:
buffer.seek(0)
f.write(buffer.read())
print("Excel report generated: ar_aging_report.xlsx")
Summary: When to Use Each Library
| Library | Primary Purpose | Key Functions |
|---|---|---|
| Jupyter | Development environment | Interactive coding, debugging |
| Pandas | Data manipulation | DataFrame, filtering, grouping, aggregation |
| NumPy | Calculations & logic | np.where(), np.select(), math operations |
| XlsxWriter | Excel creation & formatting | Formats, merged cells, conditional formatting |
| OpenPyXL | Reading Excel files | Load templates, read existing data |
Quick Reference
# Pandas: Data processing
df = pd.DataFrame(data)
df = df.fillna(0)
df_filtered = df[df['amount'] > 1000]
df_grouped = df.groupby('category').sum()
# NumPy: Calculations
df['status'] = np.where(condition, 'Yes', 'No')
df['category'] = np.select(conditions, choices, default='Other')
# XlsxWriter: Formatting
format = workbook.add_format({'bold': True, 'bg_color': '#FFFF00'})
worksheet.write('A1', 'Header', format)
worksheet.merge_range('A1:C1', 'Title', format)
worksheet.set_column('A:A', 20)