Overview
Customer Segmentation
This process is simply dividing customers based on common characteristics such as
Psychographic
Focuses on psychological attributes and how they affect sales. This is similar to demographic segmentation, but instead of looking at age, gender, and marital status, it looks at psychological attributes.
Geographic
Divides customers based on location, climate, culture, and other geographic factors.
Behavioral
Categorizes customers based on how they interact with a business. This can include looking at behavioral data to understand what customers do versus what they say.
Technographic
Uses data to understand how comfortable customers are with technology and what types of technology they prefer. This information can be gathered through customer surveys.
Customer status
Groups customers based on their place in the customer lifecycle, such as leads, new customers, loyal customers, at-risk customers, and churned customers
For Financial Analytics, this project was used for
Market products and services: Dividing a large customer base into smaller groups that are similar in ways that are relevant to marketing
Develop marketing strategies: Developed specific marketing strategies for each customer segment.
Source
This Dataset was gotten from Kaggle
Cleaning
This Data had over a million rows, it wasn’t so messy.
Few of the changes i did using SQL
Locations like Pune, Bhiwandi, Mumbai, Delhi, had some other attachments to it . E.g. - 1 Office Close PUNE.. Etc.. I made all locations containing the keywords to maintain orderliness.
Transaction Amount and Customer Balance that were initially “null” were set to “zero”
Some people didn’t state their genders, so all null genders were set to “Unknown”
Findings
EDA (Exploratory Data Analysis) was performed
1048567 unique Transaction IDs
884265 Customer IDs
Unknown Genders - 1101
Female Customers - 281936
Male Customers 765530
Using Python and Sql, for Customer segmentation.
I used Sql for the data aggregation before exporting to python.
This helped to summarize the data.
SELECT
CustomerID,
SUM(TransactionAmount_INR) AS TotalTransactionAmount,
COUNT(TransactionID) AS TransactionFrequency,
MAX(CustAccountBalance) AS MaxAccountBalance
FROM bank
GROUP BY CustomerID;
# Summary of segments
print(df['Segment'].value_counts())
# Visualization
import matplotlib.pyplot as plt
df['Segment'].value_counts().plot(kind='bar', color=['green', 'blue', 'red'])
plt.title('Customer Segmentation')
plt.ylabel('Number of Customers')
plt.show()
From the chart below:
Medium Value Customers form the largest segment, indicating a balanced distribution of moderate transactions in the dataset.
Low Value Customers and High Value Customers are nearly equal in size, showing smaller groups with extreme transaction behaviors.
This segmentation can help the bank
Focus on Medium Value Customers for upselling opportunities.
Pay attention to High Value Customers for loyalty programs.
Top 5 Regions: Transaction Amount Distribution
grouped_data = data.groupby('CustLocation').agg({
'TransactionAmount (INR)': ['sum', 'mean'],
'TransactionID': 'count',
'CustomerID': 'nunique'
})
# Renaming columns for clarity
grouped_data.columns = ['TotalTransactionAmount', 'AvgTransactionAmount', 'TransactionCount', 'UniqueCustomerCount']
# Sort by TotalTransactionAmount in descending order
grouped_data = grouped_data.sort_values(by='TotalTransactionAmount', ascending=False)
# Get the top 5 regions
top_5_regions = grouped_data.head(5)
# Print the results
print(top_5_regions)
# Creating the bar chart
plt.figure(figsize=(10, 6))
plt.bar(top_5_regions.index, top_5_regions['TotalTransactionAmount'])
plt.xlabel('Region')
plt.ylabel('Total Transaction Amount (INR)')
plt.title('Top 5 Regions by Total Transaction Amount')
plt.xticks(rotation=45, ha='right') # Rotate x-axis labels for better readability
plt.tight_layout()
plt.show()
Using SQL,
Seasonal and Trend Analysis
I analyzed transactions by month, quarter and year. I identified spikes during festive seasons or end-of-year periods.
import pandas as pd
import matplotlib.pyplot as plt
from statsmodels.tsa.seasonal import seasonal_decompose
# Convert TransactionDate to datetime format
data['TransactionDate'] = pd.to_datetime(data['TransactionDate'])
# Aggregate data by date
daily_transactions = data.groupby('TransactionDate')['TransactionAmount (INR)'].sum().reset_index()
daily_transactions = daily_transactions.set_index('TransactionDate')
from statsmodels.tsa.seasonal import seasonal_decompose
# Perform seasonal decomposition with a smaller period
result = seasonal_decompose(daily_transactions['TransactionAmount (INR)'], model='additive', period=7) # Weekly
# Plot the decomposition
result.plot()
plt.show()
From the graph
Overall Trend
Observation: The trend component shows a gradual decline in transaction amounts over time (especially after mid-September 2016).
Insight: There might be factors like reduced customer activity, lower engagement, or external market conditions influencing the drop in transaction amounts. External events such as holidays, economic changes could affect
Seasonality
- Insight: Customers exhibit consistent transaction behaviors during specific days of the week. There may be higher transactions on certain days (like weekends or paydays). This could also inform marketing strategies or operational planning.
Residuals
- Insight: Unexplained spikes or drops in residuals may indicate anomalies or outliers. These may represent unexpected events like a promotional campaign, system errors, or fraudulent activities.
Variability in Seasonality
Observation: The seasonal amplitude (the height of peaks and valleys) seems consistent initially but becomes irregular toward the end.
Insight: The irregularity in seasonal patterns might indicate shifts in customer behavior or interruptions in normal transaction patterns. Investigating this further could reveal changing customer preferences.
Conclusion
This project utilized customer segmentation and seasonal trend analysis to derive actionable insights from financial transaction data. By categorizing customers based on characteristics like psychographics, geography, behavior preferences, the segmentation revealed key customer groups, such as Medium Value Customers, who present upselling opportunities, and High Value Customers, who require loyalty programs.
The seasonal trend analysis highlighted consistent transaction patterns influenced by specific days or festive periods, while anomalies in residuals suggested potential areas for further investigation, such as system errors or promotional impacts. Marketing strategies, Operational planning, and Customer retention efforts could provide significant value to the bank