Customer Segmentation

Customer Segmentation

Overview

Customer Segmentation

This process is simply dividing customers based on common characteristics such as

  1. Psychographic

    Focuses on psychological attributes and how they affect sales. This is similar to demographic segmentation, but instead of looking at age, gender, and marital status, it looks at psychological attributes.

  2. Geographic

    Divides customers based on location, climate, culture, and other geographic factors.

  3. Behavioral

    Categorizes customers based on how they interact with a business. This can include looking at behavioral data to understand what customers do versus what they say.

  4. Technographic

    Uses data to understand how comfortable customers are with technology and what types of technology they prefer. This information can be gathered through customer surveys.

  5. Customer status

    Groups customers based on their place in the customer lifecycle, such as leads, new customers, loyal customers, at-risk customers, and churned customers

    For Financial Analytics, this project was used for

  • Market products and services: Dividing a large customer base into smaller groups that are similar in ways that are relevant to marketing

  • Develop marketing strategies: Developed specific marketing strategies for each customer segment.

Source

This Dataset was gotten from Kaggle

Kaggle Dataset

Cleaning

This Data had over a million rows, it wasn’t so messy.

Few of the changes i did using SQL

  1. Locations like Pune, Bhiwandi, Mumbai, Delhi, had some other attachments to it . E.g. - 1 Office Close PUNE.. Etc.. I made all locations containing the keywords to maintain orderliness.

  2. Transaction Amount and Customer Balance that were initially “null” were set to “zero”

  3. Some people didn’t state their genders, so all null genders were set to “Unknown”

Findings

EDA (Exploratory Data Analysis) was performed

  1. 1048567 unique Transaction IDs

  2. 884265 Customer IDs

  3. Unknown Genders - 1101

  4. Female Customers - 281936

  5. Male Customers 765530

Using Python and Sql, for Customer segmentation.

I used Sql for the data aggregation before exporting to python.

This helped to summarize the data.

SELECT 
    CustomerID, 
    SUM(TransactionAmount_INR) AS TotalTransactionAmount,
    COUNT(TransactionID) AS TransactionFrequency,
    MAX(CustAccountBalance) AS MaxAccountBalance
FROM bank
GROUP BY CustomerID;
# Summary of segments
print(df['Segment'].value_counts())

# Visualization
import matplotlib.pyplot as plt
df['Segment'].value_counts().plot(kind='bar', color=['green', 'blue', 'red'])
plt.title('Customer Segmentation')
plt.ylabel('Number of Customers')
plt.show()

From the chart below:

  • Medium Value Customers form the largest segment, indicating a balanced distribution of moderate transactions in the dataset.

  • Low Value Customers and High Value Customers are nearly equal in size, showing smaller groups with extreme transaction behaviors.

This segmentation can help the bank

  • Focus on Medium Value Customers for upselling opportunities.

  • Pay attention to High Value Customers for loyalty programs.

Top 5 Regions: Transaction Amount Distribution


grouped_data = data.groupby('CustLocation').agg({
    'TransactionAmount (INR)': ['sum', 'mean'],
    'TransactionID': 'count',
    'CustomerID': 'nunique'
})

# Renaming columns for clarity
grouped_data.columns = ['TotalTransactionAmount', 'AvgTransactionAmount', 'TransactionCount', 'UniqueCustomerCount']

# Sort by TotalTransactionAmount in descending order
grouped_data = grouped_data.sort_values(by='TotalTransactionAmount', ascending=False)

# Get the top 5 regions
top_5_regions = grouped_data.head(5)

# Print the results
print(top_5_regions)

# Creating the bar chart
plt.figure(figsize=(10, 6))
plt.bar(top_5_regions.index, top_5_regions['TotalTransactionAmount'])
plt.xlabel('Region')
plt.ylabel('Total Transaction Amount (INR)')
plt.title('Top 5 Regions by Total Transaction Amount')
plt.xticks(rotation=45, ha='right')  # Rotate x-axis labels for better readability
plt.tight_layout()
plt.show()

Using SQL,

Seasonal and Trend Analysis

I analyzed transactions by month, quarter and year. I identified spikes during festive seasons or end-of-year periods.

import pandas as pd
import matplotlib.pyplot as plt
from statsmodels.tsa.seasonal import seasonal_decompose


# Convert TransactionDate to datetime format
data['TransactionDate'] = pd.to_datetime(data['TransactionDate'])

# Aggregate data by date
daily_transactions = data.groupby('TransactionDate')['TransactionAmount (INR)'].sum().reset_index()
daily_transactions = daily_transactions.set_index('TransactionDate')
from statsmodels.tsa.seasonal import seasonal_decompose

# Perform seasonal decomposition with a smaller period
result = seasonal_decompose(daily_transactions['TransactionAmount (INR)'], model='additive', period=7)  # Weekly

# Plot the decomposition
result.plot()
plt.show()

From the graph

Overall Trend

  • Observation: The trend component shows a gradual decline in transaction amounts over time (especially after mid-September 2016).

  • Insight: There might be factors like reduced customer activity, lower engagement, or external market conditions influencing the drop in transaction amounts. External events such as holidays, economic changes could affect

Seasonality

  • Insight: Customers exhibit consistent transaction behaviors during specific days of the week. There may be higher transactions on certain days (like weekends or paydays). This could also inform marketing strategies or operational planning.

Residuals

  • Insight: Unexplained spikes or drops in residuals may indicate anomalies or outliers. These may represent unexpected events like a promotional campaign, system errors, or fraudulent activities.

Variability in Seasonality

  • Observation: The seasonal amplitude (the height of peaks and valleys) seems consistent initially but becomes irregular toward the end.

  • Insight: The irregularity in seasonal patterns might indicate shifts in customer behavior or interruptions in normal transaction patterns. Investigating this further could reveal changing customer preferences.

Conclusion

This project utilized customer segmentation and seasonal trend analysis to derive actionable insights from financial transaction data. By categorizing customers based on characteristics like psychographics, geography, behavior preferences, the segmentation revealed key customer groups, such as Medium Value Customers, who present upselling opportunities, and High Value Customers, who require loyalty programs.

The seasonal trend analysis highlighted consistent transaction patterns influenced by specific days or festive periods, while anomalies in residuals suggested potential areas for further investigation, such as system errors or promotional impacts. Marketing strategies, Operational planning, and Customer retention efforts could provide significant value to the bank