DATA-DRIVEN SALES FORECASTING AND CUSTOMER SEGMENTATION FOR RETAIL OPTIMIZATION USING MACHINE LEARNING TECHNIQUES
Keywords:
Sales Forecasting, Customer Segmentation, Machine Learning, Retail Analytics, Data-Driven Decision MakingAbstract
This study presents a data-driven framework for retail optimization by integrating sales forecasting and customer segmentation using machine learning techniques. The increasing availability of retail data has created opportunities for improving decision-making through predictive analytics and customer behaviour analysis. In this research, a retail transaction dataset obtained from Kaggle was utilised to analyse sales patterns and customer purchasing behaviour. Data preprocessing techniques, including handling missing values, outlier treatment, and feature engineering, were applied to ensure data quality and enhance model performance. For sales forecasting, multiple machine learning models such as Linear Regression, Random Forest, and Gradient Boosting were implemented, with Gradient Boosting achieving the highest accuracy based on evaluation metrics including MAE, RMSE, and MAPE. Customer segmentation was performed using K-Means clustering based on Recency, Frequency, and Monetary (RFM) analysis, resulting in the identification of distinct customer groups such as high-value customers, frequent buyers, and low-engagement customers. The clustering performance was validated using the Silhouette Score and the Davies-Bouldin Index, confirming well-defined clusters. The integration of forecasting and segmentation provides comprehensive insights into demand patterns and customer behaviour, enabling improved inventory management and targeted marketing strategies. Despite certain limitations, including reliance on a single dataset, the study demonstrates that machine learning-based approaches significantly enhance retail analytics and offer practical solutions for improving operational efficiency and strategic decision-making.