In my recent analysis, I explored the critical challenge faced by the pharmaceutical industry: ensuring the continuous availability of products. Stockouts of essential medications pose significant risks to public health and can lead to severe consequences, particularly for life-saving drugs. Given the unique characteristics of pharmaceutical inventories—where each Stock Keeping Unit (SKU) generally has limited inventory at the retail level, frequent replenishments, and considerable variability in pricing and criticality—accurate predictions of potential stockouts become essential.

To effectively anticipate and mitigate stockouts, I applied advanced Machine Learning (ML) techniques. I specifically chose classification methods due to the binary nature of stockout predictions—distinguishing SKUs likely to experience stockouts from those with adequate inventory.

To accurately predict this, it’s critical to analyze numerous variables reflecting demand patterns, inventory status, supply chain efficiency, financial implications, and external market dynamics. Due to the extensive range of these factors—including historical and forecasted demand, inventory levels, supply lead times, financial risks, SKU criticality, and external influences such as seasonality and market events—the dataset can become highly dimensional.

Initially, I used Principal Component Analysis (PCA) to identify the most influential variables affecting stockout risks. PCA reduces dimensionality by transforming the original correlated features into a smaller set of uncorrelated variables called principal components. Mathematically, PCA finds new orthogonal axes (principal components) that maximize variance as follows:

Z=XV

where:

  • Z represents the transformed data matrix,
  • X denotes the original standardized feature matrix,
  • V is the eigenvector matrix derived from the covariance matrix of X.

Also, I identified inventory levels and demand forecasts as the most critical features influencing the risk of stockout not the stockout itself, this then guided my subsequent predictive modeling.

I evaluated several classification techniques, including K-Nearest Neighbors (KNN), Logistic Regression, Linear Discriminant Analysis (LDA), and Quadratic Discriminant Analysis (QDA), each offering unique advantages:

  • K-Nearest Neighbors (KNN) compares new SKUs with historically similar data points based on proximity in feature space using Euclidean distance:
[ d(x,y) = \sqrt{\sum_{i=1}^{n}(x_i - y_i)^2} ]

Although intuitive, the KNN model achieved a moderate cross-validation accuracy of 77%.

  • Logistic Regression calculates the probability P(y=1|X) of stockout occurrences using a sigmoid function, offering improved interpretability and slightly higher accuracy (80%) compared to KNN.
  • Linear Discriminant Analysis (LDA) assumes linear boundaries between classes by modeling class distributions as multivariate Gaussians with shared covariance matrices:
\delta_k(x) = xT\Sigma{-1}\mu_k - \frac{1}{2}\mu_kT\Sigma{-1}\mu_k + \log\pi_k

LDA matched Logistic Regression's performance, providing clear and interpretable decision boundaries.

  • Quadratic Discriminant Analysis (QDA) relaxes LDA's equal covariance assumption, allowing quadratic decision boundaries. Despite theoretical flexibility, QDA achieved similar accuracy to KNN (77%) but added complexity.

I evaluated model performance using Receiver Operating Characteristic (ROC) curves, which measure models' discriminatory power comprehensively. Interestingly, KNN showed strong differentiation ability, achieving an area under the curve (AUC) of 0.92, highlighting its potential despite less distinct decision boundaries.

In conclusion, while Logistic Regression and LDA demonstrated robust and consistent predictive capabilities, KNN's strong ROC performance suggests practical utility under specific conditions. From my analysis, pharmaceutical executives can confidently implement these ML techniques to enhance inventory management, ensuring critical medicines remain available when needed most, ultimately safeguarding both public health and corporate reputation.