Overview of core ML models used across projects, why each is chosen, and the types of insights you can extract from them.

XGBoost

XGBoost

Why: Fast, regularized gradient boosting with strong performance on tabular data and good handling of missing values.

Insights: Feature importance, SHAP values for per-prediction explanations, partial dependence to show marginal feature effects.

LightGBM

LightGBM

Why: Highly optimized gradient-boosting implementation that trains quickly on large datasets and supports categorical features efficiently.

Insights: Fast experiments for hyperparameter tuning, feature importance, and sub-sample analyses for large-scale pipelines.

Random Forest

Random Forest

Why: Robust ensemble baseline that reduces variance; useful where interpretability and stability matter.

Insights: Out-of-bag error estimates, permutation feature importance, and per-tree diagnostics to understand variance.

Neural Network

Neural Networks

Why: Flexible models for learning complex, non-linear relationships; useful for time-series, embeddings, or when combining many inputs.

Insights: Layer activations and embeddings, saliency maps, and sequence-level attention weights for interpretability.

Ensemble

Ensembles

Why: Combine strengths of multiple algorithms to improve robustness and accuracy, reduce model-specific biases.

Insights: Consensus/confidence across models, model-level contribution analysis, and ensemble calibration diagnostics.

If you want, I can add links to example notebooks, SHAP plots, or a `requirements.txt` for each model.