Poker Hand Prediction

AIM
The aim of this project is to develop a machine learning model using a Multi-Layer Perceptron (MLP) classifier to accurately classify different types of poker hands based on the suit and rank of five cards.
DATASET LINK
https://www.kaggle.com/datasets/dysphoria/poker-hand-classification
NOTEBOOK LINK
https://www.kaggle.com/code/supratikbhowal/poker-hand-prediction-model
LIBRARIES NEEDED
- pandas
- numpy
- matplotlib
- seaborn
- scikit-learn
DESCRIPTION
This project involves building a classification model to predict poker hands using a Multi-Layer Perceptron (MLP) classifier. The dataset consists of features representing the suit and rank of five cards, with the target variable being the type of poker hand (e.g., one pair, two pair, royal flush). The model is trained on a standardized dataset, with class weights computed to address class imbalance. Performance is evaluated using metrics such as accuracy, classification report, confusion matrix, prediction error, ROC curve, and AUC, providing a comprehensive analysis of the model's effectiveness in classifying poker hands.
What is the requirement of the project?
- To accurately predict the Poker Hand type.
Why is it necessary?
- The project demonstrates how machine learning can solve structured data problems, bridging the gap between theoretical knowledge and practical implementation.
How is it beneficial and used?
- The project automates the classification of poker hands, enabling players to quickly and accurately identify the type of hand they have, such as a straight, flush, or royal flush, without manual effort.
- By understanding the probabilities and patterns of certain hands appearing, players can make informed decisions on whether to bet, raise, or fold, improving their gameplay strategy.
How did you start approaching this project?
- Analyzed the poker hand classification problem, reviewed the dataset structure (suits, ranks, and hand types), and identified the multi-class nature of the target variable. Studied the class imbalance issue and planned data preprocessing steps, including scaling and class weight computation.
- Chose the Multi-Layer Perceptron (MLP) Classifier for its capability to handle complex patterns in data. Defined model hyperparameters, trained the model using standardized features, and evaluated its performance using metrics like accuracy, ROC-AUC, and confusion matrix
Mention any additional resources used:
- Kaggle kernels and documentation for additional dataset understanding.
- Tutorials on machine learning regression techniques, particularly for MLP
EXPLANATION
DETAILS OF THE DIFFERENT FEATURES
Feature Name | Description | Type | Values/Range |
---|---|---|---|
S1 | Suit of card #1 | Ordinal | (1-4) representing |
C1 | Rank of card #1 | Numerical | (1-13) representing (Ace, 2, 3, … , Queen, King) |
S2 | Suit of card #2 | Ordinal | (1-4) representing |
C2 | Rank of card #2 | Numerical | (1-13) representing (Ace, 2, 3, … , Queen, King) |
S3 | Suit of card #3 | Ordinal | (1-4) representing |
C3 | Rank of card #3 | Numerical | (1-13) representing (Ace, 2, 3, … , Queen, King) |
S4 | Suit of card #4 | Ordinal | (1-4) representing |
C4 | Rank of card #4 | Numerical | (1-13) representing (Ace, 2, 3, … , Queen, King) |
S5 | Suit of card #5 | Ordinal | (1-4) representing |
C5 | Rank of card #5 | Numerical | (1-13) representing (Ace, 2, 3, … , Queen, King) |
Poker Hand | Type of Card in Hand | Ordinal | (0-9) types* |
Poker Hands
0: Nothing in hand, not a recognized poker hand
1: One pair, one pair of equal ranks within five cards
2: Two pairs, two pairs of equal ranks within five cards
3: Three of a kind, three equal ranks within five cards
4: Straight, five cards, sequentially ranked with no gaps
5: Flush, five cards with the same suit
6: Full house, pair + different rank three of a kind
7: Four of a kind, four equal ranks within five cards
8: Straight flush, straight + flush
9: Royal flush, {Ace, King, Queen, Jack, Ten} + flush
PROJECT WORKFLOW
1. Dataset Loading and Inspection
The dataset is loaded into the environment, and its structure, features, and target classes are analyzed. Column names are assigned to enhance clarity and understanding.
2. Data Preprocessing
Target variables are mapped to descriptive hand labels, features and target variables are separated, and data is standardized using 'StandardScaler' for uniform scaling.
3. Handling Class Imbalance
Class weights are computed using 'compute_class_weight' to address the imbalance in target classes, ensuring fair training for all hand types.
4. Model Selection and Training
The MLPClassifier is selected for its capability to handle multi-class problems, configured with suitable hyperparameters, and trained on the preprocessed training data.
5. Model Evaluation
The model's performance is assessed using metrics such as accuracy, classification reports, and confusion matrices. Predictions are generated for the test dataset.
6. Visualization and Error Analysis
Class prediction errors are visualized using bar charts, and ROC curves with AUC values are generated for each hand type to evaluate classification effectiveness.
7. Insights and Interpretation
Strengths and weaknesses of the model are identified through error analysis, and the findings are presented using visual tools like heatmaps for better understanding.
8. Conclusion
The project outcomes are summarized, practical applications are discussed, and suggestions for further research or improvements are proposed.
PROJECT TRADE-OFFS AND SOLUTIONS
- Trade-off: Accuracy vs. Complexity.
- Solution: Using a Multi-Layer Perceptron (MLP) introduces complexity due to its hidden layers and numerous parameters. While this improves accuracy, it requires more computational resources and training time compared to simpler models like decision trees or logistic regression.
- Trade-off: Bias vs. Variance
- Solution: The project's hyperparameter tuning (e.g., learning rate, number of hidden layers) aims to reduce bias and variance. However, achieving a perfect balance is a trade-off, as overly complex models may increase variance (overfitting), while overly simplified models may increase bias (underfitting).
- Trade-off: Generalization vs. Overfitting
- Solution:The model's flexibility with complex hyperparameters (e.g., hidden layers, activation functions) risks overfitting, especially on small or imbalanced datasets. Regularization techniques like adjusting 'alpha' help mitigate this but may compromise some accuracy.
SCREENSHOTS
Project workflow
graph LR
A[Start] --> B{Is data clean?};
B -->|Yes| C[Explore Data];
C --> D[Data Preprocessing];
D --> E[Define Models];
E --> F[Train the Model];
F --> G[Evaluate Performance];
G --> H[Visualize Evaluation Metrics];
H --> I[Model Testing];
I --> J[Conclusion and Observations];
B ---->|No| K[Clean Data];
K --> C;
Model Evaluation Metrics
MODELS EVALUATION METRICS
Poker Hand | precision | recall | f1-score | support |
---|---|---|---|---|
flush | 0.52 | 0.07 | 0.12 | 1996 |
four_of_a_kind | 0.00 | 0.00 | 0.00 | 230 |
full_house | 0.77 | 0.34 | 0.47 | 1424 |
one_pair | 1.00 | 1.00 | 1.00 | 422498 |
royal_flush | 0.00 | 0.00 | 0.00 | 3 |
straight | 0.96 | 0.44 | 0.60 | 3885 |
straight_flush | 0.00 | 0.00 | 0.00 | 12 |
three_of_a_kind | 0.95 | 0.00 | 0.00 | 21121 |
two_pair | 1.00 | 1.00 | 1.00 | 47622 |
zilch | 0.99 | 1.00 | 1.00 | 501209 |
Model | Accuracy |
---|---|
Random Forest Regression | 0.6190 |
Gradient Boosting Regression | 0.3204 |
MLP Classifier | 0.9924 |
StackingClassifier | 0.9382 |
---
CONCLUSION
KEY LEARNINGS
- Learned how different machine learning models perform on real-world data and gained insights into their strengths and weaknesses.
- Realized how important feature engineering and data preprocessing are for enhancing model performance.
Insights gained from the data:
- Class imbalance can bias the model, requiring class weighting for fair performance.
- Suits and ranks of cards are crucial features, necessitating preprocessing for better model input.
Improvements in understanding machine learning concepts:
- Learned how to effectively implement and optimize machine learning models using libraries like scikit-learn.
- Learned how to effectively use visualizations (e.g., ROC curves, confusion matrices, and heatmaps) to interpret and communicate model performance.
Challenges faced and how they were overcome:
- Some features had different scales, which could affect model performance. This was overcome by applying standardization to the features, ensuring they were on a similar scale for better convergence during training.
- Some rare hands, such as "royal_flush" and "straight_flush," had very low prediction accuracy. This was mitigated by analyzing misclassification patterns and considering potential improvements like generating synthetic samples or using other models.
USE CASES OF THIS MODEL
Poker Strategy Development and Simulation
-> Developers or researchers studying poker strategies can use this model to simulate various hand combinations and evaluate strategic decisions. The model's classification can help assess the strength of different hands and optimize strategies.
Real-time Poker Hand Evaluation for Mobile Apps
-> Mobile apps that allow users to practice or play poker could incorporate this model to provide real-time hand evaluation, helping users understand the strength of their hands during gameplay.