π Bangladesh Premier League Analysis

π― AIM
To analyze player performances in the Bangladesh Premier League by extracting insights from batsmen, bowlers, and match dataβranging from toss outcomes to overall match results.
π DATASET LINK
https://www.kaggle.com/abdunnoor11/bpl-data
π KAGGLE NOTEBOOK
https://www.kaggle.com/code/avdhesh15/bpl-analysis
Kaggle Notebook
βοΈ TECH STACK
Category | Technologies |
---|---|
Languages | Python |
Libraries/Frameworks | Matplotlib, Pandas, Seaborn, Numpy |
Tools | Git, Jupyter, VS Code |
π DESCRIPTION
What is the requirement of the project?
- This project aims to analyze player performance data from the Bangladesh Premier League (BPL) to classify players into categories such as best, good, average, and poor based on their performance.
- The analysis provides valuable insights for players and coaches, highlighting who needs more training and who requires less, which can aid in strategic planning for future matches.
How is it beneficial and used?
- For Players: Provides feedback on their performance, helping them to improve specific aspects of their game.
- For Coaches: Helps in identifying areas where players need improvement, which can be focused on during training sessions.
- For Team Management: Assists in strategic decision-making regarding player selection and match planning.
- For Fans and Analysts: Offers insights into player performances and trends over the league, enhancing the understanding and enjoyment of the game.
How did you start approaching this project? (Initial thoughts and planning)
- Perform initial data exploration to understand the structure and contents of the dataset.
- To learn about the topic and searching the related content like
what is league
,About bangladesh league
,their players
and much more. - Learn about the features in details by searching on the google or quora.
Mention any additional resources used (blogs, books, chapters, articles, research papers, etc.).
- Articles on cricket analytics from websites such as ESPNcricinfo and Cricbuzz.
- https://www.linkedin.com/pulse/premier-league-202223-data-analysis-part-i-ayomide-aremu-cole-iwn4e/
- https://analyisport.com/insights/how-is-data-used-in-the-premier-league/
π PROJECT EXPLANATION
𧩠DATASET OVERVIEW & FEATURE DETAILS
π bpl.csv
- There are 19 features in
BPL Dataset
Feature Name | Description | Datatype |
---|---|---|
id | All matches unique id | int64 |
season | Season of the match (20XX-XX) | object |
match_no | Number of matches (For eg, Round 1st, 3rd, Final) | object |
date | Date of the match | object |
team_1 | Name of the first team | object |
team_1_score | Scoreboard of the team 1 (runs/wickets) | object |
team_2 | Second Team | object |
team_2_score | Scoreboard of the team 2 (runs/wickets) | object |
player_of_match | Player of the match | object |
toss_winner | Which team won the toss? | object |
toss_decision | Toss winner team decision took either 'field first' or 'bat first' | object |
winner | Name of the team who won the match | object |
venue | Venue of the match | object |
city | City of the match | object |
win_by_wickets | Team win by how many wickets | int64 |
win_by_runs | Team win by how many runs | int64 |
result | Conclusion of win_by_wickets & win_by_runs |
object |
umpire_1 | Name of the first umpire | object |
umpire_2 | Name of the second umpire | object |
π Developed Features from bpl.csv
Feature Name | Description | Reason | Datatype |
---|---|---|---|
team_1_run | Run scored by the team 1 | To covert team_1_score categorical feature into numerical feature |
int64 |
team_1_wicket | Wickets losed by the team 1 | To covert team_1_score categorical feature into numerical feature |
int64 |
team_2_run | Run scored by the team 2 | To covert team_2_score categorical feature into numerical feature |
int64 |
team_2_wicket | Wickets losed by the team 2 | To covert team_1_score categorical feature into numerical feature |
int64 |
π batsman.csv
- There are 12 features in
Batsman Dataset
Feature Name | Description | Datatype |
---|---|---|
id | All matches unique id | int64 |
season | Season of the match (20XX-XX) | object |
match_no | Number of matches (For eg, Round 1st, 3rd, Final) | object |
date | Date of the match | object |
player_name | Player Name | object |
comment | How did the batsman get out? | object |
R | Batsman's run | int64 |
B | How many balls faced the batsman? | int64 |
M | How long their innings was in minutes? | int64 |
fours | No. of fours | int64 |
sixs | No. of sixes | int64 |
SR | Strike rate (R/B)*100 |
float64 |
π bowler.csv
- There are 12 features in
Bowler Dataset
Feature Name | Description | Datatype |
---|---|---|
id | All matches unique id | int64 |
season | Season of the match (20XX-XX) | object |
match_no | Number of matches (For eg, Round 1st, 3rd, Final) | object |
date | Date of the match | object |
player_name | Player Name | object |
O | No. of overs bowled | float64 |
M | No. of middle overs bowled | int64 |
R | No. of runs losed | int64 |
W | No. of wickets secured | int64 |
ECON | The average number of runs they have conceded per over bowled | float64 |
WD | No. of wide balls | int64 |
NB | No. of No balls | int64 |
π€ PROJECT WORKFLOW
Project workflow
graph TD
A[Start] --> B[Load Dataset]
B -->|BPL Data| C[Preprocess BPL Data]
B -->|Batsman Data| D[Preprocess Batsman Data]
B -->|Bowler Data| E[Preprocess Bowler Data]
C --> F{Generate Queries?}
D --> F
E --> F
F -->|Yes| G[Graphical Visualizations]
G --> H[Insights & Interpretation]
H --> I[End]
F -->|No| I[End]
- Read and explore all datasets individually.
- Started with
bpl.csv
, analyzing its structure and features. - Researched dataset attributes through Google and Bangladesh cricket series data.
- Reviewed relevant Kaggle notebooks to gain insights from existing work.
- Performed basic EDA to understand data distribution.
- Identified and handled missing values.
- Converted categorical features into numerical representations to enrich analysis.
- Examined dataset properties using
info()
anddescribe()
functions.
- Developed a custom function
plotValueCounts(df, col, size=(10,5))
to visualize feature distributions. - This function, built using Seaborn and Matplotlib, plays a key role in extracting insights.
- Generates count plots with labeled bar values for better interpretability.
- Refined
bpl.csv
by merging teams with their respective divisions to prevent duplicate counting. - Addressed inconsistencies:
- Resolved two tie matches recorded in the dataset.
- Extracted team runs and wickets from match scoreboards and transformed them into numerical features.
- Implemented automated querying using the
plotValueCounts()
function. - Performed grouped analysis of winners by seasons.
β KEY QUERIES
- Key Queries on
bpl.csv
Dataset:- Won the toss, took the bat first and won the match.
- Won the toss, took the bat first.
- Winning rate by taking
bat first
vsfield first
.
- Key Queries on
batsman.csv
Dataset:- Batsman who scored more than 1 century or more in a match.
- Batsman who faced 50 balls or more in a match.
- Batsman hit more than 10 fours.
- Key Queries on
bowler.csv
Dataset:- Bowler conceded 50 runs or more
- Bowler took more than 4 wickets
- Bowler delivered 4 wide or more
π₯ CODE EXPLANATION
import seaborn as sns
import matplotlib.pyplot as plt
def plotValueCounts(df, col, size=(10, 5)):
val_counts = df[col].value_counts()
plt.figure(figsize=size)
ax = sns.countplot(y=col, data=df, order=val_counts.index, palette="pastel")
for container in ax.containers:
ax.bar_label(container, fmt='%d', label_type='edge', padding=3, fontsize=10, color='black')
plt.title(f"Value Counts of {col}")
plt.show()
- It displays the visualization graph of value counts of any feature of the dataset.
βοΈ PROJECT TRADE-OFFS AND SOLUTIONS
- Trade-off: The dataset includes multiple sources, but some features required external validation (e.g., team names, score formats).
- Solution: Standardized team names and resolved inconsistencies by merging duplicate team entries based on divisions.
- Trade-off: Performing detailed feature engineering (e.g., extracting runs, wickets, and match insights) can increase computational overhead.
- Solution: Optimized preprocessing by transforming scoreboard data into structured numerical features, reducing redundant calculations.
- Trade-off: Count plots with high-cardinality features could lead to cluttered or unreadable visualizations.
- Solution: Implemented
plotValueCounts()
with bar labels and sorting to enhance clarity while retaining key insights.
- Trade-off: Writing fixed queries could limit adaptability when new data is introduced.
- Solution: Developed a function-based querying approach (e.g.,
plotValueCounts(df, col)
) to enable flexible, real-time insights.
πΌ SCREENSHOTS
Visualizations and EDA of different features
β CONCLUSION
π KEY LEARNINGS
Insights gained from the data
- Extracted meaningful player statistics by analyzing batting and bowling performances across multiple seasons.
- Identified trends in match outcomes based on factors like toss decisions, innings strategies, and scoring patterns.
- Enhanced data visualization techniques to present key insights effectively for better decision-making.
π USE CASES
Team Strategy & Selection - Helps coaches analyze player performance for better team selection and match strategies.
Player Performance Tracking - Assists in monitoring player trends to improve training and development.