premier_league Package scraps data from various sites to provide useful information for the Top 5 European Leagues. This includes Advanced Game Statistics for ML training, Rankings with European Qualifications, Transfers and Season Statistical Leaders for a given season. It also includes methods to expose the information rapidly as an API via Flask or AWS Lambda
π Match Statistics
π Ranking Table
π Player Leaders
π Transfers
π Flask API Docs
from premier_league import run_server
run_server()
Assuming A Valid AWS Account is Configured
- s3:PutObject
- s3:GetObject
npm install -g serverless
npm install -g serverless-python-requirements
python -m premier_league.lamdba_functions.deploy --aws-profile ${aws profile} --region ${region}
MatchStatistics
is a class for retrieving and analyzing detailed match-level statistics in the form of ML datasets from Premier League games and other top European leagues. It provides access to extensive game data including team performance metrics, player statistics, and match events for ML Training or Analysis
The data is stored in a SQLite database that is automatically initialized in the user's local directory upon first use. The database schema includes:
- League: Stores league information and update status
- Team: Contains team details and league associations
- Game: Records match details and scores
- GameStats: Stores detailed match statistics
from premier_league import MatchStatistics
# Initialize with default database
stats = MatchStatistics()
# The database is automatically initialized on first use
# Default location: user's data directory/premier_league_sqlite/premier_league.db
from premier_league import MatchStatistics
# Initialize match statistics
stats = MatchStatistics()
# Get specific team's match history
arsenal_games = stats.get_team_games("Arsenal")
# Get games for a specific season and match week
season_games = stats.get_games_by_season("2023-2024", match_week=5)
# Get recent games before a specific date
from datetime import datetime
recent_games = stats.get_games_before_date(
date=datetime(2024, 2, 1),
limit=10,
team="Manchester City"
)
Retrieves complete match history for a specific team.
stats = MatchStatistics()
arsenal_games = stats.get_team_games("Arsenal")
Retrieves all games for a specific season and match week.
games = stats.get_games_by_season("2023-2024", match_week=15)
Retrieves games before a specific date with optional team filter.
recent_games = stats.get_games_before_date(
date=datetime(2024, 2, 1),
limit=5
)
get_game_stats_before_date(date: datetime, limit: int = 10, team: Optional[str] = None) -> List[dict]
Retrieves detailed game statistics before a specific date.
recent_stats = stats.get_game_stats_before_date(
date=datetime(2024, 2, 1),
team="Liverpool"
)
Updates the database with the latest available match data.
stats = MatchStatistics()
stats.update_data_set()
Exports the entire match statistics database to a CSV file.
stats.create_dataset("premier_league_stats.csv")
Each game statistics record includes detailed metrics broken down by position groups (FW, MF, DF):
- Expected Goals (xG)
- Expected Assisted Goals (xAG)
- Shots (Total and On Target)
- Shot Creating Actions
- Goal Creating Actions
- Passes Completed
- Pass Completion Percentage
- Key Passes
- Progressive Passes
- Passes into Final Third
- Passes into Penalty Area
- Tackles Won
- Blocks
- Interceptions
- Clearances
- Errors Leading to Goal
- Possession Rate
- Touches
- Take-ons (Attempted and Successful)
- Carries
- Carrying Distance
- Ball Control Metrics
- Fouls (Committed and Drawn)
- Aerial Duels (Won and Lost)
- Cards (Yellow and Red)
- Goalkeeper Statistics
{
"xG": 2.3,
"shots_total_FW": 8,
"shots_on_target_FW": 4,
"passes_completed_MF": 245,
"tackles_won_DF": 12,
"possession_rate": 58,
# ... many more statistics
}
The database is automatically initialized using init_db()
which:
- Creates a user-specific data directory
- Installs a pre-configured SQLite database
- Seeds initial league data
- Sets up all required tables and relationships
def init_db(
db_filename: str = "premier_league.db",
db_directory: str = "premier_league_sqlite"
) -> Session:
"""
Initialize the database and seed initial data
"""
# Creates user data directory
data_dir = appdirs.user_data_dir(db_directory)
# Sets up database if not exists
db_path = os.path.join(data_dir, db_filename)
if not os.path.exists(db_path):
# Initialize from SQL dump
conn = sqlite3.connect(db_path)
sql_path = files("premier_league").joinpath("data/premier_league.sql")
conn.executescript(sql_file.read())
# Create SQLAlchemy session
engine = create_engine(f"sqlite:///{db_path}")
SessionLocal = sessionmaker(bind=engine)
session = SessionLocal()
# Seed initial league data
seed_initial_data(session)
return session
The database is seeded with these leagues by default:
- Premier League
- La Liga
- Serie A
- Bundesliga
- Ligue 1
- EFL Championship
- The database is automatically updated to include the latest available match data
- All statistics are sourced from official match reports
- Position groups (FW, MF, DF) are determined by primary player positions
- The database maintains complete match history since the 2017-2018 season
- Updates are rate-limited to respect data source restrictions
- Error handling includes automatic rollback of failed database operations
RankingTable
Fetches ranking data for a given premier league season and league.
from premier_league import RankingTable
# Initialize the ranking table for the current season
ranking = RankingTable()
# Or specify a target season
ranking = RankingTable(target_season="1995-1996")
Retrieves the current Premier League ranking data in list format.
- Returns: List containing the processed ranking data.
- Example:
from premier_league import RankingTable ranking = RankingTable().get_ranking_list()
Exports the ranking data to a CSV file.
- Parameters:
file_name
(str): Name of the output file (without extension)header
(str, optional): Header to include in the CSV file
- Example:
from premier_league import RankingTable ranking = RankingTable() ranking.get_ranking_csv("premier_league_rankings", "Season 2023-24")
Exports the ranking data to a JSON file.
- Parameters:
file_name
(str): Name of the output file (without extension)header
(str, optional): Header to use as the parent key in the JSON structure
- Example:
from premier_league import RankingTable ranking = RankingTable(league="Serie A") ranking.get_ranking_json("premier_league_rankings", "PL_Rankings")
Generates a formatted PDF file containing the Premier League ranking table.
- Parameters:
file_name
(str): Name of the output file (without extension)
- Features:
- Color-coded rows for European qualification spots
- Relegation zone highlighting
- Centered title with season information
- Example:
from premier_league import RankingTable ranking = RankingTable() ranking.get_ranking_pdf("premier_league_standings")
The ranking data is structured as a list of lists, where each inner list contains:
- Position
- Team name
- Matches played
- Wins
- Draws
- Losses
- Goals for
- Goals against
- Goal difference
- Points
Example:
[
["Position", "Team", "MP", "W", "D", "L", "GF", "GA", "GD", "Points"],
["1", "Manchester City", "38", "32", "4", "2", "102", "31", "71", "100"],
# ... more entries
]
- The PDF generation includes color coding:
- Green shades for European qualification spots
- Red for relegation zones
- Gray for header row
- European qualification rules are handled differently for seasons before and after 2019-20
- The class automatically handles special cases like the 1994-95 season when 4 teams were relegated
PlayerSeasonLeaders
is a specialized scraper for retrieving and processing Premier League player statistics, focusing on either goals or assists for a specific season.
from premier_league import PlayerSeasonLeaders
# Initialize for current season's top scorers
scorers = PlayerSeasonLeaders(stat_type='G')
# Initialize for current season's top assisters
assists = PlayerSeasonLeaders(stat_type='A')
# For a specific season's data
scorers_2022 = PlayerSeasonLeaders(stat_type='G', target_season='2022-23')
Returns processed list of top players and their statistics.
limit
: Optional number of players to return (defaults to 100)
# Get top 10 scorers
scorers = PlayerSeasonLeaders(stat_type='G')
top_10 = scorers.get_top_stats_list(limit=10)
Exports statistics to CSV format.
scorers = PlayerSeasonLeaders(stat_type='G')
scorers.get_top_stats_csv("top_scorers", header="2023-24 Season", limit=20)
Exports statistics to JSON format.
scorers = PlayerSeasonLeaders(stat_type='A')
scorers.get_top_stats_json("top_scorers", header="PL_Scorers", limit=20)
Creates formatted PDF of top 20 players.
scorers = PlayerSeasonLeaders(stat_type='A')
scorers.get_top_stats_pdf("premier_league_top_scorers")
List of lists with the following columns:
- Name
- Country
- Club
- Goals
- Goals Breakdown (In Play Goals + Penalties)
Example:
[
["Name", "Country", "Club", "Goals", "In Play Goals+Penalty"],
["Erling Haaland", "Norway", "Manchester City", "36", "30+6"],
# ... more entries
]
List of lists with the following columns:
- Name
- Country
- Club
- Assists
Example:
[
["Name", "Country", "Club", "Assists"],
["Kevin De Bruyne", "Belgium", "Manchester City", "16"],
# ... more entries
]
- PDF export includes:
- Gray header row
- Gold highlighting for the top scorer/assister
- Limited to top 20 players
- A3 page size for better readability
- Data is scraped from worldfootball.net
- Default limit for data retrieval is 100 entries
- All export methods support optional headers and limits (except PDF which is fixed at top 20)
Transfers
is a specialized scraper for retrieving and processing Premier League transfer data for teams in a specific season. It provides methods to fetch, display, and export both incoming and outgoing transfers.
from premier_league import Transfers
# Initialize for current season
transfers = Transfers()
# Initialize for specific season and league
transfers_2022 = Transfers(target_season="2022-23", league="La Liga")
# Print transfer table for a specific team
transfers.print_transfer_table("Arsenal")
# Get list of all teams in the specified season for referencing.
all_teams = transfers.get_all_current_teams()
Get incoming transfers for a specific team.
arsenal_ins = transfers.transfer_in_table("Arsenal FC")
Get outgoing transfers for a specific team.
arsenal_outs = transfers.transfer_out_table("Arsenal FC")
Display formatted transfer tables (both in and out) for a team.
transfers.print_transfer_table("Manchester United")
Get list of all teams in the current season.
teams = transfers.get_all_current_teams()
Export transfer data to CSV format.
# Export all transfers
transfers.transfer_csv("Chelsea", "chelsea_transfers")
# Export only incoming transfers
transfers.transfer_csv("Chelsea", "chelsea_incoming", transfer_type="in")
# Export only outgoing transfers
transfers.transfer_csv("Chelsea", "chelsea_outgoing", transfer_type="out")
Export transfer data to JSON format.
# Export all transfers
transfers.transfer_json("Liverpool", "liverpool_transfers")
# Export specific transfer type (in, out)
transfers.transfer_json("Liverpool", "liverpool_ins", transfer_type="in")
Each transfer record contains the following columns:
- Date (format: "DD/MM")
- Name (player name)
- Position
- Club (previous/new club)
Example Data Structure:
{
"arsenal in transfers": [
# Incoming transfers
[
["Date", "Name", "Position", "Club"],
["01/07", "Kai Havertz", "MF", "Chelsea FC"],
# ... more entries
]
],
"arsenal out transfers": [
[
["Date", "Name", "Position", "Club"],
["30/06", "Granit Xhaka", "MF", "Bayer Leverkusen"],
# ... more entries
]
],
# ... more teams
}
- Team names are case-insensitive but must match the official team name
- Raises
TeamNotFoundError
if specified team isn't found in the season - Data is scraped from worldfootball.net
- Transfer dates are in DD/MM format
- The
print_transfer_table
method uses PrettyTable for formatted console output - Export methods support three modes:
- "both": Exports both incoming and outgoing transfers (default)
- "in": Exports only incoming transfers
- "out": Exports only outgoing transfers
- Team names are stored in lowercase internally
- The class automatically handles clubs with special characters or extended names
- Transfer windows covered:
- Summer transfer window
- Winter transfer window
- Position abbreviations follow standard football notation (MF, FW, DF, GK)
This API provides access to Premier League player statistics, including goals and assists data. It supports both direct data retrieval and file exports in CSV and JSON formats.
GET /players/goals
Retrieve a list of top goalscorers in JSON format.
season
(optional): Season identifier (e.g., "2023-2024")limit
(optional): Maximum number of players to return
{
"data": [
{
"name": "Erling Haaland",
"country": "Norway",
"club": "Manchester City",
"goals": "36",
"goals_breakdown": "30+6"
}
]
}
GET /players/assists
Retrieve a list of top assist providers in JSON format.
season
(optional): Season identifier (e.g., "2023-2024")limit
(optional): Maximum number of players to return
{
"data": [
{
"name": "Kevin De Bruyne",
"country": "Belgium",
"club": "Manchester City",
"assists": "16"
}
]
}
GET /players/goals/csv_file
Download top goalscorers data as a CSV file.
season
(optional): Season identifier (e.g., "2023-2024")filename
(required): Name for the exported file (without extension)header
(optional): Custom header for the CSV filelimit
(optional): Maximum number of players to returnleague
(optional): League name (defaults to "Premier League")
GET /players/assists/csv_file
Download top assist providers data as a CSV file.
season
(optional): Season identifier (e.g., "2023-2024")filename
(required): Name for the exported file (without extension)header
(optional): Custom header for the CSV filelimit
(optional): Maximum number of players to returnleague
(optional): League name (defaults to "Premier League")
GET /players/goals/json_file
Download top goalscorers data as a JSON file.
season
(optional): Season identifier (e.g., "2023-2024")filename
(required): Name for the exported file (without extension)header
(optional): Custom metadata for the JSON filelimit
(optional): Maximum number of players to returnleague
(optional): League name (defaults to "Premier League")
GET /players/assists/json_file
Download top assist providers data as a JSON file.
season
(optional): Season identifier (e.g., "2023-2024")filename
(required): Name for the exported file (without extension)header
(optional): Custom metadata for the JSON filelimit
(optional): Maximum number of players to returnleague
(optional): League name (defaults to "Premier League")
Parameter | Type | Required | Description | Example |
---|---|---|---|---|
season | string | No | Premier League season identifier | "2023-2024" |
limit | integer | No | Maximum number of results to return | 10 |
filename | string | Yes* | Output filename for file exports | "top_scorers" |
header | string | No | Custom header/metadata for exports | "PL Stats" |
league | string | Yes* | Target League (defalts to PL) | "Bundesliga" |
* Required only for file export endpoints
The API returns standard HTTP status codes:
Status Code | Description |
---|---|
200 | Success |
400 | Bad Request (invalid parameters) |
500 | Internal Server Error |
Common error responses:
{
"error": "Limit must be a number"
}
{
"error": "Missing filename parameter"
}
GET /players/goals?season=2023-2024&limit=5
GET /players/assists/csv_file?limit=10&filename=top_assists&header=Premier League Assists
GET /players/goals/json_file?filename=goalscorers&header=Goal Statistics
# Get top scorers
curl "http://api.example.com/players/goals?limit=5"
# Download assists CSV
curl -O "http://api.example.com/players/assists/csv_file?filename=assists&limit=10"
This API provides access to Premier League standings and team rankings. It supports both detailed and simplified table views, along with multiple export formats including CSV, JSON, and PDF.
GET /ranking
Retrieve detailed Premier League standings with comprehensive team statistics.
season
(optional): Season identifier (e.g., "2023-2024")header
(optional): Include additional metadata in response
{
"data": {
"season": "2023-2024",
"standings": [
{
"position": 1,
"team": "Arsenal",
"played": 38,
"won": 25,
"drawn": 8,
"lost": 5,
"goals_for": 88,
"goals_against": 43,
"goal_difference": 45,
"points": 83
}
]
}
}
GET /ranking/table
Retrieve a simplified version of the league standings.
season
(optional): Season identifier (e.g., "2023-2024")
{
"data": [
["Pos", "Team", "P", "W", "D", "L", "GF", "GA", "GD", "Pts"],
[1, "Arsenal", 38, 25, 8, 5, 88, 43, 45, 83]
]
}
GET /ranking/csv_file
Download Premier League standings as a CSV file.
season
(optional): Season identifier (e.g., "2023-2024")filename
(required): Name for the exported file (without extension)
GET /ranking/json_file
Download Premier League standings as a JSON file.
season
(optional): Season identifier (e.g., "2023-2024")filename
(required): Name for the exported file (without extension)
GET /ranking/pdf_file
Download Premier League standings as a formatted PDF file.
season
(optional): Season identifier (e.g., "2023-2024")filename
(required): Name for the exported file (without extension)
Parameter | Type | Required | Description | Example |
---|---|---|---|---|
season | string | No | Premier League season identifier | "2023-2024" |
filename | string | Yes* | Output filename for file exports | "standings" |
header | string | No | Custom metadata for response | "PL Rankings" |
* Required only for file export endpoints
The API returns standard HTTP status codes:
Status Code | Description |
---|---|
200 | Success |
400 | Bad Request |
500 | Internal Server Error |
Common error response:
{
"error": "Missing filename parameter"
}
GET /ranking
GET /ranking/table?season=2023-2024
# CSV Export
GET /ranking/csv_file?filename=premier_league_standings&season=2023-2024
# JSON Export
GET /ranking/json_file?filename=pl_rankings&season=2023-2024
# PDF Export
GET /ranking/pdf_file?filename=standings_report&season=2023-2024
# Get full standings
curl "http://api.example.com/ranking"
# Download PDF report
curl -O "http://api.example.com/ranking/pdf_file?filename=standings"
# Get simplified table for specific season
curl "http://api.example.com/ranking/table?season=2023-2024"
position
: Current league positionteam
: Team nameplayed
: Games playedwon
: Games wondrawn
: Games drawnlost
: Games lostgoals_for
: Goals scoredgoals_against
: Goals concededgoal_difference
: Goal difference (GF - GA)points
: Total points
This API provides access to Premier League transfer data, allowing you to retrieve information about player transfers for specific teams. It supports both incoming and outgoing transfers and offers multiple export formats.
GET /all_teams
Retrieve a list of all teams in the Premier League for a given season.
season
(optional): Season identifier (e.g., "2023-2024")
{
"data": [
"Arsenal",
"Aston Villa",
"Brighton",
"Burnley",
...
]
}
GET /transfers/in
Retrieve all incoming transfers for a specific team.
season
(optional): Season identifier (e.g., "2023-2024")team
(required): Team name
{
"data": [
{
"date": "01/07",
"name": "Kai Havertz",
"position": "MF",
"previous_club": "Chelsea"
}
]
}
GET /transfers/out
Retrieve all outgoing transfers for a specific team.
season
(optional): Season identifier (e.g., "2023-2024")team
(required): Team name
{
"data": [
{
"date": "30/06",
"name": "Granit Xhaka",
"position": "MF",
"new_club": "Bayer Leverkusen"
}
]
}
GET /transfers/csv_file
Download transfer data as a CSV file.
season
(optional): Season identifier (e.g., "2023-2024")team
(required): Team namefilename
(required): Name for the exported file (without extension)transfer_type
(optional): Type of transfers to include:"in"
: Only incoming transfers"out"
: Only outgoing transfers"both"
: Both incoming and outgoing transfers (default)
league
(optional): League name (defaults to "Premier League")
GET /transfers/json_file
Download transfer data as a JSON file.
season
(optional): Season identifier (e.g., "2023-2024")team
(required): Team namefilename
(required): Name for the exported file (without extension)transfer_type
(optional): Type of transfers to include:"in"
: Only incoming transfers"out"
: Only outgoing transfers"both"
: Both incoming and outgoing transfers (default)
league
(optional): League name (defaults to "Premier League")
Parameter | Type | Required | Description | Example |
---|---|---|---|---|
season | string | No | Premier League season identifier | "2023-2024" |
team | string | Yes* | Team name | "Arsenal" |
filename | string | Yes** | Output filename for file exports | "transfers" |
transfer_type | string | No | Type of transfers to include | "both" |
league | string | Yes* | Target league (Defaults to PL) | "Serie A" |
* Required for all transfer-related endpoints except /all_teams
** Required only for file export endpoints
The API returns standard HTTP status codes:
Status Code | Description |
---|---|
200 | Success |
400 | Bad Request (missing or invalid parameters) |
500 | Internal Server Error |
Common error responses:
{
"error": "Missing team parameter"
}
{
"error": "Missing filename parameter"
}
{
"error": "Invalid type parameter"
}
GET /all_teams
GET /transfers/in?team=Arsenal
GET /transfers/csv_file?team=Manchester%20United&filename=united_transfers&transfer_type=both
# Get all teams
curl "http://api.example.com/all_teams"
# Get incoming transfers
curl "http://api.example.com/transfers/in?team=Chelsea"
# Download transfer data
curl -O "http://api.example.com/transfers/json_file?team=Liverpool&filename=liverpool_transfers"
date
: Transfer date (DD/MM format)name
: Player nameposition
: Player position (e.g., MF, FW, DF, GK)previous_club
/new_club
: Club involved in the transfer
- CSV exports include headers
- JSON exports are properly formatted
- Filenames are sanitized for security
- Support for splitting incoming/outgoing transfers