Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

📊 Update social expenditure in the long run dataset #4079

Open
wants to merge 11 commits into
base: master
Choose a base branch
from
26 changes: 26 additions & 0 deletions dag/redistribution.yml
Original file line number Diff line number Diff line change
Expand Up @@ -29,6 +29,32 @@ steps:
data://grapher/oecd/2025-02-25/social_expenditure:
- data://garden/oecd/2025-02-25/social_expenditure

#
# Social expenditure OMM
#
data://garden/social_expenditure/2025-03-07/social_expenditure_omm:
- data://garden/oecd/2025-02-25/social_expenditure
- data://garden/oecd/2025-03-07/social_expenditure_1985
- data://garden/social_expenditure/2025-03-07/lindert
data://grapher/social_expenditure/2025-03-07/social_expenditure_omm:
- data://garden/social_expenditure/2025-03-07/social_expenditure_omm

#
# Social transfers 1880-1930 (Lindert, 1994)
#
data://meadow/social_expenditure/2025-03-07/lindert:
- snapshot://social_expenditure/2025-03-07/lindert.csv
data://garden/social_expenditure/2025-03-07/lindert:
- data://meadow/social_expenditure/2025-03-07/lindert

#
# OECD social expenditure data (1985)
#
data://meadow/oecd/2025-03-07/social_expenditure_1985:
- snapshot://oecd/2025-03-07/social_expenditure_1985.xlsx
data://garden/oecd/2025-03-07/social_expenditure_1985:
- data://meadow/oecd/2025-03-07/social_expenditure_1985

############################################################
# HEALTH
############################################################
Expand Down
Original file line number Diff line number Diff line change
Expand Up @@ -114,7 +114,7 @@ dataset:
tables:
social_expenditure:
variables:
share_of_gdp:
share_gdp:
title: Social expenditure as a share of GDP - <<expenditure_source>> - <<spending_type>> - <<programme_type_category>> programs (<<programme_type>>)
unit: "% of GDP"
short_unit: "%"
Expand All @@ -131,7 +131,7 @@ tables:
numDecimalPlaces: 1
tolerance: 5

share_of_gov_expenditure:
share_gov_expenditure:
title: Social expenditure as a share of government expenditure - <<expenditure_source>> - <<spending_type>> - <<programme_type_category>> programs (<<programme_type>>)
unit: "% of government expenditure"
short_unit: "%"
Expand Down
4 changes: 2 additions & 2 deletions etl/steps/data/garden/oecd/2025-02-25/social_expenditure.py
Original file line number Diff line number Diff line change
Expand Up @@ -8,8 +8,8 @@

# Define indicator columns and their new names.
INDICATOR_COLUMNS = {
"Percentage of GDP": "share_of_gdp",
"Percentage of general government expenditure": "share_of_gov_expenditure",
"Percentage of GDP": "share_gdp",
"Percentage of general government expenditure": "share_gov_expenditure",
"US dollars per person, PPP converted": "usd_per_person_ppp",
}

Expand Down
Original file line number Diff line number Diff line change
@@ -0,0 +1,21 @@
{
"AUSTRALIA": "Australia",
"BELGIUM": "Belgium",
"CANADA": "Canada",
"DENMARK": "Denmark",
"FRANCE": "France",
"GERMANY": "Germany",
"GREECE": "Greece",
"ITALY": "Italy",
"Ireland": "Ireland",
"JAPAN": "Japan",
"NETHERLANDS": "Netherlands",
"NEW ZEALAND": "New Zealand",
"NORWAY": "Norway",
"SWITZERLAND": "Switzerland",
"UNITED KINGDOM": "United Kingdom",
"UNITED STATES OF AMERICA": "United States",
"AUSTRIA ": "Austria",
"SWEDEN": "Sweden",
"FINLAND": "Finland"
}
Original file line number Diff line number Diff line change
@@ -0,0 +1,36 @@
# NOTE: To learn more about the fields, hover over their names.
definitions:
common:
presentation:
topic_tags:
- Government Spending


# Learn more about the available fields:
# http://docs.owid.io/projects/etl/architecture/metadata/reference/
dataset:
update_period_days: 0


tables:
social_expenditure_1985:
variables:
share_gdp:
title: Public social expenditure as a share of GDP
unit: "% of GDP"
short_unit: "%"
description_short: Public social expenditure divided by gross domestic product, expressed as a percentage.
processing_level: major
description_processing: |-
We calculated this indicator by subtracting education expenditure from the total social expenditure published by OECD (1985), and divided by GDP.

We do this to ensure that the indicator uses the same definition of social expenditure as the OECD Social Expenditure Database (SOCX).
presentation:
attribution_short: OECD
title_public: Public social expenditure as a share of GDP
title_variant: Data between 1960-1981
display:
name: Public social expenditure as a share of GDP
numDecimalPlaces: 1
tolerance: 5

45 changes: 45 additions & 0 deletions etl/steps/data/garden/oecd/2025-03-07/social_expenditure_1985.py
Original file line number Diff line number Diff line change
@@ -0,0 +1,45 @@
"""Load a meadow dataset and create a garden dataset."""

from etl.data_helpers import geo
from etl.helpers import PathFinder

# Get paths and naming conventions for current step.
paths = PathFinder(__file__)


def run() -> None:
#
# Load inputs.
#
# Load meadow dataset.
ds_meadow = paths.load_dataset("social_expenditure_1985")

# Read table from meadow dataset.
tb = ds_meadow.read("social_expenditure_1985")

#
# Process data.
#
# Harmonize country names.
tb = geo.harmonize_countries(
df=tb,
countries_file=paths.country_mapping_path,
)

# Calculate the share of social expenditure in GDP.
tb["share_gdp"] = (tb["total_social_expenditure_with_education"] - tb["education"]) / tb["gdp"] * 100

# Improve table format.
tb = tb.format(["country", "year"])

# Keep only share_gdp column.
tb = tb[["share_gdp"]]

#
# Save outputs.
#
# Initialize a new garden dataset.
ds_garden = paths.create_dataset(tables=[tb], default_metadata=ds_meadow.metadata)

# Save garden dataset.
ds_garden.save()
Original file line number Diff line number Diff line change
@@ -0,0 +1,20 @@
{
"Australia": "Australia",
"Austria": "Austria",
"Belgium": "Belgium",
"Canada": "Canada",
"Denmark": "Denmark",
"Finland": "Finland",
"France": "France",
"Germany": "Germany",
"Ireland": "Ireland",
"Italy": "Italy",
"Japan": "Japan",
"Netherlands": "Netherlands",
"New Zealand": "New Zealand",
"Norway": "Norway",
"Sweden": "Sweden",
"Switzerland": "Switzerland",
"United Kingdom": "United Kingdom",
"United States": "United States"
}
Original file line number Diff line number Diff line change
@@ -0,0 +1,33 @@
# NOTE: To learn more about the fields, hover over their names.
definitions:
common:
presentation:
topic_tags:
- Government Spending


# Learn more about the available fields:
# http://docs.owid.io/projects/etl/architecture/metadata/reference/
dataset:
update_period_days: 0


tables:
lindert:
variables:
share_gdp:
title: Public social expenditure as a share of GDP
unit: "% of GDP"
short_unit: "%"
description_short: Public social expenditure divided bt gross domestic product, expressed as a percentage.
description_from_producer: "Social transfers, 1880–1930, as percentages of national product at current prices: all four kinds of government social spending (welfare–unemployment, pensions, health, and housing)."
processing_level: minor
presentation:
attribution_short: Lindert
title_public: Public social expenditure as a share of GDP
title_variant: Data between 1880-1930
display:
name: Public social expenditure as a share of GDP
numDecimalPlaces: 1
tolerance: 5

39 changes: 39 additions & 0 deletions etl/steps/data/garden/social_expenditure/2025-03-07/lindert.py
Original file line number Diff line number Diff line change
@@ -0,0 +1,39 @@
"""Load a meadow dataset and create a garden dataset."""

from etl.data_helpers import geo
from etl.helpers import PathFinder

# Get paths and naming conventions for current step.
paths = PathFinder(__file__)


def run() -> None:
#
# Load inputs.
#
# Load meadow dataset.
ds_meadow = paths.load_dataset("lindert")

# Read table from meadow dataset.
tb = ds_meadow.read("lindert")

#
# Process data.
#
# Harmonize country names.
tb = geo.harmonize_countries(
df=tb,
countries_file=paths.country_mapping_path,
)

# Improve table format.
tb = tb.format(["country", "year"])

#
# Save outputs.
#
# Initialize a new garden dataset.
ds_garden = paths.create_dataset(tables=[tb], default_metadata=ds_meadow.metadata)

# Save garden dataset.
ds_garden.save()
Original file line number Diff line number Diff line change
@@ -0,0 +1,40 @@
# NOTE: To learn more about the fields, hover over their names.
definitions:
common:
presentation:
topic_tags:
- Government Spending


# Learn more about the available fields:
# http://docs.owid.io/projects/etl/architecture/metadata/reference/
dataset:
update_period_days: 365
title: Social expenditure in the long run


tables:
social_expenditure_omm:
variables:
share_gdp:
title: Public social expenditure as a share of GDP
unit: "% of GDP"
short_unit: "%"
description_short: Public social expenditure divided bt gross domestic product, expressed as a percentage.
description_key:
- "This indicator combines three different datasets: Lindert (2004), OECD (1985), and the OECD Social Expenditure Database (SOCX). We combine the two OECD datasets by using the implicit growth rate from the older series, so we can extend the series back to 1960. We also use the data from Lindert (2004) to extend the series to 1880."
description_from_producer: ""
processing_level: major
description_processing: |-
We extrapolated the data available from the OECD Social Expenditure Database (public, in-cash and in-kind spending, all programs) using the earliest available observation from this dataset and applying the growth rates implied by the OECD (1985) data to obtain a series starting in 1960. These steps are necessary because the data in common years is not exactly the same for the two datasets due to changes in definitions and measurement. Nevertheless, we assume that trends stay the same in both cases.

We don't transform the data from Lindert (2004), the values are the same as in the original source.
presentation:
attribution_short: Lindert, OECD
title_public: Public social expenditure as a share of GDP
title_variant: Historical data
display:
name: Public social expenditure as a share of GDP
numDecimalPlaces: 1
tolerance: 5

Loading