Skip to content

datainpoint/final-visualization-with-modern-data-science

Repository files navigation

final-visualization-with-modern-data-science

Final: Visualization with Modern Data Science.

01. Define a function import_time_series_confirmed_deaths() which imports time_series_covid19_confirmed_global.csv and time_series_covid19_deaths_global.csv in working directory.

def import_time_series_confirmed_deaths() -> tuple:
    """
    >>> time_series_confirmed, time_series_deaths = import_time_series_confirmed_deaths()
    >>> type(time_series_confirmed)
    pandas.core.frame.DataFrame
    >>> type(time_series_deaths)
    pandas.core.frame.DataFrame
    >>> time_series_confirmed.shape
    (289, 1147)
    >>> time_series_deaths.shape
    (289, 1147)
    """
    ### BEGIN SOLUTION
    
    ### END SOLUTION

02. Define a function transform_time_series_confirmed_deaths() which transforms both time_series_covid19_confirmed_global.csv and time_series_covid19_deaths_global.csv from wide format into long format.

       Province/State        Country/Region        Lat        Long     Date  \
0                 NaN           Afghanistan  33.939110   67.709953  1/22/20   
1                 NaN               Albania  41.153300   20.168300  1/22/20   
2                 NaN               Algeria  28.033900    1.659600  1/22/20   
3                 NaN               Andorra  42.506300    1.521800  1/22/20   
4                 NaN                Angola -11.202700   17.873900  1/22/20   
...               ...                   ...        ...         ...      ...   
330322            NaN    West Bank and Gaza  31.952200   35.233200   3/9/23   
330323            NaN  Winter Olympics 2022  39.904200  116.407400   3/9/23   
330324            NaN                 Yemen  15.552727   48.516388   3/9/23   
330325            NaN                Zambia -13.133897   27.849332   3/9/23   
330326            NaN              Zimbabwe -19.015438   29.154857   3/9/23   

        Confirmed  
0               0  
1               0  
2               0  
3               0  
4               0  
...           ...  
330322     703228  
330323        535  
330324      11945  
330325     343135  
330326     264276  

[330327 rows x 6 columns]
       Province/State        Country/Region        Lat        Long     Date  \
0                 NaN           Afghanistan  33.939110   67.709953  1/22/20   
1                 NaN               Albania  41.153300   20.168300  1/22/20   
2                 NaN               Algeria  28.033900    1.659600  1/22/20   
3                 NaN               Andorra  42.506300    1.521800  1/22/20   
4                 NaN                Angola -11.202700   17.873900  1/22/20   
...               ...                   ...        ...         ...      ...   
330322            NaN    West Bank and Gaza  31.952200   35.233200   3/9/23   
330323            NaN  Winter Olympics 2022  39.904200  116.407400   3/9/23   
330324            NaN                 Yemen  15.552727   48.516388   3/9/23   
330325            NaN                Zambia -13.133897   27.849332   3/9/23   
330326            NaN              Zimbabwe -19.015438   29.154857   3/9/23   

        Deaths  
0            0  
1            0  
2            0  
3            0  
4            0  
...        ...  
330322    5708  
330323       0  
330324    2159  
330325    4057  
330326    5671  

[330327 rows x 6 columns]
def transform_time_series_confirmed_deaths() -> tuple:
    """
    >>> time_series_confirmed_long, time_series_deaths_long = transform_time_series_confirmed_deaths()
    >>> type(time_series_confirmed_long)
    pandas.core.frame.DataFrame
    >>> type(time_series_deaths_long)
    pandas.core.frame.DataFrame
    >>> time_series_confirmed_long.shape
    (330327, 6)
    >>> time_series_deaths_long.shape
    (330327, 6)
    """
    ### BEGIN SOLUTION
    time_series_confirmed, time_series_deaths = import_time_series_confirmed_deaths()
    idVars = ["Province/State", "Country/Region" , "Lat", "Long"]
    time_series_confirmed_long = pd.melt(time_series_confirmed, id_vars=idVars, var_name="Date", value_name="Confirmed")
    time_series_deaths_long = pd.melt(time_series_deaths, id_vars=idVars, var_name="Date", value_name="Deaths")
    return time_series_confirmed_long, time_series_deaths_long
    ### END SOLUTION

03. Define a function merge_time_series() which merges the long formatted time_series_covid19_confirmed_global.csv and time_series_covid19_deaths_global.csv into one data frame.

           Date Province/State        Country/Region        Lat        Long  \
0       1/22/20            NaN           Afghanistan  33.939110   67.709953   
1       1/22/20            NaN               Albania  41.153300   20.168300   
2       1/22/20            NaN               Algeria  28.033900    1.659600   
3       1/22/20            NaN               Andorra  42.506300    1.521800   
4       1/22/20            NaN                Angola -11.202700   17.873900   
...         ...            ...                   ...        ...         ...   
330322   3/9/23            NaN    West Bank and Gaza  31.952200   35.233200   
330323   3/9/23            NaN  Winter Olympics 2022  39.904200  116.407400   
330324   3/9/23            NaN                 Yemen  15.552727   48.516388   
330325   3/9/23            NaN                Zambia -13.133897   27.849332   
330326   3/9/23            NaN              Zimbabwe -19.015438   29.154857   

        Confirmed  Deaths  
0               0       0  
1               0       0  
2               0       0  
3               0       0  
4               0       0  
...           ...     ...  
330322     703228    5708  
330323        535       0  
330324      11945    2159  
330325     343135    4057  
330326     264276    5671  

[330327 rows x 7 columns]
def merge_time_series() -> pd.core.frame.DataFrame:
    """
    >>> merged_time_series = merge_time_series()
    >>> type(merged_time_series)
    pandas.core.frame.DataFrame
    >>> merged_time_series.shape
    (330327, 7)
    """
    ### BEGIN SOLUTION

    ### END SOLUTION

04. Define a function groupby_date_country() which summarizes Confirmed and Death by Date and Country/Region given merged, long formatted time_series dataframe.

          Date        Country/Region  Confirmed  Deaths
0       1/1/21           Afghanistan      52513    2201
1       1/1/21               Albania      58316    1181
2       1/1/21               Algeria      99897    2762
3       1/1/21               Andorra       8117      84
4       1/1/21                Angola      17568     405
...        ...                   ...        ...     ...
229738  9/9/22    West Bank and Gaza     702591    5706
229739  9/9/22  Winter Olympics 2022        535       0
229740  9/9/22                 Yemen      11932    2155
229741  9/9/22                Zambia     333204    4017
229742  9/9/22              Zimbabwe     256859    5596

[229743 rows x 4 columns]
def groupby_date_country() -> pd.core.frame.DataFrame:
    """
    >>> groupby_time_series = groupby_date_country()
    >>> type(groupby_time_series)
    pandas.core.frame.DataFrame
    >>> groupby_time_series.shape
    (229743, 4)
    """
    ### BEGIN SOLUTION

    ### END SOLUTION

05. Define a function format_date_column() which formats Date column with ISO-8601 format yyyy-mm-dd(%Y-%m-%d) given summarized time_series dataframe.

             Date        Country/Region  Confirmed  Deaths
0      2020-01-22           Afghanistan          0       0
1      2020-01-22               Albania          0       0
2      2020-01-22               Algeria          0       0
3      2020-01-22               Andorra          0       0
4      2020-01-22                Angola          0       0
...           ...                   ...        ...     ...
229738 2023-03-09    West Bank and Gaza     703228    5708
229739 2023-03-09  Winter Olympics 2022        535       0
229740 2023-03-09                 Yemen      11945    2159
229741 2023-03-09                Zambia     343135    4057
229742 2023-03-09              Zimbabwe     264276    5671

[229743 rows x 4 columns]
def format_date_column() -> pd.core.frame.DataFrame:
    """
    >>> formatted_time_series = format_date_column()
    >>> type(formatted_time_series)
    pandas.core.frame.DataFrame
    >>> formatted_time_series.shape
    (229743, 4)
    """
    ### BEGIN SOLUTION

    ### END SOLUTION

06. Define a function snapshot_last_day() which retrieved the cummulative confirmed and deaths by country as of 2023-03-09 given the formatted time_series dataframe.

             Date        Country/Region  Confirmed  Deaths
229542 2023-03-09           Afghanistan     209451    7896
229543 2023-03-09               Albania     334457    3598
229544 2023-03-09               Algeria     271496    6881
229545 2023-03-09               Andorra      47890     165
229546 2023-03-09                Angola     105288    1933
...           ...                   ...        ...     ...
229738 2023-03-09    West Bank and Gaza     703228    5708
229739 2023-03-09  Winter Olympics 2022        535       0
229740 2023-03-09                 Yemen      11945    2159
229741 2023-03-09                Zambia     343135    4057
229742 2023-03-09              Zimbabwe     264276    5671

[201 rows x 4 columns]
def snapshot_last_day() -> pd.core.frame.DataFrame:
    """
    >>> last_day = snapshot_last_day()
    >>> type(last_day)
    pandas.core.frame.DataFrame
    >>> last_day.shape
    (201, 4)
    """
    ### BEGIN SOLUTION

    ### END SOLUTION

07. Define a function summarize_table_shapes() which summarizes the shapes of table in taiwan_election_2024.db in working directory as a dataframe. Given table names as following: ["aboriginal_legislators", "candidates", "districts", "election_types", "parties", "party_legislators", "polling_places", "presidents", "regional_legislators", "villages"]

                    table    rows  columns
0  aboriginal_legislators  338105        7
1              candidates     331        4
2               districts   17795        5
3          election_types       5        2
4                 parties      35        2
5       party_legislators  284720        7
6          polling_places   88975        8
7              presidents   53385        7
8    regional_legislators   77032        8
9                villages    5227        2
def summarize_table_shapes() -> pd.core.frame.DataFrame:
    """
    >>> table_shapes = summarize_table_shapes()
    >>> type(table_shapes)
    pandas.core.frame.DataFrame
    >>> table_shapes.shape
    (10, 3)
    """
    ### BEGIN SOLUTION

    ### END SOLUTION

08. Define a function summarize_presidential_votes() which summarizes the votes of president/vice president election.

   party candidates    votes
0  民主進步黨    賴清德/蕭美琴  5586019
1  中國國民黨    侯友宜/趙少康  4671021
2  台灣民眾黨    柯文哲/吳欣盈  3690466
def summarize_presidential_votes() -> pd.core.frame.DataFrame:
    """
    >>> presidential_votes = summarize_presidential_votes()
    >>> type(presidential_votes)
    pandas.core.frame.DataFrame
    >>> presidential_votes.shape
    (3, 3)
    """
    ### BEGIN SOLUTION

    ### END SOLUTION

09. Define a function summarize_regional_aborinal_legislator_votes() which summarizes the votes of regional and aboriginal legislator election by party.

   election_type      party    votes
0       區域及原住民立委      民主進步黨  6095276
1       區域及原住民立委      中國國民黨  5510850
2       區域及原住民立委          無  1069758
3       區域及原住民立委      台灣民眾黨   403357
4       區域及原住民立委       時代力量    96589
5       區域及原住民立委  小民參政歐巴桑聯盟    78138
6       區域及原住民立委      社會民主黨    74375
7       區域及原住民立委    臺灣雙語無法黨    55937
8       區域及原住民立委      司法改革黨    38685
9       區域及原住民立委       台灣維新    33347
10      區域及原住民立委       台灣基進    32583
11      區域及原住民立委      人民最大黨    25216
12      區域及原住民立委       台灣綠黨    15557
13      區域及原住民立委      喜樂島聯盟    15433
14      區域及原住民立委    中華統一促進黨    13203
15      區域及原住民立委      制度救世島    11260
16      區域及原住民立委         新黨     9143
17      區域及原住民立委    台灣麻將最大黨     7839
18      區域及原住民立委      復康聯盟黨     7441
19      區域及原住民立委        勞動黨     6453
20      區域及原住民立委      司法正義黨     6331
21      區域及原住民立委      中華聯合黨     4139
22      區域及原住民立委      台灣國民黨     3293
23      區域及原住民立委      台灣革命黨     3072
24      區域及原住民立委    中華文化共和黨     2583
25      區域及原住民立委     家庭基本收入     2361
26      區域及原住民立委  台灣整復師聯盟工黨     1881
27      區域及原住民立委      中華婦女黨     1029
28      區域及原住民立委      人民民主黨      836
29      區域及原住民立委        經濟黨      472
30      區域及原住民立委    中華愛國同心黨      310
31      區域及原住民立委     合一行動聯盟      174
32      區域及原住民立委      興中同盟會      136
def summarize_regional_aborinal_legislator_votes() -> pd.core.frame.DataFrame:
    """
    >>> regional_aborinal_legislator_votes = summarize_regional_aborinal_legislator_votes()
    >>> type(regional_aborinal_legislator_votes)
    pandas.core.frame.DataFrame
    >>> regional_aborinal_legislator_votes.shape
    (33, 3)
    """
    ### BEGIN SOLUTION

    ### END SOLUTION

10. Define a function summarize_party_legislator_votes() which summarizes the votes of party legislator election.

   election_type      party    votes
0          不分區立委      民主進步黨  4982062
1          不分區立委      中國國民黨  4764576
2          不分區立委      台灣民眾黨  3040615
3          不分區立委       時代力量   353412
4          不分區立委  小民參政歐巴桑聯盟   128613
5          不分區立委       台灣綠黨   117298
6          不分區立委       台灣基進    95078
7          不分區立委        親民黨    69818
8          不分區立委    臺灣雙語無法黨    44852
9          不分區立委     台灣團結聯盟    43375
10         不分區立委         新黨    40288
11         不分區立委      司法改革黨    37615
12         不分區立委      制度救世島    19665
13         不分區立委    中華統一促進黨    17423
14         不分區立委      人民最大黨    11746
15         不分區立委       台灣維新    10300
def summarize_party_legislator_votes() -> pd.core.frame.DataFrame:
    """
    >>> party_legislator_votes = summarize_party_legislator_votes()
    >>> type(party_legislator_votes)
    pandas.core.frame.DataFrame
    >>> party_legislator_votes.shape
    (16, 3)
    """
    ### BEGIN SOLUTION

    ### END SOLUTION

About

Final: Visualization with Modern Data Science.

Resources

Stars

Watchers

Forks

Releases

No releases published

Packages

No packages published

Languages