In [70]:
import numpy as np
import pandas as pd
In [71]:
vaccine_asean = pd.read_csv('/Users/thandarmoe/Library/Mobile Documents/com~apple~CloudDocs/teaching/Cloud/me/Python/Dataset/wuenic2023rev_unicef.csv')
Data Exploration¶
The immunization dataset was downloaded from https://data.unicef.org/resources/dataset/immunization/ and filtered for only ASEAN countries.
In [74]:
print(vaccine_asean.info())
<class 'pandas.core.frame.DataFrame'> RangeIndex: 124 entries, 0 to 123 Data columns (total 28 columns): # Column Non-Null Count Dtype --- ------ -------------- ----- 0 unicef_region 124 non-null object 1 iso3 124 non-null object 2 country 124 non-null object 3 vaccine 124 non-null object 4 2023 123 non-null float64 5 2022 122 non-null float64 6 2021 120 non-null float64 7 2020 117 non-null float64 8 2019 114 non-null float64 9 2018 114 non-null float64 10 2017 112 non-null float64 11 2016 105 non-null float64 12 2015 102 non-null float64 13 2014 94 non-null float64 14 2013 92 non-null float64 15 2012 88 non-null float64 16 2011 85 non-null float64 17 2010 85 non-null float64 18 2009 80 non-null float64 19 2008 78 non-null float64 20 2007 77 non-null float64 21 2006 73 non-null float64 22 2005 72 non-null float64 23 2004 71 non-null float64 24 2003 70 non-null float64 25 2002 65 non-null float64 26 2001 64 non-null float64 27 2000 64 non-null float64 dtypes: float64(24), object(4) memory usage: 27.3+ KB None
In [75]:
print(vaccine_asean['country'].unique())
print(vaccine_asean['vaccine'].unique())
['Brunei Darussalam' 'Cambodia' 'Indonesia' "Lao People's Democratic Republic" 'Malaysia' 'Myanmar' 'Philippines' 'Singapore' 'Thailand' 'Viet Nam'] ['BCG' 'DTP1' 'DTP3' 'HEPB3' 'HEPBB' 'HIB3' 'IPV1' 'IPV2' 'MCV1' 'MCV2' 'PCV3' 'POL3' 'RCV1' 'ROTAC']
The dataset includes vaccination coverage data for 14 vaccines across 10 ASEAN countries, spanning the years 2000 to 2023. It consists of 124 records and 28 variables.
Data Manipulation¶
In [78]:
# Step 1: Melt the year columns
id_vars = ['unicef_region', 'iso3', 'country', 'vaccine']
value_vars = [str(year) for year in range(2000, 2024)]
vaccine_asean_long = pd.melt(
vaccine_asean,
id_vars=id_vars,
value_vars=value_vars,
var_name='year',
value_name='value'
)
# Step 2: Add a 'year_date' column
vaccine_asean_long['year_date'] = pd.to_datetime(vaccine_asean_long['year'], format='%Y')
# Step 3: sort by country and year
vaccine_asean_long = vaccine_asean_long.sort_values(by=['country', 'vaccine', 'year'])
print(vaccine_asean_long.head())
unicef_region iso3 country vaccine year value year_date 0 EAPR BRN Brunei Darussalam BCG 2000 99.0 2000-01-01 124 EAPR BRN Brunei Darussalam BCG 2001 99.0 2001-01-01 248 EAPR BRN Brunei Darussalam BCG 2002 95.0 2002-01-01 372 EAPR BRN Brunei Darussalam BCG 2003 95.0 2003-01-01 496 EAPR BRN Brunei Darussalam BCG 2004 99.0 2004-01-01
Data Visualization¶
In [80]:
import matplotlib.pyplot as plt
import seaborn as sns
In [81]:
g = sns.relplot(
data=vaccine_asean_long,
x='year_date',
y='value',
kind='line',
col='country', # one subplot per country
col_wrap=4, # wrap after 4 plots per row
hue='vaccine', # color by vaccine
height=3,
aspect=1.5
)
# Remove x and y axis labels for all subplots
for ax in g.axes.flat:
ax.set_xlabel('')
ax.set_ylabel('')
ax.tick_params(axis='x', rotation=0) # rotate x-axis ticks
# Add a main title for the whole grid
g.fig.suptitle('Vaccination Trends by ASEAN Countries', fontsize=16, y=1.02)
plt.show()
- Brunei, Malayisa, and Thailand has little flacutation in the coverage of 14 vaccines compared to other countries.
- Myanmar has suffered the largest dip in vaccination in 2021, which coincides with Covid-19 and coup.