The dataset was downloaded from healthdata.gov and provided by U.S. Department of Health & Human Services. It was collected from hospitals across US, started from July 31, to understand more on the impact on patient and hospital capacity of four different facilities due to COVID-19. It has one million observations and 127 fields of data. Additionally, we conducted two analysis on influenza data.
To understand more about the data, check the following:
Info about data - COVID-19 Reported Patient Impact and Hospital Capacity by Facility
# Define the file path
file_path <- "/Users/thandarmoe/Library/Mobile Documents/com~apple~CloudDocs/teaching/Cloud/me/R-Language/Practice/Dataset/COVID-19_Reported_Patient_Impact_and_Hospital_Capacity_by_Facility_20241005.csv"
# Load the CSV file
data <- read.csv(file_path)
# Set the collection_week to Date
data$collection_week <- as.Date(data$collection_week)
# View the structure of the dataset
str(data)
## 'data.frame': 1045406 obs. of 128 variables:
## $ hospital_pk : chr "050778" "051325" "060001" "061324" ...
## $ collection_week : Date, format: "2020-04-12" "2020-05-24" ...
## $ state : chr "CA" "CA" "CO" "CO" ...
## $ ccn : chr "050778" "051325" "060001" "061324" ...
## $ hospital_name : chr "LOMA LINDA UNIVERSITY CHILDREN'S HOSPITAL" "ADVENTIST HEALTH MENDOCINO COAST" "BANNER NORTH COLORADO MEDICAL CENTER" "ASPEN VALLEY HOSPITAL" ...
## $ address : chr "11234 ANDERSON STREET SUITE A" "700 RIVER DRIVE" "1801 16TH ST" "401 CASTLE CREEK RD" ...
## $ city : chr "LOMA LINDA" "FORT BRAGG" "GREELEY" "ASPEN" ...
## $ zip : int 92354 95437 80631 81611 19973 19958 32763 32164 33606 32932 ...
## $ hospital_subtype : chr "Short Term" "Critical Access Hospitals" "Short Term" "Critical Access Hospitals" ...
## $ fips_code : int 6071 6045 8123 8097 10005 10005 12127 12035 12057 12009 ...
## $ is_metro_micro : chr "true" "true" "true" "true" ...
## $ total_beds_7_day_avg : num NA 29 1081 NA 99 ...
## $ all_adult_hospital_beds_7_day_avg : num NA NA NA NA NA NA NA NA NA NA ...
## $ all_adult_hospital_inpatient_beds_7_day_avg : num NA NA NA NA NA NA NA NA NA NA ...
## $ inpatient_beds_used_7_day_avg : num NA 10.4 171.4 NA 62.7 ...
## $ all_adult_hospital_inpatient_bed_occupied_7_day_avg : num NA NA NA NA NA NA NA NA NA NA ...
## $ inpatient_beds_used_covid_7_day_avg : num -999999 0 18.7 0 5.7 ...
## $ total_adult_patients_hospitalized_confirmed_and_suspected_covid_7_day_avg : num NA NA NA NA NA NA NA NA NA NA ...
## $ total_adult_patients_hospitalized_confirmed_covid_7_day_avg : num NA NA NA NA NA NA NA NA NA NA ...
## $ total_pediatric_patients_hospitalized_confirmed_and_suspected_covid_7_day_avg : num NA NA NA NA NA NA NA NA NA NA ...
## $ total_pediatric_patients_hospitalized_confirmed_covid_7_day_avg : num NA NA NA NA NA NA NA NA NA NA ...
## $ inpatient_beds_7_day_avg : num NA 22 321 NA 99 ...
## $ total_icu_beds_7_day_avg : num 99 4 78 4 16 20 30.4 18 127 NA ...
## $ total_staffed_adult_icu_beds_7_day_avg : num NA NA NA NA NA NA NA NA NA NA ...
## $ icu_beds_used_7_day_avg : num NA NA NA NA NA NA NA NA NA NA ...
## $ staffed_adult_icu_bed_occupancy_7_day_avg : num NA NA NA NA NA NA NA NA NA NA ...
## $ staffed_icu_adult_patients_confirmed_and_suspected_covid_7_day_avg : num NA NA NA NA NA NA NA NA NA NA ...
## $ staffed_icu_adult_patients_confirmed_covid_7_day_avg : num NA NA NA NA NA NA NA NA NA NA ...
## $ total_patients_hospitalized_confirmed_influenza_7_day_avg : num NA NA NA NA NA NA NA NA NA NA ...
## $ icu_patients_confirmed_influenza_7_day_avg : num NA NA NA NA NA NA NA NA NA NA ...
## $ total_patients_hospitalized_confirmed_influenza_and_covid_7_day_avg : num NA NA NA NA NA NA NA NA NA NA ...
## $ total_beds_7_day_sum : int NA 203 7565 NA 693 1470 NA NA NA NA ...
## $ all_adult_hospital_beds_7_day_sum : int NA NA NA NA NA NA NA NA NA NA ...
## $ all_adult_hospital_inpatient_beds_7_day_sum : int NA NA NA NA NA NA NA NA NA NA ...
## $ inpatient_beds_used_7_day_sum : int NA 73 1200 NA 439 982 NA NA NA NA ...
## $ all_adult_hospital_inpatient_bed_occupied_7_day_sum : int NA NA NA NA NA NA NA NA NA NA ...
## $ inpatient_beds_used_covid_7_day_sum : int 16 0 131 0 40 13 43 13 8 24 ...
## $ total_adult_patients_hospitalized_confirmed_and_suspected_covid_7_day_sum : int NA NA NA NA NA NA NA NA NA NA ...
## $ total_adult_patients_hospitalized_confirmed_covid_7_day_sum : int NA NA NA NA NA NA NA NA NA NA ...
## $ total_pediatric_patients_hospitalized_confirmed_and_suspected_covid_7_day_sum : int NA NA NA NA NA NA NA NA NA NA ...
## $ total_pediatric_patients_hospitalized_confirmed_covid_7_day_sum : int NA NA NA NA NA NA NA NA NA NA ...
## $ inpatient_beds_7_day_sum : int NA 154 2249 NA 693 1365 NA NA NA NA ...
## $ total_icu_beds_7_day_sum : int 693 28 546 4 112 140 213 126 254 NA ...
## $ total_staffed_adult_icu_beds_7_day_sum : int NA NA NA NA NA NA NA NA NA NA ...
## $ icu_beds_used_7_day_sum : int NA NA NA NA NA NA NA NA NA NA ...
## $ staffed_adult_icu_bed_occupancy_7_day_sum : int NA NA NA NA NA NA NA NA NA NA ...
## $ staffed_icu_adult_patients_confirmed_and_suspected_covid_7_day_sum : int NA NA NA NA NA NA NA NA NA NA ...
## $ staffed_icu_adult_patients_confirmed_covid_7_day_sum : int NA NA NA NA NA NA NA NA NA NA ...
## $ total_patients_hospitalized_confirmed_influenza_7_day_sum : int NA NA NA NA NA NA NA NA NA NA ...
## $ icu_patients_confirmed_influenza_7_day_sum : int NA NA NA NA NA NA NA NA NA NA ...
## $ total_patients_hospitalized_confirmed_influenza_and_covid_7_day_sum : int NA NA NA NA NA NA NA NA NA NA ...
## $ total_beds_7_day_coverage : int 0 7 7 0 7 7 0 0 0 0 ...
## $ all_adult_hospital_beds_7_day_coverage : int 0 0 0 0 0 0 0 0 0 0 ...
## $ all_adult_hospital_inpatient_beds_7_day_coverage : int 0 0 0 0 0 0 0 0 0 0 ...
## $ inpatient_beds_used_7_day_coverage : int 0 7 7 0 7 7 0 0 0 0 ...
## $ all_adult_hospital_inpatient_bed_occupied_7_day_coverage : int 0 0 0 0 0 0 0 0 0 0 ...
## $ inpatient_beds_used_covid_7_day_coverage : int 7 7 7 1 7 7 7 7 1 7 ...
## $ total_adult_patients_hospitalized_confirmed_and_suspected_covid_7_day_coverage : int 0 0 0 0 0 0 0 0 0 0 ...
## $ total_adult_patients_hospitalized_confirmed_covid_7_day_coverage : int 0 0 0 0 0 0 0 0 0 0 ...
## $ total_pediatric_patients_hospitalized_confirmed_and_suspected_covid_7_day_coverage: int 0 0 0 0 0 0 0 0 0 0 ...
## $ total_pediatric_patients_hospitalized_confirmed_covid_7_day_coverage : int 0 0 0 0 0 0 0 0 0 0 ...
## $ inpatient_beds_7_day_coverage : int 0 7 7 0 7 7 0 0 0 0 ...
## $ total_icu_beds_7_day_coverage : int 7 7 7 1 7 7 7 7 2 0 ...
## $ total_staffed_adult_icu_beds_7_day_coverage : int 0 0 0 0 0 0 0 0 0 0 ...
## $ icu_beds_used_7_day_coverage : int 0 0 0 0 0 0 0 0 0 0 ...
## $ staffed_adult_icu_bed_occupancy_7_day_coverage : int 0 0 0 0 0 0 0 0 0 0 ...
## $ staffed_icu_adult_patients_confirmed_and_suspected_covid_7_day_coverage : int 0 0 0 0 0 0 0 0 0 0 ...
## $ staffed_icu_adult_patients_confirmed_covid_7_day_coverage : int 0 0 0 0 0 0 0 0 0 0 ...
## $ total_patients_hospitalized_confirmed_influenza_7_day_coverage : int 0 0 0 0 0 0 0 0 0 0 ...
## $ icu_patients_confirmed_influenza_7_day_coverage : int 0 0 0 0 0 0 0 0 0 0 ...
## $ total_patients_hospitalized_confirmed_influenza_and_covid_7_day_coverage : int 0 0 0 0 0 0 0 0 0 0 ...
## $ previous_day_admission_adult_covid_confirmed_7_day_sum : int NA NA NA NA NA -999999 NA NA NA NA ...
## $ previous_day_admission_adult_covid_confirmed_18.19_7_day_sum : int NA NA NA NA NA NA NA NA NA NA ...
## $ previous_day_admission_adult_covid_confirmed_20.29_7_day_sum : int NA NA NA NA NA NA NA NA NA NA ...
## $ previous_day_admission_adult_covid_confirmed_30.39_7_day_sum : int NA NA NA NA NA NA NA NA NA NA ...
## $ previous_day_admission_adult_covid_confirmed_40.49_7_day_sum : int NA NA NA NA NA NA NA NA NA NA ...
## $ previous_day_admission_adult_covid_confirmed_50.59_7_day_sum : int NA NA NA NA NA NA NA NA NA NA ...
## $ previous_day_admission_adult_covid_confirmed_60.69_7_day_sum : int NA NA NA NA NA NA NA NA NA NA ...
## $ previous_day_admission_adult_covid_confirmed_70.79_7_day_sum : int NA NA NA NA NA NA NA NA NA NA ...
## $ previous_day_admission_adult_covid_confirmed_80._7_day_sum : int NA NA NA NA NA NA NA NA NA NA ...
## $ previous_day_admission_adult_covid_confirmed_unknown_7_day_sum : int NA NA NA NA NA NA NA NA NA NA ...
## $ previous_day_admission_pediatric_covid_confirmed_7_day_sum : int NA NA NA NA NA NA NA NA NA NA ...
## $ previous_day_covid_ED_visits_7_day_sum : int NA NA NA NA NA 158 NA NA NA NA ...
## $ previous_day_admission_adult_covid_suspected_7_day_sum : int NA NA NA NA NA 111 NA NA NA NA ...
## $ previous_day_admission_adult_covid_suspected_18.19_7_day_sum : int NA NA NA NA NA NA NA NA NA NA ...
## $ previous_day_admission_adult_covid_suspected_20.29_7_day_sum : int NA NA NA NA NA NA NA NA NA NA ...
## $ previous_day_admission_adult_covid_suspected_30.39_7_day_sum : int NA NA NA NA NA NA NA NA NA NA ...
## $ previous_day_admission_adult_covid_suspected_40.49_7_day_sum : int NA NA NA NA NA NA NA NA NA NA ...
## $ previous_day_admission_adult_covid_suspected_50.59_7_day_sum : int NA NA NA NA NA NA NA NA NA NA ...
## $ previous_day_admission_adult_covid_suspected_60.69_7_day_sum : int NA NA NA NA NA NA NA NA NA NA ...
## $ previous_day_admission_adult_covid_suspected_70.79_7_day_sum : int NA NA NA NA NA NA NA NA NA NA ...
## $ previous_day_admission_adult_covid_suspected_80._7_day_sum : int NA NA NA NA NA NA NA NA NA NA ...
## $ previous_day_admission_adult_covid_suspected_unknown_7_day_sum : int NA NA NA NA NA NA NA NA NA NA ...
## $ previous_day_admission_pediatric_covid_suspected_7_day_sum : int NA NA NA NA NA NA NA NA NA NA ...
## $ previous_day_total_ED_visits_7_day_sum : int NA NA NA NA NA 920 NA NA NA NA ...
## $ previous_day_admission_influenza_confirmed_7_day_sum : int NA NA NA NA NA NA NA NA NA NA ...
## $ geocoded_hospital_address : chr "POINT (-117.264076004 34.049546981)" "POINT (-123.800331961 39.431600999)" "POINT (-104.709703002 40.414700005)" "POINT (-106.839321892 39.190483484)" ...
## $ hhs_ids : chr "[C050778-A]" "[C051325-A]" "[C060001-A]" "[C061324-A]" ...
## $ previous_day_admission_adult_covid_confirmed_7_day_coverage : int 0 0 0 0 0 7 0 0 0 0 ...
## [list output truncated]
#Load tidyverse
library(tidyverse)
## ── Attaching core tidyverse packages ──────────────────────── tidyverse 2.0.0 ──
## ✔ dplyr 1.1.4 ✔ readr 2.1.5
## ✔ forcats 1.0.0 ✔ stringr 1.5.1
## ✔ ggplot2 3.5.1 ✔ tibble 3.2.1
## ✔ lubridate 1.9.3 ✔ tidyr 1.3.1
## ✔ purrr 1.0.2
## ── Conflicts ────────────────────────────────────────── tidyverse_conflicts() ──
## ✖ dplyr::filter() masks stats::filter()
## ✖ dplyr::lag() masks stats::lag()
## ℹ Use the conflicted package (<http://conflicted.r-lib.org/>) to force all conflicts to become errors
#Clean the data
data <- data %>%
mutate(across(where(is.numeric), ~ na_if(., -999999.0)))
Variables
Average of total number of staffed inpatient beds that are occupied reported during the 7-day period.
Average of total number of all staffed inpatient and outpatient beds in the hospital, including all overflow, observation, and active surge/expansion beds used for inpatients and for outpatients (including all ICU, ED, and observation) reported during the 7-day period.
# Calculate average occupancy rates
bed_occupancy_data <- data %>%
group_by(hospital_subtype, collection_week) %>%
summarise(
avg_inpatient_beds_used = mean(inpatient_beds_used_7_day_avg, na.rm = TRUE),
avg_total_beds = mean(total_beds_7_day_avg, na.rm = TRUE),
avg_occupancy_rate = avg_inpatient_beds_used / avg_total_beds * 100,
.groups = 'drop'
) %>%
filter(!is.na(avg_occupancy_rate))
# Plot the bed occupancy rates over time
ggplot(bed_occupancy_data, aes(x = collection_week, y = avg_occupancy_rate)) +
geom_line(color = "steelblue") +
labs(title = "Average Bed Occupancy Rate Over Time per Hospital Subtype",
y = "Occupancy Rate (%)") +
theme_minimal() +
scale_x_date(date_breaks = "6 months", date_labels = "%b %Y") +
theme(plot.title = element_text(hjust = 0.5),
legend.position = "top",
axis.text.x = element_text(angle = 45, hjust = 1)) +
facet_wrap(~hospital_subtype)
This chart shows the average bed occupancy rates from 2020 to 2024:
Children’s Hospitals: Bed occupancy increased to around 70-80%, with some fluctuations and a slight decline by 2024.
Critical Access Hospitals: Occupancy remained steady around 40-50% after a rise in 2020.
Long-Term Hospitals: Stable around 70-80%, with minor dips after an early drop in 2020.
Short-Term Hospitals: Similar pattern to long-term, stabilizing around 60-70% after an initial drop.
Overall, occupancy trends stabilized after early disruptions in 2020.
Variables
Average number of total number of staffed inpatient ICU beds that are occupied reported in the 7-day period.
Average number of total number of staffed inpatient ICU beds reported in the 7-day period.
# Calculate ICU bed usage
icu_data <- data %>%
group_by(hospital_subtype, collection_week) %>%
summarise(
avg_icu_beds_used = mean(icu_beds_used_7_day_avg, na.rm = TRUE),
avg_total_icu_beds = mean(total_icu_beds_7_day_avg, na.rm = TRUE),
avg_icu_occupancy_rate = avg_icu_beds_used / avg_total_icu_beds * 100,
.groups = "drop"
) %>%
filter(!is.na(avg_icu_occupancy_rate))
# Plot the ICU occupancy rates
ggplot(icu_data, aes(x = collection_week, y = avg_icu_occupancy_rate)) +
geom_line(color = "red") +
labs(title = "Weekly Average ICU Bed Occupancy Rate Over Time per Hospital Subtype",
y = "ICU Occupancy Rate (%)") +
theme_minimal() +
scale_x_date(date_breaks = "6 months", date_labels = "%b %Y") +
theme(plot.title = element_text(hjust = 0.5),
legend.position = "top",
axis.text.x = element_text(angle = 45, hjust = 1)) +
facet_wrap(~hospital_subtype)
Children’s Hospitals:
ICU occupancy starts high around early 2020, likely reflecting a spike (possibly due to the COVID-19 pandemic).
After the initial peak, the occupancy stabilizes around 60-80% and remains relatively stable through 2024, with minor fluctuations.
Critical Access Hospitals:
ICU occupancy starts near zero in early 2020, then spikes significantly, peaking above 80%.
Afterward, there’s a fluctuating trend, with the occupancy rates varying between 40% to 80% over time, and it stabilizes somewhat by 2023 and 2024.
Long-Term Hospitals:
Short-Term Hospitals:
Similar to long-term hospitals, there’s a sharp drop early in 2020, followed by a steady recovery.
After 2020, ICU occupancy remains relatively stable around 60-70% for the rest of the period.
In general, the graph reflects the trends in ICU bed occupancy across different hospital types, likely influenced by external events such as the COVID-19 pandemic. The early peaks, followed by stabilization, suggest that after initial surges in demand, the healthcare system adapted to stabilize ICU occupancy across different hospital types.
Variable
Average number of patients currently hospitalized in an adult inpatient bed who have laboratory-confirmed or suspected COVID19, including those in observation beds reported during the 7-day period.
# Calculate weekly average of hospitalized COVID-19 patients
weekly_data <- data %>%
group_by(hospital_subtype, collection_week) %>%
summarise(
avg_patients = round(mean(total_adult_patients_hospitalized_confirmed_and_suspected_covid_7_day_avg, na.rm = TRUE)
), .groups = 'drop') %>%
filter(!is.nan(avg_patients), collection_week > as.Date("2020-01-01"))
str(weekly_data)
## tibble [855 × 3] (S3: tbl_df/tbl/data.frame)
## $ hospital_subtype: chr [1:855] "Childrens Hospitals" "Childrens Hospitals" "Childrens Hospitals" "Childrens Hospitals" ...
## $ collection_week : Date[1:855], format: "2020-03-29" "2020-04-05" ...
## $ avg_patients : num [1:855] 6 4 2 2 0 0 0 1 0 0 ...
# Plot the average number of hospitalized COVID-19 patients over time
ggplot(weekly_data, aes(x = collection_week, y = avg_patients)) +
geom_line(color = "blue") +
labs(title = "Average Number of Hospitalized COVID-19 Adult Patients Over Time",
y = "Weekly Average Patients") +
theme_minimal() +
scale_x_date(date_breaks = "6 months", date_labels = "%b %Y") +
theme(plot.title = element_text(hjust = 0.5),
legend.position = "top",
axis.text.x = element_text(angle = 45, hjust = 1))
This chart represents the weekly average number of hospitalized COVID-19 adult patients over time, spanning from 2020 to 2024. Here’s an interpretation:
Initial Peaks (2020-2021):
Mid-2021 to Early 2022:
Late 2022 to 2024:
General Trend:
The pattern shows cyclical waves of hospitalizations, with sharp peaks followed by declines. The waves are consistent with known patterns of COVID-19 infection surges during colder months or as new variants emerged.
The trend appears to be moderating over time, with peaks becoming smaller, likely reflecting the effects of vaccinations, improved treatments, and perhaps milder variants.
Overall, the chart reflects the ebb and flow of COVID-19 hospitalizations over several years, with the highest surges occurring in 2021 and early 2022, followed by a gradual decline.
# Plot the average number of hospitalized COVID-19 patients over time per hospital subtype
ggplot(weekly_data, aes(x = collection_week, y = avg_patients)) +
geom_line(color = "blue") +
labs(title = "Avg Hospitalized COVID-19 Adult Patients Over Time per Hospital Subtype",
y = "Weekly Average Patients") +
theme_minimal() +
scale_x_date(date_breaks = "6 months", date_labels = "%b %Y") +
theme(plot.title = element_text(hjust = 0.5),
legend.position = "top",
axis.text.x = element_text(angle = 45, hjust = 1))+
facet_wrap(~hospital_subtype)
Children’s Hospitals:
Critical Access Hospitals:
Long-Term Hospitals:
Short-Term Hospitals:
Overall Observations:
The burden of adult COVID-19 hospitalizations has been highest in Short-Term Hospitals, while other hospital subtypes have seen very few adult COVID-19 patients.
There is a clear decline in hospitalizations after 2022, with the peaks during the height of the pandemic’s waves.
# Average hospitalized COVID-19 patients by state
state_data <- data %>%
group_by(state) %>%
summarise(
avg_patients = round(mean(total_adult_patients_hospitalized_confirmed_and_suspected_covid_7_day_avg, na.rm = TRUE)))
str(state_data)
## tibble [56 × 2] (S3: tbl_df/tbl/data.frame)
## $ state : chr [1:56] "AK" "AL" "AR" "AS" ...
## $ avg_patients: num [1:56] 4 12 9 1 19 17 8 17 17 26 ...
#Chose the Highest and Lowest 10 states
lowest_states <- state_data %>%
arrange(avg_patients) %>%
slice_head(n = 10)
highest_states <- state_data %>%
arrange(desc(avg_patients)) %>%
slice_head(n = 10)
result_states <- bind_rows(
mutate(highest_states, category = "Highest"),
mutate(lowest_states, category = "Lowest")
)
# Plot
ggplot(result_states, aes(x = reorder(state, avg_patients), y = avg_patients, fill = category)) +
geom_bar(stat = "identity") +
labs(title = "Weekly Average Adult COVID-19 Hospitalizations by State",
x = "State",
y = "Weekly Average Patients") +
coord_flip() +
theme_minimal() +
scale_fill_manual(values = c("Highest" = "pink", "Lowest" = "steelblue"))
Variable
Sum of number of patients age 18-19 who were admitted to an adult inpatient bed on the previous calendar day who had confirmed COVID-19 at the time of admission reported in the 7-day period.
Sum of number of patients age unknown who were admitted to an adult inpatient bed on the previous calendar day who had confirmed COVID-19 at the time of admission reported in 7-day period.
Sum of number of pediatric patients who were admitted to an inpatient bed, including NICU, PICU, newborn, and nursery, on the previous calendar day who had confirmed COVID-19 at the time of admission.
# Average admissions by age group
admissions_by_age <- data %>%
group_by(hospital_subtype, collection_week) %>%
summarise(
avg_admissions_18_29 = mean(previous_day_admission_adult_covid_confirmed_18.19_7_day_sum, na.rm = TRUE),
avg_admissions_30_39 = mean(previous_day_admission_adult_covid_confirmed_30.39_7_day_sum, na.rm = TRUE),
avg_admissions_40_49 = mean(previous_day_admission_adult_covid_confirmed_40.49_7_day_sum, na.rm = TRUE),
avg_admissions_50_59 = mean(previous_day_admission_adult_covid_confirmed_50.59_7_day_sum, na.rm = TRUE),
avg_admissions_60_69 = mean(previous_day_admission_adult_covid_confirmed_60.69_7_day_sum, na.rm = TRUE),
avg_admissions_70_79 = mean(previous_day_admission_adult_covid_confirmed_70.79_7_day_sum, na.rm = TRUE),
avg_admissions_80_plus = mean(previous_day_admission_adult_covid_confirmed_80._7_day_sum, na.rm = TRUE),
avg_admissions_unknown = mean(previous_day_admission_adult_covid_confirmed_unknown_7_day_sum, na.rm = TRUE),
avg_admissions_pediatric = mean(previous_day_admission_pediatric_covid_confirmed_7_day_sum, na.rm = TRUE),
.groups = "drop"
)
# Reshape for easier plotting
admissions_long <- admissions_by_age %>%
pivot_longer(cols = -c(hospital_subtype, collection_week),
names_to = "age_group",
values_to = "avg_admissions") %>%
filter(!is.na(avg_admissions))
# Define shorter labels for the age groups
short_labels <- c(
"avg_admissions_18_29" = "18-29",
"avg_admissions_30_39" = "30-39",
"avg_admissions_40_49" = "40-49",
"avg_admissions_50_59" = "50-59",
"avg_admissions_60_69" = "60-69",
"avg_admissions_70_79" = "70-79",
"avg_admissions_80_plus" = "80+",
"avg_admissions_unknown" = "Unknown",
"avg_admissions_pediatric" = "Pediatric"
)
# Define a more extensive custom color palette (adjust to match your age groups)
color_palette <- c("darkred", "red", "orange", "lightgreen", "lightblue", "gray", "blue", "purple", "steelblue") # Add more colors as needed
# Plot admissions by age group with custom colors
ggplot(admissions_long, aes(x = collection_week, y = avg_admissions, color = age_group)) +
geom_line(linewidth = 1.2) + # Use linewidth instead of size
scale_color_manual(values = color_palette, labels = short_labels) + # Use custom colors
labs(title = "Avg COVID-19 Admissions by Age Group Over Time Per Hospital Subtype",
y = "Average Admissions") +
theme_minimal() +
scale_x_date(date_breaks = "6 months", date_labels = "%b %Y") +
theme(plot.title = element_text(hjust = 0.5),
legend.position = "top",
axis.text.x = element_text(angle = 45, hjust = 1)) +
facet_wrap(~hospital_subtype)
Children’s Hospitals:
The pediatric group (purple) dominates admissions, peaking significantly in early 2022 with over 20 average admissions. The trend shows a rise and fall in 2021 and 2022, followed by a gradual decline in 2023 and 2024.
Other age groups (e.g., 18-29, 30-39) have minimal to no admissions recorded.
Critical Access Hospitals:
Long Term Hospitals:
Short Term Hospitals:
The 80+ age group (blue) again stands out, peaking multiple times around 2021-2022.
Younger age groups (e.g., 60-69 and 70-79) also show minor peaks, indicating that elderly patients were more commonly hospitalized in short-term facilities.
In summary, pediatric admissions were particularly high in Children’s Hospitals, while the elderly population (80+) had the most notable admissions across Short Term and Critical Access Hospitals.
Variable
Average of total number of staffed inpatient beds that are occupied reported during the 7-day period.
Average of reported patients currently hospitalized in an inpatient bed who have suspected or confirmed COVID-19 reported during the 7-day period.
# Analyze hospital subtype impact
hospital_type <- data %>%
group_by(hospital_subtype, year = year(collection_week)) %>%
summarise(
avg_beds_used = mean(inpatient_beds_used_7_day_avg, na.rm = TRUE),
avg_beds_used_covid = mean(inpatient_beds_used_covid_7_day_avg, na.rm = TRUE)
, .groups = "drop")
# Pivoting
hospital_type_long <- hospital_type %>%
pivot_longer(cols = c(avg_beds_used, avg_beds_used_covid), names_to = "Legend", values_to = "value")
# Plot impact of hospital type with 100% stacked column
ggplot(hospital_type_long, aes(x = hospital_subtype, y = value, fill = Legend)) +
geom_bar(stat = "identity", position = "fill") + # Use "fill" for 100% stacked column
labs(title = "Proportion of Staffed Bed Usage by COVID-19 Patients per Year",
x = "Hospital Type",
y = "Proportion of Avg Weekly Patients") + # Adjust y-axis label to reflect proportions
facet_wrap(~year) +
scale_y_continuous(labels = scales::percent_format()) + # Show percentages on y-axis
theme(axis.text.x = element_text(angle = 90, vjust = 0.5, hjust = 1)) # Set angle to 90 for vertical labels
Variable
Average of reported patients currently hospitalized in an inpatient bed who have suspected or confirmed COVID-19 reported during the 7-day period
Average of total number of staffed inpatient beds that are occupied reported during the 7-day period.
Average number of patients currently hospitalized in a designated adult ICU bed who have suspected or laboratory-confirmed COVID-19 reported in the 7-day period.
Average of total number of staffed inpatient adult ICU beds that are occupied reported in the 7-day period.
inpatent_icu_percent <- data %>%
group_by(hospital_subtype, collection_week) %>%
summarise(
percent_inpatient_covid_beds = mean(
ifelse(inpatient_beds_used_7_day_avg == 0, NA, (inpatient_beds_used_covid_7_day_avg / inpatient_beds_used_7_day_avg) * 100),
na.rm = TRUE
),
percent_icu_covid_beds = mean(
ifelse(staffed_adult_icu_bed_occupancy_7_day_avg == 0, NA, (staffed_icu_adult_patients_confirmed_and_suspected_covid_7_day_avg / staffed_adult_icu_bed_occupancy_7_day_avg) * 100),
na.rm = TRUE
),
.groups = "drop" # Add this line to suppress the message
)
# Reshape the data to long format
inpatient_icu_percent_long <- inpatent_icu_percent %>%
pivot_longer(cols = c(percent_inpatient_covid_beds, percent_icu_covid_beds),
names_to = "metric",
values_to = "percentage")%>%
filter(!is.na(percentage))
# Create the line plot
ggplot(inpatient_icu_percent_long, aes(x = collection_week, y = percentage, color = metric)) +
geom_line(linewidth = 1) +
labs(
title = "% of Inpatient & ICU Beds Used by COVID-19 Patients by Hospital Subtype",
x = "Year",
y = "Percentage (%)",
color = "Metric"
) +
theme_minimal() +
scale_x_date(date_breaks = "6 months", date_labels = "%b %Y") +
theme(
legend.position = "top",
axis.text.x = element_text(angle = 45, hjust = 1)
) + facet_wrap(~hospital_subtype)
The graph presents the percentage of inpatient and ICU beds used by COVID-19 patients across four different types of hospitals: Children’s Hospitals, Critical Access Hospitals, Long Term Care Hospitals, and Short Term Hospitals. The data spans from 2020 through 2024, and key insights include:
Critical Access Hospitals: There is a significant peak in ICU bed usage (red line) around 2021, where the percentage of ICU beds dedicated to COVID-19 patients exceeded 100%, especially in 2022. This suggests that the critical care capacity was heavily burdened during that period, likely due to a surge in severe COVID-19 cases.
Short Term and Long Term Hospitals: Both show relatively consistent usage over time, with occasional fluctuations. Short Term Hospitals had a slightly higher ICU bed usage during certain periods, particularly in early 2021 and 2022, but overall, the percentages remained below 50%. Long Term Hospitals show a low percentage of both ICU and inpatient beds used, likely because these facilities may not specialize in acute care.
Children’s Hospitals: COVID-19 bed usage was minimal here, with little to no strain observed in both ICU and inpatient capacities. This could reflect the lower hospitalization rates for children during the pandemic.
Inpatient vs. ICU Beds: Across all hospital types, inpatient bed usage (blue line) remained relatively stable and low compared to ICU bed usage. ICU beds seem to have experienced more stress during COVID-19 peaks, particularly in Critical Access and Short Term hospitals.
In summary, Critical Access and Short Term hospitals faced the greatest strain in terms of COVID-19 patient care, especially in ICUs, while Children’s and Long Term hospitals were less impacted. This reflects the varying demand placed on different healthcare facilities during the pandemic based on their patient demographics and care capabilities.
Variable
The current maximum number of healthcare personnel who have received at least one dose of COVID-19 vaccination that is administered in a multi-dose series. This field is meant to represent those who have begun but not completed the vaccination process. Does not include those who received a single-dose vaccine in this field. (This field is only reported on Wednesdays of a given week)
The current number of healthcare personnel who have received a complete series of a COVID-19vaccination.Includes those who have received all doses in a multi-dose series as well as those who received a single-dose vaccine. (This field is only reported on Wednesdays of a given week)
The number reported of healthcare personnel who have not yet received a single vaccine dose. (This field is only reported on Wednesdays of a given week)
The number of patients who received the first dose of a COVID-19 vaccine that is administered in a multi-dose series. (This field is only reported on Wednesdays of a given week)
The number of patients who received the final dose in a COVID-19 vaccination series. (This field is only reported on Wednesdays of a given week)
#Preparing the data for vaccination status of healthcare personnel and patients
avg_weekly_vaccination_data <- data %>%
group_by(hospital_subtype, collection_week) %>%
summarise(
avg_weekly_personnel_doses_one_administered = mean(
total_personnel_covid_vaccinated_doses_one_7_day, na.rm = TRUE),
avg_weekly_personnel_doses_all_administered = mean(
total_personnel_covid_vaccinated_doses_all_7_day, na.rm = TRUE),
avg_weekly_personnel_doses_none_administered = mean(
total_personnel_covid_vaccinated_doses_none_7_day, na.rm = TRUE),
avg_weekly_patient_doses_one_administered = mean(
previous_week_patients_covid_vaccinated_doses_one_7_day, na.rm = TRUE),
avg_weekly_patient_doses_all_administered = mean(
previous_week_patients_covid_vaccinated_doses_all_7_day, na.rm = TRUE),
.groups = 'drop'
) %>%
filter(
!is.na(avg_weekly_personnel_doses_one_administered) &
!is.na(avg_weekly_personnel_doses_all_administered) &
!is.na(avg_weekly_patient_doses_one_administered) &
!is.na(avg_weekly_patient_doses_all_administered) &
!is.na(avg_weekly_personnel_doses_none_administered)
)
# Pivoting the data longer for easier plotting
avg_weekly_vaccination_data_long <- avg_weekly_vaccination_data %>%
pivot_longer(cols = starts_with("avg_weekly"),
names_to = "vaccination_type",
values_to = "avg_doses")
#For healthcare personnel
avg_weekly_vaccination_data_long_personnel <- avg_weekly_vaccination_data_long %>%
filter(vaccination_type %in% c("avg_weekly_personnel_doses_one_administered",
"avg_weekly_personnel_doses_all_administered",
"avg_weekly_personnel_doses_none_administered"))
#For Patients
avg_weekly_vaccination_data_long_patients <- avg_weekly_vaccination_data_long %>%
filter(vaccination_type %in% c("avg_weekly_patient_doses_one_administered",
"avg_weekly_patient_doses_all_administered"))
Weekly Average Vaccination administered to Healthcare Personnel by Hospital Subtype
# Creating the line graph
ggplot(avg_weekly_vaccination_data_long_personnel, aes(x = collection_week, y = avg_doses, color = vaccination_type)) +
geom_line(linewidth = 1) +
labs(title = "Avg Weekly Vacc_Doses Administered to Health Personnel by Hospital Subtype",
x = "Collection Week",
y = "Average Doses Administered (Weekly)",
color = "Vaccination Type") +
scale_color_manual(values = c("blue", "darkgreen", "red"),
labels = c("One Dose (Personnel)", "All Doses (Personnel)", "None (Personnel)"),
guide = guide_legend(nrow = 2)) + # Set legend to 2 rows
theme_minimal() +
scale_x_date(date_breaks = "6 months", date_labels = "%b %Y") +
theme(
legend.position = "top",
legend.box = "horizontal",
legend.title = element_blank(), # Remove legend title if not needed
legend.text = element_text(size = 10), # Adjust the size of the text
axis.text.x = element_text(angle = 45, hjust = 1)
) +
facet_wrap(~ hospital_subtype)
Children’s Hospitals:
They exhibit a consistent and relatively high number of doses administered to health personnel, particularly with a noticeable increase in both one-dose and no-dose vaccinations (the red and blue lines).
The red line (“None Doses Administered”) is especially notable, indicating a significant proportion of health personnel were reported to have not received any dose in recent weeks.
Critical Access Hospitals:
Long-Term Care Facilities:
Short-Term Care Hospitals:
Weekly Average Vaccination administered to Patients by Hospital Subtype
# Creating the line graph
ggplot(avg_weekly_vaccination_data_long_patients, aes(x = collection_week, y = avg_doses, color = vaccination_type)) +
geom_line(linewidth = 1) +
labs(title = "Avg Weekly Vacc_Doses Administered to Patients by Hospital Subtype",
x = "Collection Week",
y = "Average Doses Administered (Weekly)",
color = "Vaccination Type") +
scale_color_manual(values = c("blue", "darkgreen"),
labels = c("One Dose (Patient)", "All Doses (Patient)"),
guide = guide_legend(nrow = 2)) + # Set legend to 2 rows
theme_minimal() +
scale_x_date(date_breaks = "6 months", date_labels = "%b %Y") +
theme(
legend.position = "top",
legend.box = "horizontal",
legend.title = element_blank(), # Remove legend title if not needed
legend.text = element_text(size = 10), # Adjust the size of the text
axis.text.x = element_text(angle = 45, hjust = 1)
) +
facet_wrap(~ hospital_subtype)
Children’s Hospitals:
Critical Access Hospitals:
Long Term Hospitals:
Short Term Hospitals:
General Trend
Vaccination rates spiked early on during the vaccine rollout (early to mid-2021), reflecting the initial surge in vaccine distribution.
After this initial spike, the number of doses administered declined significantly across all hospital subtypes, with particularly sharp declines in Children’s, Critical Access, and Long Term hospitals.
Short Term hospitals showed more sustained vaccination efforts over a longer period compared to the others.
By 2023, all subtypes exhibit very low or near-zero vaccination activity.
vaccination_compare <- avg_weekly_vaccination_data %>%
mutate(vaccination_ratio = (avg_weekly_personnel_doses_all_administered + avg_weekly_personnel_doses_one_administered) /
(avg_weekly_patient_doses_all_administered + avg_weekly_patient_doses_one_administered))
# Visualizing the ratio
ggplot(vaccination_compare, aes(x = collection_week, y = vaccination_ratio)) +
geom_line(color = "purple") +
labs(title = "Ratio of Personnel to Patient COVID Vaccination Doses",
y = "Ratio",
x = "Collection Week") +
theme_minimal() +
scale_x_date(date_breaks = "6 months", date_labels = "%b %Y") +
theme(
legend.position = "top",
legend.box = "horizontal",
legend.title = element_blank(), # Remove legend title if not needed
legend.text = element_text(size = 10), # Adjust the size of the text
axis.text.x = element_text(angle = 45, hjust = 1)
) +
facet_wrap(~hospital_subtype)
Children’s Hospitals:
There is a sharp drop early in the vaccination period (early 2021) followed by a relatively flat period.
Beginning in late 2022, the ratio of personnel to patient doses sharply rises, leveling off at a very high ratio (~7500) and remaining consistent through 2023 and into 2024.
This likely indicates a large number of personnel vaccinations compared to patient vaccinations, especially after the initial push of patient vaccines declined.
Critical Access Hospitals:
The ratio remains consistently flat, suggesting little or no significant difference between personnel and patient vaccinations throughout the time period.
This may suggest relatively balanced vaccination rates or low overall activity.
Long Term Hospitals:
Short Term Hospitals:
The ratio is very low early in the vaccination campaign but begins to show small fluctuations starting in late 2022 and continuing into 2023, though still much lower compared to Children’s Hospitals.
There are mild increases in the ratio, which may indicate a higher proportion of personnel vaccinations compared to patient doses during this period.
Variable
total_patients_hospitalized_confirmed_influenza_7_day_avg
Average number of patients (all ages) currently hospitalized in an inpatient bed who have laboratory-confirmed influenza. Including those in observation beds reported in the 7-day period.
# Calculate weekly average of hospitalized influenza patients per hospital subtype
inf_weekly_data <- data %>%
group_by(hospital_subtype, collection_week) %>%
summarise(
avg_patients = round(mean(total_patients_hospitalized_confirmed_influenza_7_day_avg, na.rm = TRUE)
), .groups = 'drop') %>%
filter(!is.nan(avg_patients), collection_week > as.Date("2020-01-01"))
str(inf_weekly_data)
## tibble [803 × 3] (S3: tbl_df/tbl/data.frame)
## $ hospital_subtype: chr [1:803] "Childrens Hospitals" "Childrens Hospitals" "Childrens Hospitals" "Childrens Hospitals" ...
## $ collection_week : Date[1:803], format: "2020-09-20" "2020-10-11" ...
## $ avg_patients : num [1:803] 0 0 0 0 0 0 0 0 0 0 ...
# Plot the average number of hospitalized influenza patients over time
ggplot(inf_weekly_data, aes(x = collection_week, y = avg_patients)) +
geom_line(color = "black") +
labs(title = "Average Number of Hospitalized influenza Patients Over Time",
y = "Weekly Average Patients") +
theme_minimal() +
scale_x_date(date_breaks = "6 months", date_labels = "%b %Y") +
theme(plot.title = element_text(hjust = 0.5),
legend.position = "top",
axis.text.x = element_text(angle = 45, hjust = 1))
July 2020 to January 2022: No reported influenza hospitalizations during this period, as indicated by the flat line at zero.
Mid-2022: A brief spike in hospitalized influenza patients is visible but relatively small.
Late 2022 to Early 2023 (Seasonal Peak): A significant surge in hospitalizations, reaching a peak in early 2023 before gradually decreasing.
Late 2023 to Early 2024 (Another Peak): Another wave of hospitalizations occurs, peaking similarly to the previous flu season in 2023, though potentially slightly lower in magnitude.
Mid-2024: A decline in hospitalizations follows, indicating the end of the second observed flu season.
The graph shows a clear seasonal pattern of influenza, with peaks in late fall/winter and declines by the spring, typical of influenza virus behavior.
# Plot the average number of hospitalized influenza patients over time per hospital subtype
ggplot(inf_weekly_data, aes(x = collection_week, y = avg_patients)) +
geom_line(color = "black") +
labs(title = "Avg Hospitalized Influenza Patients Over Time per Hospital Subtype",
y = "Weekly Average Patients",
x = "Collection Week") +
theme_minimal() +
scale_x_date(date_breaks = "6 months", date_labels = "%b %Y") +
facet_wrap(~hospital_subtype) +
theme(axis.text.x = element_text(angle = 45, hjust = 1))
Children’s Hospitals:
Significant spikes are seen during two flu seasons: one around early 2023 and another during late 2023 to early 2024.
The peak in early 2023 reaches over 6 average patients per week, with the second peak slightly lower.
No cases are recorded before mid-2022, suggesting the flu was under control during that time.
Critical Access Hospitals:
Long-Term Care:
Short-Term Care:
Like Children’s Hospitals, Short-Term Care also experiences two distinct flu seasons, with a peak around early 2023 and another in late 2023/early 2024.
The peaks in this category are lower in magnitude compared to Children’s Hospitals, with fewer than 5 patients per week at the highest point.
Conclusion:
Children’s and Short-Term Care Hospitals seem to be the most affected by influenza during flu seasons (early 2023 and late 2023/early 2024).
Variables
Average of patients (all ages) currently hospitalized in a designated ICU bed with laboratory-confirmed influenza in the 7-day period.
Average number of total number of staffed inpatient ICU beds that are occupied reported in the 7-day period.
inf_inpatient_icu_percent <- data %>%
group_by(hospital_subtype, collection_week) %>%
summarise(
percent_icu_inf_beds = mean(
ifelse(icu_beds_used_7_day_avg == 0, NA, (icu_patients_confirmed_influenza_7_day_avg / icu_beds_used_7_day_avg) * 100),
na.rm = TRUE
),
.groups = "drop"
) %>% filter (!is.na(percent_icu_inf_beds))
# Create the line plot
ggplot(inf_inpatient_icu_percent, aes(x = collection_week, y = percent_icu_inf_beds)) +
geom_line(linewidth = 0.5, color = "black") +
labs(
title = "% of Inpatient & ICU Beds Used by Influenza Patients by Hospital Subtype",
x = "Year",
y = "Percentage (%)"
) +
theme_minimal() +
scale_x_date(date_breaks = "6 months", date_labels = "%b %Y") +
theme(
legend.position = "top",
axis.text.x = element_text(angle = 45, hjust = 1)
) + facet_wrap(~hospital_subtype)
Children’s Hospitals:
There are two distinct peaks, one around late 2020 and another in late 2022, reaching about 3% bed usage for influenza patients.
After these peaks, bed usage drops sharply to zero and stays relatively flat with minor fluctuations through 2023 and 2024.
Critical Access Hospitals:
Long Term Hospitals:
Short Term Hospitals:
Data for Dashboard
# Separate latitude and longitude
data2 <- data %>%
mutate(
# Extract longitude (first number after POINT)
longitude = as.numeric(str_extract(geocoded_hospital_address, "-?\\d+\\.\\d+")),
# Extract latitude (second number after POINT)
latitude = as.numeric(str_extract(geocoded_hospital_address, "-?\\d+\\.\\d+(?=\\))"))
)
data_icu_inpatient_covid <- data2 %>%
mutate(
percent_inpatient_covid_beds = ifelse(
inpatient_beds_used_7_day_avg == 0,
NA,
(inpatient_beds_used_covid_7_day_avg / inpatient_beds_used_7_day_avg) * 100
),
percent_icu_covid_beds = ifelse(
staffed_adult_icu_bed_occupancy_7_day_avg == 0,
NA,
(staffed_icu_adult_patients_confirmed_and_suspected_covid_7_day_avg / staffed_adult_icu_bed_occupancy_7_day_avg) * 100
)
)
test <- data_icu_inpatient_covid %>% select(collection_week, percent_inpatient_covid_beds, percent_icu_covid_beds, longitude, latitude)
# Filtering rows where either percent_inpatient_covid_beds or percent_icu_covid_beds is not NA
test <- test %>%
filter(!is.na(percent_inpatient_covid_beds) | !is.na(percent_icu_covid_beds))
# Convert to numeric, coerce non-numeric values to NA
test$longitude <- as.numeric(test$longitude)
test$latitude <- as.numeric(test$latitude)
# Merge vaccination data with weekly average of hospitalized COVID-19 patients
merged_data <- weekly_data %>%
left_join(avg_weekly_vaccination_data, by = c("hospital_subtype", "collection_week")) %>%
drop_na()
# Ensure hospital_subtype is a factor
merged_data$hospital_subtype <- as.factor(merged_data$hospital_subtype)
str(merged_data)
## tibble [689 × 8] (S3: tbl_df/tbl/data.frame)
## $ hospital_subtype : Factor w/ 4 levels "Childrens Hospitals",..: 1 1 1 1 1 1 1 1 1 1 ...
## $ collection_week : Date[1:689], format: "2021-01-10" "2021-01-17" ...
## $ avg_patients : num [1:689] 1 1 1 1 1 0 1 0 0 0 ...
## $ avg_weekly_personnel_doses_one_administered : num [1:689] 513 1218 1375 1541 1349 ...
## $ avg_weekly_personnel_doses_all_administered : num [1:689] 929 519 697 1011 1141 ...
## $ avg_weekly_personnel_doses_none_administered: num [1:689] 384 915 1101 1221 1170 ...
## $ avg_weekly_patient_doses_one_administered : num [1:689] 0.167 8.659 147.074 168.517 150.581 ...
## $ avg_weekly_patient_doses_all_administered : num [1:689] 0 2.73 39.63 182.82 17.7 ...
library(caret)
## Loading required package: lattice
##
## Attaching package: 'caret'
## The following object is masked from 'package:purrr':
##
## lift
# Train-test split (70% training, 30% testing)
set.seed(371)
train_index <- createDataPartition(merged_data$avg_patients, p = 0.7, list = FALSE)
train_data <- merged_data[train_index, ]
test_data <- merged_data[-train_index, ]
# Train linear regression model including hospital_subtype
model <- lm(avg_patients ~ hospital_subtype +
avg_weekly_personnel_doses_one_administered +
avg_weekly_personnel_doses_all_administered +
avg_weekly_patient_doses_one_administered +
avg_weekly_patient_doses_all_administered, data = train_data)
# Model summary
summary(model)
##
## Call:
## lm(formula = avg_patients ~ hospital_subtype + avg_weekly_personnel_doses_one_administered +
## avg_weekly_personnel_doses_all_administered + avg_weekly_patient_doses_one_administered +
## avg_weekly_patient_doses_all_administered, data = train_data)
##
## Residuals:
## Min 1Q Median 3Q Max
## -15.0957 -0.8758 -0.0675 0.4344 30.0040
##
## Coefficients:
## Estimate Std. Error t value
## (Intercept) -0.1065270 0.7996070 -0.133
## hospital_subtypeCritical Access Hospitals 0.0725926 0.8310922 0.087
## hospital_subtypeLong Term 0.9662296 0.8380373 1.153
## hospital_subtypeShort Term 14.1024975 0.7454990 18.917
## avg_weekly_personnel_doses_one_administered 0.0008411 0.0009947 0.846
## avg_weekly_personnel_doses_all_administered -0.0007781 0.0009228 -0.843
## avg_weekly_patient_doses_one_administered 0.0227548 0.0034182 6.657
## avg_weekly_patient_doses_all_administered -0.0101100 0.0033595 -3.009
## Pr(>|t|)
## (Intercept) 0.89407
## hospital_subtypeCritical Access Hospitals 0.93043
## hospital_subtypeLong Term 0.24950
## hospital_subtypeShort Term < 2e-16 ***
## avg_weekly_personnel_doses_one_administered 0.39819
## avg_weekly_personnel_doses_all_administered 0.39956
## avg_weekly_patient_doses_one_administered 7.7e-11 ***
## avg_weekly_patient_doses_all_administered 0.00276 **
## ---
## Signif. codes: 0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
##
## Residual standard error: 3.852 on 477 degrees of freedom
## Multiple R-squared: 0.7305, Adjusted R-squared: 0.7265
## F-statistic: 184.7 on 7 and 477 DF, p-value: < 2.2e-16
Short Term: Estimate = 14.1025, p-value < 2e-16 (highly significant). This indicates that, on average, short-term hospitals have 14.1025 more patients compared to the reference category (Children Hospitals).
avg_weekly_patient_doses_one_administered: Estimate = 0.0228, p-value = 7.7e-11 (highly significant). Each additional dose administered to patients is associated with an increase in the average number of patients.
avg_weekly_patient_doses_all_administered: Estimate = -0.0101, p-value = 0.00276 (significant). This suggests that more doses administered (overall) may be associated with a decrease in average patients, which could indicate a negative correlation in specific contexts.
Overall, the model suggests that
avg_weekly_patient_doses_one_administered, avg_weekly_patient_doses_all_administered
and hospital_subtypeShort Term
are significant predictors
of avg_patients
. The model explains a substantial portion
of the variance in the outcome, but some predictors are not
statistically significant, indicating they may not contribute
meaningfully to the model.
# Predict on test data
predictions <- predict(model, newdata = test_data)
# Calculate RMSE and R-squared
rmse <- sqrt(mean((predictions - test_data$avg_patients)^2))
r_squared <- 1 - (sum((test_data$avg_patients - predictions)^2) / sum((test_data$avg_patients - mean(test_data$avg_patients))^2))
cat("RMSE:", rmse, "\n")
## RMSE: 5.548382
cat("R-squared:", r_squared, "\n")
## R-squared: 0.5990908
Moderate Predictive Power: The R-squared value indicates that the model has a reasonable amount of predictive power, but there is still a significant portion of variability (about 40%) unexplained by the model.
Prediction Accuracy: The RMSE provides a measure
of how accurate the model’s predictions are. An RMSE of approximately
5.55 means that, on average, the model’s predictions
deviate from the actual values of avg_patients
by about
5.55 patients.
# Step 1: Create a data frame with actual and predicted values
results <- data.frame(
Actual = test_data$avg_patients,
Predicted = predictions
)
# Step 2: Plot the prediction graph
ggplot(results, aes(x = Actual, y = Predicted)) +
geom_point(alpha = 0.6) + # Scatter plot of actual vs. predicted
geom_abline(slope = 1, intercept = 0, color = "red", linetype = "dashed") + # 45-degree line
labs(title = "Actual vs. Predicted Patients",
x = "Actual Patients",
y = "Predicted Patients") +
theme_minimal() +
xlim(min(results$Actual, results$Predicted), max(results$Actual, results$Predicted)) + # Set limits for x-axis
ylim(min(results$Actual, results$Predicted), max(results$Actual, results$Predicted)) # Set limits for y-axis
Lower Patient Counts : The model’s predictions are more accurate, with many points close to the diagonal red line (ideal prediction line).
Higher Patient Counts: Predictions become less accurate, with the model consistently underestimating the actual number of patients. This can be seen as points move farther away from the red line, particularly for actual patient counts above 20.