Project:
First of all, let me just say that I am not familiar with medical data, which made this project more interesting to complete, although, without certain data (such as cost variables), it also made it more challenging, as I wanted to find potential solutions with only the data provided. The general goal of this project was to find out how to reduce the Emergency Department Overcrowding in a particular hospital. The specific goal was to discover whether the Emergency Department needed more beds, the Inpatient Care needed more beds or if there were any other factors that could help reduce overcrowding issues, such as staff increases. The data set provided has three (3) sheets which have information about the transfer of patients in an Emergency Department. All the privacy information was removed from the data sets. As noted previously, there was no cost data included, therefore the solutions had to be devised using only the available data, limiting the scope of the project.Idea:
The idea was to take an existing data set and apply it to the statistical programming language R, then to create a visual presentation about the findings. To do this, I would use both regression and multi-regression model building in R. My goal is to build a model that will help identify the causation of the overcrowding using the provided data set. Once the problem is detected, it would then be possible to develop a plan that will improve the problem.Tools:
- R Studio
- Microsoft Excel
- Microsoft PowerPoint
- Google Image Search
Data:
The data set provided was from a working hospital's Emergency Department transfer in a one-month period. The name of the specific hospital was withheld for privacy reasons. The period given for the data set was May 2016. The data was provided within three (3) sheets, and each sheet provided different insight regarding the overcrowding situation.Datasets
- NEDOC Score (Microsoft Excel File) (CSV file) (Google Sheets)
- Components (Microsoft Excel File) (CSV file) (Google Sheets)
- Patient Move (Microsoft Excel File) (CSV file) (Google Sheets)
Data Cleaning:
In order to clean the data, you first want to review it all to gain an understanding of what the various data means. For this project, this required some basic research in understanding what Emergency Department overcrowding is defined as. A commonly used term which was addressed in the data set was the 'NEDOC Score,' so research also needed to be done to gain an understanding of this and other necessary terms. I summarize this briefly below:What is Emergency Department Overcrowding:
Emergency Department Overcrowding, defined by Mohammad H Yarmohammadian, Fatemeh Rezaei, Abbas Haghshenas, and Nahid Tavakoli in an article written for the Journal of Research in Medical Sciences, is "the situation in which ED function is impeded primarily because of the excessive number of patients waiting to be seen, undergoing assessment and treatment, or waiting for departure comparing to the physical or staffing capacity of the ED" (2017).
To summarize for us non-medical folks, the definition of 'Emergency Department Overcrowding' is anything that causes a larger-than-normal number of patients to be waiting for Emergency Department care- including check-in, receiving care, and check-out of the emergency department. For example, when you go to the emergency room with a broken arm and end up waiting two hours to be seen, an hour in the doctors 'room' to get the x-ray and cast, and another hour just to get the paperwork to go home- this is considered ED overcrowding.
With this understanding in mind, our goal for this analysis is to identify what causes overcrowding in the Emergency Department and potential ways to address the issue effectively. On a side note, overcrowding in the Emergency Department is a worldwide epidemic. It is worth a search on Google for some interesting background knowledge if you are curious and have a few spare moments.
What does NEDOC Score Mean?
A commonly used term in the data set is the 'NEDOC Score.' This is a foreign term for anyone that has no technical background in the medical field. So back to Google, we go! (Side note: it is important to make sure that any research is done through Google and other sources are peer-reviewed and reputable. For example, a Men´s Health article would not be a sufficient source for this research, whereas the Journal of Community Medicine & Health Education would be a perfect resource.)
The NEDOC Score is a standardized way to measure the severity of Emergency Department Overcrowding. Essentially the level of overcrowding is scored based on specific NEDOC criteria, giving an overall picture of the severity of the problem. Here is a great article to give a basic understanding of the NEDOC Score and it's various components. The PDF file breaks it down into different types of category labels. A NEDOC Score over 100 is classified as an Overcrowded status.
Combining Datasets
Now that we have a basic level of understanding in regards to the data and necessary terms, we can move on to the data cleaning of the three (3) sheets. This process was completed in Microsoft Excel. Excel makes the process quick and easy for inputting into R. The goal for cleaning this particular was to get all of the data into one sheet. In order to complete this task, the data for each individual sheet needs to match. Unfortunately, the data use different time intervals on the various sheets- so it was important to adjust the data according to a standard time measurement without affecting the integrity of the data. The time intervals on the Components sheet is broken down into one (1) hour increments. The NEDOC Score is broken down into intervals of 15 minutes. The Patient Move sheet is broken down into the time of check-in. Since the Component sheet has the largest time intervals (one hour) and includes most of the required data, this can be the primary data sheet, and the other sheets can be adjusted to match the Components sheet.
Starting with the NEDOC Score sheet, I needed to find a way to match the data to a one-hour status. This was accomplished by combining the four (4) separate fifteen (15) minute entries into one (1) single hour entry. To do this effectively, I used the worst case entry for each of the 15 minute periods in the given hour as the representative measurement for that hour overall. It was important to use this method in particular, because taking the average measurement of all four intervals (for example) may compromise the data and overlook an overcrowded status within that period. This data was then effectively reduced to hourly intervals and assigned by the highest NEDOC Score per hour. After this was completed, the data was copied over to the Components Sheet.
The Patient Move Sheet was also combined into hourly intervals. This sheet shows the patient check-in time. The adjustment process was much more simple than with the NEDOC score and was created by counting the number of check-ins that were completed within a one-hour period. This required a quick count measurement. I chose to include this information because it had a direct tie to Emergency Department statistics in relation to overcrowding.
Assumptions
It must be noted that the data provided was quite limited. As I stated above, there was not any cost information. For example, what does the average hospital bed cost? What does the average hospital staff worker cost? While it would be nice to have more inclusive data to help strengthen the models, as an analyst you need to work with the data you were given. (In some cases, you can request more information from your data source or client, but this is not always available.) To overcome this 'lack of data' problem, an analysis needs to have some general assumptions.
The assumptions of this data:
- Only one person can check in the Emergency Department at a time.
- Patients waiting for an Inpatient Bed means that the Inpatient Care unit is full.
Process:
Is Overcrowding A Problem?The first step was to discover if an overcrowding problem even existed. This was completed by reviewing the how many times a NEDOC Score over 100 was assigned.
percentageofOvercrowded <- sum(HosData$NEDOC.Score.Over.100) / length(HosData$NEDOC.Score.Over.100)
This resulted in an overcrowded status 41.8% of the time. To find this percentage, I divided the number of hours qualifying as an overcrowded status by the total number of hours in the month. The month of May had a total of 744 hours in 2016.
It is safe to say that this hospital data set has an overcrowding problem in the Emergency Department.
Problem Areas
The second part of the analysis is to find out which factors have a statistical significance to the NEDOC score reaching over 100. (What changes are most likely to affect or reduce the NEDOC score?) This was achieved by running correlation models in R against the NEDOC score.
Code:
summary(lm(HosData$NEDOC.Score.Over.100~ HosData$X..of.ED.Pts.Waiting.IP.Bed))
Results:
Residuals:
Min 1Q Median 3Q Max
-0.7765 -0.4041 -0.2946 0.5301 0.8149
Coefficients:
Estimate Std. Error t value Pr(>|t|)
(Intercept) 0.185107 0.039851 4.645 4.02e-06 ***
HosData$X..of.ED.Pts.Waiting.IP.Bed 0.021904 0.003362 6.515 1.34e-10 ***
---
Signif. codes: 0 ‘***’ 0.001 ‘**’ 0.01 ‘*’ 0.05 ‘.’ 0.1 ‘ ’ 1
Residual standard error: 0.4803 on 742 degrees of freedom
Multiple R-squared: 0.05411, Adjusted R-squared: 0.05283
F-statistic: 42.44 on 1 and 742 DF, p-value: 1.341e-10
Code:
summary(lm(HosData$NEDOC.Score.Over.100~ HosData$X..of.ED.Pts))
Results:
Residuals:
Min 1Q Median 3Q Max
-0.90931 -0.30983 -0.03864 0.30836 1.03273
Coefficients:
Estimate Std. Error t value Pr(>|t|)
(Intercept) -0.4466515 0.0485897 -9.192 <2e-16 ***
HosData$X..of.ED.Pts 0.0142732 0.0007633 18.700 <2e-16 ***
---
Signif. codes: 0 ‘***’ 0.001 ‘**’ 0.01 ‘*’ 0.05 ‘.’ 0.1 ‘ ’ 1
Residual standard error: 0.4072 on 742 degrees of freedom
Multiple R-squared: 0.3203, Adjusted R-squared: 0.3194
F-statistic: 349.7 on 1 and 742 DF, p-value: < 2.2e-16
Code:
summary(lm(HosData$NEDOC.Score.Over.100~ HosData$X..of.ED.Pts + HosData$X..of.ED.Pts.Waiting.IP.Bed))
Results:
Residuals:
Min 1Q Median 3Q Max
-0.9080 -0.3118 -0.0374 0.3101 1.0313
Coefficients:
Estimate Std. Error t value Pr(>|t|)
(Intercept) -0.4482021 0.0502455 -8.920 <2e-16 ***
HosData$X..of.ED.Pts 0.0142318 0.0008354 17.036 <2e-16 ***
HosData$X..of.ED.Pts.Waiting.IP.Bed 0.0003818 0.0031191 0.122 0.903
---
Signif. codes: 0 ‘***’ 0.001 ‘**’ 0.01 ‘*’ 0.05 ‘.’ 0.1 ‘ ’ 1
Residual standard error: 0.4075 on 741 degrees of freedom
Multiple R-squared: 0.3203, Adjusted R-squared: 0.3185
F-statistic: 174.6 on 2 and 741 DF, p-value: < 2.2e-16
After running these correlation models, we are able to show that Inpatient Beds do not have a statistical significance in the NEDOC score. The Emergency Department has an extremely low P-value, which will conclude the analysis to focus on the Emergency Department Beds and disregard the data regarding Inpatient Beds.
In non-medical terms, when a patient checks into the Emergency Department, they are given an Emergency Department Bed. Next, patients are then either checked out and released, or checked out to an Inpatient Bed in another section of the hospital, effectively leaving the Emergency Department in either case. Therefore, the number of Inpatient Beds does not greatly affect the issue of overcrowding in the Emergency Department, since that is a separate division of the hospital. The Emergency Department Beds, however, DO have a significant impact on the NEDOC score according to these correlation models, allowing us to focus on that factor for the purposes of this project.
Model Building
Lastly, we need to build models that can be used to help address the problem of overcrowding. The goal would be to reduce the overall percentage of time that the NEDOC Score falls within an overcrowded status. Since no monetary value was provided with this dataset, a range of suggestions will work best to explain what would happen in a "what-if" scenario with the models using the data provided.
Code:
ModelFinal<-(lm(HosData$NEDOC.Score.Over.100~ HosData$EDBedsRemain + HosData$X..of.Critical.Care.Pts...display + HosData$Longest.Admit.Time.Waiting.in.ED + HosData$Count.of.Patient.Entered.in.Hour + HosData$Sunday + HosData$AfternoonShif))
intercept<- summary(ModelFinal)$coefficients[1,1]
CofRemainBeds<- summary(ModelFinal)$coefficients[2,1]
CofCritialCare<- summary(ModelFinal)$coefficients[3,1]
CofWaitTime<- summary(ModelFinal)$coefficients[4,1]
CofPatEntred<- summary(ModelFinal)$coefficients[5,1]
CofSunday<- summary(ModelFinal)$coefficients[6,1]
CofAfternoon<- summary(ModelFinal)$coefficients[7,1]
HosData$FiveNewEDBeds<- (HosData$EDBedsRemain + 5)
HosData$TenNewEDBeds<- (HosData$EDBedsRemain + 10)
HosData$FifteenNewEDBeds<- (HosData$EDBedsRemain + 15)
Data Viz:
The visual of this analysis was best prepared as a presentation. Theoretically, the goal would be to present these findings to an executive of the hospital for review. This presentation was completed as a Microsoft PowerPoint presentation for ease of access and use to a wider variety of viewers who may not be familiar with other platforms. The presentation was recorded and saved as a video to be sent to the hospital stakeholders.Insight:
The final insight was given in more detail during the presentation, however, the main points of the findings were that the Emergency Department Beds were the largest cause of the overcrowding problem and that increasing the number of beds above 72 would effectively help reduce the NEDOC Score.Additionally, we explored the factor of available staff during busy periods in the Emergency Department. The goal for this was to help build an understanding of when more staff would be needed to help offset the overcrowding issues. The dataset shows that the afternoon periods (4 pm to 12am) each day and Sundays (all day) would greatly benefit from more staff. These time periods provided a 'statistically significant' that increased the NEDOC Score.
Suggestions to Reduce Overcrowding:
- Increase staff during afternoons
- Increase staff during Sundays
- Increase the total number of Emergency Department Beds
Comments
Post a Comment