If you don't remember your password, you can reset it by entering your email address and clicking the Reset Password button. You will then receive an email that contains a secure link for resetting your password
If the address matches a valid account an email will be sent to __email__ with instructions for resetting your password
The pandemic has necessitated the need to gain information from published literature quickly.
•
Text mining and natural language processing (NLP) has been utilized to help parse out high-level information from the COVID-19 Open Research Dataset (CORD-19) Challenge, a data set containing over 100,000 articles.
•
The Nursing COVID and Historical Epidemic Literature Repository contains 770 published literature specific to nursing and uses a variety of text mining and NLP to summarize the information.
•
Using TextRank, it was identified that the topics around psychological support for nurses and the need for rapid high-impact education were identified.
•
Nurse leaders and health care leaders need to note the importance of providing psychological support for their nurses as it is the important topic for nurse literature related to COVID-19.
Abstract
Background
During COVID-19, a Kaggle challenge was issued to data scientists to leverage text mining to provide high-level summaries of full-text articles in the COVID-19 Open Research Dataset (CORD-19) data set, a data set containing articles around COVID-19 and other epidemics. A question was asked: “What if nursing had something similar?”
Purpose
Describe the development and function of the Nursing COVID and Historical Epidemic Literature and describe high-level summaries of abstracts within the repository.
Method
Nurse-specific literature was abstracted from two data sets: CORD-19 and LitCOVID. LitCOVID is a data set containing the most up-to-date literature around COVID-19. Multiple text mining algorithms were utilized to provide summaries of the articles.
Discussion
As of July 2020, the repository contains 760 articles. Summaries indicate the importance of psychological support for nurses and of high-impact rapid education.
Conclusion
To our knowledge, this repository is the only repository specific for nursing that utilizes text mining to provide summaries.
The COVID-19 pandemic has necessitated increased scholarly work around the virus, from the virus's characteristics to the treatment and management of the virus. Published work around the virus is increasing at a rapid pace. The scientific community, in response to the pandemic, has provided open access to this published literature. Two literature repositories, the COVID-19 Open Research Dataset Challenge (CORD-19) and LitCovid, contain over 230,000 published manuscripts related to COVID-19 and historical epidemics (
). LitCovid was developed to house current published literature around COVID-19 and is updated daily to assist researchers and health leaders keep current with the latest research (
). The CORD-19 data set contains over 199,000 published literature with over 88,000 full-text articles on both COVID-19 and other historical epidemics (
With the advent of the pandemic, a Kaggle challenge, a data science and machine learning competition, was issued to data scientists to use text mining and natural language processing to data-mine through the CORD-19 data set. The purpose of this challenge was to provide researchers and health leaders with insight into issues and topics regarding the current management of COVID-19. This challenge was a collaborative challenge among the Allen Institute, the White House, the National Institute of Health, and other institutions (
). Text mining uses natural language processing (NLP) for unstructured data, such as the text of abstracts or full-text papers, to summarize information using various methods (
). This body of methodologies provides multiple ways to analyze unstructured data including visualization through word clouds and knowledge maps, and ranking top ckeywords, phrases, and sentences by importance (
). NLP is a type of text mining that scans through large amounts of unstructured data to extract meaning and information from the data. NLP methods can include text classification, clustering, and sentiment analysis. Text mining and NLP transforms and characterizes text by using statistical algorithms to provide quality information from unstructured text data (
Within health care, there have been some demonstrations of the use of these text mining techniques. Using text mining and NLP, NimbleMiner mined through thousands of patient and nurse documentation to identify alcohol and substance abuse clinical notes, enabling users to search through and find alcohol and substance abuse-specific documentation (
). Text mining has also been used to parse and annotate unstructured notes in electronic health records to discover risk factors for patient deterioration (
). Further, text mining and NLP techniques have been utilized on literature databases, such as Web of Science, or for specific topics, such as cardiovascular disease, to parse out high-level information from abstracts of the databases or topics (
For nursing, evidence-based practice (EBP) is the standard in developing practice guidelines and is of particular importance within the presence of a pandemic (
). However, a current challenge for the discipline is that nurses’ primary responsibility is often at the bedside. Potential time constraints and competing priorities are barriers to the nurses’ ability to do robust literature gathering and appraisals of evidence to implement EBP to address issues presented with the COVID-19 pandemic. Text mining and NLP techniques could help remove this barrier by providing high-level summaries of published literature about nursing topics around the historic epidemics and the COVID-19 pandemic. These summaries could assist in readily evaluating the applicability and meaningfulness of literature to a nurses’ clinical question potentially improving access to literature reviews. The question was asked: “What if there was a tool that provided nurses access to high-level information on a particular topic within published literature applicable to nursing around historical epidemics and the COVID-19 pandemic?” In response to this question, a nurse scientist at a children's hospital in the western United States created this resource that housed nursing-specific literature around pandemics and epidemics. The intent of this resource was to provide a literature repository for nurses that houses abstracts with links to literature and provides preliminary assistance in accessing and visualizing the information through text mining and NLP. The repository can be accessed freely at this link: childrenscolorado.shinyapps.io/RN_COVID_Lit/. This repository has been shared with the University of Colorado, College of Nursing, Case Western Reserve University College of Nursing, the American Nurses Association and others.
The aims of this paper are as follows: (a) describe the development, maintenance, and function of the Nursing COVID and Historical Epidemic Literature Repository (NCHELR) and (b) provide a case that demonstrates the use of the repository. For the use case, an algorithm called TextRank was utilized to provide high-level summaries for the abstracts housed in the repository.
Methods
Data Sources
The repository utilizes the CORD-19 and the LitCovid literature data sets. The CORD-19 contains literature around historical epidemics such as Severe Acute Respiratory Syndrome, Middle East Respiratory Syndrome, and Ebola, providing a breadth of information useful for nursing that crosses several countries and continents (
). As there are over 230,000 published papers between these two repositories, it was essential to restrict the abstracts and corresponding links to the full-text papers contained within the NCHELR to relevant nursing topics. Keywords “nursing” or “nurse” were employed on both data sets as a broad approach in selecting topics relevant to nursing. CORD-19 was primarily used for historical epidemic literature, while LitCovid was utilized for the latest COVID-19 literature.
NCHELR Development
Initial development of the NCHELR began in March 2020. For development, a team was formed to mine through the CORD-19 and LitCovid databases. Using “nurse” and “nursing” as keywords, the abstracts of published literature and corresponding links to the full-text articles were collected to be placed in the repository. A primary goal of the NCHELR was to provide a database of nursing literature abstracts; thus, all published articles regarding nurses or nursing care and regardless of type (i.e., quantitative, qualitative) were included. An experienced medical librarian and two nurse scientists collected the literature from these databases. They reviewed the abstracts to ensure that the literature was related to nursing and that links to full-text articles were present. They characterized the literature by population (adult vs. children) and by COVID or historical epidemic. Then, a research nurse and a research assistant ensured that the following characteristics were collected: title, author, year, complete abstract, and doi or link to the article. Lastly, the NCHELR developer, whom is a nurse scientist, examined the dataset to remove duplicates and to ensure completeness of the dataset. Based on completeness of the data or applicability to nursing, some literature was not included in the repository (i.e., some articles included were animal studies).
Once the developer completed the data preparation for each new round of articles, the data fed into the repository application. Multiple text mining and NLP algorithms were implemented on the abstracts to summarize the unstructured data into high-level information. The repository foundation is coded within the R language and utilizes multiple R packages within ShinyApps as the web-based application for user access (
Figure 1NCHELR Process Map-Describes the processes for extracting the literature from the CORD-19 and LitCovid data sets, for cleaning the data, and for how the algorithms are utilized.
The first version of the repository was deployed at the end of March 2020. This version contained only 49 abstracts of published articles with corresponding links to the full-text articles, and a few text mining and NLP algorithms. A beta test was conducted using nurse scientist colleagues employed at various organizations, nurse leaders, a research assistant, and a biostatistician to assess usefulness (how useful is the repository?), meaningfulness (how meaningful is this repository?), and communicability (how well does the repository communicate the information?). Overall, feedback from this round of beta test was generally positive on all three questions. One leader and a clinical education specialist commented that the current repository was accessible for researchers but not necessarily for nurse leaders. She indicated that it would be useful to have a summary of the information and a table that contained the links to the full article. These requested changes were incorporated into the next version of NCHELR. Another beta test was conducted with the same group using the same questions; feedback was positive for this round in which testers found the changes valuable.
NCHELR Overview
The NCHELR is divided into seven different pages that incorporate different algorithms to summarize the information. Supplementary material A provides screenshots of each page. Table 1 provides a brief description of the algorithm and an example of using that algorithm within published literature across different fields. Further, the NCHELR is searchable through sub-selection of the population or epi- and/or pandemic. There is also a Boolean search term option to further sub-select the literature by a specific word. The search term option mines through the titles and the abstracts for that specific word. Within the repository, the papers containing that search term in the title or abstract and/or subselection of the categories are collected in a dataset. The algorithms then utilize the abstracts of these papers to provide a high-level summary.
Table 1Text Mining Algorithms Utilized in the Nurse COVID and Historic Epidemic Literature Repository
Algorithm
Brief Description
Examples of Use
TextRank
-Network based algorithm that identifies important sentences and keywords
-Determine factors from EHR notes associated with patient deterioration (
The second page of the repository application contains a table of the abstracts as defined by search parameters. The table lists the subselection of papers with their authors, title, year, summary sentence, and doi or link to the full-text articles. Further, an overall summary consisting of the five most important sentences from the selection of papers is displayed, providing a summary of the selection of papers. The TextRank algorithm is utilized to provide this summary (
). This high-level summary provides the end-user a snapshot of the top entries and a short summary of each abstract, with corresponding links to the full-text articles, allowing end-users the ability to select literature they would like to further investigate and appraise. The third page of the repository application groups “similar” papers into clusters. A clustering algorithm involving “Euclidean” distance between words across the literature is implemented across the selection of abstracts and group papers into three clusters (
). This technique allows the user to further sub-select the papers into similar clusters, potentially enabling more intentional digestion of information. Further, a table, similar to what is found on the landing page of the repository application, is displayed to deliver information on the select papers for each cluster. The utility of this function is that the user has the information and links for papers that are similar to each other potentially reducing time in browsing papers.
Word frequencies and word clouds summarize the selection of papers on the fourth page of the repository application. Word frequencies are simple counts of words arranged by most common, while word clouds are visual representations of the most common words as determined by word frequencies (
). The word frequencies bar plot is restricted to the top 10 most common words, while the word cloud is restricted to the top 100 words. Word clouds provide the end-user words that are emphasized across the select abstracts and help to visualize key words that can be used for additional searches and can provide a global sense of the text across abstracts. The fifth page of the repository application utilizes sentiment analysis (an algorithmic methodology that assigns words as positive or negative) to display the top 10 most common positive words and negative words (
). This page provides end-users a sense of positive or negative topics within the selection of papers.
The co-occurrence network of the most common trigrams (groups of three words) from selected papers is displayed on the sixth page of the repository application (
). This page provides the end-user a sense of how the words are related through the co-occurrence network, adding some summative information to the selected papers. The last page, page seven, clusters words from the selected abstracts into three topics. Latent Dirichlet Allocation is an algorithm that utilizes the probability to associate words into the three topics (
). These topics are the themes for the selected papers providing the end-user a sense of the themes being discussed within the selected papers. All of these features provide nurse end-users fast guidance on which full-text articles to further explore based on their topic of interest.
NCHELR Example
As a use case of the NCHELR, we provide an (a) overall summary of all the abstracts of literature housed within the repository; (b) by COVID-19 papers; and (c) by historical epidemics. We currently use a TextRank, an extractive and text summarization technique, to generate this summary. TextRank is a graph-based ranking algorithm that can be applied to a variety of natural language processing applications. Derived from Google PageRank and other graph-based algorithms, it provides a mechanism to rank sentences by importance (
). Graph-based ranking algorithms decide the importance of a vertex within a graph based on global information recursively drawn from the entire graph. For example, a vertex that links to another will cast a vote giving that vertex a score. This score is subsequently ranked according to importance. Text is used to calculate and link related sentences and then rank them based on importance (
). Thus, the recommendations made by TextRank provide a sensible summary of a selection of abstracts. We looked at the top five most important sentences for each group to assess an overall summary of the topics discussed.
NCHELR Maintenance
COVID-19 literature is constantly produced and published; therefore, a maintenance strategy was implemented to keep the NCHELR updated with the growing literature. The maintenance strategy included a team of the medical librarian and the nurse scientist developer. The structure developed to create the repository is also utilized for maintenance (see Figure 1). The medical librarian utilizes the LitCOVID database on a biweekly basis to collect recently published abstracts and links to the full-text papers. The nurse scientist developer ensures the completeness of the papers and updates the repository. The NCHELR is updated monthly.
Findings
A total of 760 published papers related to nursing are housed in the repository as of July 2020. COVID-19 published papers numbered approximately 511, while published paper classified as historical numbered 302. There were 56 papers that data extractors considered as applicable to COVID-19 and historical categories.
Table 2 summarizes the results of the TextRank analysis. When examining all 760 papers within the repository, important sentences emphasized topics germane to nursing, such as the psychological state of nurses, the necessity of rapid continuing education, and health care delivery changes (
). For the COVID-19 papers, the emphasis was on the changes in health care delivery, specifically for vulnerable populations and utilization of telehealth (
). For the historical papers, topics highlighted mostly discussed training and adherence to respiratory guidelines, specifically relating to the H1N1 outbreak (
Infection-control knowledge, attitude, practice, and risk perception of occupational exposure to Zika virus among nursing students in Korea: A cross-sectional survey.
). By having the abstracts of these articles in one location, nurse leaders and clinicians can further select articles and seek full text articles to inform practice.
Table 2Five Most Important Sentences from the Nurse COVID Literature Repository by Category
All Papers
COVID Papers
Historic Papers
“Understanding nurses’ psychological change process during the care for patients with COVID-19 is imperative for healthcare leaders” (
“Our objective was to determine the compliance with respiratory hygiene of triage nurses at 2 university hospital centers and to identify factors influencing compliance to the respiratory hygiene principles of emergency health care workers” (
“The current COVID-19 pandemic has affected every one, but presents profound consequences for patients with kidney disease, health care providers, and biomedical researchers (
“The current COVID-19 pandemic has affected every one, but presents profound consequences for patients with kidney disease, health care providers, and biomedical researchers (
“Conclusion: The study points out the need to provide in-service training for professionals on the transmission of microorganism in primary health care to ensure adequate level of risk perception and knowledge” (
“The COVID-19 pandemic has created the need for rapid development and implementation of nursing continuing professional development (NCPD) to scale up nurses and other health care providers to meet a surge in critically ill patients” (
“Introduction: The primary aim of this study was to explore the perception of Hong Kong emergency nurses regarding their work during the human swine influenza pandemic outbreak” (
“Methods: The study examined health care worker adherence to CDC recommended respiratory infection control practices in primary care clinics and emergency departments of 5 medical centers in King County, Washington, using a self-administered questionnaire” (
“Results: The study revealed a high prevalence of stress, anxiety, and poor psychological well-being, especially among females, young health care workers, and those who interacted with known or suspected COVID-19 patients” (
Therefore, this study was conducted to identify nursing students’ knowledge, attitudes, practices, and risk perceptions of infection prevention related to occupational exposure to Zika virus infection, and to identify correlations among the related variables” (
Infection-control knowledge, attitude, practice, and risk perception of occupational exposure to Zika virus among nursing students in Korea: A cross-sectional survey.
This paper describes the development and function of the Nurse COVID and Historical Epidemic Literature Repository. The repository is a tool to link frontline nurses, nurse researchers, and nurse leaders to evidence and information applicable to nursing during their initial stages of investigation in a less time intense way. The NCHELR, with its use of text mining algorithms and NLP, can provide nurses access to published papers to help make immediate evidenced-based decisions in the field. Text mining and NLP have been known to provide valuable knowledge and information useful for an organization (
). It should provide some guidance on what is discussed within the selection of papers and lower the barriers of literature reviews during the pandemic.
Another limitation worth noting is that abstracts are utilized for the algorithms. Abstracts widely vary in quality, formats, and communicating methods and findings embedded within the full text article. Therefore, poor abstracts will result in poor results of the algorithms; however, abstracts were chosen for the following reasons. First, abstracts of full text articles are freely available without having to pay a fee. The development of this application was done in a hospital-setting and not an academic institution; thus, the in-house library has limited access to full text papers. Second, the computational resource was more manageable for abstracts compared to full text articles. There are several ways to mitigate this computational resource, such as implementing parallel computing within the code; however, the authors felt that this repository needed to be developed quickly to respond to the need for access to COVID-19 literature, and implementing parallel computing would have delayed the development. Though this limitation may be a disadvantage to the validity of the results of the algorithms, the intent of the repository was first and foremost an initial tool for end-users to use the results of the algorithms to narrow down COVID-19 literature specific to their clinical question.
As an application example of the repository, the TextRank algorithm was utilized on all the literature housed within the repository to provide a high-level summary. It was found that frontline nurses’ psychological status is of utmost relevance (
). Leaders should be aware of assessing their staff's resiliency and the psychological impact of a pandemic to provide support resources for their staff. Researchers could further study the impact of COVID-19 on the nursing workforce, particularly their long-term psychological status. Further, high-impact rapid education is necessary for nurses’ safety for both COVID-19 and historical epidemics (
). Investing in these educational resources could potentially provide better management of the pandemic. Anecdotally, these conclusions are sensible and are readily noted. However, the algorithms do dictate that these are a few of the most important topics for nursing.
To our knowledge, this is the only literature repository that utilizes text mining and NLP algorithms on published literature related to nursing and specifically for frontline nurses, nurse researchers, and nurse leaders. This repository intends to be a tool to be used by nurses to gain knowledge specific to COVID-19 and historical epidemics in an easily accessible way. The advantages of this repository compared to other databases are as follows: (a) this a repository-specific to nursing literature; and (b) it utilizes text mining and NLP to provide high-level summaries. This link, childrenscolorado.shinyapps.io/RN_COVID_Lit/, provides access to the repository.
Authors’ Contributions
Figaro Loresto: Conceptualization, Methodology, Software, Formal Analysis, Visualization, Writing-Original Draft, Writing-Review & Editing; Lisa Nunez: Writing-Original Draft, Writing-Review & Editing; Lindsey Tarasenko: Conceptualization, Investigation, Data Curation, Writing-Review & Editing; Marie St. Pierre: Investigation, Data Curation, Writing-Review & Editing; Kenneth Oja: Investigation, Data Curation, Writing-Review & Editing; Mallory Mueller: Conceptualization, Writing-Review & Editing; Bailey Switzer: Investigation, Data Curation,; Katherine Marroquin: Investigation, Data Curation; Catherine Kleiner: Conceptualization, Resources, Supervision, Writing-Review & Editing.
Infection-control knowledge, attitude, practice, and risk perception of occupational exposure to Zika virus among nursing students in Korea: A cross-sectional survey.