QUICK REVIEW

[논문 리뷰] A machine learning methodology for real-time forecasting of the 2019-2020 COVID-19 outbreak using Internet searches, news alerts, and estimates from mechanistic models

Dianbo Liu, Leonardo Clemente|PubMed|2020. 04. 08.

COVID-19 epidemiological studies참고 문헌 35인용 수 102

한 줄 요약

본 논문은 Augmented ARGONet를 소개한다. 이는 중국 CDC 보고, Baidu 검색, Media Cloud 뉴스, 및 GLEAM 기계적 모델 출력물을 결합하여 중국의 성(省) 단위 COVID-19 활동을 군집화와 데이터 증강을 활용해 2일 ahead로 예측하는 실시간 프레임워크이다.

ABSTRACT

We present a timely and novel methodology that combines disease estimates from mechanistic models with digital traces, via interpretable machine-learning methodologies, to reliably forecast COVID-19 activity in Chinese provinces in real-time. Specifically, our method is able to produce stable and accurate forecasts 2 days ahead of current time, and uses as inputs (a) official health reports from Chinese Center Disease for Control and Prevention (China CDC), (b) COVID-19-related internet search activity from Baidu, (c) news media activity reported by Media Cloud, and (d) daily forecasts of COVID-19 activity from GLEAM, an agent-based mechanistic model. Our machine-learning methodology uses a clustering technique that enables the exploitation of geo-spatial synchronicities of COVID-19 activity across Chinese provinces, and a data augmentation technique to deal with the small number of historical disease activity observations, characteristic of emerging outbreaks. Our model's predictive power outperforms a collection of baseline models in 27 out of the 32 Chinese provinces, and could be easily extended to other geographies currently affected by the COVID-19 outbreak to help decision makers.

연구 동기 및 목표

희소한 과거 데이터로 신흥 발병에 대한 실시간 예측의 필요성에 동기를 부여한다.
다양한 데이터 흐름을 활용하는 데이터 기반의 지리공간 인식 모델을 개발한다.
성-군집화 예측기 학습을 위한 데이터 증강 및 클러스터링으로 데이터 부족 문제를 완화한다.
데이터 기반 예측 프레임워크에 기계적 모델 추정치를 포함하는 추가 가치를 평가한다.

제안 방법

지리-시계 COVID-19 패턴에 따라 성(省) 군집을 정의하고 매 예측 날짜마다 모델을 재학습한다.
각 군집별로 Gaussian 노이즈가 있는 부트스트랩 재샘플링을 통해 데이터 증강을 수행한다.
과거 사례, Baidu 검색, Media Cloud 기사, 사망자, 누적 사례를 입력으로 하는 2일 간의 사례 수를 예측하기 위해 LASSO 다변량 선형 모델을 적합시킨다.
군집화 및 증강 이전에 기계적 모델 추정치(GLEAM)를 Augmented ARGONet의 입력으로 포함한다.
예측 이득을 평가하기 위해 기계적 입력이 없는 PES(기본) 지속성, 자기회귀, ARGONet과 비교한다.

실험 결과

연구 질문

RQ1다원 소스 데이터(공식 보고, 인터넷 검색, 뉴스, 기계적 예측치)가 성(省) 단위의 COVID-19 활동을 근접 실시간으로 예측할 수 있는가?
RQ2클러스터링과 데이터 증강이 과거 관찰 데이터가 제한된 설정에서 예측 성능을 개선하는가?
RQ3데이터 기반 예측 프레임워크에 기계적 모델 추정치를 포함하는 것이 어떤 추가 가치를 제공하는가?

주요 결과

Augmented ARGONet는 2일 ahead 예측에서 32개 중국 성의 27개에서 지속성 기준선을 능가한다.
클러스터링 및 증강이 포함된 ARGONet은 32개 성 중 25개에서 RMSE를 개선하고 18개 성에서 상관관계를 개선한다.
기계적 모델 추정치를 포함하면 대다수 성에서 예측력이 향상된다.
일부 지역(대만, 홍콩, 광시, 산시, 후난)은 행정/보건 체계의 차이로 RMSE 개선이 나타나지 않았다.
지역 성 데이터만 사용한 모델은 일반적으로 기본값보다 나은 성능을 보이지 않았고, ARGO형 모델은 제한적 개선을 보였다.
전반적으로 이 방법은 데이터가 제한된 신흥 발병기 동안 실시간 예측 가능성을 보여준다.

더 나은 연구,지금 바로 시작하세요

연구 설계부터 논문 작성까지, 연구 시간을 획기적으로 줄여보세요.

카드 등록 없음 · 무료 플랜 제공

이 리뷰는 AI가 만들고, 인간 에디터가 검토했습니다.