文献信息
深度学习算法用于头部CT扫描关键发现检测:一项回顾性研究
Sasank Chilamkurthy, 印度Qure.ai公司
Lancet 2018 柳叶刀期刊论文
Impact Score: 79.321
H5-index: 0
Computer Science Conferences Ranking
影响因子
Motivation
- We aimed to develop and validate a set of deep learning algorithms for automated detection of the following key findings from these scans: intracranial haemorrhage and its types (ie, intraparenchymal, intraventricular, subdural, extradural, and subarachnoid); calvarial fractures; midline shift; and mass effect;
旨在开发和验证一套深度学习算法,用于自动检测这些扫描的以下关键发现:颅内出血及其5种类型(即脑实质内、脑室内、硬膜下、硬膜外和蛛网膜下腔出血);颅骨骨折;中线移位;质量效应。 - Our results show that deep learning algorithms can accurately identify head CT scan abnormalities requiring urgent attention, opening up the possibility to use these algorithms to automate the triage process.
我们的结果表明,深度学习算法可以准确识别需要紧急关注的头部 CT 扫描异常,为使用这些算法自动进行分类过程开辟了可能性。
Contribution
- We retrospectively collected a dataset containing 313,318 head CT scans together with their clinical reports from around 20 centres in India between Jan 1, 2011, and June 1, 2017. A randomly selected part of this dataset (Qure25k dataset) was used for validation and the rest was used to develop algorithms. An additional validation dataset (CQ500 dataset) was collected in two batches from centres that were different from those used for the development and Qure25k datasets. The Qure25k dataset contained 21 095 scans (mean age 43 years; 9030 [43%] female patients), and the CQ500 dataset consisted of 214 scans in the first batch (mean age 43 years; 94 [44%] female patients) and 277 scans in the second batch (mean age 52 years; 84 [30%] female patients);
作者回顾性收集了2011年1月1日至2017年6月1日期间印度约20个中心的31318例头部CT扫描及其临床报告。该数据集随机选取的一部分(Qure25k数据集)用于验证,其余的用于开发算法。另外一个验证数据集(CQ500数据集)分两批从不同于用于开发和Qure25k数据集的中心收集。Qure25k数据集包含21095次扫描(平均年龄43岁;9030[43%]名女性患者),CQ500数据集包含第一批214次扫描(平均年龄43岁;94[44%]名女性患者)和第二批277次扫描(平均年龄52岁;84[30%]名女性患者); - On the Qure25k dataset, the algorithms achieved an AUC of 0·92 (95% CI 0·91–0·93) for detecting intracranial haemorrhage (0·90 [0·89–0·91] for intraparenchymal, 0·96 [0·94–0·97] for intraventricular, 0·92 [0·90–0·93] for subdural, 0·93 [0·91–0·95] for extradural, and 0·90 [0·89–0·92] for subarachnoid). On the CQ500 dataset, AUC was 0·94 (0·92–0·97) for intracranial haemorrhage (0·95 [0·93–0·98], 0·93 [0·87–1·00], 0·95 [0·91–0·99], 0·97 [0·91–1·00], and 0·96 [0·92–0·99], respectively). AUCs on the Qure25k dataset were 0·92 (0·91–0·94) for calvarial fractures, 0·93 (0·91–0·94) for midline shift, and 0·86 (0·85–0·87) for mass effect, while AUCs on the CQ500 dataset were 0·96 (0·92–1·00), 0·97 (0·94–1·00), and 0·92 (0·89–0·95), respectively;
在Qure25k数据集上,算法检测颅内出血的AUC为0.92(95%可信区间0.91–0.93)(脑实质内出血为0.90[0.89–0.91],脑室内出血为0.96[0.94–0.97],硬膜下出血为0.92[0.90–0.93],硬膜外出血为0.93[0.91–0.95],蛛网膜下腔出血为0.90[0.89–0.92])。在CQ500数据集上,颅内出血的AUC为0.94(0.92-0.97)(分别为0.95[0.93-0.98]、0.93[0.87-1.00]、0.95[0.91-0.99]、0.97[0.91-1.00]和0.96[0.92-0.99])。Qure25k数据集上颅骨骨折的AUC为0.92(0.91–0.94),中线移位的AUC为0.93(0.91–0.94),质量效应的AUC为0.86(0.85–0.87),而CQ500数据集上的AUC分别为0.96(0.92–1.00),0.97(0.94–1.00)和0.92(0.89–0.95); - To our knowledge, our study is the first to describe the development of a system that separately identifies critical abnormalities on head CT scans.
首次描述了一个系统的开发,该系统可以单独识别头部 CT 扫描的严重异常。
Approach
- First, a natural language processing (NLP) algorithm was used to detect intraparenchymal, intraventricular, subdural, extradural, and subarachnoid haemorrhages, and calvarial fractures from clinical radiology reports. Second, reports were randomly selected so that there were around 80 scans with each of intraparenchymal, subdural, extradural, and subarachnoid haemorrhages, and calvarial fractures. Each of the selected scans were then screened for the following exclusion criteria: postoperative defect; absence of non-contrast (plain) axial series covering complete brain; and patient was younger than 7 years (estimated from cranial sutures19 if data were unavailable).
首先,使用自然语言处理 (NLP) 算法从临床放射学报告中检测脑实质内、脑室内、硬膜下、硬膜外和蛛网膜下腔出血以及颅骨骨折。 其次,随机选择报告,以便对脑实质内、硬膜下、硬膜外和蛛网膜下腔出血以及颅骨骨折进行大约 80 次扫描。 然后根据以下排除标准筛选每个选定的扫描:术后缺损; 没有覆盖整个大脑的非对比(普通)轴向系列; 并且患者小于 7 岁(如果数据不可用,则根据颅缝估计 19)。
Experiment
-
The original clinical radiology report and consensus of three independent radiologists were considered as gold standard for the Qure25k and CQ500 datasets, respectively. Areas under the receiver operating characteristic curves (AUCs) were primarily used to assess the algorithms.
三位独立放射科医师的原始临床放射学报告和共识分别被视为 Qure25k 和 CQ500 数据集的金标准。 接受者操作特征曲线 (AUC) 下的面积主要用于评估算法。 -
测试指标
For both CQ500 and Qure25k datasets, receiver operating characteristic (ROC) curves20 were obtained for each of the target findings by varying the threshold and plotting the true positive rate (ie, sensitivity) and false positive rate (ie, 1–specificity) at each threshold. Two operating points were chosen on the ROC curve so that sensitivity was approximately 0·9 (high sensitivity point) and specificity approximately 0·9 (high specificity point; see appendix p 5 for algorithm for operating point choice). Areas under the ROC curves (AUCs) and sensitivities and specificities at these two operating points were used to assess the algorithms.
对于 CQ500 和 Qure25k 数据集,通过改变阈值并绘制每个目标的真阳性率(即灵敏度)和假阳性率(即 1-特异性),获得了每个目标结果的受试者工作特征 (ROC) 曲线 20 临界点。 在 ROC 曲线上选择了两个操作点,因此灵敏度约为 0·9(高灵敏度点)和特异性约为 0·9(高特异性点;操作点选择算法参见附录第 5 页)。 ROC 曲线下面积 (AUC) 以及这两个操作点的敏感性和特异性用于评估算法。
网友评论