Establishment of a Prediction Model for Determining the Timing of LC after PTGBD in Elderly Patients with Acute Cholecystitis Based on Machine Learning Technology
The clinical data of elderly patients with acute cholecystitis who underwent LC after PTGBD and were admitted to the Central Hospital of Jinzhou from January 2013 to June 2024 were retrospectively analyzed. Logistic Regression (LR), Support Vector Machine (SVM), Decision Tree (DT), Random Forest (RF), and Artificial Neural Network (ANN) were used to construct the prediction model of the obtained dataset. Internal validation was performed using cross-validation, and the degree of differentiation and calibration of the models were evaluated and compared using the area under the curve and Brier score measurements. Finally, six risk factors were identified, including age, body temperature, white blood cells, gallbladder wall thickness, alkaline phosphatase, and blood urea nitrogen, and a prediction model was established on this basis and by comparing the sensitivity, specificity, positive predictive value, negative predictive value, clinical decision curve (DCA) and calibration curve of different models, the five models showed good predictive performance and stability, which can be regarded as an auxiliary means for clinical decision-making.
Acute Cholecystitis
将数据集以7:3的比例随机分为训练集(84例)和测试集(36例)进行模型的构建及验证。考虑到数据样本量小,采用10倍交叉验证方法检验机器学习模型的有效性。最后,使用测试集对优化后的模型进行测试。根据纳入患者术前的相关临床数据(纳入患者术中及术后指标对结局影响在可控状态,故不予计入),将筛选得到的32个相关危险因素进行分类和赋值。主要赋值和分类为“0”级、“1”级这两级。且将“1”级计作1分,而“0”级计作0分(
相关指标 |
赋值及分类 |
性别 |
0 = 女性;1 = 男性 |
手术时机 |
0 = 早期(6周内);1 = 延期(6周后) |
吸烟 |
0 = 有;1 = 无 |
饮酒 |
0 = 有;1 = 无 |
糖尿病 |
0 = 有;1 = 无 |
高血压 |
0 = 有;1 = 无 |
冠心病 |
0 = 有;1 = 无 |
慢阻肺 |
0 = 有;1 = 无 |
脑血管疾病 |
0 = 有;1 = 无 |
腹部手术史 |
0 = 有;1 = 无 |
既往胆囊炎病史 |
0 = 有;1 = 无 |
墨菲氏征 |
0 = 阳性;1 = 阴性 |
东京分度 |
0 = 中度;1 = 轻度 |
胆囊结石 |
0 = 有;1 = 无 |
胆囊周围积液 |
0 = 有;1 = 无 |
一、基线特征
根据手术时机不同分组,将行PTGBD后再行LC时间间隔小于6周的患者作为早期组,以时间间隔大于6周的患者作为延期组。比较两组基线信息的特征,采用Kolmogorov-Smirnov检验对定量数据的正态性进行评价(
变量 |
早期组 |
延期组 |
p |
训练集 |
测试集 |
p |
(n = 59) |
(n = 61) |
(n = 84) |
(n = 36) |
|||
性别 |
0.86 |
0.75 |
||||
女 |
31 (53%) |
33 (54%) |
44 (52%) |
20 (56%) |
||
男 |
28 (47%) |
28 (46%) |
40 (48%) |
16 (44%) |
||
年龄 |
67 (64, 71) |
73 (64, 78) |
0.003* |
69 (64, 75) |
71 (65, 78) |
0.22 |
吸烟 |
0.61 |
0.40 |
||||
无 |
38 (64%) |
42 (69%) |
54 (64%) |
26 (72%) |
||
有 |
21 (36%) |
19 (31%) |
30 (36%) |
10 (28%) |
||
饮酒 |
0.066 |
0.54 |
||||
无 |
32 (54%) |
43 (70%) |
51 (61%) |
24 (67%) |
||
有 |
27 (46%) |
18 (30%) |
33 (39%) |
12 (33%) |
||
墨菲氏征 |
0.90 |
0.46 |
||||
阳 |
47 (80%) |
48 (79%) |
65 (77%) |
30 (83%) |
||
阴 |
12 (20%) |
13 (21%) |
19 (23%) |
6 (17%) |
||
既往胆囊炎病史 |
0.051 |
0.86 |
||||
无 |
48 (81%) |
40 (66%) |
62 (74%) |
26 (72%) |
||
有 |
11 (19%) |
21 (34%) |
22 (26%) |
10 (28%) |
||
腹部手术史 |
0.51 |
0.80 |
||||
无 |
42 (71%) |
40 (66%) |
58 (69%) |
24 (67%) |
||
有 |
17 (29%) |
21 (34%) |
26 (31%) |
12 (33%) |
||
糖尿病 |
0.24 |
|||||
无 |
42 (71%) |
46 (75%) |
0.60 |
59 (70%) |
29 (81%) |
|
有 |
17 (29%) |
15 (25%) |
25 (30%) |
7 (19%) |
||
高血压 |
0.33 |
0.74 |
||||
无 |
39 (66%) |
35 (57%) |
51 (61%) |
23 (64%) |
||
有 |
20 (34%) |
26 (43%) |
33 (39%) |
13 (36%) |
||
冠心病 |
0.47 |
0.96 |
||||
无 |
41 (69%) |
46 (75%) |
61 (73%) |
26 (72%) |
||
有 |
18 (31%) |
15 (25%) |
23 (27%) |
10 (28%) |
||
慢阻肺 |
0.85 |
0.95 |
||||
无 |
51 (86%) |
52 (85%) |
72 (86%) |
31 (86%) |
||
有 |
8 (14%) |
9 (15%) |
12 (14%) |
5 (14%) |
||
脑血管病 |
0.88 |
|||||
无 |
46 (78%) |
53 (87%) |
0.20 |
69 (82%) |
30 (83%) |
|
有 |
13 (22%) |
8 (13%) |
15 (18%) |
6 (17%) |
||
体温 |
37.80 (37.20, 38.30) |
37.90 (37.50, 38.60) |
0.15 |
37.80 (37.30, 38.45) |
37.90 (37.50, 38.40) |
0.68 |
白细胞 |
13.0 (8.7, 16.7) |
15.9 (14.4, 17.8) |
0.003* |
15.4 (10.2, 17.2) |
15.3 (11.1, 16.9) |
0.99 |
胆囊结石 |
0.017* |
0.38 |
||||
无 |
21 (36%) |
35 (57%) |
37 (44%) |
19 (53%) |
||
有 |
38 (64%) |
26 (43%) |
47 (56%) |
47 (56%) |
||
胆囊壁厚度 |
0.29 (0.26, 0.50) |
0.60 (0.32, 0.80) |
<0.001* |
0.40 (0.28, 0.69) |
0.42 (0.27, 0.80) |
0.81 |
胆囊大小 |
32 (28, 42) |
31 (28, 43) |
0.95 |
32 (29, 43) |
31 (28, 40) |
0.41 |
胆囊周围积液 |
0.89 |
0.12 |
||||
无 |
22 (37%) |
22 (36%) |
27 (32%) |
17 (47%) |
||
有 |
37 (63%) |
39 (64%) |
57 (68%) |
19 (53%) |
||
东京分度 |
0.43 |
0.21 |
||||
中度 |
21 (36%) |
26 (43%) |
36 (43%) |
11 (31%) |
||
轻度 |
38 (64%) |
35 (57%) |
48 (57%) |
25 (69%) |
||
HB |
142 (133, 151) |
147 (138, 152) |
0.24 |
146 (138, 152) |
146 (133, 151) |
0.77 |
PT |
13.10 (11.80, 14.40) |
12.80(11.90, 14.20) |
0.68 |
12.90 (11.90, 14.20) |
13.15 (11.95, 14.30) |
0.91 |
INR |
1.19 (1.10, 1.33) |
1.23 (1.12, 1.28) |
0.90 |
1.20 (1.11, 1.31) |
1.24 (1.12, 1.29) |
0.60 |
TBIL |
26 (18, 65) |
19 (13, 52) |
0.075 |
26 (16, 64) |
20 (16, 34) |
0.30 |
AST |
38 (23, 57) |
35 (23, 54) |
0.68 |
36 (22, 52) |
37 (26, 68) |
0.40 |
ALT |
45 (26, 63) |
47 (25, 62) |
0.80 |
46 (26, 62) |
46 (23, 63) |
0.87 |
谷酰转肽酶 |
58 (43, 93) |
58 (41, 74) |
0.26 |
57 (43, 85) |
59 (43, 90) |
0.51 |
碱性磷酸酶 |
101 (81, 138) |
94 (68, 121) |
0.20 |
95 (77, 126) |
106 (71, 142) |
0.61 |
血尿素氮 |
6.40 (4.80, 7.20) |
7.60 (6.40, 8.30) |
<0.001* |
6.60 (5.05, 7.65) |
7.20 (6.45, 8.45) |
0.011* |
肌酐 |
79 (65, 96) |
86 (63, 104) |
0.51 |
83 (62, 98) |
86 (74, 99) |
0.28 |
PLR |
226 (170, 275) |
257 (215, 316) |
0.008* |
245 (177, 309) |
235 (183, 306) |
>0.99 |
NLR |
9 (5, 16) |
12 (8, 18) |
0.018* |
11 (6, 16) |
10 (6, 18) |
0.92 |
病程长短 |
6.00 (3.00, 8.00) |
4.00 (3.00, 6.00) |
0.024* |
4.00 (3.00, 7.00) |
4.00 (3.00, 6.50) |
0.63 |
注:*p < 0.05。
二、特征选择
用R中Boruta Package对包含变量的重要性进行排序。结果表明年龄、碱性磷酸酶、白细胞、血尿素氮和体温对老年急性胆囊炎患者PTGBD术后再行LC手术时机的选择影响最大(
三、逻辑回归模型
四、支持向量机模型
对变量进行支持向量机建模,由于支持向量机算法采用不同的变量组合来形成支持向量,并利用它们来预测结果,因此很难直观地呈现结果。根据结果得出ROC曲线,结果表明AUC为0.769,准确率为66.7%。对应最佳阈值的敏感性和特异性分别为0.818和0.643 (
变量 |
OR值 |
95% CI |
p |
年龄 |
1.08 |
1.58~4.02 |
0.03* |
白细胞 |
1.13 |
1.32~9.80 |
0.1 |
碱性磷酸酶 |
0.99 |
1.00~9.80 |
0.18 |
血尿素氮 |
1.18 |
1.55~9.29 |
0.19 |
体温 |
0.93 |
2.13~3.95 |
0.86 |
注:*p < 0.05。
五、决策树模型
六、随机森林模型
根据结果生成ROC曲线,模型预测准确率为75%,95% CI为(0.578~0.878)。AUC为0.858,最佳阈值下的敏感性和特异性分别为0.864和0.571 (
七、人工神经网络模型
结果表明,在最佳阈值下,AUC为0.698,准确度为66.7%,95% CI为(0.490~0.814)。模型的敏感性和特异性分别为0.636和0.714 (
八、模型性能比较
根据最佳阈值,比较使用五种算法获得的预测模型结果,结果总结在
Cut-off |
Sen |
Spe |
PPV |
NPV |
Accuracy (95% CI) |
AUC (95% CI) |
BS |
|
LR |
0.572 |
0.636 |
0.786 |
0.823 |
0.579 |
0.694 (0.519, 0.836) |
0.731 (0.561, 0.900) |
0.215 |
SVM |
0.451 |
0.643 |
0.818 |
0.750 |
0.562 |
0.667 (0.490, 0.814) |
0.769 (0.602, 0.937) |
0.201 |
DT |
0.742 |
0.636 |
0.714 |
0.778 |
0.556 |
0.667 (0.490, 0.814) |
0.633 (0.439, 0.827) |
0.242 |
RF |
0.308 |
0.864 |
0.571 |
0.760 |
0.727 |
0.750 (0.578, 0.879) |
0.724 (0.541, 0.907) |
0.209 |
ANN |
0.499 |
0.636 |
0.714 |
0.778 |
0.556 |
0.667 (0.490, 0.814) |
0.698 (0.522, 0.874) |
0.223 |
随着现代科学技术的飞速发展,对大样本数据的统计分析成为现实,且进行大样本数据的统计分析可以使我们获得更为准确的相关信息。而目前在相关因素的分析中使用最为广泛的便是Logistic回归
对这五个独立危险因素加以分析,白细胞作为延期手术治疗的保护因素,其可能的机制能反应出胆囊炎的病情变化,其主要的原因是:作为炎症反应的关键指标,白细胞计数的变化能够直观反映患者炎症的控制情况。PTGBD术后,随着胆囊引流的进行,炎症逐渐消退,白细胞计数趋于正常,这为延期手术提供了重要的时机参考。研究表明,白细胞计数的下降与患者全身炎症反应的缓解密切相关,能够有效降低手术风险,尤其是对于老年患者而言,其免疫功能较弱,手术耐受性较差,白细胞计数的恢复更能体现其生理状态的改善
血尿素氮是人体代谢产生的废物之一,它通过肾脏进入尿液,排出体外。胆囊炎时,由于胆囊炎症刺激,胆囊和胆管扩张,导致胆汁排出不畅,造成胆囊和胆管内压力增加,进而引起血尿素氮水平升高
而年龄作为延期手术保护的因素是因为高龄患者的一般情况较差且合并症较多,术后并发症的风险高
碱性磷酸酶水平在血液中升高时,对细菌感染诊断及病情评估有重要意义,细菌感染越严重或感染次数越多,酶蛋白变性越严重,碱性磷酸酶的值就会越高,同时也反映了机体应对感染的能力就越强
综上本研究通过单因素和多因素分析,明确了5个与老年急性胆囊炎PTGBD术后行LC手术时机选择相关的独立性因素。即最终分析得出年龄、白细胞、体温、碱性磷酸酶以及血尿素氮可作为老年急性胆囊炎PTGBD术后行LC手术时机选择相关的危险因素。在此基础之上,进一步建立了一个手术时机选择的预测模型并对其进行验证,通过受试者操作特征曲线分析显示该模型有较好的预测性。
在本研究中,应用了5种机器学习分类算法——逻辑回归、支持向量机、决策树、随机森林和人工神经网络算法对PTGBD术后行LC手术的老年急性胆囊炎患者的数据进行建模,并比较了模型的拟合效果。结果表明,所有模型都表现出良好的诊断性能,支持向量机模型AUC为0.760。逻辑回归和随机森林模型可以很好地可视化结果,并在临床应用中表现出简单性。我们建议使用支持向量机模型来预测老年急性胆囊炎患者PTGBD术后行LC最佳手术时机选择。
该病例报道已获得病人的知情同意。
*共同第一作者。
#通讯作者。