bayesian - Weka machine learning - Interpeting naive bayes -
i got training dataset of ill horses, data contains surgeries , diseases. of fields of registers like: temperature of horse, age, pulse, respiratory rate etc ....
what want clasificator on live/dead/euthanized column of every row. asked check is:
- think hypothesis of independence of variables
- check if got enought number of elements obtain reliable probabilities
the dataset had 25% of missing values , them imputated using mimmi imputation.
thinking possibility of getting reliable probabilities, can see training dataset little unbalanced: 179 horses live , 121 die (dead + euthanized). im not sure of that. 2 questions helpful me.
=== run information === scheme:weka.classifiers.bayes.naivebayes relation: horsecolic-weka.filters.unsupervised.attribute.remove-r25-27 instances: 300 attributes: 24 surgery age id temp pulse resprate tempextrem peripulse mucmemb capreft pain peri abddist ngtube ngreflux ngrph feces abd pcellvol totprot abdcentapp abdcenttotprot outc surgles test mode:10-fold cross-validation === classifier model (full training set) === naive bayes classifier class attribute lived died euthanized (0.59) (0.26) (0.15) ================================================================== surgery yes 97.0 59.0 28.0 no 84.0 20.0 18.0 [total] 181.0 79.0 46.0 age adult 168.0 67.0 44.0 young 13.0 12.0 2.0 [total] 181.0 79.0 46.0 id mean 1009274.0202 1452556.3598 751596.8611 std. dev. 1431022.1677 1887025.7703 989556.6807 weight sum 179 77 44 precision 16915.735 16915.735 16915.735 temp mean 34.8733 35.0055 33.054 std. dev. 10.2335 13.0545 14.9588 weight sum 179 77 44 precision 0.9275 0.9275 0.9275 pulse mean 29.2039 33.2115 29.0187 std. dev. 10.8578 14.6404 16.7248 weight sum 179 77 44 precision 0.9107 0.9107 0.9107 resprate mean 15.0771 16.9169 15.9348 std. dev. 8.9803 7.0278 8.1221 weight sum 179 77 44 precision 0.8667 0.8667 0.8667 tempextrem normal 82.0 16.0 12.0 warm 36.0 7.0 3.0 cool 53.0 48.0 25.0 cold 12.0 10.0 8.0 [total] 183.0 81.0 48.0 peripulse normal 133.0 22.0 11.0 increased 5.0 8.0 7.0 reduced 43.0 47.0 25.0 absent 2.0 4.0 5.0 [total] 183.0 81.0 48.0 mucmemb normal-pink 95.0 9.0 7.0 bright-pink 23.0 13.0 6.0 pale-pink 37.0 19.0 12.0 pale-cyanotic 16.0 17.0 12.0 bright-red 7.0 14.0 8.0 dark-cyanotic 7.0 11.0 5.0 [total] 185.0 83.0 50.0 capreft short 153.0 46.0 23.0 long 28.0 33.0 23.0 long2 1.0 1.0 1.0 [total] 182.0 80.0 47.0 pain no-pain 53.0 6.0 8.0 depressed 42.0 21.0 14.0 inte-mild-pain 64.0 10.0 8.0 inte-severe-pain 12.0 18.0 12.0 cont-severe-pain 13.0 27.0 7.0 [total] 184.0 82.0 49.0 peri hypermotile 42.0 7.0 7.0 normal 22.0 8.0 5.0 hypomotile 90.0 37.0 17.0 absent 29.0 29.0 19.0 [total] 183.0 81.0 48.0 abddist none 88.0 17.0 13.0 slight 53.0 18.0 8.0 moderate 28.0 30.0 14.0 severe 14.0 16.0 13.0 [total] 183.0 81.0 48.0 ngtube none 79.0 40.0 27.0 slight 90.0 32.0 15.0 significant 13.0 8.0 5.0 [total] 182.0 80.0 47.0 ngreflux none 149.0 50.0 30.0 17.0 15.0 6.0 less 16.0 15.0 11.0 [total] 182.0 80.0 47.0 ngrph mean 11.3797 13.0882 8.0606 std. dev. 2.3535 3.2916 5.1673 weight sum 179 77 44 precision 0.7917 0.7917 0.7917 feces normal 77.0 14.0 10.0 increased 16.0 14.0 8.0 decreased 44.0 15.0 11.0 absent 46.0 38.0 19.0 [total] 183.0 81.0 48.0 abd normal 48.0 13.0 4.0 other 39.0 5.0 7.0 firm-large-intestine 18.0 8.0 6.0 dist-small-intest 32.0 24.0 8.0 distended-large-intest 47.0 32.0 24.0 [total] 184.0 82.0 49.0 pcellvol mean 31.0162 47.0465 46.0112 std. dev. 14.1207 18.5468 17.672 weight sum 179 77 44 precision 0.9518 0.9518 0.9518 totprot mean 42.6539 41.451 43.7936 std. dev. 16.9138 18.6362 19.3247 weight sum 179 77 44 precision 0.9432 0.9432 0.9432 abdcentapp clear 112.0 25.0 10.0 cloudy 54.0 22.0 20.0 serosanguinous 16.0 33.0 17.0 [total] 182.0 80.0 47.0 abdcenttotprot mean 16.1341 21.1634 14.3203 std. dev. 6.8038 4.9109 8.6619 weight sum 179 77 44 precision 0.8837 0.8837 0.8837 surgles yes 94.0 70.0 30.0 no 87.0 9.0 16.0 [total] 181.0 79.0 46.0 time taken build model: 0.01 seconds === stratified cross-validation === === summary === correctly classified instances 216 72 % incorrectly classified instances 84 28 % kappa statistic 0.5134 mean absolute error 0.1965 root mean squared error 0.3803 relative absolute error 52.8451 % root relative squared error 88.2672 % total number of instances 300 === detailed accuracy class === tp rate fp rate precision recall f-measure roc area class 0.777 0.198 0.853 0.777 0.813 0.873 lived 0.675 0.175 0.571 0.675 0.619 0.871 died 0.568 0.082 0.543 0.568 0.556 0.824 euthanized weighted avg. 0.72 0.175 0.735 0.72 0.725 0.865 === confusion matrix === b c <-- classified 139 28 12 | = lived 16 52 9 | b = died 8 11 25 | c = euthanized
naive bayes has prominent assumption attributes independent. meaning in case age, surgery, temp taken mutually independent. may not case though, , in many instances not. naive bayes obtain decent results little training, not model in assumptions more correct. finding these models takes time , effort though, , naive bayes model reach adequate accuracy. not sure sample size, you'll have @ statistical power of dataset.
Comments
Post a Comment