Description Military planners must consider the undesirable secondary effects of military operations, such as civilian casualties, physical infrastructure damage, and societal disruption, when comparing courses of action. As modern military operations become less likely to be massive force-on-force conflicts and more likely to be asymmetric, counter-insurgency, and nation-building operations, such considerations become even more important. Therefore, a quantitative model that can be used to evaluate and compare the intentional harm, or “evil,” caused by alternative courses of action would be useful to military analysts and planners. Beginning with an initial concept and model provided by the U. S. Army System Simulation and Development Directorate (SSDD), the University of Alabama in Huntsville (UAH) has developed and tested a “Metric of Evil,” a quantitative model of the harm associated with military courses of action intended to allow the comparison of courses of action on an ethical basis. The model considers both the results of a course of action and the intentions of the actor. UAH surveyed the research literature to learn of other relevant research and to place the model in an intellectual context. Then, after implementing the SSDD-provided version of the model, UAH designed and implemented a second model that expanded upon the initial concept in two ways. First, the weightings associated with specific types of harm and the degree to which intention affects the calculations can both be adjusted, e.g., to reflect different value systems. Second, the model’s inputs were defined to be as quantitative and objective as possible, minimizing the number of subjective value judgments needed from the users. The model was experimentally validated (or calibrated) by comparing its assessments with those of human experts. The courses of action in a set of four specially selected historical scenarios were assessed (“rated”) by four quantitative models (the initial SSDD model, a calibrated version of the SSDD model, and two versions of the UAH model that differed only in certain weighting parameters), three groups of human raters (military experts, non-military experts, and non-experts), and a group of random raters. The four models were compared to a total of 50 human and 12 random raters in the experiment. Human raters were recruited with backgrounds in ethics, religion, psychology, political science, military history, and military command. Accuracy in a rater’s assessments was defined as cumulative agreement with the two groups of human experts, i.e., the more often a rater agreed with the experts, the more accurate that rater was deemed to be. Each rater’s agreement with the experts was computed and two evaluation statistics computed: the rater’s relative rank in terms of agreement with the experts (how often the rater agrees compared to the other raters) and the rater’s proportion of agreement with the experts (how often the rater agrees compared to perfect agreement). One of the UAH versions of the model agreed with the human experts over all situations approximately as well as each of the three groups of human raters. In fact, the agreement rank for the model is slightly better than the average rank of two of the three groups of human raters. Once calibrated, the SSDD model also performed very well. Overall, the human raters and the best models are closely comparable, which is a highly satisfying outcome given the model’s newness. The results strongly suggest that a model’s formulas and weights can quantitatively capture the criteria and sensibilities used by the human experts in assessing the evil associated with military courses of action, and that the concept of a model of evil able to quantitatively assess the evil associated with a military course of action is viable and practical. Moreover, the use of a model of evil of the type developed and tested can provide a valuable secondary benefit in terms of the analyst’s understanding of a course of action’s possible consequences.