In medical literature, the C-statistic is commonly used, though often misinterpreted.

  The C-Statistic is a function of the population on which it is evaluated as much as it is a function of the algorithm itself. Suppose I have a congestive heart failure (CHF) disease identification algorithm that has a C-statistic of 0.7. For argument’s sake, let’s say I really want to improve to a C-statistic of 0.8 so that I can publish it. To accomplish this, I could simply choose an “easier” population on which to run my algorithm. To make this concrete, suppose I first run the algorithm against the Medicine population and get a C-statistic of 0.7. Add the Psych ward and Ob-Gyn population (where CHF is virtually non-existent). As long as my model appropriately places these new patients at the bottom of the list, I will show a remarkable increase in the C-statistic of my algorithm. It’s not that the algorithm got better – it’s that the population got “easier” with the addition of “obvious negatives.”   This highlights an important aspect of the C-Statistic – it tends to look better when the incidence of the “in-class” condition is rare. This was particularly evident in some previous work I did in fraud detection. Imagine trying to find 100 fraud cases in a set of 1 million transactions and someone gives you an algorithm guaranteeing that all of the fraud cases will be in the top 10% most risky results. Such an algorithm is nearly useless in a practical setting – no bank wants to contact 100,000 customers to prevent 100 cases of fraud. However, the C-Statistic for this algorithm will be greater than .9!   The algorithm described above has little practical utility because the PPV (a.k.a. precision) is very low (.001). In most practical settings, PPV is the relevant quantity to trade off against sensitivity. For this reason, at PCCI we often plot curves of sensitivity vs. PPV when assessing our classifier performance in addition to traditional ROC curves of sensitivity vs. specificity. Operationally, we want to know when we alert someone to a potential medical problem, how often we expect to be right and how often we might be raising a false alarm.    –written by Brian Lucena, PhD