Neural networks trained to classify images do so by identifying features that allow them to distinguish between classes . These sets of features are either causal or context dependent . Grad-CAM is a popular method of visualizing both sets of features . In this paper, we formalize this feature divide and provide a methodology to extract causal features from Grad-CAM . We do so by defining context features as those features that allow contrast between predicted class and any contrast class . We then apply a set theoretic approach to separate causal from contrast features for COVID-19 CT scans . We show that on average, the image regions with the proposed causal features require 15% less bits when encoded using Huffman encoding, compared to Grad-CAM, for an average increase of 3% classification accuracy, over Grad-CAM . Moreover, we validate the transfer-ability of causal features between networks and comment on the non-human interpretable causal nature of current networks.