We propose a novel pathology-sensitive deep learning model (PS-DeVCEM) for frame-level anomaly detection and multi-label classification of different colon diseases in video capsule endoscopy (VCE) data . Our proposed model is capable of coping with the key challenge of colon apparent heterogeneity caused by several types of diseases . Our model is driven by attention-based deep multiple instance learning and is trained end-to-end on weakly labeled data using video labels instead of detailed frame-by-frame annotation . The spatial and temporal features are obtained through ResNet50 and residual Long short-term memory (residual LSTM) blocks, respectively . Additionally, the learned temporal attention module provides the importance of each frame to the final label prediction . Moreover, we developed a self-supervision method to maximize the distance between classes of pathologies . We demonstrate through qualitative and quantitative experiments that our proposed weakly supervised learning model gives superior precision and F1-score reaching , 61.6% and 55.1%, as compared to three state-of-the-art video analysis methods respectively . We also show our model's ability to temporally localize frames with pathologies, without frame annotation information during training . Furthermore, we collected and annotated the first and largest VCE dataset with only video labels . The dataset contains 455 short video segments with 28,304 frames and 14 classes of colorectal diseases and artifacts . Dataset and code supporting this publication will be made available on our home page.