29 Apr 2021
Due to the widespread use of tools and the development of text processing techniques, the size and range of clinical data are not limited to structured data. The rapid growth of recorded information has led to big data platforms in healthcare that could be used to improve patients' primary care and serve various secondary purposes. Patient similarity assessment is one of the secondary tasks in identifying patients who are similar to a given patient, and it helps derive insights from similar patients' records to provide better treatment. This type of assessment is based on calculating the distance between patients. Since representing and calculating the similarity of patients plays an essential role in many secondary uses of electronic records, this article examines a new data representation method for Electronic Medical Records (EMRs) while taking into account the information in clinical narratives for similarity computing. Some previous works are based on structured data types, while other works only use unstructured data. However, a comprehensive representation of the information contained in the EMR requires the effective aggregation of both structured and unstructured data. To address the limitations of previous methods, we propose a method that captures the co-occurrence of different medical events, including signs, symptoms, and diseases extracted via unstructured data and structured data. It integrates data as discriminative features to construct a temporal tree, considering the difference between events that have short-term and long-term impacts. Our results show that considering signs, symptoms, and diseases in every time interval leads to less MSE and more precision compared to baseline representations that do not consider this information or consider them separately from structured data.