Corresponding authors:
Academic editor:
At the beginning of our 21st century, the brain, its functioning and its pathologies, remain in great proportion a mystery to us. Yet, brain dysfunctions can have tremendous impact on a person's health, as the brain is the center of the nervous system. Among the problems seen with frequency are cerebral bleeding, or subarachnoid hemorrhages (SAH), due to rupture of aneurysm. An aneurysm consists in the apparition of a weakness of the arterial blood vessel that leads to local vessel dilation. If the aneurism breaks, the subsequent cortical region irrigation is interrupted, with potential brain damage as a consequence, or even death. Usually aneurysms do not cause any symptoms, unless they break and lead to SAH. With some frequency, aneurysms are detected in images undertaken for other reasons, whether of Computed Tomography (CT) Magnetic Resonance Imaging (MRI). Sometimes, the aneurism might swell so that it compresses nearby structures, leads to symptoms and is thus detected.
In Chile, subarachnoid hemorrhages have been found to be the 4th cause of cerebrovascular disease, which translates into an estimation of 700 new cases each year (
There is a need to understand the mechanisms leading to the apparition of aneurysms, their growth and more important, to their rupture. Many efforts have been done, each exploring one direction at a time: broadly speaking either exploring biochemical or biomechanical factors (
Aneurysms with an irregular or multilobulated shape
Aneurysms with higher aspect ratios (height to neck)
Anomalies in the vascular tree around the circle of Willis
Larger aneurysm sizes
Hemodynamic factors (flow impingement, pressure, wall shear stress)
Some known risk factors derived from the patient clinical history are, according to (
Direct blood relatives with SAH in their health histories
Previous SAH antecedents
Renal polycystic disease
Fibromuscular dysplasia
Some connective tissue disorders
Smoking habits
Arterial hypertension
The discussion of understanding which factors, and in which combination, take part in aneurysm rupture takes relevance in the context of the complex decision to be taken of which therapy to apply. The options are: therapeutic abstention (no intervention); surgical intervention to clip the aneurysm; or endovascular intervention to treat the aneurysm with coils. Each intervention bares its own risks, and implies to use a complex ensemble of resources (human resources, OR, etc.) in hospitals, especially public hospitals, always under very high workload. In a few words: the decision is not easy and must be taken with care, so additional information will be of help to support this decision. Up to now, decisions are essentially made based on the patient’s clinical conditions, his/her age, and the aneurysm localization and size, besides considering the patient’s option (
Machine learning is an interdisciplinary field combining computer science and mathematics to develop models with the intent of delivering maximal predictive accuracy. This is done by detecting patterns from an incomplete set of examples composed of past data. For this reason, this is a data-driven discovery process. The quality of the predictions of machine learning algorithms rely mainly on the number of samples and the quality (amount of information contained) of the variables used to describe each sample of the phenomena (
With medical data, there exist several practical, technical and ethical issues with acquiring great amounts of examples compared to other research fields such as information retrieval where millions of examples are freely available. Therefore, it is of utmost importance to extract the maximum amount of information of each example. This is why an exhaustive feature extraction phase is done.
Several machine learning techniques have been used in medicine in the past to diagnose patients (
Depending on the problem, different kinds of features are needed. Nowadays, machine learned features extracted from raw data are becoming more used, as deep learning approaches have been very successful. However, the interpretation of those features is an open problem, as the clear understanding of the role of each feature is what could give a deeper insight into the aneurysms causes and risk factors.
Up to now, what has been used in features extraction methods based on angiographic image analysis is the following: the first step in the feature extraction step corresponds to the vascular geometry creation. Using the angiographic images a three-dimensional triangulated surface is obtained using an automatic segmentation method based on geodesic active regions in combination with an image standardization technique (
A possible contribution to the state of the art would be a better aneurysm neck detection method. Also, it has been noted in (
Having the features to feed a machine learning algorithm, the predicted rupture risk can be computed. However, to understand the underlying principles involved in the rupture process, having a great amount of features is counterproductive, since most features may not be involved in the process. The process of finding the most relevant features is known as feature selection.
Nowadays, the reasons of aneurysm formation and rupture are unknown. Therefore, even if the feature selection process may only give a set of important features (in a statistical sense) without any further descriptions, it may give some hints about what variables are correlated with the rupture in dependent (if they only affect the process if other variables have a range of values) and independent ways.
The identification and combined use of relevant variables extracted from clinical, demographical, environmental and medical imaging data sources will improve the estimation of the intracranial aneurysm rupture risk, with respect to the actual practiced method.
To identify relevant variables that may help in the process of predicting the risk of intracranial aneurysm rupture using machine learning and image processing techniques based on structured and non-structured data from multiple sources.
Collection and storage of data from multiple sources
Features extraction from multiple sources:medical images, demographic, environmental and epidemic information and the patient history record.
Use every feature available to build a rupture risk prediction model, taking into account difficulties such as missing data, error cost, imbalanced classes and the use of features in different feature spaces.
Identify the relevant features for the prediction model and their respective correlation, that is what sets of variables are correlated and their relevance in the model.
The research methodology will be an experimental methodology. This study corresponds to an analytic transversal study with prospective and retrospective characteristics. This has a focus on evaluating new solutions for problems. Two main phases are distinguished. The first one is an exploratory phase where the problem is studied in search of relevant questions about the studied system. The second phase will attempt to answer these questions with thoughtful experiments.
In this project, the object of study is the rupture risk of aneurysms and its relationship with the observable variables.
The steps involved in the methodology are:
Up-to-date revision and study of the state of the art.
Questions identification and hypothesis proposal.
Design and implement a method to test the hypothesis.
Design, implement and run adequate experiments.
Analyze, discuss and document the results.
Additional practices transversal to the study consider testing every software piece within a Testing Driven Development framework to avoid small errors, use a Concurrent Versions System to be able to reproduce past results and to have better diagnose tools in case of software bugs appear. Moreover, not only the experiments results should be documented, also the experiments setup (data, parameters, software revision) and the software itself.
We are going to use public available data of weather conditions together with clinical and demographic data with the intracranial aneurysm images obtained from the angiograph of the Hospital Carlos van Buren of Valparaíso.
The time to accomplish the proposal is 3 years. The project is divided in four phases, closely related to the specific objectives:
Phase 1 – Data collection and storage (March 2017 – July 2017)
Phase 2 – Feature extraction from multiple sources (May 2017 – November 2018)
Phase 3 – Build model describing aneurysm rupture through the extracted feature combination (November 2018 – September 2019)
Phase 4 – Identify relevant variables in the rupture process (September 2019 – February 2020)
The following is a brief summary of the phases’ activities (described in greater depth in the Proposal description section) with their respective scheduling.
At the beginning, the first task is to confirm that our prior conception of the information that is hypothesized to be relevant to estimate the risk of rupture is exhaustive. This information will be used to guide the data collection and feature extraction processes.
The first step will be focus on data recollection from the Hospital Carlos van Buren (HCvB). It is important to mention that this project is inserted in a work team, namely in the HCvB: Dr. Pablo Cox and Dr. Rodrigo Riveros, interventional neuro-radiologists, and RT Maximiliano Godoy. HCvB is the public hospital of the Valparaíso region and this Hospital is a center of reference in neurology and neurosurgery in Chile.
The proposal is to undertake a retrospective and prospective study, to include patients from 2014 to 2016 retrospectively and from June 2017 up to December 2019 prospectively, counting on the approval by the local Ethical Committee. Inclusion criteria will be based on presenting a confirmed diagnostic of cerebral aneurysm. Patients will be enrolled through the Angiography Department of the Hospital, by the interventional neuro-radiologists of this project. No change will be applied in the way the patient is diagnosed or treated, only his/her data will be included in the present study. According to the registries from previous years, the Angiography Department of HCvB receives 100 patients per year with diagnostic of cerebral aneurysm, 83% of them ruptured. Specifically from 2014 to 2015, 238 patients with aneurysms were diagnosed in Angiography of the HCvB. In the prospective period of 2.5 years, we are expecting the additional inclusion of 250 patients.
Data to be collected are
Angiographic images (already in DICOM format)
Informs: radiologist inform to establish diagnostic; radiologist inform of intervention (already in digital format)
Health records: these data are already accessible in a digital home-made database in the Angiography Deparment of HCvB, based on FileMaker, developed and administrated by RT Maximiliano Godoy
Patient demographic information. Age, gender, weight, height, BMI, city of residence (to be related to epidemic data and weather data)
Diagnostic (ICD10) and comorbidity
Clinical and treatment information: Glasgow score, drugs used (which ones in which concentrations, or which suspended), allergies, number of punctures, etc.
Laboratory analysis, in particular PCR, among others
Epidemic, from the Ministry of Health (Departamento de Estadísticas e Información de Salud DEIS http://www.deis.cl) : in particular, registry of seasonal variation (syncytial virus, and others).
Weather data, in combination with the epidemic information. The “Centro de Ciencia del Clima y la Resiliencia” (http://www.cr2.cl) has released in 2016 a tool to access the Chilean historical weather data, including variables, such as mean, maximum and minimum temperatures and precipitations across the country, including at least 229 weather stations. In this step it is necessary to obtain all the information available since year 2014, so it can be later used jointly with the aneurysms rupture dates.
The inclusion of epidemic and weather data is motivated by the recent observations of neuroradiologist that there seems to be a peak of aneurysm ruptures observed in relation with seasonal fluctuations (syncytial virus for instance). The underlying hypothesis is that rupture might be influenced by mechanisms of inflammation, in combination with hemodynamic and biomechanical stresses exerted over the arterial wall.
The
The initial computational framework implementation (client-side) will also be considered for the retrieval and the processing of the data.
Write a report to summarize the aneurysm rupture information related to each resource and a detailed data description from each source for future reference.
The initial problems detected in the feature extraction stage of the image characterization problem will be further explored, such as robust aneurysm neck detection, fast and robust three-dimensional mesh computation and new feature engineering to describe the aneurysm, since features like the non-sphericity index have been shown to have a significant correlation with the aneurysm rupture (
The first step, after briefly exploring the existing problems and their solutions, will be the initial build of a flexible and modular computational framework for image processing, capable of easily integrating external libraries to use whenever is possible existing software solutions, such as ITK (Insight Segmentation and Registration Toolkit) and VTK (The Visualization Toolkit), to avoid re-implementing algorithms, since it is a time demanding task. Naturally, the integration of different external solutions will be done as needed, but it is important to lay a robust foundation to allow this to happen seamlessly.
Considering that, the methodology steps will be applied to each of the following tasks:
Initial computational framework implementation for image processing
Robust aneurysm neck detection
Vascular tree description
Fast and robust three-dimensional mesh computation
Morphological aneurysm description
Additional feature consideration
Additionally, a first paper will be written to be published in a conference of level A or A/B in the CORE system or in a journal (April 2019).
The rest of the information is composed of a plethora of different features, such as time series (weather and epidemic data), categorical (smoking habit) and numerical information (age, height). To take advantage of this diversity, it is important to carefully extract relevant features, so they provide additional information.
The following tasks are identified for which the proposed methodology will be applied:
Relevant features for weather data extraction, such as humidity, temperature and their changes in short time intervals in specific locations
Epidemic data feature extraction
Health records information extraction
Dimension reduction by finding correlated variables
Using all the extracted features, the aneurysm rupture risk will be modeled using a data-driven approach. It will be very important to take into account the different kind of features and missing values, as it has been shown that in the classifying process this can have a very significant impact (
Specifically, the following task have been identified:
Missing value estimation (November 2018 – January 2018)
Different feature space combination (February 2018 – April 2018)
To make a good estimation, it is important to use or modify a classifier that is able to take into account the risks of each decision and give a confidence on its estimation.
Imbalanced classes impact quantification and consideration (May 2019 – June 2019)
Error cost inclusion (June 2019 – July 2019)
Classifier confidence estimation (July 2019 – August 2019)
Rupture risk prediction (August 2019 – September 2019)
The publication of the experiments results and insights acquired will be published in an ISI indexed Journal of the area article (April 2019 – September 2019).
After obtaining a good classification accuracy compared to the existing literature, the objective is to identify the subset of the most relevant features and their correlations that could explain the rupture of cerebral aneurysms. The tasks to achieve this objective are:
Find lineal and non-lineal correlation among different variables sets and the predicted aneurysm rupture risk. This should take into account variables of the same source and multiple sources (September 2019 – November 2019)
Decrease the number of used features to a minimum maintaining the classifier’s predictive precision (December 2019 – January 2020)
The publication of the experiments results and insights acquired will be published in an ISI indexed Journal of the area article (October 2019 – February 2020)
The sponsoring institution is executing the PMI (
Physical space, access to the university library, internet connection and relevant journals subscriptions are also available.
2017 Postdoctoral Grant
Applying machine learning and image feature extraction techniques to the problem of cerebral aneurysm rupture
Universidad de Valparaiso, Valparaiso, Chile
Hospital Carlos van Buren, Valparaiso, Chile
The proposal is to undertake a retrospective and prospective study, to include patients from 2014 to 2016 retrospectively and from June 2017 up to December 2019 prospectively, counting on the approval by the local Ethical Committee.
The authors do not present any conflicts of interest to report.
2017 Postdoctoral Grant
Applying machine learning and image feature extraction techniques to the problem of cerebral aneurysm rupture
Universidad de Valparaiso, Valparaiso, Chile
Hospital Carlos van Buren, Valparaiso, Chile
The proposal is to undertake a retrospective and prospective study, to include patients from 2014 to 2016 retrospectively and from June 2017 up to December 2019 prospectively, counting on the approval by the local Ethical Committee.
The authors do not present any conflicts of interest to report.