Medical Engineering & Physics 37 (2015) 126–131

Contents lists available at ScienceDirect

Medical Engineering & Physics journal homepage: www.elsevier.com/locate/medengphy

Technical note

On feature extraction and classification in prostate cancer radiotherapy using tensor decompositions Auréline Fargeas a,b , Laurent Albera a,b,∗ , Amar Kachenoura a,b , Gaël Dréan a,b , Juan-David Ospina a,b,d , Julie Coloigner a,b , Caroline Lafond c , Jean-Bernard Delobel c , Renaud De Crevoisier a,b,c , Oscar Acosta a,b a

INSERM, U1099, Rennes F-35000, France Université de Rennes 1, LTSI, Rennes F-35000, France c Département de Radiotherapie, Centre Eugène Marquis, Rennes F-35000, France d Escuela de Estadística, Universidad Nacional de Colombia – Sede Medellín, Colombia b

a r t i c l e

i n f o

Article history: Received 30 July 2013 Received in revised form 11 August 2014 Accepted 25 August 2014 Keywords: Classification Canonical polyadic decomposition Deterministic multi-way analysis DIAG Prediction of toxicity Radiotherapy Prostate cancer

a b s t r a c t External beam radiotherapy is commonly prescribed for prostate cancer. Although new radiation techniques allow high doses to be delivered to the target, the surrounding healthy organs (rectum and bladder) may suffer from irradiation, which might produce undesirable side-effects. Hence, the understanding of the complex toxicity dose–volume effect relationships is crucial to adapt the treatment, thereby decreasing the risk of toxicity. In this paper, we introduce a novel method to classify patients at risk of presenting rectal bleeding based on a Deterministic Multi-way Analysis (DMA) of three-dimensional planned dose distributions across a population. After a non-rigid spatial alignment of the anatomies applied to the dose distributions, the proposed method seeks for two bases of vectors representing bleeding and non bleeding patients by using the Canonical Polyadic (CP) decomposition of two fourth order arrays of the planned doses. A patient is then classified according to its distance to the subspaces spanned by both bases. A total of 99 patients treated for prostate cancer were used to analyze and test the performance of the proposed approach, named CP-DMA, in a leave-one-out cross validation scheme. Results were compared with supervised (linear discriminant analysis, support vector machine, K-means, K-nearest neighbor) and unsupervised (recent principal component analysis-based algorithm, and multidimensional classification method) approaches based on the registered dose distribution. Moreover, CP-DMA was also compared with the Normal Tissue Complication Probability (NTCP) model. The CP-DMA method allowed rectal bleeding patients to be classified with good specificity and sensitivity values, outperforming the classical approaches. © 2014 IPEM. Published by Elsevier Ltd. All rights reserved.

1. Introduction and related work External beam radiotherapy is one of the standard treatment for localized prostate cancer. The main challenge in Prostate Cancer RadioTherapy (PCRT) is to deliver the prescribed dose to the clinical target (prostate and seminal vesicles) while minimizing the dose to the neighboring Organs At Risk (OAR), namely the rectum and the bladder, and thus avoiding subsequent toxicity-related events. The precision of the radiotherapy has been recently increased by new techniques such as Intensity Modulated RT (IMRT) and Image

∗ Corresponding author at: Université de Rennes 1, LTSI, Rennes F-35000, France. Tel.: +33 223235058. E-mail address: [email protected] (L. Albera). http://dx.doi.org/10.1016/j.medengphy.2014.08.009 1350-4533/© 2014 IPEM. Published by Elsevier Ltd. All rights reserved.

Guided RT (IGRT) [7], which allows the dose to be increased to the target. However, the potential secondary effects due to the delivered dose to the OAR are far from being completely explained [11]. Therefore, unraveling the underlying local dose–volume effect toxicity relationships and identifying patients at higher risk, appears as a cornerstone in further definitions of constraints for personalized IMRT planning. In case of rectal toxicity, different studies have shown a correlation between dose, volume, and secondary effects [10,23]. However, most of the proposed models have been solely based on the dose–volume histograms such as the Normal Tissue Complication Probability (NTCP) model [20,26,8], thereby loosing spatial information. These methods do not perform a formal classification exploiting the spatial characteristics of the dose distributions since they considered the organs as having homogeneous radio-sensitivity. Buettner et al. [4,3] addressed the issue of spatial

A. Fargeas et al. / Medical Engineering & Physics 37 (2015) 126–131

information loss. In [3], a classification approach based on locally connected neural network using a two-dimensional dose-surface maps was performed. In [4], they proposed a parameterized representation of the dose to describe its geometrical properties, such as the eccentricity, and its lateral and longitudinal extent which still remains approximative in terms of spatial location. New methods aimed at jointly taking advantage of the Three-Dimensional (3D) dose distributions, unraveling the subtle correlation between local dose and toxicity at a voxel level to classify patients at risk, are still to be devised. Performing classification by simultaneously exploiting the 3D signal across a population is challenging because the interindividual anatomical variability leading to a misalignment of information. To cope with this issue, non-rigid registration methods have been employed in order to map all the data to a common coordinate system where voxel analysis may be meaningful in terms of spatial localization [1,31]. Following this idea, previous classification approaches exploiting the 3D signal across a given population have been proposed. For instance, Principal Component Analysis (PCA) was used by Fripp et al. [12] to discriminate Alzheimer’s disease and normal elderly control participants based on non-rigidly registered 3D Positron Emission Tomography (PET) images. With the same objective, Higdon et al. [15] performed a comparison of different classification methods (logistic regression, Linear Discriminant Analysis (LDA) and quadratic discriminant analysis) which were used after a data reduction algorithm (PCA and Partial Least-Squares (PLS)) using FDG-PET images. A leaveone-out cross validation showed that better results could be obtained when PLS and LDA are used for data reduction and classification, respectively. Nevertheless, it appears that methods based on PLS slightly outperform PCA-based methods on diagnostic accuracy. In the context of rectal bleeding in PCRT, authors proposed in [9] to use PCA to analyze non-rigidly registered dose distributions. The authors identified one basis of orthogonal vectors from 3D dose distributions of the whole database (patients with and without rectal bleeding) allowing for classification. More precisely, the new patient to be classified is projected on subspace spanned by the more discriminant features (eigenvectors selected in the training step) that better divide the two classes. Another paper, proposed by Phan et al. [25], has considered the problem of feature extraction and classification based on orthogonal or nonnegative tensor (multi-way) decompositions and higher order (multilinear) discriminant analysis using TUCKER decompositions, whereby input data are considered as tensors instead of more conventional vector or matrix representations. The algorithms are verified on three different datasets (images of objects, handwritten digits and EEG). Hunyadi et al. [16] proposed to detect seizure using the nuclear norm regularization, that allows the conveying of the structural information of the multichannel EEG matrix. More recently, Signoretto et al. [28] developed a learning framework and extends regularization via nuclear norm to the case of higher order arrays. We introduce, in this paper, a novel method based on a Deterministic Multi-way Analysis (DMA) of 3D planned dose distributions across a population. The proposed technique aims at finding two bases of vectors from 3D non-rigidly registered dose distributions of patients with and without rectal bleeding, respectively, from a Canonical Polyadic (CP) decomposition [29,6,18] of two appropriate Fourth Order (FO) arrays. Unlike [4], it takes advantage of complete 3D information when performing decomposition without any parametrization, so that each voxel is independently considered. The uniqueness of the subspaces spanned by both computed bases is ensured by looking for rank-1 basis vectors. The orthogonality constraint imposed by PCA is then relaxed. A new patient is thus classified according to its distance to the subspaces spanned by both bases. Tests on real clinical data demonstrate an improved sensitivity and reliability for group analysis. The novel

127

method, named CP-DMA, opens the way for potential applications in IMRT planning. 2. Materials and methods The different steps of the proposed CP-DMA method are summarized in Fig. 1. After a pre-processing step, where 3D doses are non-rigidly aligned, the training of the CP-DMA consists in learning (1) (1) (2) (2) from the data two vector bases, (ε1 , . . ., ε (1) ) and (ε1 , . . ., ε (2) ), R

R

spanning two vector subspaces, E(1) and E(2) , characteristic of patients with and without toxicity, respectively. Then a new patient is classified by evaluating its distance to E(1) and E(2) . 2.1. Data and pre-processing A total of 99 patients treated for localized prostate cancer with IMRT were included in the study. The used treatment planning system was Pinnacle V7.4 (Philips Medical System, Madison, WI). The total prescribed dose was 46 Gy to the seminal vesicles delivered in 4.6 weeks, and 80 Gy to the prostate delivered in 8 weeks, with a standard fractionation of 2 Gy per fraction. The patient positioning, CT acquisition, volume delineations and dose constraints complied with GETUG 06 recommendations as described in [5]. For the rectal, the constraints were the maximal dose had to be lower than 76 Gy (measured within at least 1.8 cm3 ) and a V72 Gy1 lower than 25%. The size of the planning CT images in the axial plane was 512 × 512 pixels, with 1 mm image resolution, and 2 mm slice thickness. The median follow-up was 38 months, with a minimum of 24 months for all patients. Rectal toxicity events were prospectively collected and scored according to the Common Terminology Criteria for Adverse Events (CTCAE) version 3.0. The events were defined as rectal bleeding (≥ Grade 1), at least one episode occurring between 6 months and 2 years after RT. Patients with a history of hemorrhoids were not allowed to be scored as Grade 1 bleeding. A total of 17 patients presented at least a Grade 1 late rectal bleeding event. Patient’s planned CT and dose distributions were elastically registered with the demons algorithm [31], on a single coordinate system (template) by combining the CTs and organs delineations as explained in [1]. A representative individual was selected as a template to be used as the common coordinate system. This typical individual maximized a similarity criterion which is the sum of squared differences computed after rigid registration. In this paper, we focused on rectal bleeding and the dose received by the rectum. 2.2. Subspace identification Multi-dimensional arrays are those with more than two entries. The CP-DMA method uses two particular FO arrays in order to identify both vector subspaces E(1) and E(2) . The first (N1 × N2 × N3 × P(1) ) array, denoted by T (1) , is built by concatenating, in the fourth dimension, the 3D dose distributions of the patients with rectal bleeding. Similarly, the second (N1 × N2 × N3 × P(2) ) array, denoted by T (2) , is obtained by concatenating the 3D dose distributions of the patients without rectal bleeding. Let S(i) be the F(i) -dimensional (i) subspace spanned by the P(i) 3D dose distributions T:,:,:,p concatenated in the FO array T (i) . The dimension F(i) of S(i) is necessarily lower than P(i) . We obviously assume that the R(i) -dimensional subspace E(i) is included in (and not necessarily equal to) S(i) . Consequently, a vector basis of S(i) can be obtained by completing the (i) (i) (i) (i) (i) (i) vector basis (ε1 , . . ., ε (i) ) of E(i) : let (ε1 , . . ., ε (i) , ε (i) , . . ., ε (i) ) R

1

(volume receiving at least 72 Gy)

R

R +1

F

128

A. Fargeas et al. / Medical Engineering & Physics 37 (2015) 126–131

Fig. 1. Overview diagram of the CP-DMA method.

be this completed vector basis. Then there is a weight real matrix (i) D(i) = (Dp,r ) of size (P(i) × F(i) ) such that: F  (i)

∀i ∈ {1, 2}, ∀p ∈ {1, . . ., P }, (i)

(i) T:,:,:,p

=

(i)

(i)

Dp,f εf

(1)

f =1 (i)

(i) ) F (i) (i)

Now, we can compute the vector basis (ε1 , . . ., ε

(i) 3D dose distributions T:,:,:,p (i.e. from the FO array T (i) (i) ing for F rank-1 Third Order (TO) arrays εf . Indeed, (i)

P(i)

rank-1 assumption, the CP decomposition of T



from the ) by look-

under this

:

F (i)

∀i ∈ {1, 2},

T (i) =

(i) n(1) ,f

A

(i) n(2) ,f

B

(i) n(3) ,f

C

(i)

Dp,f

(2)

f =1

2.3. Classification of a new 3D dose distribution (i)

(i) ). F (i)

allows us to identify the vector basis (ε1 , . . ., ε

The CP decom-

position of the FO array T (i) consists in computing the minimal linear combination of FO rank-1 arrays that yields T (i) exactly and (i) more particularly the four loading matrices A(i) = (A (1) ), B(i) = n

(i) ), n(2) ,f

(B

C (i) = (C

(i) ) n(3) ,f

,f

(i)

and D(i) = (Dp,f ). This minimal number of

FO rank-1 arrays, denoted by F(i) in Eq. (2), is generally called the rank of the decomposed array. It is noteworthy that Kruskal’s lemma [19,27] provides non necessary but sufficient conditions for a CP decomposition to be unique. But in our practical context the uniqueness of the CP decomposition is not absolutely required: we just want to find one vector basis of subspaces S(1) and S(2) . Thus, (i) the F(i) basis vectors ␧f can be derived from the CP decomposition of T (i) and more particularly from the loading matrices A(i) , B(i) and C (i) as follows:

∀i ∈ {1, 2},

(i)

By contrast with the classical Alternating Least Square (ALS) technique, the DIAG algorithm belongs to the class of direct methods. Indeed, it algebraically formulates the CP problem as the combination of classical matrix decomposition problems for which efficient numerical solutions exist. For instance, DIAG resorts to one joint eigenvalue decomposition and several singular value decompositions. In practice, direct CP decomposition methods appear more robust than fully iterative techniques, especially with respect to an overestimation of the rank [21]. As opposed to the matrix case, all the tensor decomposition methods require the rank to be known, and some of these methods such as the ALS technique can be very affected by a misestimation of this parameter. Note that the estimation of the rank F(i) of the FO array T (i) will be discussed in Section 2.4.

(i)

(i)

(i)

εf = A:,f ◦ B:,f ◦ C :,f

(3)

where ◦ denotes the outer product operator and U :,f is the f-th column vector of any matrix U. In other words, the (n1 , n2 , n3 )(i) (i) (i) (i) th component of the TO array εf is computed as An ,f Bn ,f Cn ,f . 1

(i)

2

3

(i) ) is computed, we have to extract F (i) (i) εr spanning the subspace E(i) . A way

Once the vector basis (ε1 , . . ., ε

from it the R(i) basis vectors of proceeding is proposed in section 2.4. Regarding the numerical computation of the CP decomposition, several algorithms were proposed [6,18]. We used the DIAG (DIrect AlGorithm for canonical polyadic decomposition) approach [22,21].

Let’s consider a single patient represented by its 3D dose distribution G. We propose to compute its minimal euclidean distance to both subspaces E(1) and E(2) . More particularly, we compute the (i) euclidean distance dG (E(i) ) between G and its orthogonal projec(1)

tion on subspace E(i) for both values of i. If dG (E(1) ) is lower than (2)

dG (E(2) ), we will decide to allocate G to E(1) and we will deduce that the patient represented by the 3D dose distribution G belongs (1) to the rectal bleeding group. Otherwise, i.e. if dG (E(1) ) is strictly (2)

greater than dG (E(2) ), the patient is classified as not presenting rectal bleeding. (i) The computation of the dG (E(i) ) distance uses the vectorization operator vec(.) transforming a (N1 × N2 × N3 ) array U into an (N1 N2 N3 )-dimensional column vector u. More especially, the m-th component um of u is given by: um = Un1 ,n2 ,n3 ,

with m = N1 (N2 (n1 − 1) + (n2 − 1)) + n3 (i)

(4)

(i)

Now, let’s define the R(i) vectors er = vec(εr ) as the vectorized (i) versions of the TO arrays εr . Then, we have:

∀i ∈ {1, 2},

(i)

(i)

(i)

(i)

er = A:,r ⊗ B:,r ⊗ C :,r

(5)

where ⊗ denotes the Kronecker product operator. The R(i) (i) (N1 N2 N3 )-dimensional column vectors er can thus be concate(i) (i) nated in a (N1 N2 N3 × R ) matrix E as follows:

∀i ∈ {1, 2},

(i)

(i)

E (i) = [e1 , e2 , . . ., e

(i) ] R(i)

(6)

A. Fargeas et al. / Medical Engineering & Physics 37 (2015) 126–131

We deduce from Eqs. (5) and (6) that the E (i) matrix can also be computed as follows:

∀i ∈ {1, 2},

(i) :,1:R(i)

E (i) = A

B

(i) :,1:R(i)

C

(i) :,1:R(i)

(7)

where  denotes the Khatri-Rao product operator [6] and where U :,1:R is the submatrix defined by the R first column vectors of ⊥ matrix U. The orthogonal matrix projector E(i) on subspace E(i) can then be computed as follows:

∀i ∈ {1, 2},

T



E(i) = E (i) (E (i) E (i) )

−1

E (i)

T

(8)

∀i ∈ {1, 2},



(i)

dG (E(i) ) = ||g − E(i) g||

2.4. Rank estimation and basis vector extraction First, we estimate the rank F(i) of each T (i) using the scree graph technique [17]. More particularly, it is estimated by looking for a straight line in the curve representing the ordered singular values of the unfolding matrix of T (i) used in the DIAG algorithm [22,21]. The rank F(i) is automatically chosen as the number of singular values which cannot be fitted to an observed straight line. (i) Next, we compute the F(i) basis vectors εf of S(i) by canonically decomposing each T (i) as explained in Section 2.2. Then, both vector subspaces E(i) are jointly identified using the following solution: argmax (H(1) ,H(2) )⊂(S(1) ,S(2) )

(1)

Methods

Se

CP-DMA PCA [9] LDA SVM K-means

0.76 0.76 0.35 0.18 0.35 0 0 0 0.18

kNN

k=1 k=3 k=5

Sp 0.89 0.71 0.72 0.79 0.77 0.79 0.93 0.99 0.82

Patients well classified 86/99 71/99 65/99 68/99 69/99 65/99 76/99 81/99 70/99

(9)

where || . || is the euclidean norm operator and where g = vec(G) is the vectorized version of the 3D dose distribution G.

R=

Table 1 Performance comparison of unsupervised classification techniques such as CP-DMA and PCA [9] with supervised algorithms.

orth. TUCKER-3 [25]

(i)

Consequently, the dG (E(i) ) distance is given by:

129

(2)

(|dG (H(1) ) − dG (H(2) )|)

(10)

where G is the 3D planned dose distribution of the patient to be classified. It is noteworthy that this 3D planned dose distribution is acquired before the beginning of the treatment. In other words, although the choice of (E(1) , E(2) ) depends on the patient to be classified, no a posteriori information about the occurrence of rectal bleeding events of this patient is used to identify (E(1) , E(2) ). 3. Results We propose to split the sample of data into two complementary subsets: the training and testing sets. From the testing set, we can then compute the Sensitivity (Se ) and the Specificity (Sp ) of the classification procedure described in Section 2. The sensitivity assesses the probability of patients suffering from rectal bleeding who are correctly identified as rectal bleeders. Likewise, the specificity assesses the probability of patients who are correctly identified as no rectal bleeders. The Se and Sp values were computed within a leave-one-out cross-validation scheme because of the size of the cohort and the low number of positive events (i.e. 17 bleeding patients out of 99). More precisely, at each round, a 3D dose distribution G was removed from either T (1) or T (2) and used as testing sample while the 98 remaining dose distributions were used as the training set. The couple (F(1) , F(2) ) = (5, 5) was generally found from one round to another. Note that the F(1) and F(2) values were estimated by only considering the training set. Then, the Se and Sp values were computed by considering all the rounds of the leave-one-out cross-validation from the testing set. The Se and Sp values obtained for (R(1) , R(2) ) = (F(1) , F(2) ) (i.e. by (i)

using all the vectors ␧f derived from the CP decomposition of T (i) to form a vector basis of E(i) , in other words by taking E(i) = S(i) are Se = 0.29 and Sp = 0.96. The Se and Sp values obtained for (R(1) , R(2) ) = (1, 1) (i.e. by exploiting only one vector derived from the CP

decomposition of T (i) to form a vector basis of E(i) as described in Section 2.4) are Se = 0.76 and Sp = 0.89. The proposed CP-DMA method were also compared to the PCAbased unsupervised classification approach presented in [9] and several supervised classification techniques such as the LDA, Support Vector Machine (SVM) [13], K-means and K-Nearest Neighbors (KNN) algorithms, which also take the advantage of the spatial information through the use of the registered 3D planned dose distribution. All of these comparative classification techniques were performed in a leave-one-out cross-validation scheme. Concerning the structuring of the data, a vectorization of the 3D planned dose distributions was performed. Each individual’s 3D planned dose distributions can be represented as a 1D vector by vertically concatenating the transpose rows of all the slices. Thus, we obtained a (N × M) matrix X for all the individuals, where M represents the number of individuals (the variables) and N the number of voxels (the number of observations carried out). As we considered the data as a multi-way array, we also performed a comparison with one of the approaches proposed by Phan et al. [25] (orth. TUCKER3). In this approach, the data are arranged in a 4-way tensors, and a TUCKER-3 decomposition with orthogonality constraints is computed to find the set of basis matrices and corresponding features for the training data. Then, they performed a feature extraction for test samples using the basis factors found from the training data. Finally, they performed classification by comparing the test features with the training features using an SVM classifier. As for all the other comparative approaches, we performed it in a leave-one-out scheme. Results are given in Table 1. For a thorough evaluation, we also compared CP-DMA with the Lyman–Kurcher–Burman (LKB) model [20,26,8]. One of the issue for this comparison to be carried out lies in the need of large cohorts to estimate the population specific parameters. Recent paper proposed by Michalski et al. [24] identified these parameters judged to best describe the existing published data. Nevertheless, in our study, we computed the LKB parameters using the DVHs of our 99 patients and those of 202 additional patients treated by IMRT with the same protocol explained in Section 2.1. The reason is that the Quantec LKB parameters of [24] are defined for the late rectal bleeding of Grade ≥2 could not be include in our study which focused on rectal bleeding Grade ≥1 at 2 years. Besides, the total prescribed dose, the different techniques of irradiation and the used recommendations (in this study GETUG 06 recommendations) will influence the value of the LKB parameters by having an effect on the patient positioning, CT acquisition, volume delineations and dose constraints. It is noteworthy that the 3D planned dose distributions for these 202 additional patients were not available, so we could not use them to evaluate the numerical performance of CP-DMA. The Receiver Operating Characteristic curve (ROC) and the Area Under the Curve (AUC) at the output of the CP-DMA algorithm and the NTCP model using the LKB parameters given in Table 2 are shown in Fig. 2. By searching for the optimal cut-off point that more

130

A. Fargeas et al. / Medical Engineering & Physics 37 (2015) 126–131

Table 2 Parameters for LKB models TD50(Gy)

n

m

AUC

100.38

0.00038

0.16

0.56

discriminates rectal bleeding to non-rectal bleeding patients, the ROC curve of the LKB-NTCP model provided 0.59 sensitivity and 0.40 specificity which means that only 43/99 patients were correctly classified by means of the LKB-NTCP model. Finally, we highlighted characteristics of the basis vectors ε(1) and ε(2) , which represent the two groups, respectively. First, for each group (rectal bleeders and non-rectal bleeders), we measured the similarity between the different basis vectors derived from all the rounds of the leave-one out process through the Pearson correlation. We then observed a median similarity score of 0.6 and 0.9 for the rectal and non-rectal bleeding groups, respectively, versus 0.75 and 0.9, respectively, when basis vectors leading to a bad classification were not considered to compute both scores. Secondly, for each group, we computed the median of the basis vectors derived from all the rounds of the leave-one-out process, and we reshaped it as a 3D image. Obtained results suggest that spatial patterns representative of the rectal bleeding group may be identified. A dominant pattern located in the anterior wall showed up. The anterior wall is closed to the prostate with a higher irradiated dose. 4. Discussion and future work In this paper, we proposed an original method, aimed at characterizing the dose–toxicity relationships across a population of patients treated with radiotherapy for prostate cancer. Although the proposed CP-DMA method extracts group characteristics from non-rigidly registered 3D dose distributions which better explains links between patients with / without side effects after treatment, it can also be directly applied to different classification problems of non-rigidly aligned 3D voxel data. In our context, CP-DMA provides good results particularly when using an optimal combination of basis vectors to span each subspace E(i) compared to the results obtained by (R(1) , R(2) ) = (F(1) , F(2) ). Indeed, we obtained 0.76 sensitivity and 0.89 specificity. In other words, 13 rectal bleeders out of 17 and 73 non-rectal bleeders out of 86 were well classified. As shown in Table 1 and Fig. 2, CP-DMA outperforms seven other approaches, including LDA, the well-known LKB-NTCP model and a multiway analysis approach proposed by Phan et al. [25]. It is also interesting to note that our classification criterion is based on the euclidean norm and does not require any evolutive classification method.

The use of the CP decomposition in order to extract rank-1 basis vectors (features) of both subspaces of interest mainly contributes to the good classification results. The simple classification scheme that uses straightforwardly the population data T (1) and T (2) without any vector basis identification, provides weak results (Se = 0 and Sp = 1). In the same way, using others decompositions such as the Singular Value Decomposition (SVD) and HOSVD (orth. TUCKER3) leading to the identification of orthogonal basis vectors, also gives poor results (Se = 0 and Sp = 1 for the SVD approach and Se = 0.18 and Sp = 0.82 for the HOSVD technique). Identification of basis vectors seems then to be necessary, and more particularly basis vectors without orthogonality constraint and with a specific structure such as the rank-1 structure. The observed misclassifications when using CP-DMA may be due to the fact that for some patients the relationship between dose and side effect is not directly established. Indeed, there exist individual specificities related with the occurrence of toxicity such as individual radio-sensitivity [32,2], the clinical history or other factors determined by genetic and/or epigenetic mechanisms [20,26,8], that were not investigated in this paper. In addition, the proposed CP-DMA method also depends on the dose mapping, which aligns the whole population to a single coordinate system (template) to be representative of a given population. Thus, further work will be investigated to enhance: the alignment of all structures and the selection of the optimal template from the database to be representative to a given population. The good performance provided by CP-DMA with respect to the raised question, suggests however that when the dose–toxicity relationship exists an increased classification can be achieved by exploiting the 3D planned dose distributions. Although the obtained results are promising in terms of performance, results shall be confirmed with larger cohorts. Besides, future work will be focused on the inclusion of clinical factors in our classification procedure in order to increase the efficiency of the CP-DMA method. Eventually, we can wonder if the basis vectors computed by means of the CP-DMA method could highlight regions where rectal bleeding would be related to different dose patterns. A recent study [30] analyzed the segmental dose volume histogram of the rectum using a regression model. This work revealed associations between bowel quality of life and inferior rectal dose that could significantly influence radiation planning and prognostic models. In other words, the dominant pattern observed with CP-DMA is in line with the results given in [30]. Obviously, all these observations should be confirmed with a deeper statistical analysis based on a larger database which will be the object of forthcoming work. Identifying such spatial patterns is crucial to adapt the treatment taking into account individual specificities. Thus, new dose-volume constraints would be proposed for the identified sub-regions in the IMRT inverse planning system. 5. Conclusion

Fig. 2. ROC and AUC to predict 2-year grade>1 rectal bleeding for LKB-NTCP model and CP-DMA.

This preliminary study focused on the analysis of non-rigidly registered planned rectal 3D dose distributions across a population. The goal was here to classify patients at risk of suffering from rectal bleeding after prostate radiotherapy. The originality of the proposed CP-DMA approach is its jointly exploits the 3D spatial patterns of dose from patients sharing characteristics by means of a particular tensor decomposition, namely the canonical polyadic decomposition. Thus, representative features are extracted and used as inputs of a low computational classifier, which compares two Euclidean distances. The robustness of the CP-DMA approach was demonstrated by comparing its results with seven other approaches including LDA, the LKB NTCP model and a recently published method which is based on the orthogonal TUCKER3

A. Fargeas et al. / Medical Engineering & Physics 37 (2015) 126–131

(HOSVD) tensor decomposition. The obtained results shed some light on the difficult problem of understanding dose–toxicity relationships. Thus, the CP-DMA emerges as a very promising approach for feature extraction and classification in toxicity studies after prostate cancer radiotherapy and paves the way to other image processing applications. Ethical approval Since the patients have been included in a multicentric prospective randomized trial (IGRT-P, registration number = RECF044), the patients have signed an informed consent form. This trial has been granted by the French National Cancer Institute (INCa). Acknowledgments This work was supported by “Region Bretagne” and has received a French government support granted to the CominLabs excellence laboratory and managed by the National Research Agency in the “Investing for the Future” program under reference ANR-10-LABX07-01. Conflict of interests None declared. References [1] Acosta O, Drean G, Ospina JD, Simon A, Haigron P, Lafond C, et al. Voxel-based population analysis for correlating local dose and rectal toxicity in prostate cancer radiotherapy. Phys Med Biol 2013;58(March (8)):2581–95. [2] Bentzen SM, Overgaard J. Patient-to-patient variability in the expression of radiation-induced normal tissue injury. Sem Radiat Oncol 1994;4(2):68–80. [3] Buettner F, Gulliford SL, Webb S, Partridge M. Using dose-surface maps to predict radiation-induced rectal bleeding: a neural network approach. Phys Med Biol 2009;54(17):5139. [4] Buettner F, Gulliford SL, Webb S, Partridge M. Modeling late rectal toxicities based on a parameterized representation of the 3D dose distribution. Phys Med Biol 2011;56(7):2103. [5] Beckendorf V, Guerif S, Le Pris E, Cosset J-M, Bougnoux A, Chauvet B, et al. 70 Gy versus 80 Gy in localized prostate cancer: 5-year results of getug 06 randomized trial. Int J Radiat Oncol Biol Phys 2011;80(July (4)):1056–63. [6] Comon P, Luciani X, de Almeida ALF. Tensor decompositions, alternating least squares and other thales. J Chemom April 2009;23. [7] de Crevoisier R, Pommier P, Bachaud J, Crehange G, Boutry C, Chauvet B, et al. Image-guided radiation therapy (IGRT) in prostate cancer: preliminary results in prostate registration and acute toxicity of a randomized study. Int J Radiat Oncol Biol Phys 2009;75(3 Suppl.):S99. [8] Defraene G, Van den Bergh L, Al-Mamgani A, Haustermans K, Heemsbergen W, Van den Heuvel F, et al. The benefits of including clinical factors in rectal normal tissue complication probability modeling after radiotherapy for prostate cancer. Int J Radiat Oncol Biol Phys 2012;82(3):1233–42. [9] Fargeas A, Kachenoura A, Acosta O, Albera L, Drean G, De Crevoisier R. Feature extraction and classification for rectal bleeding in prostate cancer radiotherapy: a PCA based method. IRBM 2013;34(4–5):296–9.

131

[10] Fiorino C, Cozzarinib C, Vavassoric V, Sanguinetid G, Bianchie C, Mauro Cattaneoa G, et al. Relationships between DVHs and late rectal bleeding after radiotherapy for prostate cancer: analysis of a large group of patients pooled from three institutions. Radiother Oncol 2002;64(July (1)):1–12. [11] Fiorino C, Rancati T, Valdagni R. Predictive models of toxicity in external radiotherapy: dosimetric issues. Cancer 2009 Jul;115(13 Suppl.):3135–40. [12] Fripp J, Bourgeata P, Acosta O, Ranigaa P, Modata MK, Piked E, et al. Appearance modeling of 11C PiB PET images: characterizing amyloid deposition in Alzheimer’s disease mild, cognitive impairment and healthy aging. Neuroimage 2008;43(3):430. [13] Furey TS, Cristianini N, Duffy N, Bednarski DW, Schummer M, Haussler D. Support vector machine classification and validation of cancer tissue samples using microarray expression data. Bioinformatics 2000;16(10):906–14. [15] Higdon R, Foster N, Koeppe R, DeCarli C, Jagust W, Clark C, et al. A comparison of classification methods for differentiating fronto-temporal dementia from alzheimer’s disease using FDG-PET imaging. Stat Med 2004;3(2): 315–26. [16] Hunyadi B, Signoretto M, Van Paesschen WJ, Suykens AK, Van Huffel S, De Vos M. Incorporating structural information from the multichannel EEG improves patient-specific seizure detection. Clin Neurophysiol 2012;123(12):2352–61. [17] Jolliffe IT. Principal component analysis. 2nd ed. Springer Series in Statistics; 2005. [18] Karfoul A, Albera L, Lathauwer LD. Iterative methods for the canonical decomposition of multi-way arrays: application to blind underdetermined mixture identification. Elsevier Signal Process 2011;91(8):1789–802. [19] Kruskal JB. Three-way arrays: rank and uniqueness of trilinear decompositions. Linear Algebra Appl 1977;18:95–138. [20] Liu M, Moiseenko V, Agranovich A, Karvat A, Kwan W, Saleh ZH, et al. Normal tissue complication probability (NTCP) modeling of late rectal bleeding following external beam radiotherapy for prostate cancer: a test of the QUANTECrecommended NTCP model. Acta Oncol 2010;49(7):1040–4. [21] Luciani X, Albera L. Canonical Polyadic Decomposition based on joint eigenvalue decomposition. Chemom Int Lab Sys 2014;132:152–67. [22] Luciani X, Albera L. Semi-algebraic canonical decomposition of multi-way arrays and joint eigenvalues decomposition. IEEE ICASSP 2011:4104–7. [23] Marzi S, Arcangeli G, Saracino B, Petrongari MG, Bruzzaniti V, Iaccarino G, et al. Relationships between rectal wall dose–volume constraints and radiobiologic indices of toxicity for patients with prostate cancer. Int J Radiat Oncol Biol Phys 2007;May (68)(1):41–9. [24] Michalski JM, Gay H, Jackson A, Tucker SL, Deasy JO. Radiation dose–volume effects in radiation-induced rectal injury. Int J Radiat Oncol Biol Phys 2010;76(3):S123–9. [25] Phan AH, Cichocki A. Tensor decompositions for feature extraction and classification of high dimensional datasets. Nonlinear Theory Appl, IEICE 2010;1(1):37–68. [26] Rancati T, Fiorino C, Fellin G, Vavassori V, Cagna E, Casanova Borca V, et al. Inclusion of clinical risk factors into NTCP modelling of late rectal toxicity after high dose radiotherapy for prostate cancer. Radiat Oncol 2011;100(1):124–30. [27] Sidiropoulos ND, Bro R. On the uniqueness of the multilinear decomposition of n-way arrays. J Chemom 2000;14:229–39. [28] Signoretto M, Dinh QT, De Lathauwer LJ, Suykens AK. Learning with tensors: a framework based on convex optimization and spectral regularization. Mach Learn 2013:1–49. [29] Smilde A, Geladi RBP. Multi-way analysis. England: John Wiley & Sons Ltd.; 2004. [30] Stenmark MH, Conlon ASC, Johnson S, Daignault S, Litzenberg D, et al. Dose to the inferior rectum is strongly associated with patient reported bowel quality of life after radiation therapy for prostate cancer. Radiother Oncol 2014;110(2):291–7. [31] Thirion J. Image matching as a diffusion process: an analogy with Maxwell’s demons. Med Image Anal 1998;2(3):243–60. [32] Turesson I, Nyman J, Holmberg E, Odén A. Prognostic factors for acute and late skin reactions in radiotherapy patients. Int J Radiat Oncol Biol Phys 1996;36:1065–75.

On feature extraction and classification in prostate cancer radiotherapy using tensor decompositions.

External beam radiotherapy is commonly prescribed for prostate cancer. Although new radiation techniques allow high doses to be delivered to the targe...
834KB Sizes 0 Downloads 8 Views