We applied a machine learning approach for exploration of tissue morphology in hematoxylin and eosin (H&E) stained breast cancer tissue microarray (TMA) samples. We then investigated whether the morphological categories produced were associated with clinically relevant molecular biomarkers and 10-year overall survival.
The data set comprises digitized (0.22 µm/pixel) and H&E stained TMA spots from tumor samples of 490 women who were diagnosed with primary breast cancer within a Finnish breast cancer database (FinProg) collected in 1991 and 1992. In order to quantitatively describe the tissue morphologies of the TMA spots, we divided the tissue images into rectangular sub-images (224x224 pixels), and extracted features with a pre-trained convolutional neural network. We then clustered the sub-images (n=147,266) with a non-linear data embedding algorithm that creates a two-dimensional mapping of the tissue morphologies. Lastly, we defined a quantitative profile for each tumor, describing the morphologies within the tissue spot image by dividing the two-dimensional map of morphologies into 128 separate clusters with k-nearest neighbor clustering.
Visual inspection of the two-dimensional embedding of tissue spot images verified that the morphologies clustered coherently, i.e. similar looking sub-images formed distinct clusters in the map. Interestingly, some morphological patterns were strongly associated with tumor estrogen receptor content, progesterone receptor content, human epidermal growth factor receptor 2 status, and the proliferation marker Ki-67 status (p<0.0001 for each comparison). In exploratory analyses we identified one morphological category that was associated with a favorable 10-year overall survival with a risk ratio of 0.68 (CI95% 0.53-0.89, p=0.002, power = 0.87).
Our work demonstrates that unsupervised machine learning can be applied to explore and better understand the role of morphological patterns in breast cancer. Methods that quantitatively assess the morphology of cancer tissue may complement molecular biomarkers and potentially reveal novel prognostic and predictive factors.