Preprint / Version 1

Transfer Learning of Histology Slides Improved CNN Performance on Lung Cancer by Pretraining on Colon Cancer


  • Wesley Hu



Transfer Learning, Cancer, CNN Performance


A fully automated digital pathology workflow will save 49.4% of time spent on a histology case study and training of machine learning models is necessary for automation. A major limiting factor for the training of machine learning models is the lack of large datasets because datasets in the medical field are private and confidential. To solve this problem I utilized transfer learning to share knowledge between two similar oncology datasets of histology images for colon cancer and lung cancer. Transfer learning reduces the risk of overtraining by exposing the model to different datasets. In my study, I took a look at how transfer learning effects model performance between lung cancer and colon cancer in different dataset sizes. Remarkably, transfer learning on 3,750 lung cancer images outperformed a scratch model trained on twice the dataset. The lowest validation loss the scratch model achieved was about 0.35 while transfer learning achieved a validation loss of about 0.125 which is around a 280% improvement in validation loss. Transfer learning on extremely small dataset sizes (1,000 images for colon cancer and 1,500 images for lung cancer) showed no performance improvements and even performance degradation. All models trained on the extremely small datasets overtrained regardless whether the model was pretrained or not.