Using Machine Learning to determine the most important features in exoplanet verification
DOI:
https://doi.org/10.47611/jsrhs.v11i3.2821Keywords:
Exoplanet, Machine Learning, Random Forests, Feature Importance, keplerAbstract
Over a decade ago, NASA launched the Kepler Space Telescope in order to find earth-like planets revolving around sun-like stars in the hopes of finding habitable exoplanets. The Kepler pipeline picked up data for over 9000 astronomical bodies, out of which 52% were determined to be false positives, while the remaining 48% were candidates to be classified as exoplanets. The data collected from this mission can be used to assess and automatically classify Kepler Objects of Interest (KOIs) as exoplanets or false positives. Our goal in this work is to determine if some data features are more important than others in classifying an object as an exoplanet. To this end, we built 5 Machine Learning classification models (namely, logistic regression, support vector classifier, gradient boosting classifier, random forest classifier, and multilayer perceptron) and used 15 features to train and test them. We have included Machine Learning models that are explainable to help attain our goal, Our best predictor (random forests) achieved a prediction accuracy of 99% when evaluated with k-fold cross-validation. We evaluated the feature importances of our model and found that 5 of the features (Not Transit-like Flag, Centroid Offset Flag, Stellar Eclipse Flag, Ephemeris Match Indicate Contamination Flag, and Planetary Radius) out of the 15 selected ones make up roughly 75% of the overall feature importances. We hope that our findings can guide the selection of appropriate data to accurately predict exoplanet candidacy for future missions.
Downloads
References or Bibliography
- Goldilocks Zone. (2021, March 4). Exoplanet Exploration: Planets Beyond Our Solar System. https://exoplanets.nasa.gov/resources/323/goldilocks-zone/
- Koch, D. G., Borucki, W., Dunham, E., Geary, J., Gilliland, R., Jenkins, J., ... & Weiss, M. (2004, October). Overview and status of the Kepler Mission. In Optical, Infrared, and Millimeter Space Telescopes (Vol. 5487, pp. 1491-1500). International Society for Optics and Photonics.
- Kepler Exoplanet Search Results. (2017, October 10). [Dataset]. https://www.kaggle.com/nasa/kepler-exoplanet-search-results
- Kepler Objects of Interest. (2017–2018). [Dataset]. https://doi.org/10.26133/NEA4
- Armstrong, D. J., Gamper, J., & Damoulas, T. (2021). Exoplanet validation with machine learning: 50 new validated Kepler planets. Monthly Notices of the Royal Astronomical Society, 504(4), 5327-5344
- Q1-Q17 DR25 TCE. (2017–2018). [Dataset]. https://exoplanetarchive.ipac.caltech.edu/cgi-bin/TblView/nph-tblView?app=ExoTbls&config=tce
- Shallue, C. J., & Vanderburg, A. (2018). Identifying exoplanets with deep learning: A five-planet resonant chain around kepler-80 and an eighth planet around kepler-90. The Astronomical Journal, 155(2), 94.
- Home. (n.d.). MAST. https://archive.stsci.edu/
- Thompson, S. E., Mullally, F., Coughlin, J., Christiansen, J. L., Henze, C. E., Haas, M. R., & Burke, C. J. (2015). A machine learning technique to identify transit shaped signals. The Astrophysical Journal, 812(1), 46.
- Kepler Objects of Interest (KOI) Activity Tables. (n.d.). NASA. https://exoplanetarchive.ipac.caltech.edu/docs/Q1Q17-DR24-KOIcompanionV5.html
- Malik, A., Moster, B. P., & Obermeier, C. (2020). Exoplanet Detection using Machine Learning. arXiv preprint arXiv:2011.14135.
- McCauliff, S. D., Jenkins, J. M., Catanzarite, J., Burke, C. J., Coughlin, J. L., Twicken, J. D., ... & Cote, M. (2015). Automatic classification of Kepler planetary transit candidates. The Astrophysical Journal, 806(1), 6.
- Koch, D. G., Borucki, W. J., Basri, G., Batalha, N. M., Brown, T. M., Caldwell, D., ... & Wu, H. (2010). Kepler mission design, realized photometric performance, and early science. The Astrophysical Journal Letters, 713(2), L79.
- scikit-learn: machine learning in Python — scikit-learn 1.0 documentation. (n.d.). Scikit-Learn. https://scikit-learn.org/stable/
- Z. (2021, September 26). Logistic Regression Explained - Towards Data Science. Medium. https://towardsdatascience.com/logistic-regression-explained-9ee73cede081
- Support Vector Machines: A Simple Explanation. (n.d.). KDnuggets. https://www.kdnuggets.com/2016/07/support-vector-machines-simple-explanation.html
- Hoare, J. (2020, December 8). Gradient Boosting Explained - The Coolest Kid on The Machine Learning Block. Displayr. https://www.displayr.com/gradient-boosting-the-coolest-kid-on-the-machine-learning-block/
- Donges, N. (2021, September 17). A Complete Guide to the Random Forest Algorithm. Built In. https://builtin.com/data-science/random-forest-algorithm
- Multilayer Perceptron - an overview | ScienceDirect Topics. (n.d.). Multilayer Perceptron. https://www.sciencedirect.com/topics/veterinary-science-and-veterinary-medicine/multilayer-perceptron
-Brownlee, J. (2020, August 2). A Gentle Introduction to k-fold Cross-Validation. Machine Learning Mastery. https://machinelearningmastery.com/k-fold-cross-validation/
- Schmelzer, R. (2019, July 24). Understanding Explainable AI. Forbes. https://www.forbes.com/sites/cognitiveworld/2019/07/23/understanding-explainable-ai/?sh=4937bd697c9e
- Data columns in Kepler Objects of Interest Table. (n.d.). NASA Exoplanet Archive. https://exoplanetarchive.ipac.caltech.edu/docs/API_kepcandidate_columns.html
- NASA. (n.d.). TESS - Transiting Exoplanet Survey Satellite. https://www.nasa.gov/tess-transiting-exoplanet-survey-satellite/
Published
How to Cite
Issue
Section
Copyright (c) 2022 Ved Srivathsa; Rida Assaf
This work is licensed under a Creative Commons Attribution-NonCommercial-ShareAlike 4.0 International License.
Copyright holder(s) granted JSR a perpetual, non-exclusive license to distriute & display this article.