[Paper Review] Identifications of RR Lyrae stars and Quasars from the simulated data of Mephisto-W Survey
This study proposes a Random Forest Classifier (RFC) framework to identify RR Lyrae stars and quasars from simulated multi-band (uvgriz) light curves of the Mephisto-W Survey. Using light curves from SDSS Stripe 82 and simulated observing cadences, the method achieves 95.4% purity and 96.9% completeness for RR Lyrae stars, and 91.4% purity and 90.2% completeness for quasars, demonstrating high accuracy with minimal data requirements.
We have investigated the feasibilities and accuracies of the identifications of RR Lyrae stars and quasars from the simulated data of the Multi-channel Photometric Survey Telescope (Mephisto) W Survey. Based on the variable sources light curve libraries from the Sloan Digital Sky Survey (SDSS) Stripe 82 data and the observation history simulation from the Mephisto-W Survey Scheduler, we have simulated the $uvgriz$ multi-band light curves of RR Lyrae stars, quasars and other variable sources for the first year observation of Mephisto-W Survey. We have applied the ensemble machine learning algorithm Random Forest Classifier (RFC) to identify RR Lyrae stars and quasars, respectively. We build training and test samples and extract ~ 150 features from the simulated light curves and train two RFCs respectively for the RR Lyrae star and quasar classification. We find that, our RFCs are able to select the RR Lyrae stars and quasars with remarkably high precision and completeness, with $purity$ = 95.4 per cent and $completeness$ = 96.9 per cent for the RR Lyrae RFC and $purity$ = 91.4 per cent and $completeness$ = 90.2 per cent for the quasar RFC. We have also derived relative importances of the extracted features utilized to classify RR Lyrae stars and quasars.
Motivation & Objective
- To evaluate the feasibility and accuracy of identifying RR Lyrae stars and quasars in the Mephisto-W Survey's first-year data.
- To develop a robust machine learning pipeline for classifying variable stars and quasars from sparse, multi-band photometric light curves.
- To enable efficient, large-scale source identification in time-domain surveys with limited observations.
- To support key science goals of the Mephisto-W Survey, including Galactic archaeology and cosmological studies.
Proposed method
- Simulated uvgriz multi-band light curves for RR Lyrae stars, quasars, and other variables using SDSS Stripe 82 light curves and the Mephisto-W Survey Scheduler’s cadence model.
- Transformed SDSS ugriz magnitudes to SkyMapper-like uvgriz magnitudes using linear calibration relations derived from cross-matching with SMSS DR2.
- Applied phase-folding and linear interpolation to simulate periodic light curves for RR Lyrae stars based on their known periods and phases.
- For non-periodic quasars, selected five random SDSS observations within a year and re-timed them to 2022 calendar dates to simulate realistic observation epochs.
- Injected realistic Gaussian photometric noise based on observing conditions to produce 'observed' light curves.
- Trained separate Random Forest Classifiers (RFCs) on 150+ extracted features, using 483 RR Lyrae stars and 9,107 quasars as positive samples, with equal-sized negative samples.
Experimental results
Research questions
- RQ1Can machine learning effectively identify RR Lyrae stars from sparse, multi-band light curves in the Mephisto-W Survey?
- RQ2What is the classification accuracy and completeness of RFCs for quasars in the simulated Mephisto-W data?
- RQ3How do feature importance rankings differ between RR Lyrae star and quasar classification?
- RQ4To what extent is the RFC method robust to label noise in training data?
- RQ5Can this method be efficiently applied to large-scale time-domain surveys with limited observation epochs?
Key findings
- The RR Lyrae star RFC achieved a purity of 95.4% and completeness of 96.9%, indicating high-precision and high-efficiency identification.
- The quasar RFC achieved a purity of 91.4% and completeness of 90.2%, demonstrating strong performance on quasar classification.
- The RFC method showed robustness to label noise, with only 0.04% and 0.4% error increase for RR Lyrae and quasar classifiers, respectively, under 5% label noise.
- The most important features for RR Lyrae classification were g−r and mrcolor, while u−g and g−r were most influential for quasar identification.
- The method requires only four to five observations per field per year, making it efficient and suitable for wide-area surveys with limited telescope time.
- The framework is transferable to other time-domain surveys such as ZTF, WFST, LSST, and CSST, enabling scalable source classification.
Better researchstarts right now
From paper design to paper writing, dramatically reduce your research time.
No credit card · Free plan available
This review was created by AI and reviewed by human editors.