Given Y = VA + Z, how to estimate the unknown dimension of V, A optimally, without overfitting ? This is a 50-year-old challenge for popular PCA model (e.g. factor-analysis, dimentional reduction, etc.)
For the first time, I have found closed-form solution for this challenge via maximum-a-posterior (MAP) estimate in Bayesian method (i.e. the estimation is fast, with linear complexity). In order to solve this problem, I ended up deriving completely new probability distributions (namely Double-gamma and Double-inverse-gamma distributions) in the Appendix.
In simulations, we found that SNR = -10 (dB) is the limit of accurate estimation (i.e. non-overfitting) for independent sources.
By central limit theorem, we know that three standard deviation is the limit of all averaged random variables.
Hence, this SNR limit can be estimated from data Y via signal-plus-noise's percentage \tau(Y) (i.e. SNR > -10 (dB) <=> \tau(Y) < 90%), which means the limit of non-overfitting for independent sources is:
SNR > -10 (dB) <=> "noise's deviation < 3 * source's deviation"
P.S: we compared our MAP method with standard MATLAB packages (music and aictest). Everything should be clear in the code. All feedbacks are really welcome!
V.H.Tran and W.Wang, "Bayesian inference for PCA and MUSIC algorithms with unknown number of sources", submitted to IEEE Trans. on Signal Processing 2018 https://arxiv.org/abs/1809.10168
V.H.Tran, W.Wang, Y.Luo and J.Chambers, "Bayesian Inference for Multi-Line Spectra in Linear Sensor Array", IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP) 2018, https://ieeexplore.ieee.org/document/8461844