HOW TO KNOW WHICH NUMBER OF COMPONENTS IS GOOD
On above documentary i explain to you about PCA but i did not tell you exactly how we choose number of components for PCA if we choose wrong number of component there will be higher variance difference and we can lead to wrong model Teaching.
Lets say we have our array and we dont know exactly how many number of components can be useful so then we apply explained_variance ratio concept
from sklearn.decomposition import PCA
pca = PCA()
new_feature = pca.fit_transform(array)
np.var(new_feature)
pca.explained_variance_ratio_
Return
array([0.92461872, 0.05306648, 0.01710261, 0.00521218])
And after that we apply cumulative sum to make our data visual presented
pca.explained_variance_ratio_
cumsum = np.cumsum(pca.explained_variance_ratio_)
cumsum
Return
array([0.92461872, 0.97768521, 0.99478782, 1. ])
And then Plotting
plt.plot(cumsum)
plt.axhline(y=0.97, c='r', linestyle='-')
plt.grid(True)
plt.show()
download.png
After that we manually inspect to found best number of component that will save original Data Variance by 97%
d = np.argmax(cumsum > 0.97) + 1
print(d)
Return
2
So thats how we choose n_component = 2 and saave model efficiency at the same time