I am trying to do a simple sklearn pipeline with a StandardScaler for the numerical values and a OneHotEncoder for the categorical. As shown here, I use the CastTransformer to reduce discrepencies ...
I have a dataset with huge number of duplicates - so in order to speed-up learning process I prefer to remove these duplicates and pass sample_weight=duplicate_counts to the fit(..) method of ...