How do I create test and train samples from one dataframe with pandas?
I have a fairly large dataset in the form of a dataframe and I was wondering how I would be able to split the dataframe into two random samples (80% and 20%) for training and testing.
I would just use numpy's
In : df = pd.DataFrame(np.random.randn(100, 2)) In : msk = np.random.rand(len(df)) < 0.8 In : train = df[msk] In : test = df[~msk]
And just to see this has worked:
In : len(test) Out: 21 In : len(train) Out: 79