What rules does Pandas use to generate a view vs a copy?

I'm confused about the rules Pandas uses when deciding that a selection from a dataframe is a copy of the original dataframe, or a view on the original.

If I have, for example,

    df = pd.DataFrame(np.random.randn(8,8), columns=list('ABCDEFGH'), index=range(1,9))

I understand that a query returns a copy so that something like

    foo = df.query('2 < index <= 5')
    foo.loc[:,'E'] = 40

will have no effect on the original dataframe, df. I also understand that scalar or named slices return a view, so that assignments to these, such as

    df.iloc[3] = 70

or

    df.ix[1,'B':'E'] = 222

will change df. But I'm lost when it comes to more complicated cases. For example,

    df[df.C <= df.B]  = 7654321

changes df, but

    df[df.C <= df.B].ix[:,'B':'E']

does not.

Is there a simple rule that Pandas is using that I'm just missing? What's going on in these specific cases; and in particular, how do I change all values (or a subset of values) in a dataframe that satisfy a particular query (as I'm attempting to do in the last example above)?

Note: This is not the same as this question; and I have read the documentation, but am not enlightened by it. I've also read through the "Related" q on this topic, but I'm still missing the simple rule Pandas is using, and how I'd apply it to — for example — modify the values (or a subset of values) in a dataframe that satisfy a particular query.

Here's the rules, subsequent override:

  • All operations generate a copy

  • If inplace=True is provided, it will modify in-place; only some operations support this

  • An indexer that sets, e.g. .loc/.ix/.iloc/.iat/.at will set inplace.

  • An indexer that gets on a single-dtyped object is almost always a view (depending on the memory layout it may not be that's why this is not reliable). This is mainly for efficiency. (the example from above is for .query; this will always return a copy as its evaluated by numexpr)

  • An indexer that gets on a multiple-dtyped object is always a copy.

Your example of chained indexing

    df[df.C <= df.B].ix[:,'B':'E']

is not guaranteed to work (and thus you shoulld never do this).

Instead do:

    df.ix[df.C <= df.B, 'B':'E']

as this is faster and will always work

The chained indexing is 2 separate python operations and thus cannot be reliably intercepted by pandas (you will oftentimes get a SettingWithCopyWarning, but that is not 100% detectable either). The dev docs, which you pointed, offer a much more full explanation.

From: stackoverflow.com/q/23296282

Back to homepage or read more recommendations: