When should I be using classes in Python?

I have been programming in python for about two years; mostly data stuff (pandas, mpl, numpy), but also automation scripts and small web apps. I'm trying to become a better programmer and increase my python knowledge and one of the things that bothers me is that I have never used a class (outside of copying random flask code for small web apps). I generally understand what they are, but I can't seem to wrap my head around why I would need them over a simple function.

To add specificity to my question: I write tons of automated reports which always involve pulling data from multiple data sources (mongo, sql, postgres, apis), performing a lot or a little data munging and formatting, writing the data to csv/excel/html, send it out in an email. The scripts range from ~250 lines to ~600 lines. Would there be any reason for me to use classes to do this and why?

Classes are the pillar of Object Oriented Programming. OOP is highly concerned with code organization, reusability, and encapsulation.

First, a disclaimer: OOP is partially in contrast to Functional Programming, which is a different paradigm used a lot in Python. Not everyone who programs in Python (or surely most languages) uses OOP. You can do a lot in Java 8 that isn't very Object Oriented. If you don't want to use OOP, then don't. If you're just writing one-off scripts to process data that you'll never use again, then keep writing the way you are.

However, there are a lot of reasons to use OOP.

Some reasons:

  • Organization: OOP defines well known and standard ways of describing and defining both data and procedure in code. Both data and procedure can be stored at varying levels of definition (in different classes), and there are standard ways about talking about these definitions. That is, if you use OOP in a standard way, it will help your later self and others understand, edit, and use your code. Also, instead of using a complex, arbitrary data storage mechanism (dicts of dicts or lists or dicts or lists of dicts of sets, or whatever), you can name pieces of data structures and conveniently refer to them.

  • State: OOP helps you define and keep track of state. For instance, in a classic example, if you're creating a program that processes students (for instance, a grade program), you can keep all the info you need about them in one spot (name, age, gender, grade level, courses, grades, teachers, peers, diet, special needs, etc.), and this data is persisted as long as the object is alive, and is easily accessible.

  • Encapsulation): With encapsulation, procedure and data are stored together. Methods (an OOP term for functions) are defined right alongside the data that they operate on and produce. In a language like Java that allows for access control, or in Python, depending upon how you describe your public API, this means that methods and data can be hidden from the user. What this means is that if you need or want to change code, you can do whatever you want to the implementation of the code, but keep the public APIs the same.

  • Inheritance): Inheritance allows you to define data and procedure in one place (in one class), and then override or extend that functionality later. For instance, in Python, I often see people creating subclasses of the dict class in order to add additional functionality. A common change is overriding the method that throws an exception when a key is requested from a dictionary that doesn't exist to give a default value based on an unknown key. This allows you to extend your own code now or later, allow others to extend your code, and allows you to extend other people's code.

  • Reusability: All of these reasons and others allow for greater reusability of code. Object oriented code allows you to write solid (tested) code once, and then reuse over and over. If you need to tweak something for your specific use case, you can inherit from an existing class and overwrite the existing behavior. If you need to change something, you can change it all while maintaining the existing public method signatures, and no one is the wiser (hopefully).

Again, there are several reasons not to use OOP, and you don't need to. But luckily with a language like Python, you can use just a little bit or a lot, it's up to you.

An example of the student use case (no guarantee on code quality, just an example):

Object Oriented

    class Student(object):
        def __init__(self, name, age, gender, level, grades=None):
            self.name = name
            self.age = age
            self.gender = gender
            self.level = level
            self.grades = grades or {}

        def setGrade(self, course, grade):
            self.grades[course] = grade

        def getGrade(self, course):
            return self.grades[course]

        def getGPA(self):
            return sum(self.grades.values())/len(self.grades)

    # Define some students
    john = Student("John", 12, "male", 6, {"math":3.3})
    jane = Student("Jane", 12, "female", 6, {"math":3.5})

    # Now we can get to the grades easily
    print(john.getGPA())
    print(jane.getGPA())

Standard Dict

    def calculateGPA(gradeDict):
        return sum(gradeDict.values())/len(gradeDict)

    students = {}
    # We can set the keys to variables so we might minimize typos
    name, age, gender, level, grades = "name", "age", "gender", "level", "grades"
    john, jane = "john", "jane"
    math = "math"
    students[john] = {}
    students[john][age] = 12
    students[john][gender] = "male"
    students[john][level] = 6
    students[john][grades] = {math:3.3}

    students[jane] = {}
    students[jane][age] = 12
    students[jane][gender] = "female"
    students[jane][level] = 6
    students[jane][grades] = {math:3.5}

    # At this point, we need to remember who the students are and where the grades are stored. Not a huge deal, but avoided by OOP.
    print(calculateGPA(students[john][grades]))
    print(calculateGPA(students[jane][grades]))

From: stackoverflow.com/q/33072570