Python dataclasses, introduced in Python 3.7 via the dataclasses module, streamline class definitions for data-heavy objects by auto-generating boilerplate methods like __init__, __repr__, __eq__, and more. They promote cleaner code, type safety, and IDE integration without sacrificing flexibility. This article covers basics to advanced usage, drawing from official docs and practical patterns.python

Defining a Dataclass

Start by importing and decorating a class with @dataclass. Fields require type annotations; the decorator handles the rest.

from dataclasses import dataclass

@dataclass
class Point:
    x: float
    y: float

p = Point(1.5, 2.5)
print(p)  # Point(x=1.5, y=2.5)

Customization via parameters: @dataclass(eq=True, order=False, frozen=False, slots=False) toggles comparisons, immutability (frozen=True prevents attribute changes), and memory-efficient slots (Python 3.10+).

Field Defaults and Customization

Use assignment for immutables; field(default_factory=...) for mutables to avoid shared state.

from dataclasses import dataclass, field

@dataclass
class Employee:
    name: str
    dept: str = "Engineering"
    skills: list[str] = field(default_factory=list)
    id: int = field(init=False, default=0)  # Skipped in __init__, set later

Post-init logic: Define __post_init__ for validation or computed fields.

def __post_init__(self):
    self.id = hash(self.name)

Other field() options: repr=False, compare=False, hash=None, metadata={...} for extras, kw_only=True (3.10+) for keyword-only args.

Inheritance and Composition

Dataclasses support single/multiple inheritance; parent fields prepend in __init__.

@dataclass
class Employee(Person):  # Assuming Person from earlier
    salary: float

Nested dataclasses work seamlessly; use InitVar for init-only vars.

from dataclasses import dataclass, InitVar

@dataclass
class Logger:
    name: str
    level: str = "INFO"
    log_file: str = None  # Computed during init

    config: InitVar[dict] = None

    def __post_init__(self, config):
        if config:
            self.level = config.get('default_level', self.level)
            self.log_file = config.get('log_path', f"{self.name}.log")
        else:
            self.log_file = f"{self.name}.log"

app_config = {'default_level': 'DEBUG', 'log_path': '/var/logs/app.log'}
logger = Logger("web_server", config=app_config)
print(logger)  # Logger(name='web_server', level='DEBUG', log_file='/var/logs/app.log')
logger = Logger("web_server")
print(logger)  # Logger(name='web_server', level='INFO', log_file='web_server.log')

Field order via __dataclass_fields__ aids debugging.

Utilities and Patterns

  • replace(): Immutable updates: new_p = replace(p, age=31).
  • Exports: asdict(p), astuple(p) for serialization.
  • Introspection: fields(p), is_dataclass(p), make_dataclass(...).
Feature Use Case Python Version
frozen=True Immutable data 3.7+
slots=True Memory/attr speed 3.10+
kw_only=True Keyword args 3.10+
field(metadata=...) Annotations 3.7+

Best Practices and Gotchas

Prefer dataclasses over namedtuples for mutability needs; use frozen=True for hashable configs. Avoid overriding generated methods unless necessary—extend via __post_init__. For production, validate inputs and consider slots=True for perf gains.