Python dataclasses, introduced in Python 3.7 via the dataclasses module, streamline class definitions for data-heavy objects by auto-generating boilerplate methods like __init__, __repr__, __eq__, and more. They promote cleaner code, type safety, and IDE integration without sacrificing flexibility. This article covers basics to advanced usage, drawing from official docs and practical patterns.python
Defining a Dataclass
Start by importing and decorating a class with @dataclass. Fields require type annotations; the decorator handles the rest.
from dataclasses import dataclass
@dataclass
class Point:
x: float
y: float
p = Point(1.5, 2.5)
print(p) # Point(x=1.5, y=2.5)
Customization via parameters: @dataclass(eq=True, order=False, frozen=False, slots=False) toggles comparisons, immutability (frozen=True prevents attribute changes), and memory-efficient slots (Python 3.10+).
Field Defaults and Customization
Use assignment for immutables; field(default_factory=...) for mutables to avoid shared state.
from dataclasses import dataclass, field
@dataclass
class Employee:
name: str
dept: str = "Engineering"
skills: list[str] = field(default_factory=list)
id: int = field(init=False, default=0) # Skipped in __init__, set later
Post-init logic: Define __post_init__ for validation or computed fields.
def __post_init__(self):
self.id = hash(self.name)
Other field() options: repr=False, compare=False, hash=None, metadata={...} for extras, kw_only=True (3.10+) for keyword-only args.
Inheritance and Composition
Dataclasses support single/multiple inheritance; parent fields prepend in __init__.
@dataclass
class Employee(Person): # Assuming Person from earlier
salary: float
Nested dataclasses work seamlessly; use InitVar for init-only vars.
from dataclasses import dataclass, InitVar
@dataclass
class Logger:
name: str
level: str = "INFO"
log_file: str = None # Computed during init
config: InitVar[dict] = None
def __post_init__(self, config):
if config:
self.level = config.get('default_level', self.level)
self.log_file = config.get('log_path', f"{self.name}.log")
else:
self.log_file = f"{self.name}.log"
app_config = {'default_level': 'DEBUG', 'log_path': '/var/logs/app.log'}
logger = Logger("web_server", config=app_config)
print(logger) # Logger(name='web_server', level='DEBUG', log_file='/var/logs/app.log')
logger = Logger("web_server")
print(logger) # Logger(name='web_server', level='INFO', log_file='web_server.log')
Field order via __dataclass_fields__ aids debugging.
Utilities and Patterns
- replace(): Immutable updates:
new_p = replace(p, age=31). - Exports:
asdict(p),astuple(p)for serialization. - Introspection:
fields(p),is_dataclass(p),make_dataclass(...).
| Feature | Use Case | Python Version |
|---|---|---|
| frozen=True | Immutable data | 3.7+ |
| slots=True | Memory/attr speed | 3.10+ |
| kw_only=True | Keyword args | 3.10+ |
| field(metadata=...) | Annotations | 3.7+ |
Best Practices and Gotchas
Prefer dataclasses over namedtuples for mutability needs; use frozen=True for hashable configs. Avoid overriding generated methods unless necessary—extend via __post_init__. For production, validate inputs and consider slots=True for perf gains.
Leave a Reply