Data specifications via type hints
Typespecs is a lightweight Python library that leverages typing.Annotated to manage metadata (category, description, units, ...) within the type hints of your data structures.
It offers a dedicated read-only dictionary called a type specification to attach your metadata to your type hints.
This approach keeps your code clean and seamlessly coexists with other Annotated-based libraries such as Pydantic.
Finally, the attached metadata can be extracted and aggregated into a pandas.DataFrame object called a specification DataFrame, making it easier to manage it using the rich PyData ecosystem.
pip install typespecsYou can create and attach a type specification, typespecs.Spec(key=value, ...), to a type hint of your data structure such as Python's Data Classes and Pydantic models.
The Spec object acts as a read-only dictionary, ensuring your metadata remains immutable and safe from runtime modifications.
Once your data structure is defined, use typespecs.from_annotated(obj) to extract and aggregate the attached metadata into a specification DataFrame.
By default, the actual data and the metadata-stripped type hints will also be stored in the data and type columns, respectively (you can control this behavior using the data and type parameters in from_annotated).
import typespecs as ts
from dataclasses import dataclass
from typing import Annotated as Ann, TypeVar
@dataclass
class Weather:
temp: Ann[list[float], ts.Spec(category="data", name="Temperature", units="K")]
wind: Ann[list[float], ts.Spec(category="data", name="Wind speed", units="m/s")]
loc: Ann[str, ts.Spec(category="info", name="Observed location")]
weather = Weather([273.15, 280.15], [5.0, 10.0], "Tokyo")
specs = ts.from_annotated(weather)
print(specs) category data name type units
temp data [273.15, 280.15] Temperature list[float] K
wind data [5.0, 10.0] Wind speed list[float] m/s
loc info Tokyo Observed location <class 'str'> <NA>
You can attach multiple Spec objects to a single type hint.
If metadata overlaps between them, the last one will take precedence.
Temp = Ann[list[float], ts.Spec(category="data", name="Temperature")]
Wind = Ann[list[float], ts.Spec(category="data", name="Wind speed")]
Loc = Ann[str, ts.Spec(category="info", name="Observed Location")]
@dataclass
class Weather:
temp: Ann[Temp, ts.Spec(units="K")]
wind: Ann[Wind, ts.Spec(units="m/s")]
loc: Ann[Loc, ts.Spec(name="City")]
weather = Weather([273.15, 280.15], [5.0, 10.0], "Tokyo")
specs = ts.from_annotated(weather)
print(specs) category data name type units
temp data [273.15, 280.15] Temperature list[float] K
wind data [5.0, 10.0] Wind speed list[float] m/s
loc info Tokyo City <class 'str'> <NA>
Typespecs simplifies working with nested types. By default, the metadata attached to nested types will be merged into a single parent row.
Float = Ann[float, ts.Spec(dtype="f8")]
@dataclass
class Weather:
temp: Ann[list[Float], ts.Spec(category="data", name="Temperature", units="K")]
wind: Ann[list[Float], ts.Spec(category="data", name="Wind speed", units="m/s")]
loc: Ann[str, ts.Spec(category="info", name="Observed location")]
weather = Weather([273.15, 280.15], [5.0, 10.0], "Tokyo")
specs = ts.from_annotated(weather)
print(specs) category data dtype name type units
temp data [273.15, 280.15] f8 Temperature list[float] K
wind data [5.0, 10.0] f8 Wind speed list[float] m/s
loc info Tokyo <NA> Observed location <class 'str'> <NA>
You can disable this merging behavior using merge=False in from_annotated.
specs = ts.from_annotated(weather, merge=False)
print(specs) category data dtype name type units
temp data [273.15, 280.15] <NA> Temperature list[float] K
temp/0 <NA> <NA> f8 <NA> <class 'float'> <NA>
wind data [5.0, 10.0] <NA> Wind speed list[float] m/s
wind/0 <NA> <NA> f8 <NA> <class 'float'> <NA>
loc info Tokyo <NA> Observed location <class 'str'> <NA>
Finally, you can include the nested type itself as part of the metadata using the special typespecs.ITSELF object.
This is useful when you want to handle the inner type alongside other metadata within the specification DataFrame.
Dtype = Ann[TypeVar("T"), ts.Spec(dtype=ts.ITSELF)]
@dataclass
class Weather:
temp: Ann[list[Dtype[float]], ts.Spec(category="data", name="Temperature", units="K")]
wind: Ann[list[Dtype[float]], ts.Spec(category="data", name="Wind speed", units="m/s")]
loc: Ann[str, ts.Spec(category="info", name="Observed location")]
weather = Weather([273.15, 280.15], [5.0, 10.0], "Tokyo")
specs = ts.from_annotated(weather)
print(specs) category data dtype name type units
temp data [273.15, 280.15] <class 'float'> Temperature list[float] K
wind data [5.0, 10.0] <class 'float'> Wind speed list[float] m/s
loc info Tokyo <NA> Observed location <class 'str'> <NA>
By default, missing metadata is filled with pandas.NA in a specification DataFrame.
You can specify custom fallback values by using the default parameter in from_annotated.
specs = ts.from_annotated(weather, default={"dtype": None, "units": "1"})
print(specs) category data dtype name type units
temp data [273.15, 280.15] <class 'float'> Temperature list[float] K
wind data [5.0, 10.0] <class 'float'> Wind speed list[float] m/s
loc info Tokyo None Observed location <class 'str'> 1
You can create a specification DataFrame from type hint(s) using typespecs.from_annotation and typespecs.from_annotations.
This is useful when you want to directly handle type hints without defining them within a data structure.
annotations = {
"temp": Ann[list[Dtype[float]], ts.Spec(category="data", name="Temperature", units="K")],
"wind": Ann[list[Dtype[float]], ts.Spec(category="data", name="Wind speed", units="m/s")],
"loc": Ann[str, ts.Spec(category="info", name="Observed location")],
}
specs = ts.from_annotations(annotations)
print(specs) category dtype name type units
temp data <class 'float'> Temperature list[float] K
wind data <class 'float'> Wind speed list[float] m/s
loc info <NA> Observed location <class 'str'> <NA>
specs = ts.from_annotation(annotations["temp"])
print(specs) category dtype name type units
root data <class 'float'> Temperature list[float] K