Issue
I have data from a web API request that can return one of two different dictionaries:
Wh version: [{'energy_wh': int, 'month': int, 'year': int}]
kWh version: [{'energy': float, 'month': int, 'year': int}]
I only know which format by checking for the presence of the corresponding key.
I'd like to create class with the following properties:
class EmeterMonth:
year: int
month: int
energy: float
energy_wh: float
And convert an array of dictionaries that could be the Wh or kWh versions to an array of EmeterMonth class instances (or at least provide a common API where either can be requested).
For the Wh version energy_wh = float(energy_wh)
and energy = energy_wh / 1000
For the kWh version energy_wh = energy * 1000
and energy = energy
I'd like to make the code as clear and simple to understand as possible, and allow for other formats to come along later that supply the data in some similar but different form.
At the moment I have a class that overides __getitem__
and does the conversion depending on which members the instance has, but this obscures the different versions and is a bit of a hack.
I note that there are actually more data members with different possible units to convert that I've left out for simplicity.
It's been suggested that I could use pydantic to declare the formats but I'm not sure how this would help me with the handling of the different formats.
I'm using Python 3.11
Solution
Pydantic has the concept of validation for things like that. You can use a root_validator
to write out the logic that ensures correct conversion before the values are passed to the individual field validators by using the pre=True
setting.
If you want this validation to also occur every time values are assigned to an existing model instance, you should set the config option validate_assignment = True
as well.
Here is a full working example:
from typing import Any
from pydantic import BaseModel, ValidationError, root_validator
class EmeterMonth(BaseModel):
year: int
month: int
energy: float
energy_wh: float
class Config:
validate_assignment = True
@root_validator(pre=True)
def convert_energy(cls, values: dict[str, Any]) -> dict[str, Any]:
energy = values.get("energy")
energy_wh = values.get("energy_wh")
if energy is None:
if energy_wh is None:
return values # regular field validation will raise the errors
values["energy"] = energy_wh / 1000
if energy_wh is None:
assert energy is not None # sanity check
values["energy_wh"] = energy * 1000
if values["energy_wh"] / 1000 != values["energy"]:
raise ValueError("`energy_wh` must be equal to `energy` * 1000")
return values
def test() -> None:
print(EmeterMonth.parse_obj({"year": 22, "month": 11, "energy": 100.5}))
print(EmeterMonth.parse_obj({"year": 22, "month": 11, "energy_wh": 20000}))
print()
try:
EmeterMonth.parse_obj({"year": 2022, "month": 11})
except ValidationError as e:
print(e)
print()
try:
EmeterMonth.parse_obj({"year": 22, "month": 11, "energy_wh": 2, "energy": 1})
except ValidationError as e:
print(e)
print()
obj = EmeterMonth.parse_obj({"year": 22, "month": 11, "energy": 2})
print(obj)
try:
obj.energy_wh = 5
except ValidationError as e:
print(e)
if __name__ == "__main__":
test()
Output:
year=22 month=11 energy=100.5 energy_wh=100500.0
year=22 month=11 energy=20.0 energy_wh=20000.0
2 validation errors for EmeterMonth
energy
field required (type=value_error.missing)
energy_wh
field required (type=value_error.missing)
1 validation error for EmeterMonth
__root__
`energy_wh` must be equal to `energy` * 1000 (type=value_error)
year=22 month=11 energy=2.0 energy_wh=2000.0
1 validation error for EmeterMonth
__root__
`energy_wh` must be equal to `energy` * 1000 (type=value_error)
As you can see, the conversion is done as expected. If both fields are omitted, their respective field validators automatically raise errors. If for some reason both values are passed and they are not consistent, an error is raised. If assignment causes inconsistency among the two fields, that same error is raised.
Note also that we don't need to bother with ensuring proper coersion of int
to float
in our custom validator because that is also done afterwards by the default field validators already in place for float
type fields.
Personally, I would recommend against having redundancy in your data model. I doubt there is a point to having both energy values there. You could go for somewhat of a compromise by not declaring a energy_wh
field on your model, but still allowing it to be parsed properly and converted to the equivalent energy
value. This can also be done with a root_validator(pre=True)
:
from typing import Any
from pydantic import BaseModel, ValidationError, root_validator
class EmeterMonth(BaseModel):
year: int
month: int
energy: float
@root_validator(pre=True)
def convert_energy(cls, values: dict[str, Any]) -> dict[str, Any]:
energy = values.get("energy")
energy_wh = values.pop("energy_wh", None)
if energy is None:
if energy_wh is None:
return values # regular field validation will raise the error
values["energy"] = energy_wh / 1000
elif energy_wh is not None and energy_wh / 1000 != energy:
raise ValueError("`energy_wh` must be equal to `energy` * 1000")
return values
def test() -> None:
print(EmeterMonth.parse_obj({"year": 22, "month": 11, "energy": 100.5}))
print(EmeterMonth.parse_obj({"year": 22, "month": 11, "energy_wh": 20000}))
print()
try:
EmeterMonth.parse_obj({"year": 2022, "month": 11})
except ValidationError as e:
print(e)
print()
try:
EmeterMonth.parse_obj({"year": 22, "month": 11, "energy_wh": 2, "energy": 1})
except ValidationError as e:
print(e)
if __name__ == "__main__":
test()
Output:
year=22 month=11 energy=100.5
year=22 month=11 energy=20.0
1 validation error for EmeterMonth
energy
field required (type=value_error.missing)
1 validation error for EmeterMonth
__root__
`energy_wh` must be equal to `energy` * 1000 (type=value_error)
If want, you can still have a convenience property on that class to get the Wh value:
@property
def energy_wh(self) -> float:
return self.energy * 1000
Obviously you could also have it the other way around (i.e. storing the Wh value), depending on which format you use more often and other requirements/constraints you may have.
This is generally less error prone and a more sensible approach IMHO. Otherwise you'll always have to be careful to ensure consistency of the two values.
You mentioned that the data comes from some web API. This way, you just perform the conversion once, if necessary, when you fetch the data, but after that you always have the data in a standardized format for yourself.
Answered By - Daniil Fajnberg
0 comments:
Post a Comment
Note: Only a member of this blog may post a comment.