Monday, November 7, 2022

[FIXED] How do I use pydantic to standardise import of different formats of the same information?

November 07, 2022 pydantic, python, python-3.x No comments

Issue

I have data from a web API request that can return one of two different dictionaries:

Wh version:  [{'energy_wh': int, 'month': int, 'year': int}]
kWh version: [{'energy': float, 'month': int, 'year': int}]

I only know which format by checking for the presence of the corresponding key.

I'd like to create class with the following properties:

class EmeterMonth:
    year: int
    month: int
    energy: float
    energy_wh: float

And convert an array of dictionaries that could be the Wh or kWh versions to an array of EmeterMonth class instances (or at least provide a common API where either can be requested).

For the Wh version energy_wh = float(energy_wh) and energy = energy_wh / 1000 For the kWh version energy_wh = energy * 1000 and energy = energy

I'd like to make the code as clear and simple to understand as possible, and allow for other formats to come along later that supply the data in some similar but different form.

At the moment I have a class that overides __getitem__ and does the conversion depending on which members the instance has, but this obscures the different versions and is a bit of a hack.

I note that there are actually more data members with different possible units to convert that I've left out for simplicity.

It's been suggested that I could use pydantic to declare the formats but I'm not sure how this would help me with the handling of the different formats.

I'm using Python 3.11

Solution

Pydantic has the concept of validation for things like that. You can use a root_validator to write out the logic that ensures correct conversion before the values are passed to the individual field validators by using the pre=True setting.

If you want this validation to also occur every time values are assigned to an existing model instance, you should set the config option validate_assignment = True as well.

Here is a full working example:

from typing import Any

from pydantic import BaseModel, ValidationError, root_validator


class EmeterMonth(BaseModel):
    year: int
    month: int
    energy: float
    energy_wh: float

    class Config:
        validate_assignment = True

    @root_validator(pre=True)
    def convert_energy(cls, values: dict[str, Any]) -> dict[str, Any]:
        energy = values.get("energy")
        energy_wh = values.get("energy_wh")
        if energy is None:
            if energy_wh is None:
                return values  # regular field validation will raise the errors
            values["energy"] = energy_wh / 1000
        if energy_wh is None:
            assert energy is not None  # sanity check
            values["energy_wh"] = energy * 1000
        if values["energy_wh"] / 1000 != values["energy"]:
            raise ValueError("`energy_wh` must be equal to `energy` * 1000")
        return values


def test() -> None:
    print(EmeterMonth.parse_obj({"year": 22, "month": 11, "energy": 100.5}))
    print(EmeterMonth.parse_obj({"year": 22, "month": 11, "energy_wh": 20000}))
    print()

    try:
        EmeterMonth.parse_obj({"year": 2022, "month": 11})
    except ValidationError as e:
        print(e)
    print()

    try:
        EmeterMonth.parse_obj({"year": 22, "month": 11, "energy_wh": 2, "energy": 1})
    except ValidationError as e:
        print(e)
    print()

    obj = EmeterMonth.parse_obj({"year": 22, "month": 11, "energy": 2})
    print(obj)
    try:
        obj.energy_wh = 5
    except ValidationError as e:
        print(e)


if __name__ == "__main__":
    test()

Output:

year=22 month=11 energy=100.5 energy_wh=100500.0
year=22 month=11 energy=20.0 energy_wh=20000.0

2 validation errors for EmeterMonth
energy
  field required (type=value_error.missing)
energy_wh
  field required (type=value_error.missing)

1 validation error for EmeterMonth
__root__
  `energy_wh` must be equal to `energy` * 1000 (type=value_error)

year=22 month=11 energy=2.0 energy_wh=2000.0
1 validation error for EmeterMonth
__root__
  `energy_wh` must be equal to `energy` * 1000 (type=value_error)

As you can see, the conversion is done as expected. If both fields are omitted, their respective field validators automatically raise errors. If for some reason both values are passed and they are not consistent, an error is raised. If assignment causes inconsistency among the two fields, that same error is raised.

Note also that we don't need to bother with ensuring proper coersion of int to float in our custom validator because that is also done afterwards by the default field validators already in place for float type fields.

Personally, I would recommend against having redundancy in your data model. I doubt there is a point to having both energy values there. You could go for somewhat of a compromise by not declaring a energy_wh field on your model, but still allowing it to be parsed properly and converted to the equivalent energy value. This can also be done with a root_validator(pre=True):

from typing import Any

from pydantic import BaseModel, ValidationError, root_validator


class EmeterMonth(BaseModel):
    year: int
    month: int
    energy: float

    @root_validator(pre=True)
    def convert_energy(cls, values: dict[str, Any]) -> dict[str, Any]:
        energy = values.get("energy")
        energy_wh = values.pop("energy_wh", None)
        if energy is None:
            if energy_wh is None:
                return values  # regular field validation will raise the error
            values["energy"] = energy_wh / 1000
        elif energy_wh is not None and energy_wh / 1000 != energy:
            raise ValueError("`energy_wh` must be equal to `energy` * 1000")
        return values


def test() -> None:
    print(EmeterMonth.parse_obj({"year": 22, "month": 11, "energy": 100.5}))
    print(EmeterMonth.parse_obj({"year": 22, "month": 11, "energy_wh": 20000}))
    print()

    try:
        EmeterMonth.parse_obj({"year": 2022, "month": 11})
    except ValidationError as e:
        print(e)
    print()

    try:
        EmeterMonth.parse_obj({"year": 22, "month": 11, "energy_wh": 2, "energy": 1})
    except ValidationError as e:
        print(e)


if __name__ == "__main__":
    test()

Output:

year=22 month=11 energy=100.5
year=22 month=11 energy=20.0

1 validation error for EmeterMonth
energy
  field required (type=value_error.missing)

1 validation error for EmeterMonth
__root__
  `energy_wh` must be equal to `energy` * 1000 (type=value_error)

If want, you can still have a convenience property on that class to get the Wh value:

    @property
    def energy_wh(self) -> float:
        return self.energy * 1000

Obviously you could also have it the other way around (i.e. storing the Wh value), depending on which format you use more often and other requirements/constraints you may have.

This is generally less error prone and a more sensible approach IMHO. Otherwise you'll always have to be careful to ensure consistency of the two values.

You mentioned that the data comes from some web API. This way, you just perform the conversion once, if necessary, when you fetch the data, but after that you always have the data in a standardized format for yourself.

Answered By - Daniil Fajnberg

This Answer collected from stackoverflow and tested by PythonFixing community admins, is licensed under cc by-sa 2.5 , cc by-sa 3.0 and cc by-sa 4.0

Monday, November 7, 2022

[FIXED] How do I use pydantic to standardise import of different formats of the same information?

Issue

Solution

0 comments:

Post a Comment

Popular Posts

Labels