Issue
I'm working on an internal tool built using Django. Part of what we're doing with this tool is digitizing content from old PDFs. The PDFs have been through some kind of OCR process, but it has left them with frequent doubled or tripled spaces and hard returns at the end of every line. We're using this text to create database objects in Django's built-in admin.
Imagine I have a data model like this:
import re
from django.db import models
from django.db.models import CharField, TextField
class Widget(models.Model):
name = CharField()
description = TextField()
def fix_description(self):
self.description = re.sub(r"\s+", " ", self.description)
self.description = re.sub(r"\\n", "\n", self.description)
self.description = re.sub(r" *\n *", "\n", self.description)
Most of the time, the text in description
will be a single paragraph. Occasionally it should contain actual line breaks, which I'd like to denote by typing \n
into the description field in Django admin when creating the object. The fix_description
method cleans up unintended whitespace exactly as we'd like, and keeps line breaks entered manually.
What I'd like is to have fix_description
run exactly once when the object is created through the Admin form, and then never again. I don't want to worry about having this code run when the description is updated, because if that happens it will remove linebreaks we want to be there. What's the best way to do this?
Solution
The answer is here, I just adjusted it to your specific needs.
Instead of creating the function fix_description
, override the save()
method. Since you do not want the function to run everytime you save the Widget
instance, only the first time, you can check if the object is in the database already by seeing if it has a primary key, pk
. If it does, then the object is already in the database, so don't run the function.
class Widget(models.Model):
name = CharField()
description = TextField()
def save(self, *args, **kwargs):
# run the following only if the object is not in the database
if not self.pk:
self.description = re.sub(r"\s+", " ", self.description)
self.description = re.sub(r"\\n", "\n", self.description)
self.description = re.sub(r" *\n *", "\n", self.description)
# you have to call the super method to actually save the object
super(Widget, self).save(*args, **kwargs)
The other option is to override the __init__
method instead of the save()
method. Then you wouldn't need to check if the object is in the database because __init__
is only called when instantiating the object, not when making changes and saving again. But overriding the __init__
method is not recommended.
The docs give other methods for handling this situation instead of overriding __init__
, but I think what I gave already handles your situation.
Answered By - raphael
0 comments:
Post a Comment
Note: Only a member of this blog may post a comment.