Issue
I m a newbie to python, django, scrapy and mongodb What i am trying to do? Trying to persist data from scrapy to a mongodb collection created via django. So scrapy can read the data from this collection and display on a page.
What have i done so far?
- Model in django
class Project(models.Model):
title = models.CharField(max_length=100)
desc = models.CharField(max_length=100)
urls = models.CharField(max_length=100)
- upon migration of the project following 0001_initial.py was generated, meaning django auto generated the field 'id'
# Generated by Django 2.2.8 on 2019-12-27 03:09
from django.db import migrations, models
class Migration(migrations.Migration):
initial = True
dependencies = [
]
operations = [
migrations.CreateModel(
name='Project',
fields=[
('id', models.AutoField(auto_created=True, primary_key=True, serialize=False, verbose_name='ID')),
('title', models.CharField(max_length=100)),
('desc', models.CharField(max_length=100)),
('urls', models.CharField(max_length=100)),
# ('image', models.FilePathField(path='/img')),
],
),
]
- Following is my spider, pipeline.py file
class ProjectspiderPipeline(object):
def __init__(self):
self.conn = pymongo.MongoClient('localhost', 27017)
db = self.conn['djangodb']
#self.collection = db['spiderCollection']
self.collection = db['projects_project']
def process_item(self, item, spider):
self.collection.insert(dict(item))
return item
- This is my items.py
import scrapy
class ProjectspiderItem(scrapy.Item):
_id = scrapy.Field()
title = scrapy.Field()
desc = scrapy.Field()
url = scrapy.Field()
- Now when i try to run it my spider with
self.collection = db['spiderCollection']
in my pipelines. It runs successful. - However when i change the collection to
self.collection = db['projects_project']
. It ends up the following error:
error raise DuplicateKeyError(error.get("errmsg"), 11000, error)
pymongo.errors.DuplicateKeyError: E11000 duplicate key error collection: djangodb.projects_project index: __primary_key__ dup key: { id: null }
Will appreciate if any one could guide me either of the two options:
- how to add auto generated value for _id in my spider?
- Can we bypass autogenerated id in django or migrate by not generating autogenerated ids as in 0001_initial.py file?
thanks and appreciate heaps.
Solution
I would suggest not to generate the mongodb ObjectId yourself and let the DB autogenerate it for you. MongoDB generated _id will be unique and you can also retrieve it after saving your item. Just send the object to the database without _id field and mongodb will generate it for you.
Answered By - Taimur Ahmed Qureshi
0 comments:
Post a Comment
Note: Only a member of this blog may post a comment.