W3Cschool
恭喜您成為首批注冊(cè)用戶(hù)
獲得88經(jīng)驗(yàn)值獎(jiǎng)勵(lì)
查找重復(fù)項(xiàng)并刪除已處理的項(xiàng)的篩選器。假設(shè)我們的項(xiàng)目有一個(gè)唯一的ID,但是我們的spider返回具有相同ID的多個(gè)項(xiàng)目:
from itemadapter import ItemAdapter
from scrapy.exceptions import DropItem
class DuplicatesPipeline:
def __init__(self):
self.ids_seen = set()
def process_item(self, item, spider):
adapter = ItemAdapter(item)
if adapter['id'] in self.ids_seen:
raise DropItem(f"Duplicate item found: {item!r}")
else:
self.ids_seen.add(adapter['id'])
return item
Copyright©2021 w3cschool編程獅|閩ICP備15016281號(hào)-3|閩公網(wǎng)安備35020302033924號(hào)
違法和不良信息舉報(bào)電話:173-0602-2364|舉報(bào)郵箱:jubao@eeedong.com
掃描二維碼
下載編程獅App
編程獅公眾號(hào)
聯(lián)系方式:
更多建議: