python - How to delete entities not found in feed on GAE -


I'm updating items from a feed on datastore 200 items at a time (which can be approximately 40000 items) Problems It is that the feed can change and some items can be removed from the feed. I have this code:

  Class FeedEntry (db.Model): name = db.StringProperty (required = true) def updateFeed (offset, number = 200): response = fetchFeed (offset, Number) FeedItems = parseFeed (Feedback) feedEntriesToAdd = [] Items for items in feed: feedEntriesToAdd.append (FeedEntry (key_name = item.id, name = item.name)) db.put (feedEntriesToAdd)   

How can I find out which items were not in the item feed and they were removed from the datastore? I thought about creating a list of objects (in datastore) and just remove all those items that I updated and wanted to delete.

PS: All items. IDs are unique to that feed item and are compatible.

If you add a DateTimeProperty with auto_now = true The last revised time of each unit will be recorded. Since you update every item in the feed, when you are finished, then you will come several times after the start of the moment, so any feed prior to any date is no longer there.

Javier's generation counter is equally good - we are guaranteed to grow only during a refresh, and there is never any deficiency during refreshment.

Not sure from the docs, but I hope the datetime property is bigger than an integer property, the latter is a 64 bit integer, so they may be of the same size It may be, or it may be that DateTimeProperty stores several integers suggesting that it is 8 unlike 10 bytes.

But remember that by adding an additional asset, which you query, you are adding another index, so the difference in the size of the field is diluted to the ratio of the overhead, besides some 40 bytes Folds are not more than $ 0.24 / g / month.

With any generation or date-time, you do not need to delete the data immediately, you can filter your other questions, latest refresh date / generation, which means that you can immediately Does not remove if the feed (or your parsing) becomes fun and fails to produce any item, or only produces a few, then the latest lie used as backup Maybe depends entirely on the application whether it is worth.

Comments

Popular posts from this blog

mysql - BLOB/TEXT column 'value' used in key specification without a key length -

c# - Using Vici cool Storage with monodroid -

python - referencing a variable in another function? -