High Memory Usage #635

darthShadow · 2020-12-31T11:35:58Z

Describe the issue

This is not a bug report but more of a support issue.

I have a script that iterates over all the media of all the items in a huge movie library (45k+ movies) to find items that are missing metadata due to previous plex bugs so it can refresh it. However, as the script progresses, the memory usage shoots up considerably with it taking 3+ GB of memory while it's gone through only 1/3 rd of the list (13k+ items).

Are there any optimizations that I could do to improve this usage?

Code snipppets

Code Snippet:

plex = plexapi.server.PlexServer(PLEX_URL, PLEX_TOKEN, timeout=300)
items = plex.library.section(section).all()

missing_metadata_items = []

for item_index, item in enumerate(items):

    try:
        if not item.thumb or len(item.guids) == 0 or len(item.media) == 0 \
                or item.media[0].bitrate == 0:
            logger.info(f"Metadata Missing for Item : {item.title}"
                        f" ({item.year})")
            missing_metadata_items.append(item)
    except plexapi.exceptions.BadRequest:
        logger.exception(f"Fetching Item : {item.title} ({item.year})")
        missing_metadata_items.append(item)
    except plexapi.exceptions.NotFound:
        logger.exception(f"Fetching Item : {item.title} ({item.year})")
    except plexapi.exceptions.PlexApiException:
        logger.exception(f"Fetching Item : {item.title} ({item.year})")
    finally:
        time.sleep(PLEX_REQUESTS_SLEEP)

total_items = len(items)
total_missing_metadata_items = len(missing_metadata_items)

Expected behavior

Lower memory usage.

Enviroment (please complete the following information):

OS: Linux with Plex in Docker
Plex version: 1.21.1.3830-6c22540d5
Python Version: 3.6.9
PlexAPI version: 4.2.0 (having the commits from the multiple GUIDs PR Add support for the new Plex Movie agent #628

Additional context

N/A

Hellowlol · 2020-12-31T12:39:38Z

I dont understand why the memory usage is so high. But you can use .search with arguments container_start and maxresults. use .totalSize to get the number of items in your library. And manuallly get like 1000k items at the time.

Hellowlol · 2020-12-31T16:47:26Z

Just a example on how you can do this.

def batch(section, libtype="movie", maxresults=100):
    curr = 0

    while True:
        res = section.search(
            libtype=libtype,
            container_start=curr,
            container_size=maxresults,
            maxresults=maxresults,
        )
        curr = curr + 1 + maxresults
        for i in res:
            yield i

        if curr >= section.totalSize:
            break


n = []
for i in batch(pms.library.section("Movies")):
    i.reload()
    n.append(i)
print(len(n))

This uses less then 500 mb for 13k items with a reload. Depending on how you handle the rest you could just store the ratingKey of the movie that is missing some info and update them one by one later in the script.

darthShadow · 2021-01-03T09:01:06Z

Thanks for the suggestion about using search instead. I did try out your sample code but it will still too memory expensive for 12k+ TV Show library (having 350k+ episodes), where it got killed after using 6+ GB of memory after going through half the collection.

This is what worked for me finally:

BATCH_SIZE = 100

def _item_iterator(plex_section, start, batch_size):

    items = plex_section.search(
        container_start=start,
        maxresults=batch_size,
    )

    for item in items:
        item.reload(checkFiles=True)
        yield item


def _batch_get(plex_section, batch_size):

    start = 0

    while True:
        if start >= plex_section.totalSize:
            break

        yield from _item_iterator(plex_section, start, batch_size)

        start = start + 1 + batch_size


def main():
    plex = plexapi.server.PlexServer(PLEX_URL, PLEX_TOKEN, timeout=300)
    plex_section = plex.library.section(section)
    total_items = plex_section.totalSize

    missing_metadata_items = []

    for item in _batch_get(plex_section, BATCH_SIZE):

        try:

            if not item.thumb or len(item.guids) == 0 or len(item.media) == 0 \
                    or item.media[0].bitrate == 0:
                logger.info(f"Metadata Missing for Item : {item.title}"
                            f" ({item.year})")
                missing_metadata_items.append(item)

        except plexapi.exceptions.BadRequest:
            logger.exception(f"Fetching Item : {item.title} ({item.year})")
            missing_metadata_items.append(item)
        except plexapi.exceptions.NotFound:
            logger.exception(f"Fetching Item : {item.title} ({item.year})")
        except plexapi.exceptions.PlexApiException:
            logger.exception(f"Fetching Item : {item.title} ({item.year})")
        finally:
            time.sleep(PLEX_REQUESTS_SLEEP)

    total_missing_metadata_items = len(missing_metadata_items)

It's using less than 50MB of memory for the whole collection now.

Any reason all() doesn't use a similar iterator approach? Or is it just that nobody has required it till now and hence not sent a PR for the same?

Depending on how you handle the rest you could just store the ratingKey of the movie that is missing some info and update them one by one later in the script.

Is there some way to instantiate an item with just the ratingKey for calling refresh on it?

Hellowlol · 2021-01-03T10:25:12Z

You should pass libtype as this allows yout to iterate over episodes directly.

you can use .fetchItem(ratingKey) to build a item directly. But if you don’t need the object you could just used .query and create the refresh url yourself.

The quickest solution might be to download the database and do sql query’s on a copy of the db. This and .query would be the fastest on the amount of item you want to check.

As for your question. I don’t think anybody have tried this on a so massive library before. How many tbs of storage do you need for this thing?

To lazy to add formatting on mobile.

darthShadow · 2021-01-03T11:08:06Z

Thanks for the reply.

You should pass libtype as this allows you to iterate over episodes directly.

Unfortunately, I do need the show object too so it's easier for me to get the shows and then iterate over individual episodes of that show. And by default, it's performing correctly, i.e. returning Movies for Movie Libraries & Shows for TV Libraries, so that's fine for now.

you can use .fetchItem(ratingKey) to build a item directly. But if you don’t need the object you could just used .query and create the refresh url yourself.

I would prefer the object since I add logs regarding which object is being refreshed etc. so the fetchItem is perfect for me. On which object should the function be called? The server or the library section?

The quickest solution might be to download the database and do sql query’s on a copy of the db. This and .query would be the fastest on the amount of item you want to check.

The bottleneck here doesn't seem to be the querying part. It's that there's a lot of missing metadata which Plex is updating when I fetch the items. This missing metadata update will anyway be required even when I go to refresh the item after querying the DB so it's fine with me.

How many tbs of storage do you need for this thing?

This is in the range of PB's now 😉, not TB's.

Hellowlol · 2021-01-03T13:28:00Z

🤩

This method is a part of the base class so you can pretty much use anything. I would use the section on your script.

darthShadow · 2021-01-03T15:54:16Z

Thanks for the help.

Hellowlol closed this as completed Dec 31, 2020

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

High Memory Usage #635

High Memory Usage #635

darthShadow commented Dec 31, 2020 •

edited

Loading

Hellowlol commented Dec 31, 2020

Hellowlol commented Dec 31, 2020 •

edited

Loading

darthShadow commented Jan 3, 2021 •

edited

Loading

Hellowlol commented Jan 3, 2021 •

edited

Loading

darthShadow commented Jan 3, 2021 •

edited

Loading

Hellowlol commented Jan 3, 2021

darthShadow commented Jan 3, 2021

High Memory Usage #635

High Memory Usage #635

Comments

darthShadow commented Dec 31, 2020 • edited Loading

Hellowlol commented Dec 31, 2020

Hellowlol commented Dec 31, 2020 • edited Loading

darthShadow commented Jan 3, 2021 • edited Loading

Hellowlol commented Jan 3, 2021 • edited Loading

darthShadow commented Jan 3, 2021 • edited Loading

Hellowlol commented Jan 3, 2021

darthShadow commented Jan 3, 2021

darthShadow commented Dec 31, 2020 •

edited

Loading

Hellowlol commented Dec 31, 2020 •

edited

Loading

darthShadow commented Jan 3, 2021 •

edited

Loading

Hellowlol commented Jan 3, 2021 •

edited

Loading

darthShadow commented Jan 3, 2021 •

edited

Loading