Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

High Memory Usage #635

Closed
darthShadow opened this issue Dec 31, 2020 · 7 comments
Closed

High Memory Usage #635

darthShadow opened this issue Dec 31, 2020 · 7 comments

Comments

@darthShadow
Copy link
Contributor

darthShadow commented Dec 31, 2020

Describe the issue

This is not a bug report but more of a support issue.

I have a script that iterates over all the media of all the items in a huge movie library (45k+ movies) to find items that are missing metadata due to previous plex bugs so it can refresh it. However, as the script progresses, the memory usage shoots up considerably with it taking 3+ GB of memory while it's gone through only 1/3 rd of the list (13k+ items).

Are there any optimizations that I could do to improve this usage?

Code snipppets

Code Snippet:

plex = plexapi.server.PlexServer(PLEX_URL, PLEX_TOKEN, timeout=300)
items = plex.library.section(section).all()

missing_metadata_items = []

for item_index, item in enumerate(items):

    try:
        if not item.thumb or len(item.guids) == 0 or len(item.media) == 0 \
                or item.media[0].bitrate == 0:
            logger.info(f"Metadata Missing for Item : {item.title}"
                        f" ({item.year})")
            missing_metadata_items.append(item)
    except plexapi.exceptions.BadRequest:
        logger.exception(f"Fetching Item : {item.title} ({item.year})")
        missing_metadata_items.append(item)
    except plexapi.exceptions.NotFound:
        logger.exception(f"Fetching Item : {item.title} ({item.year})")
    except plexapi.exceptions.PlexApiException:
        logger.exception(f"Fetching Item : {item.title} ({item.year})")
    finally:
        time.sleep(PLEX_REQUESTS_SLEEP)

total_items = len(items)
total_missing_metadata_items = len(missing_metadata_items)

Expected behavior

Lower memory usage.

Enviroment (please complete the following information):

Additional context

N/A

@Hellowlol
Copy link
Collaborator

I dont understand why the memory usage is so high. But you can use .search with arguments container_start and maxresults. use .totalSize to get the number of items in your library. And manuallly get like 1000k items at the time.

@Hellowlol
Copy link
Collaborator

Hellowlol commented Dec 31, 2020

Just a example on how you can do this.

def batch(section, libtype="movie", maxresults=100):
    curr = 0

    while True:
        res = section.search(
            libtype=libtype,
            container_start=curr,
            container_size=maxresults,
            maxresults=maxresults,
        )
        curr = curr + 1 + maxresults
        for i in res:
            yield i

        if curr >= section.totalSize:
            break


n = []
for i in batch(pms.library.section("Movies")):
    i.reload()
    n.append(i)
print(len(n))

This uses less then 500 mb for 13k items with a reload. Depending on how you handle the rest you could just store the ratingKey of the movie that is missing some info and update them one by one later in the script.

@darthShadow
Copy link
Contributor Author

darthShadow commented Jan 3, 2021

Thanks for the suggestion about using search instead. I did try out your sample code but it will still too memory expensive for 12k+ TV Show library (having 350k+ episodes), where it got killed after using 6+ GB of memory after going through half the collection.

This is what worked for me finally:

BATCH_SIZE = 100

def _item_iterator(plex_section, start, batch_size):

    items = plex_section.search(
        container_start=start,
        maxresults=batch_size,
    )

    for item in items:
        item.reload(checkFiles=True)
        yield item


def _batch_get(plex_section, batch_size):

    start = 0

    while True:
        if start >= plex_section.totalSize:
            break

        yield from _item_iterator(plex_section, start, batch_size)

        start = start + 1 + batch_size


def main():
    plex = plexapi.server.PlexServer(PLEX_URL, PLEX_TOKEN, timeout=300)
    plex_section = plex.library.section(section)
    total_items = plex_section.totalSize

    missing_metadata_items = []

    for item in _batch_get(plex_section, BATCH_SIZE):

        try:

            if not item.thumb or len(item.guids) == 0 or len(item.media) == 0 \
                    or item.media[0].bitrate == 0:
                logger.info(f"Metadata Missing for Item : {item.title}"
                            f" ({item.year})")
                missing_metadata_items.append(item)

        except plexapi.exceptions.BadRequest:
            logger.exception(f"Fetching Item : {item.title} ({item.year})")
            missing_metadata_items.append(item)
        except plexapi.exceptions.NotFound:
            logger.exception(f"Fetching Item : {item.title} ({item.year})")
        except plexapi.exceptions.PlexApiException:
            logger.exception(f"Fetching Item : {item.title} ({item.year})")
        finally:
            time.sleep(PLEX_REQUESTS_SLEEP)

    total_missing_metadata_items = len(missing_metadata_items)

It's using less than 50MB of memory for the whole collection now.


Any reason all() doesn't use a similar iterator approach? Or is it just that nobody has required it till now and hence not sent a PR for the same?


Depending on how you handle the rest you could just store the ratingKey of the movie that is missing some info and update them one by one later in the script.

Is there some way to instantiate an item with just the ratingKey for calling refresh on it?

@Hellowlol
Copy link
Collaborator

Hellowlol commented Jan 3, 2021

You should pass libtype as this allows yout to iterate over episodes directly.

you can use .fetchItem(ratingKey) to build a item directly. But if you don’t need the object you could just used .query and create the refresh url yourself.

The quickest solution might be to download the database and do sql query’s on a copy of the db. This and .query would be the fastest on the amount of item you want to check.

As for your question. I don’t think anybody have tried this on a so massive library before. How many tbs of storage do you need for this thing?

To lazy to add formatting on mobile.

@darthShadow
Copy link
Contributor Author

darthShadow commented Jan 3, 2021

Thanks for the reply.


You should pass libtype as this allows you to iterate over episodes directly.

Unfortunately, I do need the show object too so it's easier for me to get the shows and then iterate over individual episodes of that show. And by default, it's performing correctly, i.e. returning Movies for Movie Libraries & Shows for TV Libraries, so that's fine for now.

you can use .fetchItem(ratingKey) to build a item directly. But if you don’t need the object you could just used .query and create the refresh url yourself.

I would prefer the object since I add logs regarding which object is being refreshed etc. so the fetchItem is perfect for me. On which object should the function be called? The server or the library section?

The quickest solution might be to download the database and do sql query’s on a copy of the db. This and .query would be the fastest on the amount of item you want to check.

The bottleneck here doesn't seem to be the querying part. It's that there's a lot of missing metadata which Plex is updating when I fetch the items. This missing metadata update will anyway be required even when I go to refresh the item after querying the DB so it's fine with me.

How many tbs of storage do you need for this thing?

This is in the range of PB's now 😉, not TB's.

@Hellowlol
Copy link
Collaborator

🤩

This method is a part of the base class so you can pretty much use anything. I would use the section on your script.

@darthShadow
Copy link
Contributor Author

Thanks for the help.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

2 participants