Skip to content

Pixiv Utils implemented in Python, including Pixiv Crawler and Mosaic Puzzles, support for rankings, personal bookmarks, artist works and keyword search for personalized filtering, and provide high-performance multi-threaded parallel download. 🤗

License

Notifications You must be signed in to change notification settings

CWHer/PixivCrawler

Repository files navigation

Pixiv Utils

Daily test

The Chinese version of README can be found here.

About

Pixiv Utils implemented in Python, including Pixiv Crawler and Mosaic Puzzles, support for rankings, personal bookmarks, artist works and keyword search for personalized filtering, and provide high-performance multi-threaded parallel download. 🤗

This GIF depicts a sample run in normal speed,

Features

  • Pixiv Crawler

    • Ranking lists for daily/monthly/yearly...

    • Personal bookmarks

    • Specific artist's artworks

    • Specific keyword's artworks (support advanced keyword search, e.g., (Lucy OR 边缘行者) AND (5000users OR 10000users))

    • Parallel download with multi-threading

  • Mosaic Puzzles

Installation

Install from PyPI (Recommended)

pip install pixiv-utils

Install from source

git clone git@github.com:CWHer/PixivCrawler.git
pip install -v .

Quick Start

Please refer to tutorial for comprehensive instructions.

Note: This section only contains the usage of Pixiv Crawler. For the usage of Mosaic Puzzles, please refer to Mosaic Puzzles Doc.

import datetime

from pixiv_utils.pixiv_crawler import (
    RankingCrawler,
    checkDir,
    displayAllConfig,
    download_config,
    network_config,
    ranking_config,
    user_config,
)

if __name__ == "__main__":
    network_config.proxy["https"] = "127.0.0.1:7890"
    user_config.user_id = ""
    user_config.cookie = ""
    download_config.with_tag = False
    ranking_config.start_date = datetime.date(2024, 5, 1)
    ranking_config.range = 2
    ranking_config.mode = "weekly"
    ranking_config.content_mode = "illust"
    ranking_config.num_artwork = 50

    displayAllConfig()
    checkDir(download_config.store_path)

    app = RankingCrawler(capacity=200)
    app.run()

Learn about the configurations

The configurations locate at config.py, which contains several items that should potentially be modified, denoted by ⚠️. You can simply import these configurations, modify them like the example above, and use displayAllConfig() to check if they are correct.

  • RankingConfig

    import ranking_config from pixiv_utils.pixiv_crawler

    NOTE: This config is only activated when downloading the ranking list.

    • ranking_config.start_date: datetime.date: The start date of the ranking list ⚠️

    • ranking_config.range: int: The date range of the ranking list ⚠️ [start, start + range - 1]

    • ranking_config.mode: str: The type of ranking list ⚠️, which can be chosen from

      ranking_modes: Tuple = (
          "daily",
          "weekly",
          "monthly",
          "male",
          "female",
          "daily_ai",
          "daily_r18",
          "weekly_r18",
          "male_r18",
          "female_r18",
          "daily_r18_ai",
      )
    • ranking_config.content_mode: str: The type of content in the ranking list ⚠️, which can be chosen from

      content_modes: Tuple = ("all", "illust", "manga", "ugoira")
    • ranking_config.num_artwork: int: The number of artworks to be downloaded in each ranking list ⚠️

  • NetworkConfig

     import network_config from pixiv_utils.pixiv_crawler
    • network_config.proxy: Dict: The proxy configuration ⚠️

      # For example, to turn off the proxy
      network_config.proxy["https"] = ""

      The default proxy["https"] value is 127.0.0.1:7890, which is the default proxy port of clash. It needs to be changed according to the actual proxy settings. If you do not need a proxy, please set the https attribute to "".

    • network_config.headers: Dict: The headers used in the request.

  • UserConfig

    import user_config from pixiv_utils.pixiv_crawler

    NOTE: User-specific configurations are required when downloading personal bookmarks or R18 content.

    • user_config.user_id: str: The user ID of the Pixiv account ⚠️. You can find it in the URL of your profile page, https://www.pixiv.net/users/{UID}.

    • user_config.cookie: str: The cookie of your Pixiv account ⚠️

      1. Open the browser's DevTools (usually F12) and switch to the Network tab.

      2. Visit the ranking list and refresh the page. Find ranking.php in DevTools.

        Copy all characters after cookie: (as shown in the red box) to the COOKIE configuration

  • DownloadConfig

    import download_config from pixiv_utils.pixiv_crawler
    • download_config.timeout: float: The timeout of the request.

    • download_config.retry_times: int: The number of retries after a request fails.

    • download_config.fail_delay: float: The delay after a request fails.

    • download_config.store_path: str: The path to store the downloaded images ⚠️

    • download_config.with_tag: bool: Whether to download image tags to tags.json. ⚠️

    • download_config.url_only: bool: Whether to download image URLs only, without downloading images. URL will be returned through app.run(). ⚠️

      ...
      download_config.url_only = True
      
      ...
      urls = app.run()  # a set of image URLs
    • download_config.num_threads: int: The number of threads for parallel download ⚠️

    • download_config.thread_delay: float: The delay for each thread to start.

  • DebugConfig

    import debug_config from pixiv_utils.pixiv_crawler
    • debug_config.verbose: bool: Whether to print debug information.

    • debug_config.show_error: bool: Whether to print detailed error information.

Create a crawler instance

  • RankingCrawler

    """
    Download artworks from rankings
    
    NOTE: Require cookie for R18 images!
    
    Args:
        capacity (int): flow capacity, default is 1024MB
    """
    app = RankingCrawler(capacity=200)
    app.run()
  • BookmarkCrawler

    """
    Download artworks from public bookmarks
    
    NOTE: Require cookie!
    
    Args:
        n_images (int): max download number, default is 200
        capacity (int): flow capacity, default is 1024MB
    """
    app = BookmarkCrawler(n_images=20, capacity=200)
    app.run()
  • UserCrawler

    """
    Download artworks from a single artist
    
    NOTE: Require cookie for R18 images!
    
    Args:
        artist_id (str): artist id
        capacity (int): flow capacity, default is 1024MB
    """
    app = UserCrawler(artist_id="32548944", capacity=200)
    app.run()
  • KeywordCrawler

    NOTE: Popularity sorting requires a premium account.

    """
    Download search results of a keyword (sorted by popularity if order=True)
    Support advanced search, e.g. "(Lucy OR 边缘行者) AND (5000users OR 10000users)", refer to https://www.pixiv.help/hc/en-us/articles/235646387-I-would-like-to-know-how-to-search-for-content-on-pixiv
    
    NOTE: Require cookie for R18 images!
    NOTE: Require premium account for popularity sorting!
    
    Args:
        keyword (str): search keyword
        order (bool): order by popularity or not, default is False
        mode (str): content mode, default is "safe", support ["safe", "r18", "all"]
        n_images (int): max download number, default is 200
        capacity (int): flow capacity, default is 1024MB
    """
    app = KeywordCrawler(
        keyword="(Lucy OR 边缘行者) AND (5000users OR 10000users)",
        order=False,
        mode=["safe", "r18", "all"][-1],
        n_images=20,
        capacity=200,
    )
    app.run()

Run

Just run your script. 😆

Tips

  • COOKIE expiration time is relatively long, and can be reused within a few days.

  • Use displayAllConfig() to display all configurations and check if they are correct.

Documentation

Star History

Star History Chart

About

Pixiv Utils implemented in Python, including Pixiv Crawler and Mosaic Puzzles, support for rankings, personal bookmarks, artist works and keyword search for personalized filtering, and provide high-performance multi-threaded parallel download. 🤗

Topics

Resources

License

Stars

Watchers

Forks

Contributors 4

  •  
  •  
  •  
  •