Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Suggestion for Enhancing Project Functionality #8

Open
pyccino opened this issue Nov 24, 2023 · 2 comments
Open

Suggestion for Enhancing Project Functionality #8

pyccino opened this issue Nov 24, 2023 · 2 comments

Comments

@pyccino
Copy link

pyccino commented Nov 24, 2023

Hello, I recently came across your project and found it to be quite impressive!

Upon analyzing packets from the Amazon iOS app today, I discovered that it utilizes the "endpoint" to extract valuable information such as:

  • Product availability
  • Product price
  • Prime availability
  • Non-discounted product price (if applicable)

What's particularly intriguing is that each request allows the submission of up to 100 ASINs, making it seemingly resistant to bans (fingers crossed). The endpoint for this functionality is: "https://www.amazon.it/gp/twister/dimension?isDimensionSlotsAjax=1&asinList=B0BG8F7PCX&vs=1"

While the app employs a few other parameters, I have yet to find any of them particularly interesting.

To include ASINs, utilize the "asinList" parameter and separate the ASINs with a comma, as demonstrated here: "https://www.amazon.it/gp/twister/dimension?isDimensionSlotsAjax=1&asinList=B0BG8F7PCX,CB0CG7JG7N3&vs=1"

It's worth noting that the other parameters, apart from "asinList," are not optional, and any alterations to their values result in empty returns (I'm still trying to figure out why).

Although the provided endpoint is for Amazon.it, I believe it could potentially work for other Amazon countries as well.

Here is an example output:

{
    "ASIN": "B0BG8F7PCX",
    "Type": "JSON",
    "sortOfferInfo": "",
    "isPrimeEligible": "false",
    "Value": {
        "content": {
            "twisterSlotJson": {"price": "49.49"},
            "twisterSlotDiv": "<span id=\"_price\" class=\"a-color-secondary twister_swatch_price unified-price\"><span class=\"a-size-mini twisterSwatchPrice\"> 49,49 € </span></span>"
        }
    }
}
&&&
{
    "ASIN": "B0CG7JG7N3",
    "Type": "JSON",
    "sortOfferInfo": "",
    "isPrimeEligible": "false",
    "Value": {
        "content": {
            "twisterSlotJson": {"price": "69.99"},
            "twisterSlotDiv": "<span id=\"_price\" class=\"a-color-secondary twister_swatch_price unified-price\"><span class=\"a-size-mini twisterSwatchPrice\"> 69,99 € </span></span>"
        }
    }
}

If the non-discounted price is present, it will be embedded in the "content" HTML.

@sushil-rgb
Copy link
Owner

Hey @pyccino. I appreciate your insights. I've been thinking about implementing price alert functionalities, and it looks like the endpoint you found will be beneficial for that. Thanks once again for your valuable feedback and suggestions.

@pyccino
Copy link
Author

pyccino commented Dec 1, 2023

I propose the addition of a new method, api_scraping, to leverage an API endpoint for scraping ASIN and price information. This solution aims to enhance the efficiency and reliability of retrieving data compared to other methods.

Code Proposal:

async def api_scraping(self, asin_list):
    api_url = self.config["Ascraper"]['api_url']

    # Create a list of dictionaries for each ASIN and price extracted from the API, using JSON for faster extraction
    for asin in asin_list:
        api_url = api_url + str(asin)

    for retry in range(self.max_retries):
        try:
            # Use the 'static_connection' method to download the HTML content of the search results page
            content = await Response(api_url).content()
            content = content.split(b'&&&')
            
            # Convert the content to JSON
            content = [json.loads(c) for c in content]
            
            # Extract the price from the JSON
            price = [c['Value']['content']['twisterSlotJson']['price'] for c in content]
            asin = [c['ASIN'] for c in content]
            isPrimeEligible = [c['isPrimeEligible'] for c in content]   
            
            # Create a list of dictionaries with ASIN, price, and isPrimeEligible
            asin_price = [{'ASIN': asin[i], 'Price': price[i], 'isPrimeEligible': isPrimeEligible[i]} for i in range(len(asin))]
            
            return asin_price
        except ConnectionResetError as se:
            print(f"Connection lost: {str(se)}. Retrying... ({retry + 1} / {self.max_retries})")
            if retry < self.max_retries - 1:
                await asyncio.sleep(5)

Configuration File:

Ascraper:
    max_retries: 3
    api_url: https://www.amazon.it/gp/twister/dimension?isDimensionSlotsAjax=1&vs=1&asinList=

Implementation Notes:

  • The proposed method utilizes the specified API endpoint for retrieving ASIN and price data.
  • The max_retries configuration parameter determines the maximum number of retry attempts in case of connection issues.
  • The API response is processed using JSON for efficient extraction of relevant information.

Note: While this code may not fully align with the intended use of the API, it provides a simple starting point that can be expanded upon and customized based on future requirements. Further enhancements and adjustments can be made to better align with the desired functionality.

I welcome your feedback on this proposal and am available for further discussions or clarifications.

Thank you.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

2 participants