Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Exception: yfinance failed to decrypt Yahoo data response #1407

Closed
robertluisw opened this issue Feb 7, 2023 · 96 comments
Closed

Exception: yfinance failed to decrypt Yahoo data response #1407

robertluisw opened this issue Feb 7, 2023 · 96 comments

Comments

@robertluisw
Copy link

robertluisw commented Feb 7, 2023

Looks like more encryption issues from yahoo.com

import yfinance as yf
ticker = 'PENN'
stock_info = yf.Ticker(ticker).balance_sheet

Exception: yfinance failed to decrypt Yahoo data response

[ Basically affects everything except price history @ValueRaider ]

Using Python version 3.11.0
yf version 0.2.9

@ValueRaider hijacking top post

[2023-06-23] Update! Latest release fixes financials tables (and removes decryption code).

What is happening? In December 2022 Yahoo began encrypting webpage data, maybe to block scraping. Now, Yahoo is regularly changing their encryption key, we think every day (and maybe multiple times a day). Without an automated system to extract key from their webpage (work in progress), fixing decryption requires a volunteer to manually extract the new key and provide to developers to upload to yfinance.

Help needed

Need a Javascript dev to write a script that extract AES decryption key from obfuscated JS that Yahoo uses to en/decrypt. The key is there plaintext, just need to automate extraction. The JS changes every day so limited scope to hardcode (use Git branch hotfix/decryption to print today's JS url). Don't worry about sandboxing etc, end users won't execute this.

Script should be separate to yfinance codebase. I expect your only interaction with yfinance is testing the extracted key works by putting in yfinance/data.py

Useful comments:

Progress updates

2023-06-21

Update your yfinance! Latest release fixes financials tables and removes decryption code.

2023-06-04

Obvious that the decryption won't be fixed. See last message for plan.

2023-03-25

Ticker.info fixed by fetching from API. Financials still broken.

2023-02-17

Yahoo finally started using a new encryption key not in yfinance backup list of keys, so decryption failing. Inevitable. Surprised it took 4 days.

2023-02-13

What is the "backup decryption method"? This is simply yfinance fetching decryption keys from this GitHub project website instead of extracting from Yahoo.com. Was broken in 0.2.9 but fixed in 0.2.10. Today worked for many thanks to a key uploaded yesterday. Discussion continues on a decent system for extracting & sharing decryption key.

workaround - yahooquery

Python module yahooquery is a functional alternative to yfinance. Instead of scraping webpages it accesses Yahoo's undocumented API. Not encrypted and faster, but lacks earnings_dates. GitHub Documentation

@ValueRaider
Copy link
Collaborator

ValueRaider commented Feb 7, 2023

Agreed. I noticed 12 hours ago Yahoo was more sensitive to spam, but only now a total block.

FYI I've just released 0.2.10 which fixes the backup decrypt methods but doesn't help (I hoped it would), so don't feel pressured to upgrade 0.2.9. Unless you want to debug and fix, then definitely upgrade.

@ValueRaider ValueRaider pinned this issue Feb 7, 2023
@ValueRaider ValueRaider changed the title yfinance failed to decrypt Yahoo data Exception: yfinance failed to decrypt Yahoo data response Feb 7, 2023
Repository owner deleted a comment from robertluisw Feb 7, 2023
Repository owner deleted a comment from robertluisw Feb 7, 2023
Repository owner deleted a comment from onesoned Feb 7, 2023
@ValueRaider
Copy link
Collaborator

ValueRaider commented Feb 7, 2023

If you came to report same issue, just upvote the top comment. Keep this thread clean and constructive.

@ValueRaider
Copy link
Collaborator

ValueRaider commented Feb 8, 2023

Don't see any obvious change to dict structure - still 10004 extra items just like before. Maybe they've upgraded their obfuscation from simply changing key to changing other encryption parameters.

This is the Javascript we think they use to encrypt: https://s.yimg.com/uc/finance/dd-site/js/main.e0c853d8cea2b75a5208.min.js
Reading compressed Javascript not my expertise, maybe someone can extract the encryption parameters and cross-check against yfinance/data.py::decrypt_cryptojs_aes_stores()

Repository owner deleted a comment from bspallholtz Feb 8, 2023
Repository owner deleted a comment from entzyeung Feb 8, 2023
Repository owner deleted a comment from alimjahagirdar Feb 8, 2023
Repository owner deleted a comment from sanyearng Feb 8, 2023
Repository owner deleted a comment from sanyearng Feb 8, 2023
@jasmohan-narula
Copy link

Why are the comments being deleted?

@ValueRaider
Copy link
Collaborator

ValueRaider commented Feb 9, 2023

@jasmohan-narula Because all they essentially say is "I have issue too", contributing nothing. Thread quickly gets messy, some of us want to discuss problem.

@aetmezgu
Copy link

aetmezgu commented Feb 9, 2023

Hello, I have investigated and I noticed different things during my testing using :
python test_yfinance.py (Script tested in a python 3.11.2 docker)

_get_decryption_keys_from_yahoo_js(self, soup) always return an empty array of keys for me and I get the error :
WARNING: No decryption keys could be extracted from JS file. Falling back to backup decrypt methods.

For function _get_decryption_keys_from_yahoo_js in data.py, line 218 :
if len(sub_keys) == key_count: => always return FALSE for me because key_count == 4 and len(sub_keys) always return 10004 for me, so the script never execute the code inside the if since last yahoo changes ?

So I tried to make this if work and I replace the instruction before :
sub_keys = key_list[ind+1:]

To :
sub_keys = key_list[len(key_list)-4:] => To really take the last 4 keys as explained in the comment of the first attempt

And the method now return the concatenate result of the last 4 keys :

# Gather decryption keys:
        soup = BeautifulSoup(response.content, "html.parser")
        keys = self._get_decryption_keys_from_yahoo_js(soup)
        print(keys) => ['2ecbf885a68605aaf0ee8a8b9529fc80c6458ff25278cb981aa69b8103c18471c9219387b538643252eea3e8938c99b078e05ff7589994b974efc3fa8fcf505b']

I guess the code can now try to decrypt the store with the non-empty keys :
stores = decrypt_cryptojs_aes_stores(data, keys)
But I'm still getting the exception :
Exception: yfinance failed to decrypt Yahoo data response

When decrypt_cryptojs_aes_stores(data, keys) is called ....

It seems that the keys contained in the plugin object doesn't work anymore?

I hope it helped, I'll try go deeper in the code to see what makes the decryption failed.

@ValueRaider
Copy link
Collaborator

ValueRaider commented Feb 9, 2023

I've created branch hotfix/decryption for people to collab on. You'll still need to Pull Request but I'll merge with minimal review - proper review can happen later. Just make sure your fork is on that branch not main.

@annis-souames
Copy link

Since the encryption method changed from Yahoo Finance's backend side, does this mean that all of yfinance package is not usable, not even the previous versions ?

@jessysu
Copy link

jessysu commented Feb 11, 2023

@snowgato Off topic but it works. Just pip upgrade your requests and urllib3. dpguthrie/yahooquery#143

Repository owner deleted a comment from snowgato Feb 11, 2023
Repository owner deleted a comment from dkim777 Feb 11, 2023
@Meborl
Copy link

Meborl commented Feb 11, 2023

The json loaded from root.App.main always comprises 10004 key/value pairs, but simply joining the last 4 values is no longer working.

The password needed to disentangle "stores" is generated by a javascript function supplied in "main.xxxxxxxxxxxxxxxxxx.modern.js". The version of this file is indicated by the hash "xxxxxxxxxxxxxxxxxx". The javascript code in this file changes with every version and seems to be heavily obfuscated. I got the same version of "main.xxxxxxxxxxxxxxxxxx.modern.js" for all pages I called on the same day, and another version on the next day. All pages delivered with a certain version of "main.xxxxxxxxxxxxxxxxxx.modern.js" are including the same 10004 key/value pairs in root.App.main, but the order of these 10004 key/values is changed with each page call.

I loaded a stock page in a webbrowser and then opened the inspection console (F12). After setting a breakpoint in "main.xxxxxxxxxxxxxxxxxx.modern.js" I could scrap the password from an internal variable. The password is still a concatenate of 4 of the values comprised in root.App.main and it is 128 bytes long. After manually copying the password into python code, I could read the "stores" dict.

The javascript code in "main.xxxxxxxxxxxxxxxxxx.modern.js" is obfuscated. Variable and function names seem to change in diffrenet versions. The decryption of the json string is done in this function call:

return s.context.dispatcher.stores=JSON.parse(function(e,t){return c().decrypt(e,t).toString(...

In this case, a variable named "e" is holding the entangled content of "stores" and a variable name "t" comprising the 128 bytes password. This password can be used to decrypt the "stores" in all pages delivered with that particular version of "main.xxxxxxxxxxxxxxxxxx.modern.js".

I have no idea, how to automate the generation of the password with "main.xxxxxxxxxxxxxxxxxx.modern.js". Maybe someone experienced in javascript will find a solution.

The way Yahoo is wrapping their data is by no means proper encryption. It is just a kind of obfuscation by misusing standard functions from cryptography.

@dwmanikandan
Copy link

dwmanikandan commented Mar 8, 2023

Could anyone please confirm if there is any screenshots or data files, of store decrypted data. I need to know what it actually contains. with the reference of that. I will do browser automation if it is available in ui.

@ValueRaider
Copy link
Collaborator

I need to know what it actually contains. with the reference of that.

You don't, trust me. If decrypt_cryptojs_aes_stores() completes without exception you probably have the correct key.

@dwmanikandan
Copy link

I need to know what it actually contains. with the reference of that.

You don't, trust me. If decrypt_cryptojs_aes_stores() completes without exception you probably have the correct key.

Do you have any sample store decrypted data?

@ValueRaider
Copy link
Collaborator

@dwmanikandan No

@ValueRaider
Copy link
Collaborator

@asafravid Can you move your yahooquery work into a separate Issue/Discussion, keep this Issue focused on decryption.

@asafravid
Copy link
Collaborator

asafravid commented Mar 8, 2023

@ValueRaider Sure

@qianyun210603
Copy link
Contributor

@ValueRaider
What's the latest status of this? I'm failing calling ticker.info with the latest version (as well as the main branch).

Can anyone succeeds getting info using any version?

I found that https://query1.finance.yahoo.com/v7/finance/quote provides part of the original info information that current fast_info does not provide, and did some modification over current main branch to fit my needs in my fork. Happy to create a PR if that may help.

Repository owner deleted a comment from iukea1 Mar 21, 2023
Repository owner deleted a comment from asafravid Mar 21, 2023
@ValueRaider
Copy link
Collaborator

ValueRaider commented Mar 21, 2023

@qianyun210603 The current status is top post of this thread, there isn't a secret work group. yfinance is community-maintained, all contributions welcome #1084.

@LuluYui
Copy link

LuluYui commented Mar 23, 2023

Is it possible that the decrypt_cryptojs_aes_stores() function is no longer valid to decrypt the yahoo finance page after the update ?

I tried to hardcode the input, with the last 4 key/pairs combined 128bytes password to run decrypt_cryptojs_aes_stores()._decrypt() function, an error araies from the line 133 cipher = Cipher(algorithms.AES(key), modes.CBC(iv))

Traceback (most recent call last):
File "<stdin>", line 4, in <module>

Traceback (most recent call last):
File "<stdin>", line 4, in <module>
File "<stdin>", line 11, in _decrypt

File "/home/luluyip/miniconda3/envs/revenge/lib/python3.8/site-packages/cryptography/hazmat/primitives/padding.py", line 159, in finalize
result = _byte_unpadding_check(

File "/home/luluyip/miniconda3/envs/revenge/lib/python3.8/site-packages/cryptography/hazmat/primitives/padding.py", line 101, in _byte_unpadding_check
raise ValueError("Invalid padding bytes.")
ValueError: Invalid padding bytes.

It araises the following assumptions:

  1. What if the yahoo API updates on encryption method has changed the size of the Initialization Vector length, making the current 16 bytes size decryption failed ?

  2. Does the invalid padding bytes araises from the obfuscation in standard encryption method ? What can we tell from the error to help us fix the issue ?

  3. How to filter the correct 4 consecutives key/values from the 10004 key/values pairs password ?
    as mentioned in @Meborl comment, the order of these key/values pairs changes daily.

As a beginner, I am here to learn. Please bare with me for my lack of understanding with the repo and lack of knowledge in modern cryptography.

@ValueRaider
Copy link
Collaborator

Is it possible that the decrypt_cryptojs_aes_stores() function is no longer valid to decrypt the yahoo finance page after the update ?

No, it still works, people have manually extracted key from the Javascript and successfully decrypted. Read the top post, and maybe the thread.

@cmjordan42
Copy link

To the next person who asks when will this be fixed?

I assume that this will never get fixed. It doesn't seem to be the direction that the broader community wants to go, and it makes sense.

I finally had the time to go rewrite all of my stuff and took an opportunity to re-architect everything. yahooquery was trivially easy to use - I would say simpler than yfinance - and leaps and bounds faster for data that would have come from info in yfinance.

I had implemented my own caching layer on top of yfinance which I kept for yahooquery (this was mostly to minimize any unneeded requests to yahoo) and ported that over without issue. For anyone concerned about speed because lack of caching... assuming you know how to build a cache, this is only like 50 lines of code in python to create a basic symbol + data segment cache.

All in all, it took me on the order of 10 hours to reimplement the code that deals with sourcing data from yahoo. Thanks for all of the past work building and maintaining yfinance, @ValueRaider et al.

@JotaSe
Copy link

JotaSe commented Mar 26, 2023

To the next person who asks when will this be fixed?

I assume that this will never get fixed. It doesn't seem to be the direction that the broader community wants to go, and it makes sense.

I finally had the time to go rewrite all of my stuff and took an opportunity to re-architect everything. yahooquery was trivially easy to use - I would say simpler than yfinance - and leaps and bounds faster for data that would have come from info in yfinance.

I had implemented my own caching layer on top of yfinance which I kept for yahooquery (this was mostly to minimize any unneeded requests to yahoo) and ported that over without issue. For anyone concerned about speed because lack of caching... assuming you know how to build a cache, this is only like 50 lines of code in python to create a basic symbol + data segment cache.

All in all, it took me on the order of 10 hours to reimplement the code that deals with sourcing data from yahoo. Thanks for all of the past work building and maintaining yfinance, @ValueRaider et al.

Great! Would you share it?

@ValueRaider
Copy link
Collaborator

It doesn't seem to be the direction that the broader community wants to go, and it makes sense.

I have no problem with community wanting to go in a different direction, whether that's moving yfinance to Yahoo's API or simply jumping ship to yahooquery - this is your project not mine. My only enforcement here is keeping this specific Issue focused on the decryption - any significant API discussion should occur in a separate Issue / Discussion, otherwise this thread gets messy.

@AhmedThahir
Copy link

Any updates?

@xiaozhe76
Copy link

xiaozhe76 commented Jun 4, 2023

I have the following problem when using yfinance. The code is:

import yfinance as yf
msft = yf.Ticker("aapl")
print(msft.info)

output is :

`- AAPL: No summary info found, symbol may be delisted

None`

my yfinance version is 0.2.10
python : 3.7

print(yf.__version__)
0.2.10

I don't know what happen, who can help solve this issue thanks

@ValueRaider
Copy link
Collaborator

ValueRaider commented Jun 4, 2023

I don't see the decryption being fixed - too difficult, and probably Yahoo don't want us to fix scraping.

So what next? @ranaroussi's preference is replacing the scraping with API requests like yahooquery does #1420 - if anyone is interested in helping implement drop a message in #1546.

@ValueRaider
Copy link
Collaborator

Closing this as decryption will never be fixed.

Financials tables now fully ported to use API - was already 95% done just had to stop scraping the keys, 5 minute fix.

@ValueRaider ValueRaider closed this as not planned Won't fix, can't repro, duplicate, stale Jun 23, 2023
@ValueRaider ValueRaider unpinned this issue Jul 13, 2023
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests