Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

HTTP Error 401 when called on large amount of tickers #360

Closed
rpfreitasxyz opened this issue Apr 30, 2022 · 15 comments
Closed

HTTP Error 401 when called on large amount of tickers #360

rpfreitasxyz opened this issue Apr 30, 2022 · 15 comments

Comments

@rpfreitasxyz
Copy link

rpfreitasxyz commented Apr 30, 2022

Hi, Thank you very much for the fix and comments here. getSymbols.yahoo() now works for me, but I have a different problem. When I run getSymbols.yahoo() successfully in a loop for more than about 300-400 tickers, I started to get "HTTP error 401" for all following downloads. There are some failed downloads of invalid tickers in between though. Does anyone know what the issue is? Maybe I have the same problem with @rhamo.

Here is an example of the subsequent download:

> getSymbols.yahoo("AAPL",from="2022-04-27",to="2022-04-29",auto.assign=FALSE)
Warning: AAPL download failed; trying again.
Warning: Unable to import "AAPL".
AAPL download failed after two attempts. Error message:
HTTP error 401.
[1] "Error in open.connection(file, \"rt\") : HTTP error 401.\n"
attr(,"class")
[1] "try-error"
attr(,"condition")
<simpleError in open.connection(file, "rt"): HTTP error 401.>

Thank you!

Originally posted by @edwinhung in #358 (comment)

Thank you all for the quick fix in regards to tq_get()! However, I believe that this new issue has arisen.

@Knxd3
Copy link

Knxd3 commented Apr 30, 2022

Is a fix planned for tq_get? getSymbols works with many tickers because of the 1 second pause. Thanks.

@msperlin
Copy link

msperlin commented May 1, 2022

The limits do not seem to be too restrictive. After reaching the 404 error, I was able to get successfull api calls in a few minutes.

After that, I downloaded all sp500 stocks (2010-today) in a single call:
df_sp500 <- yfR::yf_collection_get("SP500", first_date = '2010-01-01')

If anyone can confirm this in your own R session, please do.

It looks as the restrictions are based on time between api calls for the same ip. This could invalidate any parallel computation, which is what I'm testing now.

@msperlin
Copy link

msperlin commented May 1, 2022

As expected, any parallel use of quantmod::getSymbol() reaches the limit very easilly. As such, I'm removing the parallel option from yfR and BatchGetSymbols.

When using a single session (non-parallel), yfR runs fine for any large sample of stocks.

@msperlin
Copy link

msperlin commented May 1, 2022

If you can, please confirm if the code below runs fine:

remotes::install_github("msperlin/yfR")

df_sp500 <- yfR::yf_collection_get("SP500", first_date = '2010-01-01')

@rpfreitasxyz
Copy link
Author

If you can, please confirm if the code below runs fine:

remotes::install_github("msperlin/yfR")

df_sp500 <- yfR::yf_collection_get("SP500", first_date = '2010-01-01')

The code runs fine, with a caveat:

The issue really seems to be the amount of API calls. Even though they're not in parallel, they are still (500+) sequential calls to the Yahoo! Finance API, and as such, it is quite inconsistent whether or not the whole dataframe will be downloaded. I believe this has already been worked around with your implementation of a cache system.

Thus, if it hasn't been downloaded completely at once, I suggest users wait for a few minutes, then run the code again, until completion.

Either way, thank you, @msperlin!

@msperlin
Copy link

msperlin commented May 2, 2022

Yes, lack of consistency in equivalent calls to BatchGetSymbols/yf_get can be troublesome. I'll see how I can control for this, at least letting the user know about the 404 error.

joshuaulrich added a commit that referenced this issue May 22, 2022
There seems to be a rate limit for the number of tickers you can
request via the CSV endpoint. The yfinance python library [1] uses the
JSON endpoint and doesn't seem to have rate limit issues.

[1] https://github.com/ranaroussi/yfinance

Closes #362. See #360.
@joshuaulrich
Copy link
Owner

joshuaulrich commented May 22, 2022

I just added an option to use the JSON endpoint instead of the CSV endpoint. Can you try that and see if you still get the 401 responses? You can install the patch via: remotes::install_github("joshuaulrich/quantmod@362-yahoo-json-endpoint"). Then call quantmod::getSymbols("SPY", use.json.api = TRUE)

@msperlin
Copy link

Sure, let me try..

@msperlin
Copy link

I just added an option to use the JSON endpoint instead of the CSV endpoint. Can you try that and see if you still get the 401 responses? You can install the patch via: remotes::install_github("joshuaulrich/quantmod@362-yahoo-json-endpoint"). Then call quantmod::getSymbols("SPY", use.json.api = TRUE)

Changed the call to getSymbols and I tried my best to reach the 401 response, with not success. I ranned yfR with parallel execution (14 cores) and it worked as expected.

@msperlin
Copy link

looks good..

@msperlin
Copy link

msperlin commented May 22, 2022

anyone can test it here:

remotes::install_github("msperlin/yfR@testing-json-entry")

library(yfR)

n_workers <- parallel::detectCores() - 1
future::plan(future::multisession, workers = n_workers)
available_collections <- yf_get_available_collections()

df <- yf_collection_get(collection = "SP500",
                          first_date = Sys.Date() - 10*365,
                          last_date = Sys.Date(),
                          do_parallel = TRUE)

dplyr::n_distinct(df$ticker)

@joshuaulrich please let me know if and when you're incorporating these changes.. I'll wait for your update in CRAN.

thanks.

@joshuaulrich
Copy link
Owner

I'm considering whether or not to make the JSON endpoint the default for getSymbols.yahoo(). It seems like it's a better endpoint, but I don't have any experience with it. @msperlin, what do you think about switching from the CSV endpoint to the JSON endpoint for the default?

@msperlin
Copy link

msperlin commented May 22, 2022

Being honest, I'm not sure. I'll have to think about that. Quality wise, I suspect the YF data comes from the same source and, wheter it is json or csv, the output should be the same.

But, the csv entrypoint is restricted by IP, which forces user to behave better, which is good. The restriction is also not that bad (I can still download everything I need for my classes, for example). While I would prefer to allow parallel computing with yfR, I also know that we should be thankful to YF for still keeping the API open...

what do you think?

@ethanbsmith
Copy link
Contributor

some thoughts:

since you've done the bulk of the work of moving to the V8 api, may as well loosen the validation on period to support intra-day and kill #351

also, why not just remove the v7 code path entirely. in principle, i think supporting code to work around throttling (if thats what yahoo is doing) is not really a worth while battle. moving to higher rev makes sense to me, but if yahoo is really throttling and serious about it, its going to comer up again sooner or later. i'd just see this as another notch in the growing list of issues w/ yahoo data in general

@joshuaulrich
Copy link
Owner

joshuaulrich commented May 23, 2022

I suspect the YF data comes from the same source and, whether it is json or csv, the output should be the same.

I would hope so, but I wouldn't be surprised if there are some differences... because data is awful. ;)

Also, why not just remove the v7 code path entirely.

I'm thinking the same thing. Thanks for mentioning the intra-day issue. That's a great point.

joshuaulrich added a commit that referenced this issue May 29, 2022
The v7 endpoint seems to be rate-limited, and the v8 endpoint includes
intra-day data.

See #360. See #362.
@joshuaulrich joshuaulrich added this to the Release 0.4.21 milestone Mar 27, 2023
netbsd-srcmastr pushed a commit to NetBSD/pkgsrc that referenced this issue Jun 12, 2023
### Changes in 0.4.22 (2023-04-05)

1. Move jsonlite from Suggests to Imports so it doesn't cause a problem
    when a package that doesn't also Suggest jsonlite uses getSymbols().
    Thanks to Kurt Hornik for the report and fix!
    [#380](joshuaulrich/quantmod#380)

### Changes in 0.4.21 (2023-03-29)

1. Fix S3 method issues. R-devel (83995-ish) added a check for possible S3
    method issues. Register methods it found that were not registered:
    `str.replot()`, `seriesHi.timeSeries()`, and `seriesLo.timeSeries()`.

    It was also confused by `range.bars()` and `unique.formula.names()`. Remove
    `unique.formula.names()` because it wasn't exported or used internally.
    Rename `range.bars()` to `rangeBars()`, which isn't exported.

    Thanks to Kurt Hornik for the report!
    [#375](joshuaulrich/quantmod#375)

1. Remove "^" prefix from `getSymbols()` return value. When the 'Symbols'
    argument has a "^" prefix and `auto.assign = TRUE`:

    * `getSymbols()` removes the "^" from the object it creates, but
    * returns the 'Symbols' argument unchanged, and
    * removes the "^" from the column names of the object it creates.

    The example below will create an object named `IXIC` but the value of
    `sym` will be "^IXIC".

        sym <- getSymbols("^IXIC")

    That means `x <- get(sym)` will not work because an object named `^IXIC`
    doesn't exist.
    [#371](joshuaulrich/quantmod#371)

1. Add 'from' and 'to' arguments to `getSymbols.FRED()`. Users expect to be
    able to set the 'from' and 'to' arguments for FRED data like they can for
    Yahoo data. Those values were ignored and the entire series was always
    returned.
    [#368](joshuaulrich/quantmod#368)

1. Change interval to 1d for `getDividends()` and `getSplits()`. The "3mo"
    setting caused some dividends to be missing for companies that issued monthly
    dividends. Note that the response to this request also includes all the OHLCV
    data. But it's small (less than 1MB for 60+ years of daily data).
    [#372](joshuaulrich/quantmod#372)

1. Handle errors in `getSplits()` and `getDividends()`. `getDividends()` didn't
    handle cases where the download failed, or when dividends needed to be
    split-adjusted but there were no splits. It also tried to set colnames
    on the empty xts object that's returned when there are no dividends.
    `getSplits()` had the same colnames issue. Check for no splits by testing
    for `NULL` because that's more explicit. Thanks to Chris Cheung for the
    report!
    [#366](joshuaulrich/quantmod#366)

1. Export `HL()`, `is.HL()`, and `has.HL()` functions and add documentation.
    These were added in 0.4.18 but not exported or included in the documentation.

1. Use Yahoo Finance v8 JSON endpoint and remove the v7 CSV endpoint. There
    seems to be a rate limit for the number of tickers you can request via the CSV
    endpoint. The [yfinance python library](https://github.com/ranaroussi/yfinance)
    uses the JSON endpoint and doesn't seem to have rate limit issues.
    [#360](joshuaulrich/quantmod#360)
    [#362](joshuaulrich/quantmod#362)
    [#364](joshuaulrich/quantmod#364)
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

5 participants