Does anybody else have this problem when using lncrawl in 69xinshu #2269

itsyahma · 2024-02-13T12:15:17Z

Let us know

Novel URL: https://www.69xinshu.com/book/9969673.htm
App Location: PIP | EXE | Discord | Telegram
App Version: x.y.z

Describe this issue

itsyahma · 2024-02-13T12:15:40Z

itsyahma · 2024-02-13T12:16:00Z

this is what appears

camp00000 · 2024-02-13T14:53:08Z

I fixed 69xinshu in #2256 which is currently not released yet, this branch fixes the first error, the 403 you got -> not sure about that.
You can try running lncrawl with the --auto-proxy arg to see if that helps to 403, but donwloading won't work unless you use the dev branch or wait for the next release.

ncuxie · 2024-02-13T15:48:11Z

I fixed 69xinshu in #2256 which is currently not released yet, this branch fixes the first error, the 403 you got -> not sure about that. You can try running lncrawl with the --auto-proxy arg to see if that helps to 403, but donwloading won't work unless you use the dev branch or wait for the next release.

I used the dev branch to download from this website and found that there was a problem with downloading more than 650 chapters at once.

The website will deny access. It seems that the anti-crawler has been upgraded. (:з」∠)

camp00000 · 2024-02-13T15:59:16Z

Darn, that's not too good. I'll see if there's anything that can be done there..
I suppose for now you can try downloading in batches, you can select by chapter range so you can enter 1-500 and then 501-1000 and so on to probably bypass this if it's just a simple check.

and maybe combine that with --auto-proxy to get new source IPs each download batch.

Let me know if that works. @ncuxie

ncuxie · 2024-02-15T13:27:15Z

I found that it started getting errors from chapter 251, so I tried downloading only chapters 1-250 and didn't encounter any problems. However, when I try to download the second time

It couldn't even get the directory, so I used a browser to access the website

This looks a bit troublesome (:з」∠)

After a period of time without completing the verification, the website becomes inaccessible.

But if you are reading novels normally, you will not encounter verification if you read more than 250 chapters a day, so I guess it may be that you download too frequently. 🤔

I will try --auto-proxy later.

@camp00000

camp00000 · 2024-02-17T21:40:01Z

This looks a bit troublesome (:з」∠)

After a period of time without completing the verification, the website becomes inaccessible.

But if you are reading novels normally, you will not encounter verification if you read more than 250 chapters a day, so I guess it may be that you download too frequently. 🤔

I will try --auto-proxy later.

@camp00000

There's rate-limiting that can be done on the downloader-side but no way to enforce downloading only X amount of chapters.

My hopes are currently on the --auto-proxy approach, IP-Reputation may or may not break that but we'll see I guess.

To note: if I understood correctly, auto-proxy makes the crawler cicle through proxies when downloading, so it may be possible to download an entire novel with lots of chapters at once with the auto-proxy option, given that this is actually what it does and the IPs aren't all/mostly banned already.

Let me know how it goes.

ncuxie · 2024-02-27T09:18:22Z

I can get chapters without --auto-proxy but not with --auto-proxy.

$ lncrawl -s https://www.69xinshu.com/book/40107.htm

===================================================

                 [#] Lightnovel Crawler v3.4.2
         https://github.com/dipu-bd/lightnovel-crawler

---------------------------------------------------------------------------------------

-> Press Ctrl + C to exit

Retrieving novel info...

[#] 从时间停止开始纵横诸天
14 volumes and 1357 chapters found.
- https://www.69xinshu.com/book/40107.htm

? Enter output directory: C:\Users\XIE\Lightnovels\www-69xinshu-com\C
ong Shi Jian Ting Zhi Kai Shi Zong Heng Zhu Tian

$ lncrawl -s https://www.69xinshu.com/book/40107.htm --auto-proxy

===================================================

                 [#] Lightnovel Crawler v3.4.2
         https://github.com/dipu-bd/lightnovel-crawler

---------------------------------------------------------------------------------------
Sources: 100%|█████████████████████| 24/24 [00:03<00:00, 6.20file/s]

-> Press Ctrl + C to exit

Retrieving novel info...
Exception in thread Thread-4:
Traceback (most recent call last):
File "D:\anaconda3\lib\threading.py", line 980, in _bootstrap_inner
self.run()
File "D:\anaconda3\lib\threading.py", line 917, in run
self._target(*self._args, **self._kwargs)
File "C:\Users\XIE.lncrawl\sources\zh\69shuba.py", line 70, in read_novel_info
soup = self.get_soup(self.novel_url, encoding="gbk")
File "D:\anaconda3\lib\site-packages\lncrawl\core\scraper.py", line 304, in get_soup
response = self.get_response(url, **kwargs)
File "D:\anaconda3\lib\site-packages\lncrawl\core\scraper.py", line 201, in get_response
return self.__process_request(
File "D:\anaconda3\lib\site-packages\lncrawl\core\scraper.py", line 130, in __process_request
raise e
File "D:\anaconda3\lib\site-packages\lncrawl\core\scraper.py", line 123, in __process_request
response.raise_for_status()
File "D:\anaconda3\lib\site-packages\requests\models.py", line 1021, in raise_for_status
raise HTTPError(http_error_msg, response=self)
requests.exceptions.HTTPError: 401 Client Error: Unauthorized for url: https://www.69xinshu.com/book/40107.htm

! Error: No chapters found
<class 'Exception'>
File "D:\anaconda3\lib\site-packages\lncrawl\bots\console\integration.py", line 107, in start
raise e
File "D:\anaconda3\lib\site-packages\lncrawl\bots\console\integration.py", line 101, in start
_download_novel()
File "D:\anaconda3\lib\site-packages\lncrawl\bots\console\integration.py", line 85, in _download_novel
self.app.get_novel_info()
File "D:\anaconda3\lib\site-packages\lncrawl\core\app.py", line 137, in get_novel_info
raise Exception("No chapters found")

----------------------------------------------------------------------
- https://github.com/dipu-bd/lightnovel-crawler/issues
======================================================================

camp00000 · 2024-02-27T21:35:19Z

It looks like some of the the proxies are likely already on a blacklist or have a very bad IP reputation.

So the other somewhat simple way forward would be to find working proxies for 69xinshu and test them - once you have a few suitable ones you could make a custom proxies file and use as described in the lncrawl help section
--proxy-file FILE Proxies as SCHEME://HOST:PORT@USER:PASSWORD format in each line. All except HOST are optional
to download everything at once hopefully.

Otherwise you can slowly download part-by-part with your own IP and that might work given enough time and only selecting a few hundred chaps per day max. I suggest this way if you're fine waiting a bit and downloading in parts. The EPUB can always be concatenated into one big thing with some tool at a later time if you prefer it that way.

To make --auto-proxy viable as is for this source, I think the whole proxy handling would need to be reworked to treat certain status codes (like 401 access denied) as potential proxy issues instead of server/request issues. So that's not very feasible.

wizerdo37 · 2024-02-28T15:22:02Z

this link of raws does not have limit rates for downloads: https://www.ddxsss.com/

camp00000 · 2024-02-28T17:32:23Z

I checked and lncrawl doesn't currently support this source yet but if it does indeed not have any rate-limiting like 69xinshu then it would be a viable alternative, the site structure looks relatively similar as well so adding it shouldn't be too big of an issue.

I even found a novel with the same title as mentioned in the above logs https://www.ddxsss.com/book/46000/ so they seem to overlap in that part as well.

If someone wants to create an issue to add this source I'll look into doing that later this week.

camp00000 · 2024-02-28T22:54:59Z

I actually went ahead and added the crawler already, it's currently a pull request so once it's merged into dev you can test it out by installing the newest dev version locally. #2287

I was able to download 1.3k chaps at once without any significant issues. The chapters with HTTP 503 reported did have their content available so it seemed to have failed once out of the few retries it has per chapter in those instances but no blocking from cloudflare / captchas or the like.

Retrieving novel info...

📒 从时间停止开始纵横诸天
14 volumes and 1357 chapters found.
🔗 https://www.ddxsss.com/book/46000

? Enter output directory: /home/.../lightnovel-crawler/Lightnovels/www-ddxsss-com/Cong Shi Jian Ting Zhi Kai Shi Zong Heng Zhu Tian
? Which chapters to download? Everything! (1357 chapters)
? 1357 chapters selected Continue
? Which output formats to create? [epub]
? How many files to generate? Pack everything into a single file
HTTPError: 503 Server Error: Service Unavailable for url: https://www.ddxsss.com/book/46000/148.html
HTTPError: 503 Server Error: Service Unavailable for url: https://www.ddxsss.com/book/46000/433.html
HTTPError: 503 Server Error: Service Unavailable for url: https://www.ddxsss.com/book/46000/451.html
HTTPError: 503 Server Error: Service Unavailable for url: https://www.ddxsss.com/book/46000/457.html
HTTPError: 503 Server Error: Service Unavailable for url: https://www.ddxsss.com/book/46000/927.html
HTTPError: 503 Server Error: Service Unavailable for url: https://www.ddxsss.com/book/46000/1135.html
HTTPError: 503 Server Error: Service Unavailable for url: https://www.ddxsss.com/book/46000/1150.html
HTTPError: 503 Server Error: Service Unavailable for url: https://www.ddxsss.com/book/46000/1225.html
Chapters: 100%|█████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████| 1357/1357 [00:42<00:00, 32.19item/s]
  Images: 100%|███████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████| 1/1 [00:00<00:00,  8.48item/s]
Created: Cong Shi Jian Ting Zhi Kai Shi Zong Heng Zhu Tian c1-1357.epub
✨ Task completed

itsyahma added the source-issue label Feb 13, 2024

camp00000 mentioned this issue Feb 25, 2024

fix uukanshu possible failure on synopsis #2279

Merged

camp00000 mentioned this issue Feb 28, 2024

add new crawler for ddxsss #2287

Merged

dipu-bd closed this as completed Feb 29, 2024

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Does anybody else have this problem when using lncrawl in 69xinshu #2269

Does anybody else have this problem when using lncrawl in 69xinshu #2269

itsyahma commented Feb 13, 2024

itsyahma commented Feb 13, 2024

itsyahma commented Feb 13, 2024

camp00000 commented Feb 13, 2024

ncuxie commented Feb 13, 2024

camp00000 commented Feb 13, 2024 •

edited

Loading

ncuxie commented Feb 15, 2024 •

edited

Loading

camp00000 commented Feb 17, 2024

ncuxie commented Feb 27, 2024 •

edited

Loading

camp00000 commented Feb 27, 2024

wizerdo37 commented Feb 28, 2024

camp00000 commented Feb 28, 2024

camp00000 commented Feb 28, 2024

Does anybody else have this problem when using lncrawl in 69xinshu #2269

Does anybody else have this problem when using lncrawl in 69xinshu #2269

Comments

itsyahma commented Feb 13, 2024

Let us know

Describe this issue

itsyahma commented Feb 13, 2024

itsyahma commented Feb 13, 2024

camp00000 commented Feb 13, 2024

ncuxie commented Feb 13, 2024

camp00000 commented Feb 13, 2024 • edited Loading

ncuxie commented Feb 15, 2024 • edited Loading

camp00000 commented Feb 17, 2024

ncuxie commented Feb 27, 2024 • edited Loading

camp00000 commented Feb 27, 2024

wizerdo37 commented Feb 28, 2024

camp00000 commented Feb 28, 2024

camp00000 commented Feb 28, 2024

camp00000 commented Feb 13, 2024 •

edited

Loading

ncuxie commented Feb 15, 2024 •

edited

Loading

ncuxie commented Feb 27, 2024 •

edited

Loading