-
Notifications
You must be signed in to change notification settings - Fork 451
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Update Popularity community #6950
Conversation
732bbd6
to
928d970
Compare
1f89155
to
7605843
Compare
retest this please |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Important and good PR! Requires some additional fixes. In my opinion it should be merged to the main
branch and not to the release
branch, as it is a new feature that can affect the release stability.
total_torrents = self.mds.TorrentState.select().count() | ||
random_index = self._select_randomly_likely_popular_torrent_index(total_torrents) | ||
|
||
random_popular_torrent = self.mds.TorrentState.select(lambda g: g.last_check < last_fresh_time)\ | ||
.order_by(lambda g: (desc(g.seeders), g.last_check))\ | ||
.fetch(limit=1, offset=random_index)[:] | ||
|
||
old_torrents = list(self.mds.TorrentState.select(lambda g: g.last_check < last_fresh_time). | ||
order_by(lambda g: (g.last_check, desc(g.seeders))).limit(TORRENT_SELECTION_POOL_SIZE)) | ||
random_old_torrent = self.mds.TorrentState.select(lambda g: g.last_check < last_fresh_time)\ | ||
.order_by(lambda g: (g.last_check, desc(g.seeders)))\ | ||
.fetch(limit=1, offset=random_index)[:] |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I think it is not a good idea to use normally distributed offsets when selecting random torrents from the database for the check.
Let's consider a real-life situation when a user subscribed to several popular channels and has 3.5 million torrents in the database. In this case, the average offset will be approximately 1 million. So, to select a random torrent from the database, the SQL engine should sort torrents by last_check
and seeders
columns and then skip the first one million torrents to select a random one for the check. On a decent machine with an SSD drive, it took around 2 seconds just to select a single random torrent. During that time, the event loop is blocked, and it may be much slower on a machine with a traditional HDD drive.
It is not as necessary to select random torrents in a perfectly normalized way. Instead, we can select N completely random torrents from the database and choose the most popular from these torrents. It should be much faster. On the same machine, the query self.mds.TorrentState.select_random(100)
takes four milliseconds for execution, much better than 2 seconds for the original query.
So, I suggest doing the following instead:
- Select N random torrents
- From these torrents, in Python, filter torrents with
torrent.last_check < last_fresh_time
(most torrents from the database should satisfy this criteria) - Order the resulting list by the number of seeders, and choose the most popular one.
Something like that (not tested):
random_torrents = self.mds.TorrentState.select_random(100)
old_torrents = [t for t in random_torrents if t.last_check < last_fresh_time]
result = []
if old_torrents:
random_old_torrent = old_torrents[0]
result.append(random_old_torrent)
if len(old_torrents) > 1:
popular_random_torrents = sorted(old_torrents[1:], key = lambda torrent: torrrent.seeders, reverse=True)
most_popular_random_torrent = popular_random_torrents[0]
result.append(most_popular_random_torrent)
return result
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Thank you for the opinion and the suggestion. I think this needs more in-depth thinking and thus can be its own separate PR that only deals with the changes to the torrent checker and not with the Popularity community. Therefore, I'll remove these changes from PR.
src/tribler/core/components/popularity/community/popularity_community.py
Outdated
Show resolved
Hide resolved
7605843
to
b6289b0
Compare
4c79e26
to
8b46066
Compare
Seems src/tribler/gui/tests/test_gui.py::test_add_deeptorrent Windows fatal exception: stack overflow
Thread 0x000020ec (most recent call first):
File "C:\Users\tribler\AppData\Local\Programs\Python\Python38\lib\threading.py", line 306 in wait
File "C:\Users\tribler\AppData\Local\Programs\Python\Python38\lib\threading.py", line 558 in wait
File "C:\Users\tribler\AppData\Local\Programs\Python\Python38\lib\threading.py", line 1252 in run
File "C:\Users\tribler\AppData\Local\Programs\Python\Python38\lib\threading.py", line 932 in _bootstrap_inner
File "C:\Users\tribler\AppData\Local\Programs\Python\Python38\lib\threading.py", line 890 in _bootstrap
Current thread 0x00001bf8 (most recent call first):
File "E:\jenkins\workspace\GH_Tribler_PR_Tests\PR_GUI_win64\tribler\src\tribler\gui\widgets\torrentfiletreewidget.py", line 253 in <lambda>
File "E:\jenkins\workspace\GH_Tribler_PR_Tests\PR_GUI_win64\tribler\src\tribler\gui\widgets\torrentfiletreewidget.py", line 72 in subtree
File "E:\jenkins\workspace\GH_Tribler_PR_Tests\PR_GUI_win64\tribler\src\tribler\gui\widgets\torrentfiletreewidget.py", line 252 in get_selected_items
File "E:\jenkins\workspace\GH_Tribler_PR_Tests\PR_GUI_win64\tribler\src\tribler\gui\widgets\torrentfiletreewidget.py", line 225 in fill_entries
File "E:\jenkins\workspace\GH_Tribler_PR_Tests\PR_GUI_win64\tribler\src\tribler\gui\dialogs\startdownloaddialog.py", line 198 in on_received_metainfo
File "E:\jenkins\workspace\GH_Tribler_PR_Tests\PR_GUI_win64\tribler\src\tribler\gui\utilities.py", line 411 in trackback_wrapper
File "E:\jenkins\workspace\GH_Tribler_PR_Tests\PR_GUI_win64\tribler\src\tribler\gui\tribler_request_manager.py", line 228 in on_finished
File "E:\jenkins\workspace\GH_Tribler_PR_Tests\PR_GUI_win64\tribler\src\tribler\gui\tribler_request_manager.py", line 122 in <lambda>
File "E:\jenkins\workspace\GH_Tribler_PR_Tests\PR_GUI_win64\tribler\src\tribler\gui\utilities.py", line 411 in trackback_wrapper
File "E:\jenkins\workspace\GH_Tribler_PR_Tests\PR_GUI_win64\tribler\src\tribler\gui\tests\test_gui.py", line 157 in wait_for_list_populated
File "E:\jenkins\workspace\GH_Tribler_PR_Tests\PR_GUI_win64\tribler\src\tribler\gui\tests\test_gui.py", line 442 in test_add_deeptorrent
File "E:\jenkins\workspace\GH_Tribler_PR_Tests\PR_GUI_win64\tribler\venv\lib\site-packages\_pytest\python.py", line 192 in pytest_pyfunc_call
File "E:\jenkins\workspace\GH_Tribler_PR_Tests\PR_GUI_win64\tribler\venv\lib\site-packages\pluggy\_callers.py", line 39 in _multicall
File "E:\jenkins\workspace\GH_Tribler_PR_Tests\PR_GUI_win64\tribler\venv\lib\site-packages\pluggy\_manager.py", line 80 in _hookexec
File "E:\jenkins\workspace\GH_Tribler_PR_Tests\PR_GUI_win64\tribler\venv\lib\site-packages\pluggy\_hooks.py", line 265 in __call__
File "E:\jenkins\workspace\GH_Tribler_PR_Tests\PR_GUI_win64\tribler\venv\lib\site-packages\_pytest\python.py", line 1761 in runtest
File "E:\jenkins\workspace\GH_Tribler_PR_Tests\PR_GUI_win64\tribler\venv\lib\site-packages\_pytest\runner.py", line 166 in pytest_runtest_call
File "E:\jenkins\workspace\GH_Tribler_PR_Tests\PR_GUI_win64\tribler\venv\lib\site-packages\pluggy\_callers.py", line 39 in _multicall
File "E:\jenkins\workspace\GH_Tribler_PR_Tests\PR_GUI_win64\tribler\venv\lib\site-packages\pluggy\_manager.py", line 80 in _hookexec
File "E:\jenkins\workspace\GH_Tribler_PR_Tests\PR_GUI_win64\tribler\venv\lib\site-packages\pluggy\_hooks.py", line 265 in __call__
File "E:\jenkins\workspace\GH_Tribler_PR_Tests\PR_GUI_win64\tribler\venv\lib\site-packages\_pytest\runner.py", line 259 in <lambda>
File "E:\jenkins\workspace\GH_Tribler_PR_Tests\PR_GUI_win64\tribler\venv\lib\site-packages\_pytest\runner.py", line 338 in from_call
File "E:\jenkins\workspace\GH_Tribler_PR_Tests\PR_GUI_win64\tribler\venv\lib\site-packages\_pytest\runner.py", line 258 in call_runtest_hook
File "E:\jenkins\workspace\GH_Tribler_PR_Tests\PR_GUI_win64\tribler\venv\lib\site-packages\_pytest\runner.py", line 219 in call_and_report
File "E:\jenkins\workspace\GH_Tribler_PR_Tests\PR_GUI_win64\tribler\venv\lib\site-packages\_pytest\runner.py", line 130 in runtestprotocol
File "E:\jenkins\workspace\GH_Tribler_PR_Tests\PR_GUI_win64\tribler\venv\lib\site-packages\_pytest\runner.py", line 111 in pytest_runtest_protocol
File "E:\jenkins\workspace\GH_Tribler_PR_Tests\PR_GUI_win64\tribler\venv\lib\site-packages\pluggy\_callers.py", line 39 in _multicall
File "E:\jenkins\workspace\GH_Tribler_PR_Tests\PR_GUI_win64\tribler\venv\lib\site-packages\pluggy\_manager.py", line 80 in _hookexec
File "E:\jenkins\workspace\GH_Tribler_PR_Tests\PR_GUI_win64\tribler\venv\lib\site-packages\pluggy\_hooks.py", line 265 in __call__
File "E:\jenkins\workspace\GH_Tribler_PR_Tests\PR_GUI_win64\tribler\venv\lib\site-packages\_pytest\main.py", line 347 in pytest_runtestloop
File "E:\jenkins\workspace\GH_Tribler_PR_Tests\PR_GUI_win64\tribler\venv\lib\site-packages\pluggy\_callers.py", line 39 in _multicall
File "E:\jenkins\workspace\GH_Tribler_PR_Tests\PR_GUI_win64\tribler\venv\lib\site-packages\pluggy\_manager.py", line 80 in _hookexec
File "E:\jenkins\workspace\GH_Tribler_PR_Tests\PR_GUI_win64\tribler\venv\lib\site-packages\pluggy\_hooks.py", line 265 in __call__
File "E:\jenkins\workspace\GH_Tribler_PR_Tests\PR_GUI_win64\tribler\venv\lib\site-packages\_pytest\main.py", line 322 in _main
File "E:\jenkins\workspace\GH_Tribler_PR_Tests\PR_GUI_win64\tribler\venv\lib\site-packages\_pytest\main.py", line 268 in wrap_session
File "E:\jenkins\workspace\GH_Tribler_PR_Tests\PR_GUI_win64\tribler\venv\lib\site-packages\_pytest\main.py", line 315 in pytest_cmdline_main
File "E:\jenkins\workspace\GH_Tribler_PR_Tests\PR_GUI_win64\tribler\venv\lib\site-packages\pluggy\_callers.py", line 39 in _multicall
File "E:\jenkins\workspace\GH_Tribler_PR_Tests\PR_GUI_win64\tribler\venv\lib\site-packages\pluggy\_manager.py", line 80 in _hookexec
File "E:\jenkins\workspace\GH_Tribler_PR_Tests\PR_GUI_win64\tribler\venv\lib\site-packages\pluggy\_hooks.py", line 265 in __call__
File "E:\jenkins\workspace\GH_Tribler_PR_Tests\PR_GUI_win64\tribler\venv\lib\site-packages\_pytest\config\__init__.py", line 164 in main
File "E:\jenkins\workspace\GH_Tribler_PR_Tests\PR_GUI_win64\tribler\venv\lib\site-packages\_pytest\config\__init__.py", line 187 in console_main
File "E:\jenkins\workspace\GH_Tribler_PR_Tests\PR_GUI_win64\tribler\venv\Scripts\pytest.exe\__main__.py", line 7 in <module>
File "C:\Users\tribler\AppData\Local\Programs\Python\Python38\lib\runpy.py", line 87 in _run_code
File "C:\Users\tribler\AppData\Local\Programs\Python\Python38\lib\runpy.py", line 194 in _run_module_as_main |
retest this please |
1 similar comment
retest this please |
#7052 should fix the issue |
Fix normal distributed positive int utility function
8b46066
to
03bb3c8
Compare
retest this please |
1 similar comment
retest this please |
Add missing super call in introduction request callback in Popularity community
03bb3c8
to
dfa1901
Compare
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Note that sometimes get_normally_distributed_positive_integers
function can be really slow. Like, executing get_normally_distributed_positive_integers(9999, 10000)
requires more than one minute on my PC.
However, with a low value of size argument, like GOSSIP_RANDOM_TORRENT_COUNT
which is 10, it is not a problem.
Major changes
3. Torrents selection for random check also uses random normal variate to find random torrent.