Match on module aliases for auto import suggestions #730

MrBago · 2023-11-07T01:47:57Z

Description

This PR adds a table of aliases for AutoImport to use. It joins the alias with the names table to find available modules with matching aliases.

This PR is in the initial draft to verify the approach and get feedback. It is still missing:

A way to specify a list of import aliases in a config file
Documentation udpates

Fixes #712

Checklist (delete if not relevant):

I have added tests that prove my fix is effective or that my feature works
I have updated CHANGELOG.md
I have made corresponding changes to user documentation for new features

tkrabel-db

Overall, I like the direction! Some minor concern regarding performance.

tkrabel-db · 2023-11-07T18:38:53Z

rope/contrib/autoimport/models.py

+        connection.execute("CREATE INDEX IF NOT EXISTS alias ON aliases(alias)")
+
+    modules = Query(
+        "(SELECT DISTINCT aliases.*, package, source, type FROM aliases INNER JOIN names on aliases.module = names.module)",


Not a DB expert, but the names table can comprise 10,000 - 100,000 rows, so I am wondering if we should run this inner join on every autoimport request (which can happen with every keystroke when rope is run inside of a language server).
Can we quickly test how much adding alias support slows down search?
Alternatively, I'd make sure aliases only contains the aliases to modules that exist in names

The joins are pretty fast, I made a notebook to test it out.

The join time should be dominated by the Alias table, not the Names table, because the Names table has an index on the module column. Also, here we're including a where clause which makes the left side of the join even smaller. Most DB engines are pretty good about pushing down the filter past the join and sqlite3 seems to handle it well.

I thought about this a little bit before testing out this implementations I see 3 main paths forward:

The join approach

Materialize the availability information in the Aliases table as a column, we'd need to be careful to always update the Aliases table whenever updating the cache. This would probably be the fastest approach, but more work.

Keep the aliases in memory as a list or dict. We'd basically be implementing the join logic manually, but it might be really fast if the # of Aliases is very small. Then again if the # of Aliases is very slow the join should also be very fast.

@tkrabel what do you think?

The current approach has the benefit that we never have to do any updates on the aliases tables. The names table is the source of truth of that is available to the user.
If you're happy with the performance, then let's go with the current approach.

@MrBago thanks for doing testing the performance notebook, the notebook brings up something that is interesting/surprising to me, in that the module search_by_name_like query is much slower than what I was expecting. A prefix search using an index should not have been that slow.

That is an unrelated issue from this PR though, so I've created another ticket for that #736, but with the fixed index the Alias query should hopefully become faster as well. 883ms for an inner join between a large table and a very small table doesn't smell right to me that seems to indicate a full table scan as well.

I'll see if I can fix this tomorrow, but in the meantime, apologies but I'll be holding off on merging this PR yet until that is fixed and then we can see the new performance impact.

lieryan · 2023-11-09T11:55:53Z

@MrBago thanks for making this PR, from a quick look this looks great to me, I'm on a trip right now, so my availability to review this is quite limited for the next few weeks. Once I'm back, I'll look into this properly, but please continue the conversation for now.

MrBago · 2023-12-01T00:19:36Z

@lieryan I added the aliases to prefs so that it can be configured. When you have a min can you take a look at the PR, also I think I need to be given some kind of permission so the CI will run tests on my PRs.

lieryan

Thanks @MrBago, this looks good to me on a first pass. I am going to have to hold off on merging for now for reasons mentioned below, but once I resolved that issue then I'll be taking a second look at this again.

rope/base/prefs.py

ropetest/contrib/autoimporttest.py

lieryan · 2023-12-17T16:25:56Z

rope/contrib/autoimport/models.py

+        connection.execute("CREATE INDEX IF NOT EXISTS alias ON aliases(alias)")
+
+    modules = Query(
+        "(SELECT DISTINCT aliases.*, package, source, type FROM aliases INNER JOIN names on aliases.module = names.module)",


@MrBago thanks for doing testing the performance notebook, the notebook brings up something that is interesting/surprising to me, in that the module search_by_name_like query is much slower than what I was expecting. A prefix search using an index should not have been that slow.

That is an unrelated issue from this PR though, so I've created another ticket for that #736, but with the fixed index the Alias query should hopefully become faster as well. 883ms for an inner join between a large table and a very small table doesn't smell right to me that seems to indicate a full table scan as well.

I'll see if I can fix this tomorrow, but in the meantime, apologies but I'll be holding off on merging this PR yet until that is fixed and then we can see the new performance impact.

lieryan · 2023-12-17T16:29:48Z

rope/contrib/autoimport/sqlite.py

            models.Package.create_table(self.connection)
            models.Metadata.create_table(self.connection)
+            self.add_aliases(self.project.prefs.import_aliases)
            data = (


So if I understand this correctly, this will add the aliases into the database only when the database is created. IIUC, this would need to depend on the database being re-created when preference changes.

You're right. I didn't look at the different ways that prefs can change, we could add a method to clear the aliases table and reset it and invoke that when the prefs are updated.

MrBago · 2023-12-20T23:29:19Z

@lieryan let me know if I can help with looking at the timing. One key thing to notice in my notebook is that I intentionally bloated the database to get the timing for an extreme case.

When doing the timing, one thing that I found odd was that including DISTINCT in the query seemed to make a bigger difference than I expected. I expected the database to be more efficient than python and removing duplicates, but I found that removing the "DISTINCT" and using a set in python was more efficient than pushing down to the database :/.

lieryan · 2024-01-06T19:45:09Z

Hi @tkrabel-db, apologies for the delay. I got sidetracked as I didn't get the performance goal that I was expecting few weeks ago when I initially experimented with fixing the index. But I tried again today with a fresh pair of eyes, and it now worked as I expected, after I found a couple of silly errors when creating the index in my original attempt. Now after applying PR #739, this is more inline with the performance that I was expecting for these operations:

In [2]: from rope.base.project import Project
   ...: from rope.contrib.autoimport.defs import SearchResult
   ...: from rope.contrib.autoimport.sqlite import AutoImport
   ...: 
   ...: import os; os.makedirs('/tmp/bagoD/rope', exist_ok=True)
   ...: project = Project('/tmp/bagoD/rope')
   ...: autoimport = AutoImport(project, memory=False)
   ...: 
   ...: autoimport.generate_cache()  # Generates a cache of the local modules, from the project you're working on
   ...: autoimport.generate_modules_cache()  # Generates a cache of external modules

In [4]: import rope.contrib.autoimport.models as m

In [5]: %time aa = list(autoimport._execute(m.FinalQuery("SELECT * FROM names"), ()))
CPU times: user 23.2 ms, sys: 2.33 ms, total: 25.5 ms
Wall time: 25.1 ms

In [6]: autoimport._executemany(m.Name.objects.insert_into(), aa * 300)
Out[6]: <sqlite3.Cursor at 0x105251110>

In [7]: !du -sh /tmp/bagoD/
1.4G	/tmp/bagoD/

In [8]: %time set(autoimport._execute(m.Name.search_module_like.select_star(), ('abc',)))
    ...: 
CPU times: user 3.26 ms, sys: 2.43 ms, total: 5.7 ms
Wall time: 4.62 ms
Out[8]: 
{('ABC', 'abc', 'abc', 5, 7),
 ('abstractclassmethod', 'abc', 'abc', 5, 7),
 ('abstractmethod', 'abc', 'abc', 5, 3),
 ('abstractproperty', 'abc', 'abc', 5, 7),
 ('abstractstaticmethod', 'abc', 'abc', 5, 7)}

with just the old case sensitive index, the LIKE operations would've been more like a 400ms operation.

if you would update your PR to include the new index as well I can review that again. I think you may need to add an index that looks like this for the alias table (untested):

connection.execute("CREATE INDEX IF NOT EXISTS aliases_alias_nocase ON aliases(alias COLLATE NOCASE)")

tkrabel-db · 2024-01-08T08:47:51Z

@lieryan thanks!
@MrBago this is unblocked

bagel897 · 2024-01-09T18:57:46Z

rope/base/prefs.py

@@ -140,6 +140,22 @@ class Prefs:
        """),
    )

+    import_aliases: List[Tuple[str, str]] = field(


I'm adding an autoimport prefs table in #516 , can we move this there?

@lieryan do you have a preference where the "import_aliases" option goes. I tried moving it, but I was having trouble with the nested Prefs. Specifically I wasn't able to set the prefs for testing here.

MrBago · 2024-01-26T02:14:00Z

@lieryan updated my PR and added the new index. The aliases table should be small so I'm not sure how much of an impact this index will have, but it shouldn't hurt. I tried to optimize the alias query a bit, but I wasn't really able to move the needle. Let me know if you think using an alias table and join like this might be an issue.

MrBago · 2024-01-29T19:25:30Z

@lieryan When you have a few min can you take a look at this PR, I have some time and would love to move this across the finish line.

tkrabel-db · 2024-01-29T20:31:09Z

@lieryan can you prioritize this work so that we have closure? :)

docs/contributing.rst

…aliases

lieryan · 2024-01-30T04:56:22Z

Thanks @MrBago for implementing this PR and @tkrabel-db, @bagel897 for contributing to the discussions.

I've made some changes to the preferences to align the autoimport preferences changes with #516.

@all-contributors add @MrBago for code

allcontributors · 2024-01-30T04:56:25Z

@lieryan

@MrBago already contributed before to code

Bago Amirbekian added 2 commits November 6, 2023 17:42

Add aliases to auto complete search

5b914ef

fixup

4702c3a

MrBago mentioned this pull request Nov 7, 2023

Support aliases in rope_autoimport #712

Closed

tkrabel-db reviewed Nov 7, 2023

View reviewed changes

add import aliases as to prefs config

dcd12ba

lieryan reviewed Dec 17, 2023

View reviewed changes

Fix typos

1665040

bagel897 reviewed Jan 9, 2024

View reviewed changes

Bago Amirbekian added 3 commits January 25, 2024 16:08

Merge branch 'master' into aliases

12c4bbe

minor fixes

fdfbeca

Merge branch 'aliases' of github.com:MrBago/rope into aliases

eda133a

MrBago changed the title ~~[WIP] Add aliases to AutoImport search~~ Match on module aliases for autoimport suggestions Jan 26, 2024

MrBago changed the title ~~Match on module aliases for autoimport suggestions~~ Match on module aliases for auto import suggestions Jan 26, 2024

Bago Amirbekian added 2 commits January 25, 2024 18:10

style

cf83daf

style

d2639bc

MrBago requested review from lieryan and tkrabel-db January 26, 2024 21:13

lieryan reviewed Jan 30, 2024

View reviewed changes

docs/contributing.rst Outdated Show resolved Hide resolved

lieryan and others added 4 commits January 30, 2024 11:37

Remove stray character

1667b67

Document an example of how to configure import_aliases

253fdee

Black

9e494c3

Use the new AutoimportPrefs namespace, move config key to autoimport.…

9134d03

…aliases

Fix import_aliases config key in sample_project

b266add

lieryan merged commit e264c6f into python-rope:master Jan 30, 2024
18 checks passed

lieryan added this to the 1.13.0 milestone Mar 24, 2024

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Match on module aliases for auto import suggestions #730

Match on module aliases for auto import suggestions #730

MrBago commented Nov 7, 2023 •

edited by lieryan

Loading

tkrabel-db left a comment

tkrabel-db Nov 7, 2023

MrBago Nov 7, 2023

tkrabel-db Nov 8, 2023

lieryan Dec 17, 2023

lieryan commented Nov 9, 2023

MrBago commented Dec 1, 2023

lieryan left a comment

lieryan Dec 17, 2023

lieryan Dec 17, 2023

MrBago Dec 20, 2023

MrBago commented Dec 20, 2023

lieryan commented Jan 6, 2024

tkrabel-db commented Jan 8, 2024

bagel897 Jan 9, 2024

MrBago Jan 26, 2024

MrBago commented Jan 26, 2024

MrBago commented Jan 29, 2024

tkrabel-db commented Jan 29, 2024

lieryan commented Jan 30, 2024

allcontributors bot commented Jan 30, 2024

Match on module aliases for auto import suggestions #730

Match on module aliases for auto import suggestions #730

Conversation

MrBago commented Nov 7, 2023 • edited by lieryan Loading

Description

Checklist (delete if not relevant):

tkrabel-db left a comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

lieryan commented Nov 9, 2023

MrBago commented Dec 1, 2023

lieryan left a comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

MrBago commented Dec 20, 2023

lieryan commented Jan 6, 2024

tkrabel-db commented Jan 8, 2024

Choose a reason for hiding this comment

Choose a reason for hiding this comment

MrBago commented Jan 26, 2024

MrBago commented Jan 29, 2024

tkrabel-db commented Jan 29, 2024

lieryan commented Jan 30, 2024

allcontributors bot commented Jan 30, 2024

MrBago commented Nov 7, 2023 •

edited by lieryan

Loading