Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

feat: top level imports #3779

Merged
merged 19 commits into from
Feb 18, 2025
Merged

feat: top level imports #3779

merged 19 commits into from
Feb 18, 2025

Conversation

dmadisetti
Copy link
Collaborator

📝 Summary

Followup to #3755 for #2293 allowing for "top level imports"

For completion of #2293, I thin UI changes and needed for enabling this behavior. Notably:

  • Indicate when in function mode (maybe top level import too)
  • Provide hints when pushed out of function mode
  • Maybe allow the user to opt out of function mode?

+ docs

This also increases security risk since code is run outside of runtime. This was always possible, but now marimo can save in a format that could skip the marimo runtime all together on restart.

There are opportunities here. marimo could lean into this, and leverage external code running as a chance to hook in (almost a plugin system for free)

But also issues, since a missing dep could stop the notebook from running at all (goes against the "batteries included" ethos). This can be mitigated with static analysis over just an import (markdown does this for instance), or marimo can re-serialize the notebook in the "safe" form, if it comes across issues in import.

🔍 Description of Changes

Includes a bit of a refactor to codegen since there were a fair amount of changes.
Allows top level imports of "import only" cells. The contents are pasted at the top of the file, with a bit of care not to break header extraction.

# Normal headers are retained
# Use a notice to denote where generated imports start
# Notice maybe needs some copy edit

# 👋 This file was generated by marimo. You can edit it, and tweak
# things- just be conscious that some changes may be overwritten if opened in
# the editor. For instance top level imports are derived from a cell, and not
# the top of the script. This notice signifies the beginning of the generated
# import section.

# Could also make this app.imports? But maybe increasing surface area for no reason
import numpy
# Note, import cells intentionally do not have a `return`
# for static analysis feature below

import marimo


__generated_with = "0.11.2"
app = marimo.App(_toplevel_fn=True)


@app.cell
def import_cell():
    # Could also make this app.imports? But maybe increasing surface area for no reason
    import numpy
    # Note, import cells intentionally do not have a `return`
    # for static analysis feature below

Top level refs (this includes @app.functions) are ignored in the signatures. E.g.

import marimo as mo

# ...

@app.cell
def md_cell():
    mo.md("Hi")
    return 

Since I was also in there, I added static analysis to ignore returning dangling defs.

@app.cell
def cell_with_dangling_def():
    a = 1
    b = 2
    return (a,) # No longer returns b since it's not used anywhere. Allowing for linters like ruff to complain.

@app.cell
def ref_cell(a):
    a + 1
    return 

LMK if too far reaching and we can break it up/ refactor. A bit more opinionated than the last PR

Test border more on being more smoke tests than unit tests, but hit the key issues I was worried about. I can break them down more granularly if needed. Also LMK if you can think of some more edgecases.

📜 Reviewers

@akshayka OR @mscolnick

Copy link

vercel bot commented Feb 13, 2025

The latest updates on your projects. Learn more about Vercel for Git ↗︎

Name Status Preview Comments Updated (UTC)
marimo-docs ✅ Ready (Inspect) Visit Preview 💬 Add feedback Feb 18, 2025 3:21pm
marimo-storybook ✅ Ready (Inspect) Visit Preview 💬 Add feedback Feb 18, 2025 3:21pm

@leventov
Copy link

@dmadisetti what's the matter with hashing top-level functions in hash.py, Hasher class and surrounding logic? Will it be treated as "normal" top-level/"pure" functions whose code is hashed in the module_hash calculation, with no special treatment needed? Or maybe serialize_and_dequeue_content_refs() can add a check if the function is app.fn for fast-track instead of calling is_pure_function()? Or, is_pure_function() should be changed itself to add such a fast-track? Should/could the code hash of app.fns be saved in a field of the corresponding Cell such that it doesn't need to be re-computed?

@dmadisetti
Copy link
Collaborator Author

@leventov caching is dependent on the runtime of the app. This PR is more to expose cells as usable functions to be exported from other module + some tweaks to make notebooks look more "pythonic" for linters. When marimo first loads this file, no runtime has been initialized.

Cell level caching is going to be coupled more with changes in _runtime.executor

@dmadisetti
Copy link
Collaborator Author

Eh. Just noticed import cells without a return are not liked by ruff. That was a bit of a last minute choice to try and clean up the whitespace- I'll put it back in

Copy link
Contributor

@akshayka akshayka left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Nice, getting closer to the ideal of reusable code!

Most of the below is discussion — doesn't need to be immediately addressed, but should be addressed before top level functions are enabled.

But also issues, since a missing dep could stop the notebook from running at all (goes against the "batteries included" ethos).

Yea, this is an issue for sure. As long as the user has marimo installed, marimo edit nb.py should always work, no matter if top-level imports are missing.

This can be mitigated with static analysis over just an import (markdown does this for instance), or marimo can re-serialize the notebook in the "safe" form, if it comes across issues in import.

The former option sounds better. I wonder if we should define the file format to consist of three sections:

  1. A user-defined section, containing arbitrary text (the "header"), except for perhaps a special delimiter token.
  2. A generated section containing top-level imports, if they are missing from the user-defined section, followed by a special delimiter.
  3. Today's generated section:
import marimo

__generated_with = ...
app = marimo.App()

@app.function
def foo():
  ...

@app.cell
def bar():
  ...

In this way, marimo's Python file reader would simply skip sections (1) and (2) (based on the presence of the delimiter token), and programmatically read section 3 as it does today. If the delimiter were missing (user edited the file, or wrote from scratch), marimo would try to read the file programmatically as it does today. Just one proposal, and maybe this is similar to what you've implemented, but I do think it's worth it to write a specification for this very concretely and to document it in the codebase.

I think we should also very clearly define and document what is okay for the user to edit, and how, and what is not okay. One proposal: section 1 is fine to edit arbitrarily (except for a special delimiter?); section 2 should not be edited; section 3's cell and function definitions can be edited, cells and functions can be added, and cells and functions can be removed.

Comment on lines 291 to 302
if cell.import_workspace.is_import_block:
# maybe a bug, but import_workspace.imported_defs does not
# contain the information we need.
toplevel_imports |= cell.defs
if toplevel_fn:
# TODO: Consider fn="imports" for @app.imports?
# Distinguish that something is special about the block
# Also remove the "return" in this case.
definitions[idx] = to_general_functiondef(cell, names[idx])
else:
definitions[idx] = to_functiondef(cell, names[idx])
import_blocks.append(code.strip())
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

If only import blocks are used, then in the below, foo won't get saved as a function. I can see this being a bit confusing for users. I'm wondering if imports could be saved top-level even if they weren't in import blocks.

cell:

import random
...

Another cell

def foo():
  return random.randint(0, 43)

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yes they can- but I think restricting to import only blocks makes sense. Consider the following block:

@app.cell
def _(run_button):
    mo.stop(run_button.value)
    import something_very_expensive_with_side_effects

# notice to separate the imports from the rest of the code.
filecontents = [NOTICE, ""]

filecontents.append("\n\n".join(import_blocks))
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Should imports be added unconditionally, as you've written, or should imports only be added if they are used in top-level functions?

One thought, if imports are added to the top of the file unconditionally, perhaps we should remove their corresponding defs from cell signatures, so that code completion in editors works better. However, maybe the right thing to do is just bite the bullet and write editor plugins / an LSP-like thing that handle completions for marimo notebook files, in which case my suggestion here is moot.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Also, can we ruff format the import section?

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

perhaps we should remove their corresponding defs from cell signatures, so that code completion in editors works better

Yep, this PR already does this

Also, can we ruff format the import section?

Yes, I'm leaning towards removing the statement block, stripping comments and formatting the imports.
Thoughts?

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think that sounds good ... also see my response to your import guard idea.

@dmadisetti
Copy link
Collaborator Author

I think we should also very clearly define and document what is okay for the user to edit, and how, and what is not okay. One proposal: section 1 is fine to edit arbitrarily (except for a special delimiter?); section 2 should not be edited; section 3's cell and function definitions can be edited, cells and functions can be added, and cells and functions can be removed.

I was struggling with this because I recognized having the many imports mixed with comments seemed to leave the notebook feeling a little messy, and more confusing to the intro user. I also think that for the most part, the current serialization is great.

I wonder if part of the UI is a "library mode" flag which is required before activating this. Means we don't have to communicate this information to the casual user, and the user looking for the functionality of exports, reuse, and linting will take the time to understand "library mode".

But also, here's another potential serialization that makes these "sections" a bit more evident:

# Header comments
"""Doc strings allowed too"""

import marimo                                                                                                                                                                                                                     
                                                                                                                                                                                                                                  
if marimo.import_guard():
    # Note these imports reflect the cell content below.                                                                                                                                                                          
    # Editing this block will not change the notebook imports.                                                                                                                                                                            
    import io                                                                                                                                                                                                                     
    import textwrap                                                                                                                                                                                                               
    import typing                                                                                                                                                                                                                 
    from pathlib import Path                                                                                                                                                                                                      
                                                                                                                                                                                                                                  
    import marimo as mo                                                                                                                                                                                                           
                                                                                                                                                                                                                                  
                                                                                                                                                                                                                                  
__generated_with = "0.11.2"                                                                                                                                                                                                       
app = marimo.App(_toplevel_fn=True)                                                                                                                                                                                               
                                          
...

Which also mitigates potential breakage, since marimo.import_guard() could always return False, and still keep linters happy.
I'm sold on reformatting the imports and stripping comments before serialization.

@akshayka
Copy link
Contributor

Yea, appreciate your attention to the intro user.

Hmm, I'd prefer not to introduce a library mode if possible, but can consider it. As an alternative I think the import_guard() idea is interesting. But it would need to return True sometimes right? For example, given

# Header comments
"""Doc strings allowed too"""

import marimo                                                                                                                                                                                                                     
                                                                                                                                                                                                                                  
if marimo.import_guard():
  import numpy as np
                                                                                                                                                                                        
                                                                                                                                                                                                                                                                                                                                                                                                                                                                   
                                                                                                                                                                                                                                  
__generated_with = "0.11.2"                                                                                                                                                                                                       
app = marimo.App(_toplevel_fn=True)  

@app.function
def my_function():
  return np.random.randn(10, 10)                                                                                                                                                                                                                                    
...

for

```python
from my_notebook import my_function

to work, import_guard() would need to evaluate as True. Maybe import_guard would by default be True, but perhaps when reading notebook files in marimo, we'd have a context manager:

with marimo._ast.block_imports():  # makes import_guard() evaluate to False.
  # load the notebook ...

Not sure yet if this is a good idea. Just brainstorming ...

@dmadisetti
Copy link
Collaborator Author

import_guard was relatively easy to put in, and we can strip it out. I have import_guard return True for now, but there area a few cases where False might make sense

Copy link
Contributor

@akshayka akshayka left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Think this is looking pretty good — left one comment.

Are you leaning toward keeping import_guard, or using the comments-imports-import-marimo-as-mo alternative? One thing to note is that as implemented, linters may complain when symbols imported in the guard are used later on. At least PyRight does:

image

@@ -57,3 +57,7 @@ def get_mode() -> Optional[RunMode]:
return "test"

return None


def import_guard() -> bool:
Copy link
Contributor

@akshayka akshayka Feb 16, 2025

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

If we're keeping import guard:

  1. Can you document the cases in which you think it makes sense to have this return False?
  2. Can you add a TODO to the code where we load marimo notebooks to patch this function to return False? (Or just implement the patch, since it should be small.)

@mscolnick
Copy link
Contributor

mscolnick commented Feb 16, 2025

Personally i like the comment instead of the import guard.

Although its fragile, it looks a lot cleaner. Could it have "# magic comment: ..."? Or can brainstorm something concise but also clear why it's there.

@dmadisetti
Copy link
Collaborator Author

Stripping the imports of comments and using ruff means that we don't have to rely on the comment guard any more, we can just use import marimo to seperate. E.g.

# Normal headers are retained
# Use a notice to denote where generated imports start
# Notice maybe needs some copy edit

# this is still a header comment, `import marimo` signifies the import section

import marimo

# Notice can still be generated here, but not used to parse
import numpy
import ...

__generated_with = "0.11.2" # signifies the end of the import block
app = marimo.App(_toplevel_fn=True)

@akshayka
Copy link
Contributor

To double check: in this proposal, none of the notices are used to parse, correct? And instead the notebook would be parsed by first excising the code between import marimo and __generated_with, then proceeding as today?

@akshayka
Copy link
Contributor

Correct! See the commit I just pushed. Small change

I think static parsing should maybe be another blocker for release (re, batteries included)

I see, thanks.

Also saw your follow up comment on ruff reorganizing the import block, which would have broken our proposed static parsing.

In light of that, I think a conditional or context manager could make more sense. Robustness is important, and avoiding static parsing feels more robust. We could workshop naming of the import guard, but to me the construct feels relatively natural (reminds me of if TYPE_CHECKING).

@dmadisetti
Copy link
Collaborator Author

One thing to note is that as implemented, linters may complain when symbols imported in the guard are used later on. At least PyRight does:

I checked this in pyright, and it's only if the module is used top level. For instance:

if bar():
    import my_context
    
@app.function
@my_context      # pyright will complain, and actually potentially dangerous to serialize (in case my_context fails or has side effects)
def fn(): ...

@app.function
def fn2():
    return mo.md("") # this is fine

I was considering restricting serialization for decorators, but the appeal of pytest I think is too strong.
Note, it's not smart enough to make this suggestion in a context block:

with marimo.import_notice:
    import my_context
    
@app.function
@my_context      # no complaints
def fn(): ...

and the with block can still be skipped similar to trick in cache

@akshayka
Copy link
Contributor

I checked this in pyright, and it's only if the module is used top level. For instance:

Thanks for checking.

In that case, and in light of the tradeoffs we've discussed, should we revert to a programmatic import guard for now, then merge? We can keep the symbol private until we've finalized the API/ready to be released (if marimo._imports()).

@dmadisetti
Copy link
Collaborator Author

I put in with marimo.input_guard() so there's reference to a commit for the MEP.

I think it's fair to get something in to ensure there's not a blockage, but will freeze further work until there's consensus with the design doc

akshayka
akshayka previously approved these changes Feb 18, 2025
Copy link
Contributor

@akshayka akshayka left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM! modulo CI

Copy link

🚀 Development release published. You may be able to view the changes at https://marimo.app?v=0.11.7-dev8

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

4 participants