-
Notifications
You must be signed in to change notification settings - Fork 1.2k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
[Markdown Parser] Add CodeExtractor
trait for extracting source code from a file
#5123
Conversation
@charliermarsh would also like your thoughts on this, if you get the time. |
PR Check ResultsEcosystem✅ ecosystem check detected no changes. BenchmarkLinux
Windows
|
CodeExtractor
traitCodeExtractor
trait for extracting source code from a file
369fa25
to
ae0b684
Compare
ae0b684
to
146d346
Compare
CodeExtractor
trait for extracting source code from a fileCodeExtractor
trait for extracting source code from a file
CodeExtractor
trait for extracting source code from a fileCodeExtractor
trait for extracting source code from a file
Thanks, I'll try to take a look later today! |
|
||
pub trait CodeExtractor<T> { | ||
fn extract_code(path: &Path) -> Result<T, Box<Diagnostic>>; | ||
} |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I think the generic can be omitted entirely here with:
pub trait CodeExtractor: Sized {
fn extract_code(path: &Path) -> Result<Self, Box<Diagnostic>>;
}
@@ -24,3 +28,7 @@ impl SourceKind { | |||
} | |||
} | |||
} | |||
|
|||
pub trait CodeExtractor<T> { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Maybe I'm unable to see ahead to future usages, but I don't quite see the value in introducing this trait as-is, since it's really just defining a constructor for the implementing class -- there aren't any shared methods across trait implementors (e.g., nothing that takes &self
), and so implementing the trait doesn't help us very much. How do you see it evolving?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I was planning on storing the various mapping functions here as well (like mapping the initial/terminal offsets of Markdown code cells to the underlying document or Jupyter cells vs its JSON). I'm finding a lot of similarities between the Markdown parser and the existing Jupyter one, so wanted to consolidate some functionality.
Additionally, a lot of these functions will be applicable for embedded language support (mapping initial and terminal offsets of SQL code in Python, for example).
I felt that there was an opportunity to unify a lot of the stuff between the Markdown and Jupyter parsers which will also extend to other use cases in the larger "multi-language" use case. If you don't think this trait is worth it, though, I'm happy to remove it and just continue on the Markdown parser itself.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
@charliermarsh I think I'm going to close this for now - you're right. I'll keep working on the Markdown parser and then we can look for opportunities to unify stuff down the road. I'll try to design it in such a way that we can easily implement multi-language support for it down the road.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Sounds good, sorry for not getting back to your previous comment. I just want to avoid introducing abstractions before they're needed.
Summary
Will be used when implementing #3792.
This PR adds a
CodeExtractor
trait (and a Jupyter notebook implementation). This trait will be used to extract source code (language-agnostic, potentially range-agnostic once Open Question 1 is answered) from files. It will be used in aMarkdownParser
to extract Python code blocks.Open Questions
extract_code_from_range
method, or change the signature of the existing function to take astr
and pass the in Path contents that way. I'm open to either one, though the latter would obviously require a change in the existing implementation. I think the latter option (changing signature tostr
vs.Path
) would be better.CodeExtractor
trait onto theSourceKind
enum? I thought about it, but am not sure how that enum can be extended for embedded language support down the road. I think we'd have to change it to work at a "code block" level instead of aPath
level. My local branch contains the concept of aBoundedCodeBlock
, but I wonder if that's going to be too much change for now. If there's interest, I can submit the Markdown parser PR with that struct incorporated.Test Plan
cargo t
Note
I had a Markdown language parser PR earlier, but there've been a bunch of changes in main (specifically around Jupyter notebooks) that will prove helpful for the Markdown parser, so I'm essentially restarting the branch. Further development on Markdown parsing will be paused until this gets looked at (I think it's currently deprioritized, but just wanted to let it be known :) ).