-
Notifications
You must be signed in to change notification settings - Fork 167
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Auto-saving and loading modules at compile time #1065
base: main
Are you sure you want to change the base?
Conversation
std::string infile) { | ||
LFORTRAN_ASSERT(LFortran::asr_verify(u)); | ||
std::string modfile_binary = LFortran::save_pycfile(u); | ||
Allocator al(4*1024); | ||
LFortran::SymbolTable *symtab = | ||
al.make_new<LFortran::SymbolTable>(nullptr); | ||
std::vector<std::pair<ASR::Module_t*, SymbolTable*>> module_parent; | ||
for (auto &item : u.m_global_scope->get_scope()) { | ||
if (LFortran::ASR::is_a<LFortran::ASR::Module_t>(*item.second)) { | ||
LFortran::ASR::Module_t *m = LFortran::ASR::down_cast<LFortran::ASR::Module_t>(item.second); | ||
|
||
symtab->add_symbol(std::string(m->m_name), item.second); | ||
module_parent.push_back(std::make_pair(m, m->m_symtab->parent)); | ||
m->m_symtab->parent = symtab; | ||
} | ||
} | ||
|
||
LFortran::Location loc; | ||
LFortran::ASR::asr_t *asr = LFortran::ASR::make_TranslationUnit_t(al, loc, | ||
symtab, nullptr, 0); | ||
LFortran::ASR::TranslationUnit_t *tu = | ||
LFortran::ASR::down_cast2<LFortran::ASR::TranslationUnit_t>(asr); | ||
LFORTRAN_ASSERT(LFortran::asr_verify(*tu)); | ||
|
||
std::string modfile_binary = LFortran::save_pycfile(*tu); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This is the same approach as LFortran i.e., https://github.com/lfortran/lfortran/blob/f07bc66f885d4bc365be140b23a2b62f7e205d85/src/bin/lfortran.cpp#L609-L649 but still this issue persists. The reason I think is because we keep using the same process to compile the modules as well as the main program. However LFortran launches a different process to compile modules and a different one to compile the main program. If we want to follow the auto-compile, load and save approach in LPython then probably we shouldn't compare Symtab ID for exact matches while comparing the reference tests and the ASR output. @certik Thoughts?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Or you can add an internal option to not use pre-compiled modules (i.e., ignore pyc files) but compile every-time while generating reference tests.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
The reason I think is because we keep using the same process to compile the modules as well as the main program.
I made some changes (see in 8317729 which is an experimental commit) where I launched a separate lpython process to compile modules and save them in pyc files. And it worked on my mac. Between consecutive clean builds reference tests matched. So I think the issue is not the one in #992 (comment). The real issue is modules getting compiled with other programs because of which the module's symbol tables are affected. If modules are compiled by an isolated lpython
process then I don't see any such thing happening.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
The tests pass with 8317729 (an experimental commit). So clearly the issue is what I explained in my above comment. Now system
is not a safe way to compile modules in an isolated lpython
process. Some better ways include,
popen
- See the comparison betweensystem
andpopen
in https://stackoverflow.com/a/8538345.spawn
- https://stackoverflow.com/a/55490499
I would say encapsulate the logic to safely compile a module in a API inside LPython and then call it everywhere. Current approach in main
is very non-deterministic so we need to change it anyways.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
We should not be using any kind of isolated process. We simply can always compile using a new class for compilation. We just have to handle the symbol table ID correctly, for example as a local variable, etc.
Furthermore, we should use exactly the same code with LFortran, not maintain two separate codes to load/save ASR to mod files.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Furthermore, we should use exactly the same code with LFortran, not maintain two separate codes to load/save ASR to mod files.
The code and logic is exactly the same. The difference between LFortran and LPython is explained below,
The reason I think is because we keep using the same process to compile the modules as well as the main program. However LFortran launches a different process to compile modules and a different one to compile the main program.
Basically LFortran compiles modules according to the instructions in the build scripts (CMakeLists.txt). However LPython compiles modules on the fly. So a module compiled with one program will give different ASR when compiled along with another program. This is not the issue in LFortran because there we compile the modules, save them in .mod
files before compiling the main program. In fact that's why such handling is not present in LFortran I think because LFortran doesn't compile modules on the fly.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
So let's think what happens when we compile modules on the fly. Consider three files a.py
, b.py
and module.py
. a.py
and b.py
import module.py
.
Case 1
We execute, lpython a.py
first and then lpython b.py
. Since we compile modules on the fly during execution of lpython a.py
, when we will reach the following call to compile module.py
,
lpython/src/lpython/semantics/python_ast_to_asr.cpp
Lines 243 to 244 in d4e2f74
Result<ASR::TranslationUnit_t*> r2 = python_ast_to_asr(al, *ast, | |
diagnostics, false, true, false, infile); |
some symbol tables would have been created already by then. Say n
symbol tables were created. So the ID of first symbol table while compiling module.py
will be n
(symbol table counter starts from 0
and its global for a single lpython
process). The same thing will be stored in module.pyc
. So minimum symbol table ID in module.pyc
will be n
.
Now once the module.pyc
is generated we will continue compiling a.py
and its symbol table IDs will start from n + number_of_symtabs_in_module
.
Case 2
After cleaning our repository, we execute lpython b.py
first and then lpython a.py
. Now again we will reach the following call,
lpython/src/lpython/semantics/python_ast_to_asr.cpp
Lines 243 to 244 in d4e2f74
Result<ASR::TranslationUnit_t*> r2 = python_ast_to_asr(al, *ast, | |
diagnostics, false, true, false, infile); |
However this time (since we are compiling b.py) say m
symbol tables were created till the above call. So now minimum symbol table ID in module.pyc
will be m
.
Now once the module.pyc
is generated we will continue compiling b.py
and its symbol table IDs will start from m + number_of_symtabs_in_module
.
So you can see how compiling on the fly produces different ASRs for a.py
and b.py
because after compiling module.py
on the fly, SymbolTable::counter
continues from where it left after completing module.py
. This affects the program's ASR as well.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Yes, the symbol table ID will differ based on who compiled it. This is not a problem, as the deserialization renumbers the Symbol IDs already. LFortran also has to (and does) it.
What needs to be done to complete this PR? |
|
||
#ifdef _WIN32 | ||
#define stat _stat | ||
#endif |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This should be moved to a separate file, the python_ast_to_asr
should be platform independent.
mod1 = compile_module_till_asr(al, rl_path, infile, loc, err); | ||
// mod1 = compile_module_till_asr(al, rl_path, infile, loc, err); | ||
std::string cmd = "lpython -c --disable-main " + infile; | ||
system(cmd.c_str()); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
For now this is fine, but it is fragile (it will fail if lpython
is not in path, or a different lpython
gets picked up). We will later rework it to simply call the appropriate C++ functions directly to do the compilation.
I think all that is needed is to rebase this on top of the latest master and address some of the minor points above and get everything polished, so that we can merge. |
66e23be
to
47726f0
Compare
It seems the tests pass locally, but fail at the |
#992 (comment)