Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Auto-saving and loading modules at compile time #1065

Draft
wants to merge 9 commits into
base: main
Choose a base branch
from

Conversation

czgdp1807
Copy link
Collaborator

Comment on lines +166 to +190
std::string infile) {
LFORTRAN_ASSERT(LFortran::asr_verify(u));
std::string modfile_binary = LFortran::save_pycfile(u);
Allocator al(4*1024);
LFortran::SymbolTable *symtab =
al.make_new<LFortran::SymbolTable>(nullptr);
std::vector<std::pair<ASR::Module_t*, SymbolTable*>> module_parent;
for (auto &item : u.m_global_scope->get_scope()) {
if (LFortran::ASR::is_a<LFortran::ASR::Module_t>(*item.second)) {
LFortran::ASR::Module_t *m = LFortran::ASR::down_cast<LFortran::ASR::Module_t>(item.second);

symtab->add_symbol(std::string(m->m_name), item.second);
module_parent.push_back(std::make_pair(m, m->m_symtab->parent));
m->m_symtab->parent = symtab;
}
}

LFortran::Location loc;
LFortran::ASR::asr_t *asr = LFortran::ASR::make_TranslationUnit_t(al, loc,
symtab, nullptr, 0);
LFortran::ASR::TranslationUnit_t *tu =
LFortran::ASR::down_cast2<LFortran::ASR::TranslationUnit_t>(asr);
LFORTRAN_ASSERT(LFortran::asr_verify(*tu));

std::string modfile_binary = LFortran::save_pycfile(*tu);
Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This is the same approach as LFortran i.e., https://github.com/lfortran/lfortran/blob/f07bc66f885d4bc365be140b23a2b62f7e205d85/src/bin/lfortran.cpp#L609-L649 but still this issue persists. The reason I think is because we keep using the same process to compile the modules as well as the main program. However LFortran launches a different process to compile modules and a different one to compile the main program. If we want to follow the auto-compile, load and save approach in LPython then probably we shouldn't compare Symtab ID for exact matches while comparing the reference tests and the ASR output. @certik Thoughts?

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Or you can add an internal option to not use pre-compiled modules (i.e., ignore pyc files) but compile every-time while generating reference tests.

Copy link
Collaborator Author

@czgdp1807 czgdp1807 Aug 30, 2022

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The reason I think is because we keep using the same process to compile the modules as well as the main program.

I made some changes (see in 8317729 which is an experimental commit) where I launched a separate lpython process to compile modules and save them in pyc files. And it worked on my mac. Between consecutive clean builds reference tests matched. So I think the issue is not the one in #992 (comment). The real issue is modules getting compiled with other programs because of which the module's symbol tables are affected. If modules are compiled by an isolated lpython process then I don't see any such thing happening.

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The tests pass with 8317729 (an experimental commit). So clearly the issue is what I explained in my above comment. Now system is not a safe way to compile modules in an isolated lpython process. Some better ways include,

I would say encapsulate the logic to safely compile a module in a API inside LPython and then call it everywhere. Current approach in main is very non-deterministic so we need to change it anyways.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

We should not be using any kind of isolated process. We simply can always compile using a new class for compilation. We just have to handle the symbol table ID correctly, for example as a local variable, etc.

Furthermore, we should use exactly the same code with LFortran, not maintain two separate codes to load/save ASR to mod files.

Copy link
Collaborator Author

@czgdp1807 czgdp1807 Aug 30, 2022

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Furthermore, we should use exactly the same code with LFortran, not maintain two separate codes to load/save ASR to mod files.

The code and logic is exactly the same. The difference between LFortran and LPython is explained below,

The reason I think is because we keep using the same process to compile the modules as well as the main program. However LFortran launches a different process to compile modules and a different one to compile the main program.

Basically LFortran compiles modules according to the instructions in the build scripts (CMakeLists.txt). However LPython compiles modules on the fly. So a module compiled with one program will give different ASR when compiled along with another program. This is not the issue in LFortran because there we compile the modules, save them in .mod files before compiling the main program. In fact that's why such handling is not present in LFortran I think because LFortran doesn't compile modules on the fly.

Copy link
Collaborator Author

@czgdp1807 czgdp1807 Aug 30, 2022

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

So let's think what happens when we compile modules on the fly. Consider three files a.py, b.py and module.py. a.py and b.py import module.py.

Case 1

We execute, lpython a.py first and then lpython b.py. Since we compile modules on the fly during execution of lpython a.py, when we will reach the following call to compile module.py,

Result<ASR::TranslationUnit_t*> r2 = python_ast_to_asr(al, *ast,
diagnostics, false, true, false, infile);

some symbol tables would have been created already by then. Say n symbol tables were created. So the ID of first symbol table while compiling module.py will be n (symbol table counter starts from 0 and its global for a single lpython process). The same thing will be stored in module.pyc. So minimum symbol table ID in module.pyc will be n.

Now once the module.pyc is generated we will continue compiling a.py and its symbol table IDs will start from n + number_of_symtabs_in_module.

Case 2
After cleaning our repository, we execute lpython b.py first and then lpython a.py. Now again we will reach the following call,

Result<ASR::TranslationUnit_t*> r2 = python_ast_to_asr(al, *ast,
diagnostics, false, true, false, infile);

However this time (since we are compiling b.py) say m symbol tables were created till the above call. So now minimum symbol table ID in module.pyc will be m.

Now once the module.pyc is generated we will continue compiling b.py and its symbol table IDs will start from m + number_of_symtabs_in_module.

So you can see how compiling on the fly produces different ASRs for a.py and b.py because after compiling module.py on the fly, SymbolTable::counter continues from where it left after completing module.py. This affects the program's ASR as well.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yes, the symbol table ID will differ based on who compiled it. This is not a problem, as the deserialization renumbers the Symbol IDs already. LFortran also has to (and does) it.

@Thirumalai-Shaktivel
Copy link
Collaborator

Thirumalai-Shaktivel commented Oct 9, 2022

What needs to be done to complete this PR?


#ifdef _WIN32
#define stat _stat
#endif
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This should be moved to a separate file, the python_ast_to_asr should be platform independent.

mod1 = compile_module_till_asr(al, rl_path, infile, loc, err);
// mod1 = compile_module_till_asr(al, rl_path, infile, loc, err);
std::string cmd = "lpython -c --disable-main " + infile;
system(cmd.c_str());
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

For now this is fine, but it is fragile (it will fail if lpython is not in path, or a different lpython gets picked up). We will later rework it to simply call the appropriate C++ functions directly to do the compilation.

@certik
Copy link
Contributor

certik commented Oct 12, 2022

I think all that is needed is to rebase this on top of the latest master and address some of the minor points above and get everything polished, so that we can merge.

@Thirumalai-Shaktivel Thirumalai-Shaktivel force-pushed the mod02 branch 2 times, most recently from 66e23be to 47726f0 Compare October 14, 2022 13:45
@Shaikh-Ubaid
Copy link
Collaborator

It seems the tests pass locally, but fail at the CI.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

4 participants