Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

[FEATURE] FunSQL Representation of Schemas #37

Open
TheCedarPrince opened this issue Sep 13, 2023 · 3 comments
Open

[FEATURE] FunSQL Representation of Schemas #37

TheCedarPrince opened this issue Sep 13, 2023 · 3 comments

Comments

@TheCedarPrince
Copy link
Member

Hey @DilumAluthge ,

I was wondering if it would be alright with you if I added in a new feature that, when a particular OMOP CDM version is generated, there is also a function to generate FunSQL representations of that version. The reason being is it would be nice to have this sort of representative schema when developing JuliaHealth tools for OMOP CDM analysis where FunSQL is the backend.

We'd add a test suite and checker. Also we could make FunSQL a conditional dependency if that works for you.

What do you think? Use case was over here: JuliaHealth/OMOPCDMCohortCreator.jl#56

~ tcp 🌳

@DilumAluthge
Copy link
Member

That sounds great!

@TheCedarPrince
Copy link
Member Author

Hey @Jay-sanjay , to speak more about this issue, here's the way I see about addressing it as straightforward as possible with some additional background:

If you recall, within OMOPCDMCohortCreator, we always write at the beginning:

GenerateDatabaseDetails(:sqlite, "main")

GenerateTables(conn)

The GenerateTables function does something perhaps a little odd in that it scans the database we are connecting to and generates tables from there -- it is assumed that the database we are connecting to is an OMOP CDM formatted database to generate all the tables from the specification (see table below).

image

Here is how the GenerateTables function works:

function GenerateTables(conn; inplace = true, exported = false)

    db_info = reflect(conn; schema = schema, dialect = dialect)

    if inplace == true
        for key in keys(db_info.tables)
            @eval global $(Symbol(lowercase(string(key)))) = $(db_info[key])
            @info "$(lowercase(string(key))) table generated internally"
        end

    end

    if exported == true
        tables = Dict()
        for key in keys(db_info.tables)
            push!(tables, Symbol(lowercase(string(key))) => db_info[key])
            @info "$(lowercase(string(key))) table generated publicly"
        end

        return tables

    end

    return conn

end

So instead, what we want to have is, optionally internal to OMOPCDMCohortCreator, FunSQL representations of the OMOP CDM tables rather than having to call GenerateTables every single time. The goal is to reduce the boilerplate one has to write when using OMOPCDMCohortCreator as well as hopefully make things a bit faster.

What I imagine to do to solve this is the following:

First, we go back to the OMOPCommonDataModel package and add some new features to it. Right now, that package generates bindings for ONLY v5.3.1 of the OMOP CDM. This is not great as there exists v5.4 of the CDM. So, to address this, I think we should redownload the SQL Server DDL for 5.3 here: https://github.com/OHDSI/CommonDataModel/tree/main/inst/ddl and then test out the package locally to produce the 5.3 bindings. You can do that by replacing the file https://github.com/JuliaHealth/OMOPCommonDataModel.jl/tree/3f869a34cc6c1c7ac47f9911500830fd9a795590/assets with the DDL file you download. Then, you follow the instructions here: https://juliahealth.org/OMOPCommonDataModel.jl/stable/generate/ If everything worked properly, then v5.3.1 should be regenerated and we know the state of the art here is good!

The next step would then be to make sure that we can generate multiple versions of the OMOP CDM using the DDL's here: https://github.com/OHDSI/CommonDataModel/tree/main/inst/ddl as you can see, there is just 5.3 and 5.4 which is fine. But the autogeneration pipeline within this package instead should perhaps give one the ability to generate what version they want. How I would go about getting this set-up is to instead download the 5.4 DDL of the SQLServer DDL and test out the autogeneration pipeline as normal. If everything worked as expected (which it may not since it is a different version), everything should now be updated to 5.4 instead of 5.3 within the package. From here, you'd need to think about how to make the package able to allow one to generate information for either 5.3 or 5.4 on demand.

Now, getting to FunSQL, we can use a package extension approach. We could define a part within this package that defines a function like cdm2funsql or something that will take all the generated structs and create FunSQL Table representations of each struct. This part is probably the easiest.

Finally, we can then include OMOPCommonDataModel as a dependency within OMOPCDMCohortCreator itself and replace most the GenerateTable function with this cdm2funsql function call to generate the tables needed inside the package. And then, that addresses this issue and the issue within OMOPCDMCohortCreator.

Hopefully that helps better with what one needs to do here!

@Jay-sanjay
Copy link
Member

Hi @DilumAluthge I was going through the issue the way it was suggested by @TheCedarPrince and unable to get what needs to be done that the generator function parses the new version of the file correctly .Can give any insight/suggestion on what to do to update the generator piece. Thanks

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

3 participants