Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

SQLite: Enable Decimal #19635

Open
Tracked by #22950
bricelam opened this issue Jan 18, 2020 · 71 comments
Open
Tracked by #22950

SQLite: Enable Decimal #19635

bricelam opened this issue Jan 18, 2020 · 71 comments

Comments

@bricelam
Copy link
Contributor

bricelam commented Jan 18, 2020

Similar to PR #19617, we can enable more decimal operations by leveraging UDFs. Unlike TimeSpan, however, there isn't an acceptable type we can convert to to perform the operations, so it will require roughly one UDF per operation. For example:

Done .NET SQL
m1 + m2 ef_add($m1, $m2)
m1 / m2 ef_divide($m1, $m2)
m1 > m2 ef_compare($m1, $m2) > 0
m1 >= m2 ef_compare($m1, $m2) >= 0
m1 < m2 ef_compare($m1, $m2) < 0
m1 <= m2 ef_compare($m1, $m2) <= 0
m1 % m2 ef_mod($m1, $m2)
m1 * m2 ef_multiply($m1, $m2)
m1 - m2 ef_add($m1, ef_negate($m2))
-m ef_negate($m)
Average(t => t.Decimal) ef_avg(t.Decimal)
  Max(t => t.Decimal) ef_max(t.Decimal)
  Min(t => t.Decimal) ef_min(t.Decimal)
Sum(t => t.Decimal) ef_sum(t.Decimal)
  OrderBy(t => t.Decimal) ORDER BY t.Decimal COLLATE EF_DECIMAL
  OrderByDescending(t => t.Decimal) ORDER BY t.Decimal COLLATE EF_DECIMAL DESC
@bricelam
Copy link
Contributor Author

bricelam commented Jan 18, 2020

I don’t think there’s much we can do for OrderBy...

@ajcvickers ajcvickers added this to the Backlog milestone Jan 21, 2020
@bricelam bricelam added the good first issue This issue should be relatively straightforward to fix. label Jan 22, 2020
@lokalmatador
Copy link
Contributor

Hi, first time here looking to contribute to OSS projects. Being tagged as a good first issue, maybe I can give it a shot?

@bricelam
Copy link
Contributor Author

@lokalmatador Sure! You might want to start small on this one (e.g. just do ef_compare and it’s operators) in your first PR.

@lokalmatador
Copy link
Contributor

Great! I guess I start out by looking at your changes in the above mentioned PR (PR#19617) to get an idea on how to possibly achieve this. Also, I may ask one or another question the next days :)

@bricelam
Copy link
Contributor Author

Feel free to start a draft PR to ask questions about the implementation

@bricelam
Copy link
Contributor Author

(Added modulo as part of PR #20024)

@lokalmatador
Copy link
Contributor

First of, thanks for helping me in getting my first contribution done. I hope I wasn't too much of a burden. Would it be OK to continue on this task? I'd be happy to.

@bricelam
Copy link
Contributor Author

bricelam commented Mar 4, 2020

No burden at all! Please feel free to continue

@lokalmatador
Copy link
Contributor

Well, then I'd continue with ef_{add, divide, multiply, subtract} and ef_negate.

@lokalmatador
Copy link
Contributor

Draft PR is here

@bricelam
Copy link
Contributor Author

bricelam commented Apr 7, 2020

So, @JonPSmith got me thinking again about ORDER BY and I think we could actually do it via collation.

connection.CreateCollation(
    "EF_DECIMAL",
    (x, y) => decimal.Compare(decimal.Parse(x), decimal.Parse(y));
SELECT *
FROM MyTable
ORDER BY DecimalColumn COLLATE EF_DECIMAL

@JonPSmith
Copy link

JonPSmith commented Apr 8, 2020

Wow, I wouldn't have thought of that!

I did DM @ErikEJ about this and it put me on to #18593 and I saw the Sqlite limitations page.

I use Sqlite in-memory as much as possible for unit testing. That's because a lot of people use 'standard' EF Core, i.e. don't use any of the SQL-specific parts, which means Sqlite in-memory works really well. I used Sqlite in-memory in my last client's project and they had an Azure CI/CD pipeline that includes unit tests and Sqlite in-memory works great!

So, of course, I am interested in differences between Sqlite and say SQL Server (a typical database people use). Other than string case sensitivity and filter/order on decimal, then Sqlite in-memory is a good fit for unit testing. I suspect I will go with a ValueConvertion to double if the database provider is Sqlite to overcome the decimal limitation.

PS. If you guys have any clever way to manage a real database in a CI/CD pipeline then I would love to know. Clearly I could call EnsureDeleted and then EnsureCreated in every unit test, but that is so slow!

@ajcvickers
Copy link
Contributor

@JonPSmith In my experience, it's only the creation and dropping of databases that is really slow. (Also, dropping a database destabilizes SQL Server, so we avoid ever doing it when running tests on the C.I.) We have a database cleaner that uses our reverse engineering code to clear the database instead of dropping and re-creating it. We haven't productized it, but we're considering it. Also, it looks like @jbogard's Respawn does something similar.

@JonPSmith
Copy link

JonPSmith commented Apr 8, 2020

Hi @ajcvickers,

Yes, I have a cleaner too but I think yours might be better, but the real problem is migrations, i.e. the Model changes during development, which is going to happen say 2% (??) of the push to CI/CD pipeline.

I have a fix for handling changes to the Model that works well locally (and used it in real apps), i.e.

  1. Create unique database names for unit tests, but with a given prefix.
  2. If the Model changes the developer has to manually run a method that deletes all the databases with names with that prefix.

That works locally, but not for CI/CD. Here are two ideas I am thinking about.

  1. At the start of each CI/CD I use my method to delete all the unit test databases before running any unit tests in the pipeline. That simple(ish), but it will be slow for the 98% times there is no Model change. Maybe speed in CI/CD isn't so important(?)
  2. Add a test ahead of the 'delete all unit test databases' that checks if migrations are needed to a 'canary database'. If a migration was applied then it delete all the unit test databases. Quicker in 98%, but feels fragile.

Sorry, I'm unloading my ideas but I want to come up with something for an "article". If its off-topic then please ignore.

@ajcvickers
Copy link
Contributor

@JonPSmith The key thing about the cleaner is that it can't be based on the EF model otherwise, as you point out, when the model changes it may no longer match the database structure. That's why the EF one uses the reverse engineering code to figure out what is in the database and then drop it all. The process used in the EF C.I. is that every test database is created if it doesn't already exist, and then cleaned. The database is then never dropped.

@JonPSmith
Copy link

@ajcvickers Wow - that is so cool! I want it 😄

Obviously I hadn't looked at your cleaner properly before, but I have now. Its pretty complex so I want to reflect back what I think your cleaner does so you can correct me. I think it does the following:

  1. If there is an existing database it
    1. Uses the reverse engineering code to read the schema
    2. The sql code in the cleaner then deletes all the indexes, foreign key (constraints?), tables, and sequences
    3. Then it runs some code given by the caller - for SQL Server this has code to delete other things like views, functions etc.
  2. Then it creates the tables etc. defined by the current Model. I presume its uses the code in context.Database.EnsureCreated

Is that correct? If so that solves the C.I. problem, but actually makes the local unit testing much better too (no need to run a method when the Model changes). And I tried a few of your unit tests and the initial test is pretty quick too.

The next question is, can the SqlServerDatabaseCleaner be made available easily? The RelationalDatabaseCleaner doesn't access anything internal, but SqlServerDatabaseCleaner does.

Any comments?

@jbogard
Copy link

jbogard commented Apr 9, 2020 via email

@njkoopman
Copy link

We manly because we use SQLite for unit-testing and until now we are not able to unit-test any functions that sum, max, min or avg decimals in the database.

It's generally discouraged to use a different database for testing (e.g. swap SQL Server for SQLite), since databases - and their providers - have different capabilities; this is an example of such a difference. See our testing docs for more information on this.

I think most of us here know there are tradeoffs when using a database like SQLite for testing. If we didn't think some of those tradeoffs still made testing with SQLite a good choice for several scenarios, then we would not be requesting this feature.

I agree, this is also true in our case.

It's generally discouraged to use a different database for testing (e.g. swap SQL Server for SQLite), since databases - and their providers - have different capabilities; this is an example of such a difference. See our testing docs for more information on this.

Although there are some differences between capabilities, using SQLite makes it much easier for us to run test in parallel and faster. A lot of the time these differences are not even relevant for the unit-test. With support for decimals the differences between the data providers would be even less in our case.

@roji
Copy link
Member

roji commented Oct 26, 2023

We definitely don't think of SQLite as only a testing database - it's an absolutely important database in many scenarios. However, we do discourage using SQLite as a database fake (test double) when the application actually runs against a different database in production. This used to be much more necessary as testing against the real database used to be difficult, error-prone and slow; but today with containerization technologies, spinning up a containerized database is trivial and fast. I suggest people check out testcontainers, which really makes this trivial.

Beyond that, we do intend to improve decimal support for SQLite, but keep in mind that SQLite itself does not support a decimal data type, or provide any functions for working with them; that's why this isn't trivial. In addition, this issue has only received 13 votes, which means it is generally very low-priority for our users.

@bjowes
Copy link

bjowes commented Mar 15, 2024

We wanted to keep using SQLite with decimals in the DB, since in our test cases the lost precision was not critical. By applying these changes EF Core SQLite will happily convert your decimals to double and support all those missing operators:

In DbContext partial class:

public partial class MyEntities : DbContext 
{
    partial void OnModelCreatingPartial(ModelBuilder modelBuilder)
    {
        if (Database.ProviderName == "Microsoft.EntityFrameworkCore.Sqlite") {
            modelBuilder.Entity<MyTableEntity>(b =>
            {
                b.Property(e => e.OneDecimalProperty).HasConversion<double>();
                b.Property(e => e.AnotherDecimalProperty).HasConversion<double>();
            });
        }
    }
}

The condition ensures that the conversions are only applied when the DbContext is used by SQLite, meaning only in our unit tests.

A drawback is that you need to repeat this for all decimal properties that are included in non-supported operations (such as .Sum(), .Avg() ...).

@roji
Copy link
Member

roji commented Mar 15, 2024

@bjowes usually, the point in using decimal in the first place is to do high-accuracy math; if you're just converting all your decimals to double in the database, it may be better to just have double properties in the first place (i.e. what's the point of making them decimal?).

@bjowes
Copy link

bjowes commented Mar 19, 2024

Agreed @roji - for users with SQLite as their production database, my suggestion would make the behavior similar to using double in the database. However, I think the suggestion is still valuable for others who use SQLite for unit testing (or integration testing) only, and another provider for system tests and production where decimals are properly supported.

I see now that my comment was not clear on that this was the intended scenario.

@roji
Copy link
Member

roji commented Mar 19, 2024

@bjowes see my comment just above about it being a bad idea to use Sqlite (or any other database) as a test double. Integration testing - at least in my understanding of the term - would mean that you use Sqlite in production, in which case it again doesn't make much sense to me to use decimal but not care about precision.

@bjowes
Copy link

bjowes commented Mar 20, 2024

I have read that comment @roji but I don't agree. Even with testcontainers, spinning up an SQL Server database with isolation between the tests is still an operation in the order of seconds. That same operation for SQLite in-memory is in the order of milliseconds. This fact alone makes SQLite a feasible option for unit testing, while testcontainer is not - given that unit tests are often in the order of hundreds (or thousands).

As many others have pointed out in earlier comments, I think SQLite for testing provides a strong middle ground option. It is not exactly the same as the production database, but it is close enough to help and fast enough to be an option.

The alternative of mocking out the database completely (repository pattern, mocked context or some other way) is error prone in itself, since it leaves the implementation of acting as a database to the test developer. Detecting changes, resolving and updating references, checking constraints is all on you. Sure, it doesn't have to be fully SQL compliant. But when it isn't you can end up having plenty of tests that are testing against data which could never occur in production, or tests that pass although they will fail when constraints are properly checked.

@roji
Copy link
Member

roji commented Mar 20, 2024

Even with testcontainers, spinning up an SQL Server database with isolation between the tests is still an operation in the order of seconds. That same operation for SQLite in-memory is in the order of milliseconds.

The typical way to use a testcontainer database is indeed not to create a database for each test, but to reuse the same database across tests. That indeed raises the problem of test isolation, but there are well-known ways to deal with that which really aren't that hard (see our docs for discussion).

More importantly... Our repo is full of issues from people asking us to, simply put, make the InMemory provider (and the Sqlite provider) behave like SQL Server (or some other database). The pattern is always the same: people start out with the "minimal effort" solution of using InMemory/Sqlite as a database fake, because it's easy, fast and doesn't require thinking about isolation. In all but the most trivial cases, these people run into some place where InMemory/Sqlite simply isn't SQL Server, and their SQL Server queries either don't translate, or return different results. They ask us to make SQLite be case-insensitive with strings (because SQL Server is), implement SQL Server-style full-text search (which is impossible for us to do), or to somehow make the SQL Server-specific functions such as EF.Functions.DateDiff work. None of that is feasible, and it's certainly not the job of the EF SQLite provider to make SQLite behave like SQL Server; SQLite is a database in its own right, with its own behaviors and featureset.

The alternative is to invest a bit of time and effort up-front to handle test isolation; run all read-only tests against the same database (no isolation problems to begin with), run write tests within a transaction that can be rolled back, and for tests which require exercising code that manages transaction, clean and reseed the database between each test (tools such as Respawn can help there).

The alternative of mocking out the database completely (repository pattern, mocked context or some other way) is error prone in itself, since it leaves the implementation of acting as a database to the test developer. Detecting changes, resolving and updating references, checking constraints is all on you.

I think there may be a misunderstanding here; the alternative of a repository pattern isn't about mocking out the database (that's exactly what Sqlite/InMemory do), but rather mocking out your data layer. In other words, the idea is to cut EF out entirely from your application, and have high-level repository operations which encapsulate stuff like change detection (which is what EF does). That's the right pattern if you want pure unit tests, which verify that your code works correctly (and therefore EF shouldn't be involved at all, nor a fake database).

There's unfortunately no middle ground that really works (assuming any but the most trivial cases): you can either do pure unit tests with a repository pattern (fully mocked data access), or you can run integration tests involving your production database system, at which point you need to take care of isolation.

@selangley-wa
Copy link

In my opinion, one of the dumbest arguments in tech, regardless of the context, is that "My way is better than your way so we are going to do it my way".

There are always choices, tradeoffs, use cases, even experiments, that will satisfy some need.

If you are a vendor, you can choose your way. But in the world of open source, everyone is free to do things differently.

On the subject at hand, while I would like to see this request addressed, I perfectly understand the circumstances of a development team that has their own vision of how things should work as well as constraints on time and other resources that require them to make choices on what to support and enhance.

As long as the EF Core Team is willing to accept a reasonable patch or support an extension mechanism by which other parties can integrate their own solution, I see no problem.

Incidentally, there is somewhat of a hack for Sqlite to support a decimal type mentioned here:

NUMERIC and DECIMAL support for Sqlite #2887
launchbadge/sqlx#2887 (comment)

@njkoopman
Copy link

@bjowes The code that you posted code gave me an idea for our unit-test scenarios without having to mention all the columns specifically.

I put the code below at the end of OnModelCreating:

            if (Database.ProviderName == "Microsoft.EntityFrameworkCore.Sqlite")
            {
                foreach (Microsoft.EntityFrameworkCore.Metadata.IMutableEntityType entityType in builder.Model.GetEntityTypes())
                {
                    foreach (Microsoft.EntityFrameworkCore.Metadata.IMutableProperty property in entityType.GetProperties())
                    {
                        Type propertyType = Nullable.GetUnderlyingType(property.ClrType) ?? property.ClrType;
                        if (propertyType == typeof(Decimal))
                        {
                            property.SetProviderClrType(typeof(Double));//Decimals are treated as doubles
                        }
                    }
                }
            }

This means we cannot test with the complete precision in the unit-tests, but it means we can test all methods using sum, avg, etc. with certain precision, which is a big step forward and maybe even enough for our unit-testing purposes.

Thank you very much for this idea!

@ranma42
Copy link
Contributor

ranma42 commented May 5, 2024

Given that #26070 has been fixed, I would like to implement the aggregation functions as suggested in #19635 (comment)
Is that ok? Is anybody else already working on this?

@ajcvickers
Copy link
Contributor

@ranma42 I'll bring it up with the team.

@ajcvickers ajcvickers removed this from the Backlog milestone May 7, 2024
@ajcvickers ajcvickers removed the good first issue This issue should be relatively straightforward to fix. label May 13, 2024
@ajcvickers
Copy link
Contributor

@ranma42 We discussed this and we think you should be good to go, but @cincuranet is going to confirm.

ranma42 added a commit to ranma42/efcore that referenced this issue May 15, 2024
ranma42 added a commit to ranma42/efcore that referenced this issue May 15, 2024
ranma42 added a commit to ranma42/efcore that referenced this issue May 15, 2024
@cincuranet
Copy link
Contributor

I confirm.

ranma42 added a commit to ranma42/efcore that referenced this issue May 28, 2024
ranma42 added a commit to ranma42/efcore that referenced this issue May 28, 2024
@erwan-auror

This comment has been minimized.

@cincuranet

This comment has been minimized.

@ajcvickers ajcvickers added this to the Backlog milestone Jun 15, 2024
@JeffreySu

This comment has been minimized.

@roji

This comment has been minimized.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

No branches or pull requests