Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Very slow deletion, multiple objects #3566

Closed
dkuaf opened this issue Apr 6, 2024 · 9 comments · Fixed by realm/realm-core#7582
Closed

Very slow deletion, multiple objects #3566

dkuaf opened this issue Apr 6, 2024 · 9 comments · Fixed by realm/realm-core#7582

Comments

@dkuaf
Copy link

dkuaf commented Apr 6, 2024

What happened?

I am playing around realm and trying to delete 100k items at once.

It takes for me around 20-25 seconds for that write to execute

Maybe my filtering is pretty bad. I followed this syntax: https://stackoverflow.com/questions/70905369/what-is-the-correct-syntax-to-write-a-filter-query-in-realm-net-using-the-in

Repro steps

Create a simple poco class that has PrimaryKey Guid

Create 100k instances and add to the realm.

Select all the ids and filter the realm on the ids.
Delete range with the result.

Expected result: sqllite does it in around 3-4 seconds
Actual result: 20-25 seconds

Version

11.7.0

What Atlas Services are you using?

Local Database only

What type of application is this?

Console/Server

Client OS and version

windows 10 19045

Code snippets

public partial class MyObject : IRealmObject
{
[PrimaryKey] public Guid Id { get; set; }
}
Delete(Guid[] ids)
{
            using Realm realm = Realm.GetInstance(_configuration);
    var realmDelete = new Stopwatch();
    realmDelete.Start();

    var firstId = ids[0].ToString();

    var sb= new StringBuilder();
    sb.Append($"Id == uuid({firstId})");
    for (int i = 1; i < ids.Length; i++)
    {
        sb.Append(" OR ");
        sb.Append("Id == uuid(");
        sb.Append(ids[i]);
        sb.Append(")");
    }
    realm.Write(() =>
    {
        var toDelete = realm.All<MyObject>().Filter(sb.ToString());
        realm.RemoveRange(toDelete);
    });

    realmDelete.Stop();
    Console.WriteLine($"realm deleted. Elapsed time: {realmDelete.ElapsedMilliseconds} ms"); // i get like 20000ms here
}

Stacktrace of the exception/crash you're getting

-

Relevant log output

-
Copy link

sync-by-unito bot commented Apr 6, 2024

➤ PM Bot commented:

Jira ticket: RNET-1131

@nirinchev
Copy link
Member

My guess is that just parsing the query is taking the majority of the time here. I don't expect this is a case we'd like to actively optimize for, but if you're curious, you could try and measure how long var toDelete = realm.All<MyObject>().Filter(sb.ToString()); takes vs realm.RemoveRange.

@sync-by-unito sync-by-unito bot added the Waiting-For-Reporter Waiting for more information from the reporter before we can proceed label Apr 7, 2024
@dkuaf
Copy link
Author

dkuaf commented Apr 7, 2024

My guess is that just parsing the query is taking the majority of the time here. I don't expect this is a case we'd like to actively optimize for, but if you're curious, you could try and measure how long var toDelete = realm.All<MyObject>().Filter(sb.ToString()); takes vs realm.RemoveRange.

Thank you for a quick answer and sorry for the bad formatting in OP.

It does seem it actual is realm.RemoveRange that is slow: I should have tested the filtering before saying that. The filtering looks pretty good.

var removeRangeSw = new Stopwatch();
var filterSw = new Stopwatch();
filterSw.Start();

var firstId = ids[0].ToString();`

var sbb = new StringBuilder();
sbb.Append($"Id == uuid({firstId})");
for (int i = 1; i < ids.Length; i++)
{
    sbb.Append(" OR ");
    sbb.Append("Id == uuid(");
    sbb.Append(ids[i]);
    sbb.Append(")");
}

realm.Write(() =>
{
    var toDelete = realm.All<MyObject>().Filter(sbb.ToString());
    filterSw.Stop();

    removeRangeSw.Start();
    realm.RemoveRange(toDelete);
    removeRangeSw.Stop();
});

Console.WriteLine($"Elapsed time filtering: {filterSw.ElapsedMilliseconds} ms");
Console.WriteLine($"Elapsed time remove range: {removeRangeSw.ElapsedMilliseconds} ms");
Console.WriteLine($"Total deletion time {filterSw.ElapsedMilliseconds + removeRangeSw.ElapsedMilliseconds} ms");

//Output:
//Elapsed time filtering: 451 ms
//Elapsed time remove range: 17655 ms
//Total deletion time 18106 ms

@github-actions github-actions bot added Needs-Attention Reporter has responded. Review comment. and removed Waiting-For-Reporter Waiting for more information from the reporter before we can proceed labels Apr 7, 2024
@dkuaf
Copy link
Author

dkuaf commented Apr 7, 2024

just a little correction. I have previosly tested on a more advanced class. I now tested on the class that I wrote here and I get total deletion time of around 10000ms

@nirinchev
Copy link
Member

I was able to somewhat reproduce this with the following code:

using System.Diagnostics;
using Realms;

using var realm = Realm.GetInstance("RemoveRange.realm");
var ids = Enumerable.Range(0, 100_000).Select(_ => Guid.NewGuid()).ToArray();

realm.Write(() =>
{
    foreach (var id in ids)
    {
        realm.Add(new PrimaryKeyGuidObject
        {
            Id = id
        });
    }
});

var sw = new Stopwatch();
sw.Start();

var query = string.Join(" OR ", ids.Select(i => $"Id == uuid({i})"));
Console.WriteLine($"Construct string query: {sw.ElapsedMilliseconds}");

var results = realm.All<PrimaryKeyGuidObject>().Filter(query);
Console.WriteLine($"Construct Realm query: {sw.ElapsedMilliseconds}");

realm.Write(() =>
{
    realm.RemoveRange(results);
    Console.WriteLine($"RemoveRange: {sw.ElapsedMilliseconds}");
});

Console.WriteLine($"Commit: {sw.ElapsedMilliseconds}");


public partial class PrimaryKeyGuidObject : IRealmObject
{
    [PrimaryKey]
    public Guid Id { get; set; }
}

On my M1 mac, this prints out:

Construct string query: 25
Construct Realm query: 288
RemoveRange: 7726
Commit: 7742

I.e. the removal takes about 7.5 seconds. It's not amazing, but it's not the end of the world either, so I don't expect we'll go out of our way to optimize this use case. I'll reach out to the core database team and see if they spot any low-hanging fruit that could speed it up.

@nirinchev
Copy link
Member

Another interesting observation - if instead of constructing a massive query and deleting all objects that match it, you delete the objects 1 by 1, this is much faster. Essentially, by replacing the second .Write with:

foreach (var id in ids)
{
    realm.Remove(realm.Find<PrimaryKeyGuidObject>(id)!);
}

I get the whole operation to complete in 350 ms.

@dkuaf
Copy link
Author

dkuaf commented Apr 7, 2024

I was able to somewhat reproduce this with the following code:

using System.Diagnostics;
using Realms;

using var realm = Realm.GetInstance("RemoveRange.realm");
var ids = Enumerable.Range(0, 100_000).Select(_ => Guid.NewGuid()).ToArray();

realm.Write(() =>
{
    foreach (var id in ids)
    {
        realm.Add(new PrimaryKeyGuidObject
        {
            Id = id
        });
    }
});

var sw = new Stopwatch();
sw.Start();

var query = string.Join(" OR ", ids.Select(i => $"Id == uuid({i})"));
Console.WriteLine($"Construct string query: {sw.ElapsedMilliseconds}");

var results = realm.All<PrimaryKeyGuidObject>().Filter(query);
Console.WriteLine($"Construct Realm query: {sw.ElapsedMilliseconds}");

realm.Write(() =>
{
    realm.RemoveRange(results);
    Console.WriteLine($"RemoveRange: {sw.ElapsedMilliseconds}");
});

Console.WriteLine($"Commit: {sw.ElapsedMilliseconds}");


public partial class PrimaryKeyGuidObject : IRealmObject
{
    [PrimaryKey]
    public Guid Id { get; set; }
}

On my M1 mac, this prints out:

Construct string query: 25
Construct Realm query: 288
RemoveRange: 7726
Commit: 7742

I.e. the removal takes about 7.5 seconds. It's not amazing, but it's not the end of the world either, so I don't expect we'll go out of our way to optimize this use case. I'll reach out to the core database team and see if they spot any low-hanging fruit that could speed it up.

thank you for reply

I just tried this class ("PrimaryKeyGuidObject/MyObject") with sqlite (Microsoft.Data.Sqlite). I takes 682ms. So Realm is about 11x times slower in delete range.

It is weird that inserts are must faster than deletes?

@nirinchev
Copy link
Member

I agree - this appears to be hitting some weirdness with how the query is evaluated and maintained throughout the deletion. Not sure if you saw my follow-up comment, but looking up objects and deleting them one by one seems to be, unintuitively, much faster than passing down a query.

@dkuaf
Copy link
Author

dkuaf commented Apr 7, 2024

Another interesting observation - if instead of constructing a massive query and deleting all objects that match it, you delete the objects 1 by 1, this is much faster. Essentially, by replacing the second .Write with:

foreach (var id in ids)
{
    realm.Remove(realm.Find<PrimaryKeyGuidObject>(id)!);
}

I get the whole operation to complete in 350 ms.

I agree - this appears to be hitting some weirdness with how the query is evaluated and maintained throughout the deletion. Not sure if you saw my follow-up comment, but looking up objects and deleting them one by one seems to be, unintuitively, much faster than passing down a query.

very nice, I can confirm this. Now we are seeing more reasonable ms, and even faster than sqlite which is resonating with other comparisons I have tried.

Thank you for this tip! very useful.

however i didnt expect RemoveRange to be so slow. maybe there is some weirdness going on there

@sync-by-unito sync-by-unito bot removed the Needs-Attention Reporter has responded. Review comment. label Apr 22, 2024
@sync-by-unito sync-by-unito bot assigned ironage and unassigned nirinchev Apr 22, 2024
@github-actions github-actions bot locked as resolved and limited conversation to collaborators May 24, 2024
Sign up for free to subscribe to this conversation on GitHub. Already have an account? Sign in.
Projects
None yet
Development

Successfully merging a pull request may close this issue.

3 participants