Skip to content

Commit

Permalink
Article: How to get status of MongoDB operation (#9)
Browse files Browse the repository at this point in the history
  • Loading branch information
AxelUser authored Apr 20, 2022
1 parent aa698cf commit aa2f430
Show file tree
Hide file tree
Showing 3 changed files with 149 additions and 11 deletions.
16 changes: 8 additions & 8 deletions content/005-dictionary-on-generics/index.md
Original file line number Diff line number Diff line change
Expand Up @@ -13,15 +13,15 @@ legacy: true

## Disclaimer

1. This article shows how to simulate dictionary behavior with generic static classes. However, **the way to this solution goes through other examples with lots of design details** to make you familiar with the situation. If you're interested only in "hacking" part, you may go directly to the section [**Implementing a generic-based cached producer**](#implementing-a-generic-based-cached-producer).
1. This article shows how to simulate dictionary behavior with generic static classes. However, **the way to this solution goes through other examples with lots of design details** to make you familiar with the situation. If you're interested only in "hacking" part, you may go directly to the section [Implementing a generic-based cached producer](#implementing-a-generic-based-cached-producer).
2. In code examples I've used **Nullable Reference Types**, which is a new feature from **C# 8**. They don't affect the performance and definitely not a main point of the article. If you're curious, check the [documentation](https://docs.microsoft.com/en-us/dotnet/csharp/nullable-references).
3. All code is available on [GitHub](https://github.com/AxelUser/examples/tree/master/DotNet/DictionaryOfTypes).

## Task: create a factory for REST clients

When you are integrating different services into each other, it's always a very time-consuming process to write clients for all of them. Luckily, if those RESTful services provide their API schema in **OpenAPI** (or previously named **Swagger**) format, chances are great that there's a generator of clients for this common type of schema format.

.Net has several packages for client generation, for example [**NSwag**](https://github.com/RicoSuter/NSwag). There are different opinions on how generated clients should be look like, but let's consider that their constructors receive *HttpClient* instance for sending requests and classes themselves are derived from generated interfaces, containing all public methods for the API.
.Net has several packages for client generation, for example [NSwag](https://github.com/RicoSuter/NSwag). There are different opinions on how generated clients should be look like, but let's consider that their constructors receive *HttpClient* instance for sending requests and classes themselves are derived from generated interfaces, containing all public methods for the API.

The first requirement helps to manipulate *HttpClient* creation and lifetime, which means that we can even reuse one from the pool. The second requirement will be handy, when it's needed to write unit-tests for code, that uses service's clients - in that case they must be mocked and mocking in .Net's frameworks "mostly" requires passing an interface.

Expand Down Expand Up @@ -55,7 +55,7 @@ Although clients are implementing their own interfaces, it's still hard to test

However sometimes it's necessary to control base url to your service or to dynamically pass some values into request's headers, like authorization tokens or distributed tracing ids. So it may be preferred to pass a valid *HttpClient* manually and that's why for the sake of the article let's stick to this format.

The most appropriate way of extracting object construction into dedicated dependency is implementing a [**Factory**](https://refactoring.guru/design-patterns/factory-method) for clients. Unfortunately, all clients implement different interfaces and it isn't possible to write base interface as returned value for the factory method. However it's still possible to invoke the creation of specific client by redesigning the factory into generic class.
The most appropriate way of extracting object construction into dedicated dependency is implementing a [Factory](https://refactoring.guru/design-patterns/factory-method) for clients. Unfortunately, all clients implement different interfaces and it isn't possible to write base interface as returned value for the factory method. However it's still possible to invoke the creation of specific client by redesigning the factory into generic class.

Let's discuss possible interface:

Expand All @@ -66,7 +66,7 @@ public interface IClientFactory<out T> where T: class
}
```

Why it's preferred to make whole class as generic and not just the method `Create`? If it will be only a generic method, the factory will be similar to the [**Service Locator**](https://blog.ploeh.dk/2010/02/03/ServiceLocatorisanAnti-Pattern/), which has some maintainability issues and hides the information which clients the outer code depends on.
Why it's preferred to make whole class as generic and not just the method `Create`? If it will be only a generic method, the factory will be similar to the [Service Locator](https://blog.ploeh.dk/2010/02/03/ServiceLocatorisanAnti-Pattern/), which has some maintainability issues and hides the information which clients the outer code depends on.

Here is an example:

Expand All @@ -86,7 +86,7 @@ As factory should create clients of specific types, there are some more question
1. How factory should invoke a constructor of the concrete client?
2. How factory should effectively guess object of which class should be created, if only interface is passed to the generic type parameter?

Solution for the first question is quite trivial - invoking constructor via handy static helper [**Activator.CreateInstance**](https://docs.microsoft.com/en-gb/dotnet/api/system.activator.createinstance?view=netcore-3.1#System_Activator_CreateInstance_System_Type_System_Object___). Internally it's an old friend reflection does all the job, but activator provides a simpler API.
Solution for the first question is quite trivial - invoking constructor via handy static helper [Activator.CreateInstance](https://docs.microsoft.com/en-gb/dotnet/api/system.activator.createinstance?view=netcore-3.1#System_Activator_CreateInstance_System_Type_System_Object___). Internally it's an old friend reflection does all the job, but activator provides a simpler API.

For the second problem another reflection-based mechanism should be involved. As I mentioned above, mocking frameworks for .Net work better, if they create mocks that implement base interfaces. Thus the factory method should expose client's interface in returned value. It can be easily achieved with the help of generic type parameter, but nevertheless factory method should create an object of the concrete class.

Expand Down Expand Up @@ -168,7 +168,7 @@ Even so, what about making this mechanism by ourselves? If you're interested, I'

Before we dig into optimizations, **it's HIGHLY recommended to track the performance of made solutions**. As we are dealing with isolated modules, micro-benchmarking will suit our needs.

The easiest way to create benchmarks of that kind is using a popular nuget package [**BenchmarkDotNet**](https://benchmarkdotnet.org/). I won't include in the article how to write good benchmarks for every situation, because this theme is quite vast. However, if you're not familiar with benchmarking or BenchmarkDotNet, you may follow the links to BenchmarkDotNet documentation at the section **References**.
The easiest way to create benchmarks of that kind is using a popular nuget package [BenchmarkDotNet](https://benchmarkdotnet.org/). I won't include in the article how to write good benchmarks for every situation, because this theme is quite vast. However, if you're not familiar with benchmarking or BenchmarkDotNet, you may follow the links to BenchmarkDotNet documentation at the section **References**.

Frankly speaking, I shall mention that maintainers of the BenchmarkDotNet did a great job in providing an easy API for creating benchmarks, which gives ability to include lots of useful indicators and will be clear to the most of .Net developers.

Expand All @@ -191,7 +191,7 @@ To show how many attempts were performed, BenchmarkDotNet has an ability to use
public int Accesses { get; set; }
```

Another useful feature is making benchmark for original solution as [**baseline**](https://benchmarkdotnet.org/articles/features/baselines.html). It is used to display the ratio of how speed of other benchmarks differs from the baseline.
Another useful feature is making benchmark for original solution as [baseline](https://benchmarkdotnet.org/articles/features/baselines.html). It is used to display the ratio of how speed of other benchmarks differs from the baseline.

Alright, now everything is ready to write the code of the first benchmark:

Expand Down Expand Up @@ -341,7 +341,7 @@ This trick is mostly inspired by the way how ["Array.Empty<T>"](https://docs.mic

Empty arrays are best candidates for caching, because their construction doesn't require any parameters, but only a generic type parameter.

When you invoke `Array.Empty<MyClass>`, it internally invokes a static read-only field `Empty` of static generic class `EmptyArray<MyClass`, which initializes and returns an empty array of type *MyClass (have a look at [sources](https://github.com/dotnet/runtime/blob/3705185af806e273ccef98e44699400f0416c452/src/libraries/System.Private.CoreLib/src/System/Array.cs#L694-L704))*. Static field is initialized during the time of a first access to the field of the class *EmptyArray*. This is guaranteed from the fact how generics and static classes work in **CLR** (Common Language Runtime). For your information, that's how you can implement a [simple thread-safe singleton](https://csharpindepth.com/articles/singleton) in .Net.
When you invoke `Array.Empty<MyClass>`, it internally invokes a static read-only field `Empty` of static generic class `EmptyArray<MyClass`, which initializes and returns an empty array of type `MyClass` (have a look at [sources](https://github.com/dotnet/runtime/blob/3705185af806e273ccef98e44699400f0416c452/src/libraries/System.Private.CoreLib/src/System/Array.cs#L694-L704)). Static field is initialized during the time of a first access to the field of the class *EmptyArray*. This is guaranteed from the fact how generics and static classes work in **CLR** (Common Language Runtime). For your information, that's how you can implement a [simple thread-safe singleton](https://csharpindepth.com/articles/singleton) in .Net.

## How CLR compiles generic classes

Expand Down
123 changes: 123 additions & 0 deletions content/006-how-to-get-status-of-mongodb-operation/index.md
Original file line number Diff line number Diff line change
@@ -0,0 +1,123 @@
---
date: "2022-04-20"
tags:
- "MongoDB"
title: "How to get status of MongoDB operation"
preview: "Simple solution for polling status of long-running MongoDB queries."
draft: false
---

Sometimes you may need to inspect the status of a running DB query. It may be a profiling or even some part of polling mechanism for asynchronous operations. My case was the second one, let's discuss details and implementation.

My intention was to make a background service, that handles data-retention. Not going much in details, but it should handle multiple requests and should be tolerant to failures during deletion handling. That's why it should store states of running operations, which may be checked during failure recovery or regular reboot/deployment.

A good way to receive requests is using some message broker, for example Kafka. Service will receive messages with `JobId` and condition which data to delete. After deletion is completed, service will commit message. If service will be restarted or fails - it will receive the uncommitted message again.

Straightforward solution is to store state in another MongoDB collection, but I thought that it may be redundant. The only need of that state is to tell if operation was completed, and if not - is it running or failed.

Most of the databases with which I've dealt has special tables or views with information about all running queries. MongoDB is not an exception, it has special query [db.currentOp()](https://www.mongodb.com/docs/manual/reference/method/db.currentOp/), which returns document with all running queries.

This API have limitations, caused by MongoDB specifics, so there's a more modern way of retrieving running queries - [$currentOp](https://www.mongodb.com/docs/manual/reference/operator/aggregation/currentOp/) stage for aggregation pipeline. It works as regular stage and can be combined with other aggregation features, like projection, grouping, etc. So, we will stick with this one.

There are several things to mention.

Firstly, aggregation pipeline with this stage should be run on `admin` collection - you need a special user to access it via your application.

Secondly, this command returns operation that are started on specific MongoDB node. I've got sharded cluster, and I need to run `$currentOp` on the router, that started specific delete operation. It's also not a big deal - you can run this query against all routers in parallel and check if any has it.

The last thing is that you need to distinguish delete operations stated by your service from normal operations. In my case all those data-retention tasks has `JobId`, which is a unique key for operation. All I need is a way to mark MongoDB queries with this key.

If we look through the output format for `$currentOp`, we will notice that it has a [comment](https://www.mongodb.com/docs/manual/reference/command/currentOp/#mongodb-data-currentOp.command) field, which can be attached when command is started. Some queries (e.g `find`) support [$comment](https://www.mongodb.com/docs/manual/reference/operator/query/comment/) operator, but the most universal way to pass a comment is to run a query via [database command](https://www.mongodb.com/docs/manual/reference/command/#database-commands).

I advise you to have a look, because this API provides many interesting features, for example [delete command](https://www.mongodb.com/docs/manual/reference/command/delete/#mongodb-dbcommand-dbcmd.delete) has limit option, which may be used to delete large amount of data by parts.

Alright, let's try some MongoDB shell.

Now when we start the delete operation we can pass `JobId` into comment:
```js
db.runCommand({
"delete": "Events",
"ordered": false,
"comment": "job:blog-test",
"deletes": [{
"q": {"clientId": 0},
"limit": 0
}]
})
```

And fetch status via aggregation query:
```js
db.aggregate([
{"$currentOp": {"localOps": true}},
{"$match": {"command.comment": "job:blog-test"}},
{"$limit": 1}
])
```

If operation with such comment is running on the current node (or router), we will receive single document, such like the one below:
```json
{
"type" : "op",
"host" : "f0fde895fb50:27017",
"desc" : "conn65",
"connectionId" : 65,
"client" : "172.18.0.1:57176",
"appName" : "MongoDB Shell",
"clientMetadata" : {
"application" : {
"name" : "MongoDB Shell"
},
"driver" : {
"name" : "MongoDB Internal Client",
"version" : "4.2.6-18-g6cdb6ab"
},
"os" : {
"type" : "Windows",
"name" : "Microsoft Windows 8",
"architecture" : "x86_64",
"version" : "6.2 (build 9200)"
},
"mongos" : {
"host" : "f0fde895fb50:27017",
"client" : "172.18.0.1:57176",
"version" : "4.4.11"
}
},
"active" : true,
"currentOpTime" : "2022-04-19T22:01:50.629+00:00",
"opid" : 996,
"lsid" : {
"id" : UUID("e42b457e-bc01-4ffa-83dc-343f1f6ea351"),
"uid" : { "$binary" : "47DEQpj8HBSa+/TImW+5JCeuQeRkm5NMpJWZG3hSuFU=", "$type" : "00" }
},
"secs_running" : NumberLong(3),
"microsecs_running" : NumberLong(3406911),
"op" : "remove",
"ns" : "testdb.Events",
"command" : {
"delete" : "Events",
"ordered" : false,
"comment" : "job:blog-test",
"lsid" : {
"id" : UUID("e42b457e-bc01-4ffa-83dc-343f1f6ea351")
},
"$clusterTime" : {
"clusterTime" : Timestamp(1650405661, 34),
"signature" : {
"hash" : { "$binary" : "AAAAAAAAAAAAAAAAAAAAAAAAAAA=", "$type" : "00" },
"keyId" : NumberLong(0)
}
},
"$readPreference" : {
"mode" : "secondaryPreferred"
},
"$db" : "testdb"
},
"numYields" : 0,
"waitingForLatch" : {
"timestamp" : ISODate("2022-04-19T22:01:47.323Z"),
"captureName" : "ProducerConsumerQueue::_mutex"
}
}
```
21 changes: 18 additions & 3 deletions src/styles.module.css
Original file line number Diff line number Diff line change
Expand Up @@ -4,6 +4,7 @@
--dark-3: rgb(120, 120, 120);
--white: rgb(210, 210, 210);
--red: rgb(175, 13, 13);
--yellow: rgb(218, 165, 32);
--main-width: 700px;
}

Expand Down Expand Up @@ -72,7 +73,11 @@ footer {
}

.bio a {
color: var(--red);
color: var(--yellow);
}

.bio a:hover {
border-bottom: 1px solid var(--yellow);
}

.bio p {
Expand Down Expand Up @@ -130,14 +135,19 @@ footer {

.article > h1 {
margin-top: 1.5rem;
color: var(--yellow);
}

.article .text {
border-bottom: 1px dotted var(--dark-2);
}

.article a {
color: var(--red);
color: var(--yellow);
}

.article a:hover {
border-bottom: 1px solid var(--yellow);
}

.meta {
Expand Down Expand Up @@ -184,7 +194,12 @@ footer {

.notFound a {
margin-top: 3rem;
color: var(--red);
color: var(--yellow);
border-bottom: 1px solid transparent;
}

.notFound a:hover {
border-bottom: 1px solid var(--yellow);
}

/* Announcement */
Expand Down

0 comments on commit aa2f430

Please sign in to comment.