This service simulates a robot moving around a space.
Given this input to the web API:
POST /robots/commands/
{
"start": {
"x": 10,
"y": 22
},
"commands": [
{
"direction": "east",
"steps": 2
},
{
"direction": "north",
"steps": 1
}
]
}
Response:
{
"startPoint": {
"x": 10,
"y": 22
},
"commands": [
{
"direction": "East",
"steps": 2
},
{
"direction": "North",
"steps": 1
}
],
"id": "7450e353-6f62-403b-b7f4-fc5a9f22a603",
"createdAt": "2023-01-29T20:58:12.765874+00:00",
"modifiedAt": "2023-01-29T20:58:12.765874+00:00",
"deletedAt": null
}
The robot should move like this:
The robot covers 4 coordinates on its route.
You can then get result of the execution with:
GET /robots/commands/7450e353-6f62-403b-b7f4-fc5a9f22a603/executions
Response:
[
{
"id": "e60ffb2a-017b-45b9-aab0-1752f5a2572f",
"result": 4,
"duration": "00:00:00.0064170"
}
]
- DotNet 6.0
- Postgres 10.8 (for running the API)
- Docker Engine 17.09.0+ (if you want to build from the
docker-compose.yml
)
Either run natively if you have dotnet installed or use Docker:
dotnet run --configuration Release --project CleaningRobotService.Console
or:
docker compose run --rm console
This will also run any dependencies such as Postgres if you haven't already separately ran that service. You can comment out the depends_on
section in the compose file if need be.
dotnet run --configuration Release --project CleaningRobotService.Web
or
docker compose up web
App__DatabaseConnectionString
- configure the Postgres connection string
These will be run automatically if you run any services that depend on them.
docker compose up -d postgres
dotnet test
or
docker compose run --rm test
dotnet run --configuration Release --project CleaningRobotService.Benchmarks
or
docker compose run --rm benchmark
Methods like RobotGrid_CalculatePointsVisited_10
indicate that the grid size is 10. Different grid sizes will impact memory usage.
GenerateCommands_SpiralIn
generates a perfect spiral which hits every point in a 500 width/height grid exactly once
| Method | Mean | Error | StdDev | Ratio | RatioSD | Allocated | Alloc Ratio |
|------------------------------------- |----------:|----------:|---------:|------:|--------:|------------:|------------:|
| RobotPoints_CalculatePointsVisited | 47.82 ms | 2.970 ms | 1.965 ms | 1.00 | 0.00 | 12233.83 KB | 1.000 |
| RobotLines_CalculatePointsVisited | 648.68 ms | 12.627 ms | 7.514 ms | 13.55 | 0.62 | 20.53 KB | 0.002 |
| RobotGrid_CalculatePointsVisited_10 | 37.35 ms | 1.534 ms | 1.015 ms | 0.78 | 0.04 | 2398.87 KB | 0.196 |
| RobotGrid_CalculatePointsVisited_30 | 30.08 ms | 0.702 ms | 0.464 ms | 0.63 | 0.03 | 847.29 KB | 0.069 |
| RobotGrid_CalculatePointsVisited_100 | 29.34 ms | 0.436 ms | 0.288 ms | 0.61 | 0.03 | 414.74 KB | 0.034 |
| RobotGrid_CalculatePointsVisited_500 | 28.32 ms | 0.225 ms | 0.118 ms | 0.59 | 0.03 | 277.86 KB | 0.023 |
The RobotGrid_CalculatePointsVisited_500
uses very little memory because it has a Grid size of 500, so the entire Grid is used.
GenerateCommands_LoopOffset
moves in a loop which is offset by 1 point (north 3, east 3, south 2, west 2) x 10,000
| Method | Mean | Error | StdDev | Ratio | RatioSD | Allocated | Alloc Ratio |
|------------------------------------- |-----------:|-----------:|-----------:|-------:|--------:|-----------:|------------:|
| RobotPoints_CalculatePointsVisited | 3.028 ms | 0.1850 ms | 0.1224 ms | 1.00 | 0.00 | 657.47 KB | 1.00 |
| RobotLines_CalculatePointsVisited | 828.450 ms | 34.3731 ms | 22.7357 ms | 273.84 | 9.52 | 198.12 KB | 0.30 |
| RobotGrid_CalculatePointsVisited_10 | 4.306 ms | 0.2630 ms | 0.1740 ms | 1.43 | 0.11 | 580.72 KB | 0.88 |
| RobotGrid_CalculatePointsVisited_30 | 4.227 ms | 0.1495 ms | 0.0989 ms | 1.40 | 0.06 | 554.47 KB | 0.84 |
| RobotGrid_CalculatePointsVisited_100 | 4.779 ms | 0.2059 ms | 0.1362 ms | 1.58 | 0.07 | 864.62 KB | 1.32 |
| RobotGrid_CalculatePointsVisited_500 | 8.909 ms | 0.4659 ms | 0.3081 ms | 2.95 | 0.16 | 2809.36 KB | 4.27 |
I tried many different algorithms for calculating where the robot has been:
Simple algorithm which stores each Point the robot has been at and stores it in a HashSet<Point>
.
Pros:
- fast and simple with small datasets
Cons:
- memory usage is too high
- each point uses 16 bytes of memory (2 ints, 8 bytes each)
- if every possible point was visited and stored: 200,000² possible points * 16 bytes in a point = 640GB
- HashSet requires an extra 12 bytes for each item in the collection so the actual memory usage would be even higher
Rather than storing each Point, each path is stored as a Line which has a Start and End Point. Each Point the Robot then visits can be checked to see if it's on any other Line.
Pros:
- uses very little memory
- each Line requires 2 Points which requires 32 bytes of memory
- if there are 10,000 commands, there should be 10,000 Lines, which would use 0.32MB of memory (not including collection overhead)
Cons:
- very slow speed
- each Point has to be compared against each Line
Improvements?
- maybe
List<Line>
is not the best data structure for comparisons
Similar to RobotLines
but the search time of finding if a Point is on a pre-existing Line is reduced by putting the Lines into a Dictionary and only searching through Lines that are on the same x-axis or y-axis as this Point.
Pros:
- uses very little memory (same as RobotLines)
Cons:
- reasonably fast speed
- each Point only has to be checked against a subset of Lines
Rather than storing where the Robot has been using a Point, create a 2D array called a Grid
which stores booleans and uses the index of the columns/rows to know the X and Y coordinates.
This Grid has a width of 3:
0 1 2
0 [ ][ ][ ]
1 [ ][ ][ ]
2 [ ][ ][ ]
If the Robot starts at [0,0], goes to [2,0], then [2,2], the data structure would look like this:
0 1 2
0 [x][x][x]
1 [ ][ ][x]
2 [ ][ ][x]
However, objects in C# cannot have a size larger than ~2GB. To have a Grid which could store every Point on it, I would need at least 5GB of memory to store the booleans alone (200,000² bits).
Therefore, I can use multiple Grids, lined up in rows and columns. These Grids can be added as needed to avoid unnecessary memory usage.
If the Robot starts at [0,0], goes to [5,0], and my Grid size is 3, 2 Grids will be used:
0 1 2 0 1 2
0 [x][x][x] 0 [x][x][ ]
1 [ ][ ][ ] 1 [ ][ ][ ]
2 [ ][ ][ ] 2 [ ][ ][ ]
Each Grid can then store its offset.
Pros:
- tends to use less memory with large datasets
- reasonably fast
Cons:
- data structure unable to store the order of the points visited
- a good Grid width needs to be used to reduce memory usage
- in my tests, I found a width of 30 to fit well
- still wastes memory for storing unvisited points
Migrations are handled by EntityFramework.
First install the tool (the config for local tools is stored in .config/dotnet-tools.json
:
dotnet tool restore
To add a new migration:
dotnet ef migrations add MigrationName --project CleaningRobotService.DataPersistence --context ServiceDbContext
Migrations are run by this service as it first starts up. This means when you release a code change which also requires a database change, these will be released at the same time.
If you have multiple instances of this service running at the same time, and they're released one by one, older versions of code may run into errors if the database is updated by another instance. When you're releasing, you need to be careful to either:
- ensure that migrations don't take too long to run (which would stop so many errors in production from occurring)
- or your database changes should not break existing code (this may require multiple releases of migrations)
Patterns used:
- Repository pattern - repositories are exposed from the DataPersistence project so the rest of the application doesn't know which ORM is being used underneath (this is a facade)
- Service pattern - for business logic
- Dependency injection - easier developing and testing (if mocking is needed)
Patterns not used:
- Unit of Work - could consider using to allow for transactional updates to the db
POSTing some commands to the CommandController will store those commands in the database along with an Execution. The actual logic that then runs the robot around the space will then run in a separate thread as it could take some time - the Execution will be updated with the result once finished.
Benefits:
- the API won't hang during any long running logic
- if there's a bug in the calculations, they can be debugged easier (by looking at what commands were sent), they can then be fixed and a new run can be initiated
This ORM isn't the fastest (being about twice as slow as Dapper) but has some nice features:
- handles migrations
- no need to write SQL (and I love LINQ to SQL), so if your models update, the generated SQL updates too
- code is fairly agnostic about which db is really being used
Downsides:
- special features in specific databases can sometimes be hard to use
While running database queries against a real instance of Postgres will somewhat slow down the tests, this is the best way to simulate real conditions in a unit test.
You can use an in memory database, but it doesn't act the same way as a real database.
https://learn.microsoft.com/en-us/ef/core/testing/#involving-the-database-or-not
The in-memory provider will not behave like your real database in many important ways. Some features cannot be tested with it at all (e.g. transactions, raw SQL..), while other features may behave differently than your production database (e.g. case-sensitivity in queries). While in-memory can work for simple, constrained query scenarios, it is highly limited and we discourage its use.
Example: CommandRobotDto and Execution
Both of these classes are almost identical, but they're separate because the API does not need to reflect how the database is built. It's possible that the API may need to change in the future to be different to how the database is structured, so may as well create different classes now to allow this.
Mapping between the DTOs and db models is manually done. AutoMapper could be used instead, but I'm not a fan of how AutoMapper magically does this, and this can cause runtime errors rather than issues being caught at compile time.
Since this service would likely be deployed to system of microservices, it could be that other services need to be aware of Executions, so I would publish them to a queue to other services could asynchronously consume any events.
I'd also add a Uuid column to the executions
table which would be published instead of the integer ID.
Right now, anyone can command the robot.
Currently, each test which interacts with the database actually makes changes. This means if some tests ran in parallel, they could run into each other.
Running the application without properly setting up env vars can cause issues. This can happen quite often when a developer forgets to add new env vars to the production environment after testing a feature on a staging env and releasing. The application should fail to start if env vars are incorrectly configured. Systems like Kubernetes can also be configured to only take down older instances of services once the new instance has correctly started. This can avoid bad releases.
https://learn.microsoft.com/en-us/aspnet/core/mvc/controllers/filters?view=aspnetcore-6.0
Personally I like Drone CI but Travis CI is okay too.
- create some kind of system for putting benchmarks in the readme
- add large dataset for testing to repo
- add enforced styleguide
- better logging
- set CreateAt/ModifiedAt before save
- implement deletion of models and handle it in repo
- use async/await for db stuff
- fixing crazy mappings between all types