perf: Positions table index on Odometer and Car_ID to improve response time of some queries #3801

jheredianet · 2024-04-01T14:46:57Z

I've realized that some queries from Grafana dashboard that shows information from the odometer are very slow specially on databases with too much data logged and for those who has more than one car.

This response speed is greatly improved with an index on the "positions" table based on the "odometer" and "car_id" fields.

netlify · 2024-04-01T14:47:15Z

✅ Deploy Preview for teslamate ready!

Name	Link
🔨 Latest commit	`b9f340b`
🔍 Latest deploy log	https://app.netlify.com/sites/teslamate/deploys/660add5a40231b0008a5b178
😎 Deploy Preview	https://deploy-preview-3801--teslamate.netlify.app
📱 Preview on mobile	Toggle QR Code... Use your smartphone camera to open QR code link.

To edit notification comments on pull requests, go to your Netlify site configuration.

JakobLichterfeld · 2024-04-02T12:22:00Z

Thanks for the change. I appreciate the performance improvements as well.

I feel like we should change it in the first place and add your suggested migration in addition. @brianmay What do you think?

Dulanic · 2024-04-02T21:49:16Z

I had thought about this before, but adding index in positions means the index needs to be updated many times a second, this could possibly cause write issues for RPI etc... Not a problem for most, but I know we have a lot of RPI users too. What dashboard specifically are you noticing this wtih? Im trying to think of use cases where odometer is a useful index, but I don't have all the SQL up to look.

Also there is already a car_id index.

brianmay · 2024-04-02T22:35:15Z

adding index in positions means the index needs to be updated many times a second, this could possibly cause write issues for RPI

Some random thoughts:

"updated many times a second"? I don't think we get that many updates? I think most I have observed is about one update a second.
The rate of updates might go down even more if can't continue to use the streaming API.
We should be able to get some metrics that show just how much it will slow down on RPi. I think we can't make any performance based judgements without concrete before and after metrics (at least I have been to many conference talks that make this point...)

Dulanic · 2024-04-02T23:05:09Z

adding index in positions means the index needs to be updated many times a second, this could possibly cause write issues for RPI

Some random thoughts:

"updated many times a second"? I don't think we get that many updates? I think most I have observed is about one update a second.

The rate of updates might go down even more if can't continue to use the streaming API.

We should be able to get some metrics that show just how much it will slow down on RPi. I think we can't make any performance based judgements without concrete before and after metrics (at least I have been to many conference talks that make this point...)

Agree, I mean you're probably right, but I'd love to test the code some. So Ill await what dashboard is that slow first.

jheredianet · 2024-04-02T23:55:31Z

Agree, I mean you're probably right, but I'd love to test the code some. So, I'll await what dashboard is that slow first.

Hi, the odometer is used on several dashboards, but it goes unnoticed because they are usually time filtered (30 days, 90 days, etc). But in the Battery Health dashboard there is a panel for the trips to get the distance logged in TeslaMate and it's extremely slow, specially this query:

SELECT max(odometer) - min(odometer) as "Mileage" from positions where car_id = 1;

I have a database with more than 16 millions of records in the table "positions" (of 3 cars as shown in the following screenshot):

That query without this suggested index takes around 26 secs to run and with the index just 55ms (tested on a cloud server with 4 cores and 8GB of RAM).

You said in the previous comment:

Also there is already a car_id index.

Yes, it is, and it's necessary (since all dashboards have to filter by car), but at the same time there is another composite index based on "date, drive_id" which is also used a lot (even having already an index based on date).

That is why I'm proposing this composite index based on "odometer, car_id", which will be very useful especially for those who have more than one car in the database. If a simple index is created based only on the odometer field, it would work, but it would not be enough when there are more cars registered.

Dulanic · 2024-04-03T16:33:39Z

Sorry to push back, I don't think you understand indexing. Indexing is used to FIND records. I doubt any of the dashboards have odometer in a join or in the where statement. Indexes are not used to provide data *with a few exceptions, B-tree indexes allow for efficient searching, insertion, and deletion of data.

In your case, since you have such a large data set, if you want to add an index, go for it. It doesn't hurt anything most likely for you. But I really don't see a benefit to adding it to everyone.

For my above comment of B-Tree indexing not helping most dashboards, it's prob accurate. But for a MIN/MAX situation it likely helps because it can just check the MIN/MAX of the index. But for returning 1:1 values and most use cases, it would not benefit anything.

Also, most of the dashboards should be using the drive data NOT the positions data for min/max odometer readings. If they are not, they should be reviewed and possibly enhanced. Also, you can show the explain analyze for more indexing details and how the query optimizer works.

EX: select min(start_km), max(end_km)
from drives

https://explain-postgresql.com/archive/explain/b4250b3cd7cbef4f2d81032ac1fcef85:0:2024-04-03

using positions:
https://explain-postgresql.com/archive/explain/3550d3af35c9c4e9e8eeab7f49edc7d7:0:2024-04-03

jheredianet · 2024-04-03T18:50:47Z

Sorry to push back, I don't think you understand indexing. Indexing is used to FIND records. I doubt any of the dashboards have odometer in a join or in the where statement. Indexes are not used to provide data *with a few exceptions, B-tree indexes allow for efficient searching, insertion, and deletion of data.

In your case, since you have such a large data set, if you want to add an index, go for it. It doesn't hurt anything most likely for you. But I really don't see a benefit to adding it to everyone.

For my above comment of B-Tree indexing not helping most dashboards, it's prob accurate. But for a MIN/MAX situation it likely helps because it can just check the MIN/MAX of the index. But for returning 1:1 value and most use cases, it would not benefit anything.

Also, most of the dashboards should be using the drive data NOT the positions data for min/max odometer readings. If they are not, they should be reviewed and possibly enhanced. Also, you can show the explain analyze for more indexing details and how the query optimizer works.

EX: select min(start_km), max(end_km) from drives

https://explain-postgresql.com/archive/explain/b4250b3cd7cbef4f2d81032ac1fcef85:0:2024-04-03

using positions: https://explain-postgresql.com/archive/explain/3550d3af35c9c4e9e8eeab7f49edc7d7:0:2024-04-03

Hi, I understand what is an Index for and agree with you, but this is the use case:

In the battery health dashboard, we have this information:

Where:

"Logged" is the distance traveled that is saved on Teslamate database, and "Mileage" is the distance the car has traveled since using Teslamate.

These are the queries:

select ROUND(convert_km(sum(distance)::numeric, 'km'),0)|| ' km' as "Logged"
from drives 
where car_id = 1;

Any problem here, fast and clean.

SELECT ROUND(convert_km(max(odometer)::numeric, 'km'),0) || ' km' as "Odometer"
from positions where car_id =  1;

Without the index takes more than 20 seconds. With the index just milliseconds.
We could use this other query to get the same information (as in Overview dashboard), but it's still very slow if we don't filter the time:

SELECT ROUND(convert_km(odometer::numeric, 'km'),0) || ' km' as "Odometer" 
from positions WHERE car_id=1 ORDER BY "date" DESC LIMIT 1;

And the special one is this:

SELECT ROUND(convert_km((max(odometer) - min(odometer))::numeric, 'km'),0)|| ' km' as "Mileage"
from positions where car_id = 1;

Without the index takes almost a minute. With the index just milliseconds.

In both last queries we cannot use the drives table (of course it will be so fast), but, what about if there some unclosed drives? or if Teslamate was offline while driving. The real information about the odometer is only in the positions table, so we have to use it.

The case here is to show the user, about how much mileages have been not saved in the database, when for some reason a drive hasn't been fully recorded, for example due to a bug or an unexpected restart and that Teslamate has not been able to record, either due to lack of connection, areas without signal, or that it has been out of service.

I am open to hearing any improvements you can come up with to improve query performance for this data without the need of creating this proposed index.

brianmay · 2024-04-03T22:20:27Z

Explain is your friend here :-). Without the new index:

teslamate=# explain select ROUND(convert_km(sum(distance)::numeric, 'km'),0)|| ' km' as "Logged"
from drives
where car_id = 1;
                           QUERY PLAN
-----------------------------------------------------------------
 Aggregate  (cost=299.62..299.64 rows=1 width=32)
   ->  Seq Scan on drives  (cost=0.00..278.51 rows=8441 width=8)
         Filter: (car_id = 1)
(3 rows)

teslamate=# explain SELECT ROUND(convert_km(max(odometer)::numeric, 'km'),0) || ' km' as "Odometer"
from positions where car_id =  1;
                                              QUERY PLAN
-------------------------------------------------------------------------------------------------------
 Aggregate  (cost=305056.46..305056.48 rows=1 width=32)
   ->  Index Scan using positions_car_id_index on positions  (cost=0.44..304829.46 rows=90801 width=8)
         Index Cond: (car_id = 1)
(3 rows)

teslamate=# explain SELECT ROUND(convert_km(odometer::numeric, 'km'),0) || ' km' as "Odometer"
from positions WHERE car_id=1 ORDER BY "date" DESC LIMIT 1;
                                                   QUERY PLAN
----------------------------------------------------------------------------------------------------------------
 Limit  (cost=0.44..16.19 rows=1 width=40)
   ->  Index Scan Backward using positions_date_index on positions  (cost=0.44..1430761.48 rows=90801 width=40)
         Filter: (car_id = 1)
(3 rows)

teslamate=# explain SELECT ROUND(convert_km((max(odometer) - min(odometer))::numeric, 'km'),0)|| ' km' as "Mileage"
from positions where car_id = 1;
                                              QUERY PLAN
-------------------------------------------------------------------------------------------------------
 Aggregate  (cost=305283.46..305283.49 rows=1 width=32)
   ->  Index Scan using positions_car_id_index on positions  (cost=0.44..304829.46 rows=90801 width=8)
         Index Cond: (car_id = 1)
(3 rows)

Adding the new index:

$ mix ecto.migrate
Compiling 2 files (.ex)

08:45:44.916 [info] == Running 202404161637000 TeslaMate.Repo.Migrations.AddCompositeIndexToPositionsOnOdometer.change/0 forward

08:45:44.917 [info] create index positions_car_id_odometer_index

08:45:50.516 [info] == Migrated 202404161637000 in 5.5s

Running the above again:

teslamate=# explain select ROUND(convert_km(sum(distance)::numeric, 'km'),0)|| ' km' as "Logged"
from drives
where car_id = 1;
                           QUERY PLAN
-----------------------------------------------------------------
 Aggregate  (cost=299.62..299.64 rows=1 width=32)
   ->  Seq Scan on drives  (cost=0.00..278.51 rows=8441 width=8)
         Filter: (car_id = 1)
(3 rows)

teslamate=# explain SELECT ROUND(convert_km(max(odometer)::numeric, 'km'),0) || ' km' as "Odometer"
from positions where car_id =  1;
                                                               QUERY PLAN
-----------------------------------------------------------------------------------------------------------------------------------------
 Result  (cost=0.60..0.62 rows=1 width=32)
   InitPlan 1 (returns $0)
     ->  Limit  (cost=0.56..0.60 rows=1 width=8)
           ->  Index Only Scan Backward using positions_car_id_odometer_index on positions  (cost=0.56..643282.88 rows=18160116 width=8)
                 Index Cond: ((car_id = 1) AND (odometer IS NOT NULL))
(5 rows)

teslamate=# explain SELECT ROUND(convert_km(odometer::numeric, 'km'),0) || ' km' as "Odometer"
from positions WHERE car_id=1 ORDER BY "date" DESC LIMIT 1;
                                                    QUERY PLAN
------------------------------------------------------------------------------------------------------------------
 Limit  (cost=0.44..0.49 rows=1 width=40)
   ->  Index Scan Backward using positions_date_index on positions  (cost=0.44..972265.94 rows=18160116 width=40)
         Filter: (car_id = 1)
(3 rows)

teslamate=# explain SELECT ROUND(convert_km((max(odometer) - min(odometer))::numeric, 'km'),0)|| ' km' as "Mileage"
from positions where car_id = 1;
                                                                 QUERY PLAN
--------------------------------------------------------------------------------------------------------------------------------------------
 Result  (cost=1.20..1.22 rows=1 width=32)
   InitPlan 1 (returns $0)
     ->  Limit  (cost=0.56..0.60 rows=1 width=8)
           ->  Index Only Scan Backward using positions_car_id_odometer_index on positions  (cost=0.56..643282.88 rows=18160116 width=8)
                 Index Cond: ((car_id = 1) AND (odometer IS NOT NULL))
   InitPlan 2 (returns $1)
     ->  Limit  (cost=0.56..0.60 rows=1 width=8)
           ->  Index Only Scan using positions_car_id_odometer_index on positions positions_1  (cost=0.56..643282.88 rows=18160116 width=8)
                 Index Cond: ((car_id = 1) AND (odometer IS NOT NULL))
(9 rows)

The 1st is the exact same. It only touches the drives table. Hardly surprising.

The 2nd one has changed the "index scan" to a "index only scan". I think this means we get get the max odometer value straight from the index, instead of having the search every car_id=1 entry. Cheap.

The third query we are looking for every odometer entry, finding one with the latest date. The new index isn't helping here. Instead we do a search on the date index for the latest date, and then scan all of these entries for car_id=1. Which is going to be expensive if you have a lot of cars, but cheap if you only have 1 car. To me this looks like it should be improved if we had an index on (car_id, date) or (date, car_id).

The 4th case I think instead of scanning each any every entry sequentially, we can get the max and min values straight from the index.

Dulanic · 2024-04-03T23:53:58Z

So I am going to go back to my previous statement, wouldn't it be far more efficient just to use the drives for the query instead? A quick PR with a simple query change and that long query time go away. It made little difference for me, but bet it will be for you.

The only difference may be the dashboard WHILE you are driving. But honestly, really who does that? I think thats a edge case that really shouldnt matter.

One thing I dont mind is the query that orders by date :) That's a good use of indexing as the date field is indexed.

SELECT ROUND(convert_km((max(odometer) - min(odometer))::numeric, 'mi'),0)|| ' mi' as "Logged"
from positions where car_id = 3;

SELECT ROUND(convert_km((max(end_km) - min(start_km))::numeric, 'mi'),0)|| ' mi' as "Logged"
from drives where car_id = 3;

And why can't we use the other one? It seems a weird way to get the MAX record basically. I'm sure it was created a while ago, but that doesn't mean it's the best way to do it. Same results, much faster /w the 2nd.

SELECT ROUND(convert_km(odometer::numeric, 'km'),0) || ' km' as "Odometer" 
from positions WHERE car_id=3 ORDER BY "date" DESC LIMIT 1;

SELECT ROUND(convert_km(max(end_km)::numeric, 'km'),0) || ' km' as "Odometer" 
from drives WHERE car_id=3 ;

And the other can easily be done as:

SELECT ROUND(convert_km((max(odometer) - min(odometer))::numeric, 'km'),0)|| ' km' as "Mileage"
from positions where car_id = 3;


SELECT ROUND(convert_km((max(end_km) - min(start_km))::numeric, 'km'),0)|| ' km' as "Mileage"
from drives where car_id = 3;

Working with existing indexed can also make some of these queries MUCH faster.

EX: Changing

SELECT ROUND(convert_km((max(odometer) - min(odometer))::numeric, 'km'),0)|| ' km' as "Mileage"
    
from positions where car_id = 3;

to:

SELECT
    ROUND(convert_km(maxodo::numeric - minodo::numeric, 'km'),0)|| ' km' as "Mileage"
FROM
    (SELECT odometer AS maxodo  FROM  positions
        WHERE car_id=3
        ORDER BY "date" DESC
        LIMIT 1
    ) a
    CROSS JOIN (
        SELECT odometer AS minodo FROM positions
        WHERE car_id=3
        ORDER BY  "date"
        LIMIT 1
    ) b

made that query a lot faster. I threw together the code, so it's poorly written, but it gives an example of working with the existing indexing can make much of our existing queries MUCH fast.

jheredianet · 2024-04-04T07:56:10Z

I threw together the code, so it's poorly written, but it gives an example of working with the existing indexing can make much of our existing queries MUCH fast.

Thank you. You're right, it only makes a little difference of data using only the "drives" Table instead of using positions.
I'll make a PR for the Battery Health dashboard. Then we can close and deny this proposed index.

jheredianet · 2024-04-04T08:30:42Z

Ok, I've managed to include these query changes on this PR #3806

cwanja · 2024-04-05T19:49:39Z

This is an interesting dialog considering the issue that was raised here #3727

swiffer · 2024-10-20T08:03:47Z

a similar index has been created (ideal_battery_range_km) in #4200.

after 2 weeks of usage this index is within top 3 indexes used within PG.

in addition it solves issues like #4225 (while this has also been addressed by filtering for date via #4228).

in general dashboarding should follow these practices:

#4253 (comment)

jheredianet added 2 commits April 1, 2024 16:34

Add composite index to positions on odometer and car_id

8a402d5

Update timestamp

29ebf54

jheredianet added 3 commits April 1, 2024 17:56

update order

e44702b

Update formatting

e09ad7b

Fix formatting

b9f340b

JakobLichterfeld added enhancement New feature or request area:dashboard Related to a Grafana dashboard area:teslamate Related to TeslaMate core and removed area:dashboard Related to a Grafana dashboard labels Apr 2, 2024

JakobLichterfeld changed the title ~~Positions table index on Odometer and Car_ID~~ perf: Positions table index on Odometer and Car_ID Apr 2, 2024

JakobLichterfeld changed the title ~~perf: Positions table index on Odometer and Car_ID~~ perf: Positions table index on Odometer and Car_ID to improve response time of queries Apr 2, 2024

Dulanic self-assigned this Apr 3, 2024

JakobLichterfeld added WIP Work in progress and removed enhancement New feature or request labels Apr 3, 2024

JakobLichterfeld changed the title ~~perf: Positions table index on Odometer and Car_ID to improve response time of queries~~ perf: Positions table index on Odometer and Car_ID to improve response time of some queries Apr 4, 2024

jheredianet closed this Apr 5, 2024

jheredianet deleted the positions-table-indexes branch April 5, 2024 09:21

swiffer mentioned this pull request Oct 20, 2024

perf: exclude streaming data in visited dashboard #4287

Merged

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

perf: Positions table index on Odometer and Car_ID to improve response time of some queries #3801

perf: Positions table index on Odometer and Car_ID to improve response time of some queries #3801

jheredianet commented Apr 1, 2024

netlify bot commented Apr 1, 2024 •

edited

Loading

JakobLichterfeld commented Apr 2, 2024

Dulanic commented Apr 2, 2024 •

edited

Loading

brianmay commented Apr 2, 2024

Dulanic commented Apr 2, 2024

jheredianet commented Apr 2, 2024

Dulanic commented Apr 3, 2024 •

edited

Loading

jheredianet commented Apr 3, 2024

brianmay commented Apr 3, 2024

Dulanic commented Apr 3, 2024 •

edited

Loading

jheredianet commented Apr 4, 2024

jheredianet commented Apr 4, 2024

cwanja commented Apr 5, 2024

swiffer commented Oct 20, 2024

perf: Positions table index on Odometer and Car_ID to improve response time of some queries #3801

perf: Positions table index on Odometer and Car_ID to improve response time of some queries #3801

Conversation

jheredianet commented Apr 1, 2024

netlify bot commented Apr 1, 2024 • edited Loading

✅ Deploy Preview for teslamate ready!

JakobLichterfeld commented Apr 2, 2024

Dulanic commented Apr 2, 2024 • edited Loading

brianmay commented Apr 2, 2024

Dulanic commented Apr 2, 2024

jheredianet commented Apr 2, 2024

Dulanic commented Apr 3, 2024 • edited Loading

jheredianet commented Apr 3, 2024

brianmay commented Apr 3, 2024

Dulanic commented Apr 3, 2024 • edited Loading

jheredianet commented Apr 4, 2024

jheredianet commented Apr 4, 2024

cwanja commented Apr 5, 2024

swiffer commented Oct 20, 2024

netlify bot commented Apr 1, 2024 •

edited

Loading

Dulanic commented Apr 2, 2024 •

edited

Loading

Dulanic commented Apr 3, 2024 •

edited

Loading

Dulanic commented Apr 3, 2024 •

edited

Loading