Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

feat: add max speed & speed histogram to drive stats #4253

Merged
merged 7 commits into from
Oct 29, 2024

Conversation

js94x
Copy link
Contributor

@js94x js94x commented Oct 9, 2024

I have implemented long term statistics about maximum speed and a speed histogram (already known from "Drive Details" dashboard).

Imho a good place for that kind of statistics is the "Drive Stats" dashboard.

For speed histogram I added a "hack" to hide standing times (speed == 0) at traffic lights or while parking -> see last WHERE condition. Otherwise the bar/spike at the left side of histogram would be even bigger.

SELECT 
    speed_section_$length_unit AS "Speed",
    SUM(seconds_elapsed) * 100 / MAX(duration) as "Elapsed",
    TO_CHAR((SUM(seconds_elapsed) || ' second')::interval, 'HH24:MI:SS') AS "Time"
FROM (
    SELECT
        ROUND(convert_km(p.speed::numeric, '$length_unit') / 10,0) * 10 AS speed_section_$length_unit,
        EXTRACT(EPOCH FROM (LEAD(p."date") OVER (ORDER BY p."date") - p."date")) AS seconds_elapsed,
        EXTRACT(EPOCH FROM (end_date - start_date)) AS duration
    FROM drives d
    INNER JOIN positions p ON p.drive_id = d.id
    WHERE d.car_id = $car_id AND $__timeFilter(d.start_date)
) AS drivedata
WHERE speed_section_$length_unit > 0
GROUP BY 1
ORDER BY 1

See screenshot for more details:

skitch_export

Copy link

netlify bot commented Oct 9, 2024

Deploy Preview for teslamate ready!

Name Link
🔨 Latest commit 5d9dcb3
🔍 Latest deploy log https://app.netlify.com/sites/teslamate/deploys/6720074ff6751b0008e50af5
😎 Deploy Preview https://deploy-preview-4253--teslamate.netlify.app
📱 Preview on mobile
Toggle QR Code...

QR Code

Use your smartphone camera to open QR code link.

To edit notification comments on pull requests, go to your Netlify site configuration.

@swiffer
Copy link
Contributor

swiffer commented Oct 9, 2024

Hi @js94x - thanks for this suggestions. to ensure dashboards & panels are loading quick could you add an explain analyze for the query mentioned to see index usage & query execution time?

@js94x
Copy link
Contributor Author

js94x commented Oct 9, 2024

Hi @swiffer ,

sure, I just added explain analyze in front of the whole query and issued the query on psql CLI, find the outputs attached. fyi: I am running Teslamate on a docker host with 4 CPU cores & 4 GB RAM.

max speed query

teslamate=# explain analyze SELECT max(speed) AS speed_kmh
FROM positions
WHERE car_id = '1' AND date BETWEEN '2023-10-09T20:20:50.849Z' AND '2024-10-09T20:20:50.849Z';
                                                                                            QUERY PLAN                                                                                            
--------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------
 Finalize Aggregate  (cost=39555.95..39555.96 rows=1 width=2) (actual time=119.718..123.784 rows=1 loops=1)
   ->  Gather  (cost=39555.73..39555.94 rows=2 width=2) (actual time=119.612..123.776 rows=3 loops=1)
         Workers Planned: 2
         Workers Launched: 2
         ->  Partial Aggregate  (cost=38555.73..38555.74 rows=1 width=2) (actual time=109.068..109.069 rows=1 loops=3)
               ->  Parallel Seq Scan on positions  (cost=0.00..36665.64 rows=756037 width=2) (actual time=0.008..87.262 rows=604294 loops=3)
                     Filter: ((date >= '2023-10-09 20:20:50.849'::timestamp without time zone) AND (date <= '2024-10-09 20:20:50.849'::timestamp without time zone) AND (car_id = '1'::smallint))
 Planning Time: 0.323 ms
 Execution Time: 123.813 ms
(9 rows)

histogram query

--------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------
 Sort  (cost=207701.97..207702.47 rows=200 width=96) (actual time=3389.331..3389.333 rows=20 loops=1)
   Sort Key: drivedata.speed_section_km
   Sort Method: quicksort  Memory: 26kB
   ->  HashAggregate  (cost=207687.83..207694.33 rows=200 width=96) (actual time=3389.295..3389.320 rows=20 loops=1)
         Group Key: drivedata.speed_section_km
         Batches: 1  Memory Usage: 48kB
         ->  Subquery Scan on drivedata  (cost=0.72..203151.61 rows=604829 width=96) (actual time=10.814..3013.453 rows=1697374 loops=1)
               Filter: (drivedata.speed_section_km > '0'::numeric)
               Rows Removed by Filter: 85841
               ->  WindowAgg  (cost=0.72..180470.51 rows=1814488 width=104) (actual time=10.806..2883.487 rows=1783215 loops=1)
                     ->  Nested Loop  (cost=0.72..116963.43 rows=1814488 width=26) (actual time=10.784..1254.866 rows=1783215 loops=1)
                           ->  Index Scan using positions_date_index on positions p  (cost=0.43..71381.04 rows=1814488 width=14) (actual time=0.014..611.027 rows=1812882 loops=1)
                           ->  Memoize  (cost=0.29..0.31 rows=1 width=20) (actual time=0.000..0.000 rows=1 loops=1812882)
                                 Cache Key: p.drive_id
                                 Cache Mode: logical
                                 Hits: 1811893  Misses: 989  Evictions: 0  Overflows: 0  Memory Usage: 116kB
                                 ->  Index Scan using trips_pkey on drives d  (cost=0.28..0.30 rows=1 width=20) (actual time=0.002..0.002 rows=1 loops=989)
                                       Index Cond: (id = p.drive_id)
                                       Filter: ((start_date >= '2023-10-09 20:16:27.717'::timestamp without time zone) AND (start_date <= '2024-10-09 20:16:27.717'::timestamp without time zone) AND (car_id = '1'::smallint))
 Planning Time: 0.388 ms
 JIT:
   Functions: 24
   Options: Inlining false, Optimization false, Expressions true, Deforming true
   Timing: Generation 0.995 ms, Inlining 0.000 ms, Optimization 0.439 ms, Emission 10.316 ms, Total 11.751 ms
 Execution Time: 3390.428 ms
(25 rows)

@js94x
Copy link
Contributor Author

js94x commented Oct 10, 2024

I made a fix to the query to display real percentage values and changed color to a static one.

static_color

New query:

query
WITH TotalDuration AS (
    SELECT 
        SUM(seconds_elapsed) AS total_elapsed_time
    FROM (
        SELECT
            ROUND(convert_km(p.speed::numeric, '$length_unit') / 10, 0) * 10 AS speed_section_$length_unit,
            EXTRACT(EPOCH FROM (LEAD(p."date") OVER (ORDER BY p."date") - p."date")) AS seconds_elapsed
        FROM drives d
        INNER JOIN positions p ON p.drive_id = d.id
        WHERE d.car_id = $car_id AND $__timeFilter(d.start_date)
    ) AS drivedata
    WHERE speed_section_$length_unit > 0
)
SELECT 
    speed_section_$length_unit AS "Speed",
    SUM(seconds_elapsed) * 100 / (SELECT total_elapsed_time FROM TotalDuration) AS "Elapsed",  -- Normalized percentage
    TO_CHAR((SUM(seconds_elapsed) || ' second')::interval, 'HH24:MI:SS') AS "Time"
FROM (
    SELECT
        ROUND(convert_km(p.speed::numeric, '$length_unit') / 10, 0) * 10 AS speed_section_$length_unit,
        EXTRACT(EPOCH FROM (LEAD(p."date") OVER (ORDER BY p."date") - p."date")) AS seconds_elapsed,
        EXTRACT(EPOCH FROM (end_date - start_date)) AS duration
    FROM drives d
    INNER JOIN positions p ON p.drive_id = d.id
    WHERE d.car_id = $car_id AND $__timeFilter(d.start_date)
) AS drivedata
WHERE speed_section_$length_unit > 0
GROUP BY 1
ORDER BY 1

Unfortunately the overall query time increased a bit (checked with explain analyze). @swiffer I saw you tuned some queries here #4252, maybe you can give me a hint how to optimize my speed histogram query?

@js94x
Copy link
Contributor Author

js94x commented Oct 10, 2024

Just changed the query again (after asked chatgpt to optimize it); now it is much shorter and results in better performance.

WITH drivedata AS (
    SELECT
        ROUND(convert_km(p.speed::numeric, '$length_unit') / 10, 0) * 10 AS speed_section_$length_unit,
        EXTRACT(EPOCH FROM (LEAD(p."date") OVER (ORDER BY p."date") - p."date")) AS seconds_elapsed
    FROM drives d
    INNER JOIN positions p ON p.drive_id = d.id
    WHERE d.car_id = $car_id AND $__timeFilter(d.start_date)
),
TotalDuration AS (
    SELECT SUM(seconds_elapsed) AS total_elapsed_time
    FROM drivedata
    WHERE speed_section_$length_unit > 0
)
SELECT 
    speed_section_$length_unit AS "Speed",
    SUM(seconds_elapsed) * 100 / (SELECT total_elapsed_time FROM TotalDuration) AS "Elapsed", 
    TO_CHAR((SUM(seconds_elapsed) || ' second')::interval, 'HH24:MI:SS') AS "Time"
FROM drivedata
WHERE speed_section_$length_unit > 0
GROUP BY speed_section_$length_unit
ORDER BY speed_section_$length_unit;
explain analyze
                                                                                                      QUERY PLAN                                                                                                      
----------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------
 Sort  (cost=257600.42..257600.92 rows=200 width=96) (actual time=3958.840..3958.844 rows=20 loops=1)
   Sort Key: drivedata.speed_section_km
   Sort Method: quicksort  Memory: 26kB
   CTE drivedata
     ->  WindowAgg  (cost=0.72..171398.07 rows=1814488 width=72) (actual time=11.939..2646.071 rows=1783215 loops=1)
           ->  Nested Loop  (cost=0.72..116963.43 rows=1814488 width=10) (actual time=11.921..1248.612 rows=1783215 loops=1)
                 ->  Index Scan using positions_date_index on positions p  (cost=0.43..71381.04 rows=1814488 width=14) (actual time=0.019..616.673 rows=1812882 loops=1)
                 ->  Memoize  (cost=0.29..0.31 rows=1 width=4) (actual time=0.000..0.000 rows=1 loops=1812882)
                       Cache Key: p.drive_id
                       Cache Mode: logical
                       Hits: 1811893  Misses: 989  Evictions: 0  Overflows: 0  Memory Usage: 101kB
                       ->  Index Scan using trips_pkey on drives d  (cost=0.28..0.30 rows=1 width=4) (actual time=0.002..0.002 rows=1 loops=989)
                             Index Cond: (id = p.drive_id)
                             Filter: ((start_date >= '2023-10-10 05:09:50.766'::timestamp without time zone) AND (start_date <= '2024-10-10 05:09:50.766'::timestamp without time zone) AND (car_id = '1'::smallint))
   InitPlan 2 (returns $2)
     ->  Aggregate  (cost=42338.06..42338.07 rows=1 width=32) (actual time=349.595..349.595 rows=1 loops=1)
           ->  CTE Scan on drivedata drivedata_1  (cost=0.00..40825.98 rows=604829 width=32) (actual time=0.018..230.080 rows=1697374 loops=1)
                 Filter: (speed_section_km > '0'::numeric)
                 Rows Removed by Filter: 85841
   ->  HashAggregate  (cost=43850.12..43856.62 rows=200 width=96) (actual time=3958.802..3958.830 rows=20 loops=1)
         Group Key: drivedata.speed_section_km
         Batches: 1  Memory Usage: 48kB
         ->  CTE Scan on drivedata  (cost=0.00..40825.98 rows=604829 width=64) (actual time=11.950..3226.430 rows=1697374 loops=1)
               Filter: (speed_section_km > '0'::numeric)
               Rows Removed by Filter: 85841
 Planning Time: 0.385 ms
 JIT:
   Functions: 32
   Options: Inlining false, Optimization false, Expressions true, Deforming true
   Timing: Generation 1.010 ms, Inlining 0.000 ms, Optimization 0.453 ms, Emission 11.505 ms, Total 12.969 ms
 Execution Time: 3964.892 ms
(31 rows)

@JakobLichterfeld JakobLichterfeld added the area:dashboard Related to a Grafana dashboard label Oct 12, 2024
@swiffer
Copy link
Contributor

swiffer commented Oct 12, 2024

@js94x - yes, the query provided by chatgpt is better cause within yours the positions are queried twice.

Basically, right now there are some rules to follow

  • querying from positions is expensive as this table is 98% of all collected data
    • there are indexes for car, drive and date that help improving query time (if only looking for detailed data per drive / car / short period)
    • if accurate data is needed (Streaming API / High Resolution) there is no other way to processing the data the way you currently do
    • you could add a condition that excludes Streaming API data (adding ideal_battery_range IS NULL) and therefore is way faster (but less accurate) - the index needed is in master but not yet released

I opened a discussion to evaluate TimescaleDB instead of PostgreSQL - this hopefully results in better performance and adds additional possibilities (continous aggregates / new time series related functions / ...).

if excluding streaming data - query exec time is ~36 times faster in my case

WITH drivedata AS (
    SELECT
        ROUND(convert_km(p.speed::numeric, 'km') / 10, 0) * 10 AS speed_section_km,
        EXTRACT(EPOCH FROM (LEAD(p."date") OVER (ORDER BY p."date") - p."date")) AS seconds_elapsed
    from positions p
    WHERE p.car_id = '2' AND p.date > '2023-10-12' and p.ideal_battery_range_km is not null
),
TotalDuration AS (
    SELECT SUM(seconds_elapsed) AS total_elapsed_time
    FROM drivedata
    WHERE speed_section_km > 0
)
SELECT 
    speed_section_km AS "Speed",
    SUM(seconds_elapsed) * 100 / (SELECT total_elapsed_time FROM TotalDuration) AS "Elapsed", 
    TO_CHAR((SUM(seconds_elapsed) || ' second')::interval, 'HH24:MI:SS') AS "Time"
FROM drivedata
WHERE speed_section_km > 0
GROUP BY speed_section_km
ORDER BY speed_section_km;

@js94x
Copy link
Contributor Author

js94x commented Oct 12, 2024

Thanks. I understand your concerns / rules.

you could add a condition that excludes Streaming API data (adding ideal_battery_range IS NULL) and therefore is way faster (but less accurate) - the index needed is in master but not yet released

Sounds like a good compromise. I currently do not plan to clone my database, switch all docker images to master branch and test the new column. So, would it possible for you to run both queries in psql with your data?

One query with p.ideal_battery_range_km is not null and the other without? Just to see how much both results differ in terms of accurancy.

@swiffer
Copy link
Contributor

swiffer commented Oct 13, 2024

you can validate using your dataset (simply add / remove the additional query condition in your drivedata CTE and p.ideal_battery_range_km is not null).

you can expect the query incl. the condition to be ~ 30x faster once the next release is out.
we've only added the index, the column and data is already there.

@js94x
Copy link
Contributor Author

js94x commented Oct 14, 2024

At my first try I got an error like column p.ideal_battery_range_km not found, this was the reason why I asked you to check for me. I´m a bit puzzled why it works now. I probably had a typo or something similar the first time i tried it.

comparison

Most of data is looking fine for me, so I will add the suggested when condition to the sql query soon.
But nevertheless, I can not explain why the bars at 30 and 120kmh has been swapped 😄

@JakobLichterfeld
Copy link
Collaborator

@swiffer what do you think about this PR after your changes?

@swiffer
Copy link
Contributor

swiffer commented Oct 22, 2024

values do differ (incl. vs excl. streaming data) - i would vote for dashboard loading speed over accuracy in this case.

  • is the graph still colorized? that would bring it in line with drive details speed histrogram.
  • please use "Speed Histogram ($speed_unit)" as panel title to bring it in line with Drive Details.

@js94x
Copy link
Contributor Author

js94x commented Oct 28, 2024

thanks @swiffer for the feedback. I agree and changed both little things.
final version:
Screenshot 2024-10-28 at 22 54 27

@js94x js94x marked this pull request as ready for review October 28, 2024 21:56
@swiffer
Copy link
Contributor

swiffer commented Oct 29, 2024

👍 looks good to me, approved

Copy link
Collaborator

@JakobLichterfeld JakobLichterfeld left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks so much!

@JakobLichterfeld JakobLichterfeld merged commit 3f32617 into teslamate-org:master Oct 29, 2024
15 checks passed
@sdwalker
Copy link
Contributor

sdwalker commented Oct 30, 2024

Missing convert_km()s for mph units and speed_mih should be speed_mph imo

image

Fixed with #4323

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
area:dashboard Related to a Grafana dashboard
Projects
None yet
Development

Successfully merging this pull request may close these issues.

4 participants