Skip to content

note 2: Archiving db tuyathome.messages

Marco Sillano edited this page Nov 16, 2023 · 46 revisions

For my needs (climate, photovoltaic...) I have identified 3 basic levels of data reorganization and archiving: view, historical series 5 minutes and historical series 1 hour.

Views

groups data from different devices, concerning the same theme, with a record at regular intervals (5 minutes, 1 hour)
A view is created from scratch every time, starting from the tuyathome.messages data, so they are always up to date. It is the view's task to transform data from the device's DPs into tables of values that are easy to understand and use, discarding some data and processing others.
It is important that they are calculated quickly because they are also used as a basis for the other tables, and this depends on various factors, on which action can be taken to optimize performance:

  • by the server hardware
  • by the number of fields in the view, and the size of the view (I limit views to the last 24 hours).
  • by the size of tuyathome.messages table (see do_purge)

note: The SELECT in a View suffers from some limitations, for example, it is not possible to define user variables (@variable) inside a View, and this limits the data processing.

Currently, I use the following views:

EXAMPLE#1 meteo_data_view

This 5-minute view reorganizes the weather data of the last 24H, from the "messages" table. The query uses various aggregation functions to calculate the maximum and average values for meteorological parameters. Obviously, it must be adapted to the devices present: use it as a model for custom views.

    -- Create a view for weather data analysis
      CREATE VIEW meteo_data_view AS
      SELECT
    -- This statement is used to round the timestamp to the nearest 5 minutes (timestamp).
         FROM_UNIXTIME((UNIX_TIMESTAMP(min(`tuyathome`.`messages`.`timestamp`)) DIV 300) * 300) AS `timestamp`,

    -- Maps `device-id.dps` to fields, using  AVG() (or other functions) to process data.
    -- The CASE WHEN statements are used to filter and extract specific parameter values 
    -- selection based on the dps column, plus device-id column if required.
         MAX(CASE WHEN `tuyathome`.`messages`.`dps` = '_day' THEN `tuyathome`.`messages`.`value` END) AS `day`,
         MAX(CASE WHEN `tuyathome`.`messages`.`dps` = '_temperature' THEN `tuyathome`.`messages`.`value` END) AS `temp`,
    -- The ROUND and AVG functions calculate the rounded average values for certain weather parameters, with control on decimal digits.
         ROUND(AVG(CASE WHEN `tuyathome`.`messages`.`dps` = '207' THEN `tuyathome`.`messages`.`value` END), 1) AS `PMtemp`,
         ROUND(AVG(CASE WHEN `tuyathome`.`messages`.`dps` = '208' THEN `tuyathome`.`messages`.`value` END), 1) AS `PMhum`,

    -- more statements like the previous ones as required (17 in my case)...

         FROM
            `tuyathome`.`messages`
    -- The WHERE clause filters the data based on the specified device IDs, and only last 24H
         WHERE
            (`tuyathome`.`messages`.`device-id` = '_Auriol_IAN_114435' OR
             `tuyathome`.`messages`.`device-id` = '_PM1_USB') AND
             `tuyathome`.`messages`.`timestamp` > current_timestamp() - interval 1 day
    -- The GROUP BY clause groups the data by the rounded timestamp: 5m -> 300s
         GROUP BY
            unix_timestamp(`tuyathome`.`messages`.`timestamp`) DIV 300
         ORDER BY
            `tuyathome`.`messages`.`timestamp` DESC
    -- The LIMIT clause excludes the first line (now, it is incomplete) and limits to 24h data.
             LIMIT 1, 290
  • The code has been optimized, with tests by MariaDB 10.11.2 on H96 MAX 3318 top-box with 17 columns in view and 28000 records in messages table: The query took 0.8647 seconds (not optimized: 2.2125 seconds).
  • In this case, the selection takes place only on the dps because the two devices used here. i.e. '_Auriol_IAN_114435' and '_PM1_USB', have dps with all different IDs. Otherwise, the WHEN condition must check both device-id and dps.
  • A specific of TuyaDAEMON is that the names can be changed at any time: therefore use only device-id and dps in the conditions (at the limit you can use _name, where the underscore reminds you that they must not be changed ).
  • The time indicated in each record (timestamp) is the initial one: the data refer to the following 5-minute interval.

EXAMPLE#2 Energy_data_view

I use this 5-minute view to reorganize the AC/DC energy data from 2 devices: an AC meter on the home panel (the billed energy) and a smart photovoltaic inverter(plug&play solar energy). The query uses various aggregation functions to calculate tension, current, and power for GRID, PV_out, PV_in, plus cumulate energy (kWh). Obviously, it must be adapted to the devices present: use it as a model for custom views.

-- Create a view to retrieve aggregated (5 min) energy data from the 'messages' table
CREATE VIEW `energy_data_view` AS

-- Select the following columns for the view:
SELECT 
    -- Convert the minimum timestamp to a rounded value in 5-minute intervals: 5m -> 300s
    FROM_UNIXTIME(((UNIX_TIMESTAMP(MIN(`msgs`.`timestamp`)) DIV 300) * 300)) AS `timestamp`,
    
    -- Aggregate and calculate average values for GRID_W, GRID_V, and GRID_I (dps created by [EXPLODE node](https://github.com/msillano/tuyaDAEMON/wiki/ver.-2.0--milestones#early-data-process-rtavg-explode))
    ROUND(AVG((CASE WHEN (`msgs`.`dps` = '_6.W') THEN `msgs`.`value` END)), 1) AS `GRID_W`,
    ROUND(AVG((CASE WHEN (`msgs`.`dps` = '_6.V') THEN `msgs`.`value` END)), 1) AS `GRID_V`,
    ROUND(AVG((CASE WHEN (`msgs`.`dps` = '_6.I') THEN `msgs`.`value` END)), 3) AS `GRID_I`,
    
    -- Find the maximum value of GRID_KWH for 'mainAC' device and '1' dps
    MAX((CASE WHEN ((`msgs`.`device-name` = 'mainAC') AND (`msgs`.`dps` = '1')) THEN `msgs`.`value` END)) AS `GRID_KWH`,
    
    -- Aggregate and calculate average values for PV_ACW. 
    -- The wattmeter in the inverter does not differentiate between input and output power, as a workaround I subtract 3.0W (inverter's own consumption).
    ROUND(AVG((CASE WHEN (`msgs`.`device-name` = 'inverter_p&p') AND (`msgs`.`dps` = '10') THEN `msgs`.`value` END)), 1) - 3.0 AS `PV_ACW`,

   -- Aggregate and calculate average values for PV_ACV and PV_ACI
     ROUND(AVG((CASE WHEN (`msgs`.`device-name` = 'inverter_p&p') AND (`msgs`.`dps` = '104') THEN `msgs`.`value` END)), 1) AS `PV_ACV`,
    ROUND(AVG((CASE WHEN (`msgs`.`device-name` = 'inverter_p&p') AND (`msgs`.`dps` = '105') THEN `msgs`.`value` END)), 3) AS `PV_ACI`,
    
    -- Find the maximum value of PV_KWH for 'Cumulative_kWh' dp-name
    MAX((CASE WHEN (`msgs`.`device-name` = 'inverter_p&p') AND (`msgs`.`dps` = '2') THEN `msgs`.`value` END)) AS `PV_KWH`,

    -- Calculate the product of DC_V and DC_I as PV_DCW
    ROUND(( AVG((CASE WHEN (`msgs`.`device-name` = 'inverter_p&p') AND (`msgs`.`dps` = '103') THEN `msgs`.`value` END)) * AVG((CASE WHEN (`msgs`.`device-name` = 'inverter_p&p') AND (`msgs`.`dps` = '102') THEN `msgs`.`value` END))), 1) AS `PV_DCW`,
    
    -- Aggregate and calculate average values for PV_DCV and PV_DCI
    ROUND(AVG((CASE WHEN (`msgs`.`device-name` = 'inverter_p&p') AND (`msgs`.`dps` = '103') THEN `msgs`.`value` END)), 1) AS `PV_DCV`,
    ROUND(AVG((CASE WHEN (`msgs`.`device-name` = 'inverter_p&p') AND (`msgs`.`dps` = '102')) THEN `msgs`.`value` END)), 3) AS `PV_DCI`

FROM `tuyathome`.`messages` `msgs`

-- Filter the messages based on 2 devices AND timestamp
WHERE (((`msgs`.`device-name` = 'mainAC') OR (`msgs`.`device-name` = 'inverter_p&p')) AND (`msgs`.`timestamp` > (NOW() - INTERVAL 1 DAY)))

-- Group the results by timestamp in 5-minute intervals
GROUP BY (UNIX_TIMESTAMP(`msgs`.`timestamp`) DIV 300)

-- Order the results by timestamp in descending order
ORDER BY `timestamp` DESC

-- Limit the number of results: eliminate the first row (incomplete) and cut to 290 (a day).
LIMIT 1, 290;

Time series '5m'

I use them above all for daily charts, '5min' is a good compromise between detail and data size (288 records/day) for historical series of data. They are standard tables, which contain the data present in a view, which can be updated at any time with a stored procedure, however at least once every 24 hours to have a complete series.

EXAMPLE#3 ConsolidateDataFromView

This is a general-purpose simple procedure, that you can use with any view when no data processing is needed:

    -- This stored procedure consolidates data from a view into a table.
    -- Parameters:
    --   - `viewName`: The name of the view containing the data to be consolidated, also with db, e.g. 'tuyathome.meteodata'
    --   - `tableName`: The name of the table where the consolidated data will be stored, e.g. 'historical.meteo_5min'
    -- Notes:
    --   - If `tableName` does not exist, it will be created by copying the `viewName` table.
    --   - The field 'timestamp' is required and must be UNIQUE.

DELIMITER //

CREATE PROCEDURE ConsolidateDataFromView(IN viewName VARCHAR(255), IN tableName VARCHAR(255))
BEGIN
    -- Check if the table exists in the target database
    -- COLLATE clause added to catch an error on my DB.
    SELECT COUNT(*) INTO @tableExists FROM information_schema.TABLES
          WHERE CONCAT(TABLE_SCHEMA,'.',TABLE_NAME) = tableName COLLATE utf8_general_ci;
    
    IF @tableExists = 0 THEN
        -- The table doesn't exist, create it in the target database
        -- Using prepared statements to allow parameters as table names in the SQL.
        SET @createTableSql := CONCAT('CREATE TABLE ', tableName, ' AS SELECT * FROM ', viewName);
        PREPARE stmt FROM @createTableSql;
        EXECUTE stmt;
        SELECT ROW_COUNT() AS "Added rows";
        DEALLOCATE PREPARE stmt;
    ELSE
    
    -- Get the maximum time from the table
    SET @getMaxTimexSql := CONCAT('SELECT MAX(`timestamp`) INTO @maxTimex FROM ', tableName);
    PREPARE stmt FROM @getMaxTimexSql;
    EXECUTE stmt;
    DEALLOCATE PREPARE stmt;
   
    -- In case of an empty table:   
    SET  @maxTimex := IFNULL(@maxTimex, CURRENT_TIMESTAMP - INTERVAL 1 DAY);  
  
    -- Copy only the records from the view with timestamp greater than maxTimex
    SET @selectSql := CONCAT('INSERT INTO ', tableName, ' SELECT * FROM ', viewName, ' WHERE `timestamp` > @maxTimex');
    PREPARE stmt FROM @selectSql;
    EXECUTE stmt;
    SELECT ROW_COUNT() AS "Added rows";
    DEALLOCATE PREPARE stmt;
    END IF;
END //

DELIMITER ;

notes:

  • This procedure is for general use and very simple, limiting itself to copying the data.
  • Custom versions can perform more complex operations on the data if needed.
  • When we create a tableName from a view, it is advisable to edit the structure of the table, add at least one index, and redefine some fields, because the 'types' in a view are created automatically so they are not always optimal.

EXAMPLE#4 energyViewToPowerTable

A different example of specialized Consolidate that I wrote to process some data while consolidating. In this case, some cumulative fields can be NULL in the View, and this procedure propagates the last known value, replacing NULLs: it is important that the output cumulative energy values (GRID_KWH and PV_KWH) are monotonic increasing.

    -- This stored procedure consolidates data from the 'energy' view into the 'power5m' table
    -- Parameters:
    --   - This is a specialized procedure, parameters are hard coded.
    -- Notes:
    --   - If the `power5m` table does not exist, it will be created from the `energy` view.
    --   - The field 'timestamp' is required and must be UNIQUE.

DELIMITER //

CREATE PROCEDURE energyViewToPowerTable()
BEGIN
    -- Check if the table 'storico.power5m' exists in the target database
    SELECT COUNT(*) INTO @tableExists FROM information_schema.TABLES WHERE TABLE_SCHEMA = "storico" AND TABLE_NAME = "power5m";
    
    IF @tableExists = 0 THEN
        -- The table doesn't exist, create it in the target database
        CREATE TABLE `storico`.`power5m` AS SELECT * FROM `tuyathome`.`energy` LIMIT 0;
    END IF;
    
    -- Get the maximum timestamp (timex) from the table
    SELECT MAX(`timestamp`) INTO @maxTimex FROM `storico`.`power5m`;
    
    -- In case of an empty table...
    SET @maxTimex := IFNULL(@maxTimex, CURRENT_TIMESTAMP - INTERVAL 1 DAY);
    
    -- Define filling variables
    SET @lastGK := IFNULL((SELECT MAX(GRID_KWH) FROM `storico`.`power5m`), 0);
    SET @lastPK := IFNULL((SELECT MAX(PV_KWH) FROM `storico`.`power5m`), 0);
    
    -- Copy only the records from the view with a timestamp greater than maxTimex
    INSERT INTO `storico`.`power5m` (
        timestamp,
        GRID_W,
        GRID_V,
        GRID_I,
        GRID_KWH,
        PV_ACW,
        PV_ACV,
        PV_ACI,
        PV_KWH,
        PV_DCW,
        PV_DCV,
        PV_DCI
    )
    -- 2023-11-13 modified ISNULL to IF bigger
    SELECT
        t.timestamp,
        t.GRID_W,
        t.GRID_V,
        t.GRID_I,
        @lastGK := IF(t.GRID_KWH > @lastGK, t.GRID_KWH, @lastGK),
        t.PV_ACW,
        t.PV_ACV,
        t.PV_ACI,
        @lastPK := IF(t.PV_KWH > @lastPK, t.PV_KWH, @lastPK),
        t.PV_DCW,
        t.PV_DCV,
        t.PV_DCI
    FROM
    -- 2023-11-13 added  view preorder
        (SELECT * FROM `tuyathome`.`energy`  WHERE
        `timestamp` > @maxTimex  ORDER BY `timestamp` ASC LIMIT 10000) t
    ORDER BY
        t.`timestamp` ASC
    LIMIT 1000000;
    -- Return the number of added rows
    SELECT ROW_COUNT() AS "Added rows";
END //

  DELIMITER ;

EXAMPLE#5 system.doConsolidate()

Here is an implementation for tuyaDAEMON and global.alldevices, using a share, adding a new property (or 'method', if you like the OOP terminology) _doConsolidate(from, to) to SYSTEM, with parameters handling, as follows:

		{
		  "dp": "_doConsolidate",
		  "capability": "SKIP",
	          "__comment": "uses the stored procedure `ConsolidateDataFromView` to update a table from a view",
	          "__parameters": "'value':{'from':'dbfrom.view', 'to':'dbto.table'}",
		  "share": [
		    {
		      "test": [
		        "tuyastatus.core._DBase"
		      ],
		      "action": [
		        {
	                  "device": "_system",
		          "property": "_sqlDBlocal",
   		          "value": "@'CALL ConsolidateDataFromView(\"' + msg.info.value.from + '\",\"' + msg.info.value.to +'\")'"
		        }
		      ]
		    }
		  ]
		}

With a CRON, this procedure can be invoked every 12 hours, to ensure correct data consolidation. The code is similar to the previous one.

Time series '1h'

Data series in 1-hour intervals allow you to analyze long periods of time, still appreciating the variations in the 24h. They are obtained from the '5min' series quite simply.

EXAMPLE#6 consolidate_toHour(useddb, topic)

Here's a general-purpose example:

   /*
    Synopsis:
    This stored procedure consolidates data from a table with records every 5 minutes
    into a new or existing table with records every hour.
    The procedure is parametric, i.e. can be used for any table, with the following limits:
       - Table names:  <useddb>.<topic>5m   and   <useddb>.<topic>1h 
       - Must exist a 'timestamp' column in <topic>5m table, better if indexed.  
       - The consolidation is done by calculating the average of each column for each hour.
        
    
    Parameters:
    - useddb: The database used
    - topic: The base-name to define actual table names.

    Notes:
    - The procedure checks if the target table already exists and creates it if not.
    - It retrieves the maximum timestamp from the target table and uses it to consolidate
      only the new records.
    - The consolidated data is inserted into the target table.
    - The procedure returns the number of added rows as a result.
    */

-- Create the stored procedure

DELIMITER //

-- Drop the procedure if it already exists
DROP PROCEDURE IF EXISTS consolidate_toHour //

-- Create the procedure
CREATE PROCEDURE consolidate_toHour(
    IN useddb VARCHAR(255), 
    IN topic VARCHAR(255)     
)
BEGIN
 
    -- Check if the table exists in the target database
    SELECT COUNT(*) INTO @tableExists FROM information_schema.TABLES
          WHERE CONCAT(TABLE_SCHEMA,'.',TABLE_NAME) = CONCAT(useddb,".",topic,"1h");
    
    IF @tableExists = 0 THEN
        -- The table doesn't exist, create it in the target database
        SET @createTableSql := CONCAT('CREATE TABLE ', useddb, '.',topic,'1h', ' AS SELECT * FROM ',useddb, '.',topic,'5m LIMIT 0');
        PREPARE stmt FROM @createTableSql;
        EXECUTE stmt;
        DEALLOCATE PREPARE stmt;
   END IF;
    
    -- Get the maximum timestamp from the table
    SET @getMaxTimexSql := CONCAT('SELECT MAX(`timestamp`) INTO @maxTimex FROM ', useddb,'.',topic,'1h');
    PREPARE stmt FROM @getMaxTimexSql;
    EXECUTE stmt;
    DEALLOCATE PREPARE stmt;
    
    -- In case of an empty table, set @maxTimex to the current timestamp minus 2 day else adjust @maxTimex
    SET  @maxTimex := IFNULL(@maxTimex + INTERVAL 59 MINUTE, CURRENT_TIMESTAMP - INTERVAL 2 DAY);  

    -- Building the dynamic query step by step: 1h -> 3600s.
    SET @query1 := CONCAT('INSERT INTO `',useddb,'`.`',topic,'1h` SELECT
    from_unixtime(((unix_timestamp(min(`',topic,'5m`.`timestamp`)) DIV 3600) * 3600)) AS `timestamp`,');
 
    SET @query2 := (SELECT 
                     GROUP_CONCAT('avg(`', COLUMN_NAME, '`) AS `', COLUMN_NAME, '`')
                     FROM INFORMATION_SCHEMA.COLUMNS
                     WHERE (TABLE_SCHEMA = useddb) AND (TABLE_NAME = CONCAT(topic,'1h')) 
                     AND (COLUMN_NAME <> 'timestamp') );
                   
    SET @query3 := CONCAT(' FROM `',useddb,'`.`',topic,'5m`
               WHERE `',topic,'5m`.`timestamp` > @maxTimex
               GROUP BY (unix_timestamp(`',topic,'5m`.`timestamp`) DIV 3600) 
               ORDER BY `timestamp`');

    -- For debugging purposes, you can uncomment these lines to check the individual steps
    -- SELECT @query1 AS 'step1';
    -- SELECT @query2 AS 'step2';
    -- SELECT @query3 AS 'step3';
     
    PREPARE stmt FROM CONCAT(@query1,@query2,@query3);
    EXECUTE stmt;
    -- Echo the number of added rows back to the caller
    SELECT ROW_COUNT() AS "Added rows";
    DEALLOCATE PREPARE stmt;
  
END //

DELIMITER ;

EXAMPLE#7 meteo_5min_to_meteo_1hr

It is of course possible to perform other operations on the data during the consolidation, using procedures specialized for a specific dataset. This example uses a mix of MIN(), MAX(), ROUND(AVG(), 2) to get better values:

    -- Stored Procedures: meteo_5min_to_meteo_1hr
    -- Description: This stored procedure consolidates the wwwenergy.meteo_5min table into the wwwenergy.meteo_1hr table. 
    -- Parameters:
    --   Since different expressions are needed for different fields, this procedure can't be generic, it is specialized for a dataset, hard coded.
    -- Notes:


BEGIN
    -- Check if the table exists in the target database
    SELECT COUNT(*) INTO @tableExists FROM information_schema.TABLES 
           WHERE TABLE_SCHEMA =  'wwwenergy' AND
                 TABLE_NAME   =  'meteo_1h';
   
    IF @tableExists = 0 THEN
        -- The table doesn't exist, create it in the target database

        --  simple mode: copy structure from 'wwwenergy.meteo_5min'
          CREATE TABLE wwwenergy.meteo_1hr  LIKE wwwenergy.meteo_5min;

        --  long mode: full definition
           -- CREATE TABLE wwwenergy.meteo_1hr(
           --     timestamp timestamp NOT NULL,
           --     field1 varchar(255) NOT NULL,
           --     field2 INTEGER,
           --       ...more...
           --     PRIMARY KEY (timestamp)
           --     );
    END IF 

    SET @startts := (SELECT 
       max( `wwwenergy`.`data1h`.`timestamp`)
       FROM `wwwenergy`.`data1h`);
    
-- In case of an empty table:   
    SET  @startts := IFNULL(@startts + INTERVAL 59 MINUTE, CURRENT_TIMESTAMP - INTERVAL 1 DAY);  
   
    INSERT IGNORE INTO  `wwwenergy`.`data1h` SELECT 
    min(`t`.`timestamp`) AS `timestamp`,
    round(avg(`t`.`DIN_W`),1) AS `DIN_WH`,  --  round to limit decimal digits
    max(`t`.`DIN_T`) AS `DIN_TKWH`,
    round(avg(`t`.`PV_W`),1) AS `PV_WH`,
    max(`t`.`PV_T`) AS `PV_TKWH`
    -- more fields as required

    FROM `wwwenergy`.`data5min` `t`
     
    WHERE (`t`.`timestamp` >= @startts)
    GROUP BY  (unix_timestamp(`t`.`timestamp`) DIV 3600)
    ORDER BY `t`.`timestamp`;
    
    END

EXAMPLE#8 'daily-update' consolidate procedure

Increasing the views and tables to be updated, it is convenient to use only one stored procedure for all the requested activities. The advantage is that every change will be made in MySQL and not in tuyaDAEMON, which is limited, as already seen, to call this procedure every 24 hours (better every 12 to guarantee the update even if a run doesn't happen). This is what runs on my system today:

BEGIN
  -- messages
  CALL `tuyathome`.`do_purge`();
  DO SLEEP(2);
  -- clima (view 1h)
  CALL `tuyathome`.`ConsolidateDataFromView`('tuyathome.clima', 'storico.clima1h');
  DO SLEEP(2);
  -- eloads (view 5m)
  CALL `tuyathome`.`ConsolidateDataFromView`('tuyathome.eloads', 'storico.eloads5m');
  DO SLEEP(2);
  CALL `storico`.`consolidate_toHour`('storico','eloads');
  DO SLEEP(2);
  -- energy (view 5m)
  CALL `tuyathome`.`energyViewToEnergyTable`();
  DO SLEEP(2);
  CALL `storico`.`consolidate_toHour`('storico','energy');
  DO SLEEP(2);
  -- service (view 1h)
  CALL `tuyathome`.`ConsolidateDataFromView`('tuyathome.service', 'storico.service1h');
END

NOTE: When adding/deleting devices it is sometimes necessary to modify the views. This can lead to differences between the view schemas and the 5m and 1h table schemas. Executing the 'daily-update' procedure results in an error message: “Column count doesn't match value count at row 1”. All you need to do is rename the related table(s) (5m or 1h, e.g. historical/power1h_a) when you update the view: tables will be automatically recreated by the consolidation procedures with the new schema, and no data will be lost.


Applications

TuyaDEAMON's historical data series from devices are a useful starting point for creating graphs. However, sometimes the user needs more global statistics, such as monthly power data, the average temperature across all devices, or the most common error code. These global statistics can be built using views on "data 1-hour" tables.

For example, to calculate the average temperature across all devices, the user could create a view that aggregates the "clima1h" table by time and calculates the average temperature across all devices. The user could then use this view to create a graph or table showing the average temperature over time.

In general, views on "data 1-hour" tables can be used to calculate any global statistic that the user needs about the IoT system. This makes them a powerful tool for data analysis and visualization.

These global statistics can be used to monitor the health and performance of the IoT system, identify potential problems, and make informed decisions about how to improve the system.

EXAMPLE#9 monthly energy view

To evaluate the result of solar panels, a table like this can be useful:

Monthly power data

Let's say that we have tuyaDAEMON that is reading the data from a total power meter. The tuyaDAEMON reading may be inaccurate due to network latency or other errors. However, the values of the total power meter itself will always be accurate, regardless of the tuyaDAEMON reading.

Therefore, the starting data for calculating power consumption should always be the cumulative values of the power (grid or PV) from the 'power1h' table. This will ensure that the monthly power values are as accurate as possible.

/*
    Synopsis:
       This view calculates monthly energy data, including total energy consumption, grid energy, PV energy, and PV savings as percent.

     Notes:
        - All data are in [kWh].
        - The 'timestamp' field is redundant but required by the SQL algorithm.
*/

CREATE VIEW month_power AS
SELECT
  timestamp,
  -- Concatenate the year and (previous) month name for each month into a single string.
  CONCAT(
    YEAR(DATE_ADD(timestamp, INTERVAL -1 MONTH)),
    " ",
    MONTHNAME(DATE_ADD(timestamp, INTERVAL -1 MONTH))
  ) AS month,
  -- Calculate the total home energy [kWh] consumed for each month by subtracting the previous month's value from the current month's value and adding grid energy with PV energy.
  -- note: LAG() requires MySQL 8+
  (`GRID_KWH` - (LAG(`GRID_KWH`, 1) OVER (ORDER BY timestamp))) +
  (`PV_KWH` - (LAG(`PV_KWH`, 1) OVER (ORDER BY timestamp))) AS total,
  -- Calculate the grid energy [kWh] for each month by subtracting the previous month's grid value from the current month's grid value.
  (`GRID_KWH` - (LAG(`GRID_KWH`, 1) OVER (ORDER BY timestamp))) AS grid,
  -- Calculate the PV energy [kWh] for each month by subtracting the previous month's PV value from the current month's PV value.
  (`PV_KWH` - (LAG(`PV_KWH`, 1) OVER (ORDER BY timestamp))) AS PV,
  -- Calculate the PV savings percentage for each month by dividing the PV energy by the total energy and multiplying by 100.
  FORMAT(
    (((`PV_KWH` - (LAG(`PV_KWH`, 1) OVER (ORDER BY timestamp))) / ((`GRID_KWH` - (LAG(`GRID_KWH`, 1) OVER (ORDER BY timestamp))) +
    (`PV_KWH` - (LAG(`PV_KWH`, 1) OVER (ORDER BY timestamp)))))) * 100,
    2
  ) AS save
FROM
  power1h
GROUP BY
  MONTH(timestamp)
HAVING
  timestamp = MIN(timestamp)
ORDER BY
  timestamp ASC
-- The first row contains fake data. Skipped.
LIMIT
  1, 1200;

This table can easily be exported to OpenOffice (Excel) using 'export, CSV' in phpMyAdmin, or using a stored procedure if the export is frequently requested.


The use of LAG() in the previous 'month power' view imposes the constraint of using a MySQL version equal to or greater than 8.

On an older Mac that I use, I have MySQL v. 5.7.26 and I can't update it. I then wrote a 'month power2' without using the LAG() function: it's not as precise as 'month power' (ignores one hour per month values), but can be useful in some cases.

 CREATE VIEW `month power2` AS 
   select 
      min(`power1h`.`timestamp`) AS `timestamp`,
      concat(year(min(`power1h`.`timestamp`)),'  ', monthname(min(`power1h`.`timestamp`))) AS `month`,
      (max(`power1h`.`GRID_KWH`) - min(`power1h`.`GRID_KWH`)) AS `grid`,
      (max(`power1h`.`PV_KWH`) - min(`power1h`.`PV_KWH`)) AS `PV`,
      (((max(`power1h`.`GRID_KWH`) - min(`power1h`.`GRID_KWH`)) + max(`power1h`.`PV_KWH`)) - min(`power1h`.`PV_KWH`)) AS `total`,
      (((max(`power1h`.`PV_KWH`) - min(`power1h`.`PV_KWH`)) * 100) / (((max(`power1h`.`GRID_KWH`) - min(`power1h`.`GRID_KWH`)) + max(`power1h`.`PV_KWH`)) - min(`power1h`.`PV_KWH`))) AS `save` 
   from `power1h` 
   group by month(`power1h`.`timestamp`)
   order by `timestamp`

The related user interface is presented here.


Sometimes, we need to use the 'messages' table to build new tables that contain information about the duration of time between a SET event and a RESET event (similar to an SR flip-flop). In this case, the first SET event acts as the start of the duration period, and the first RESET event after the SET event acts as the end of the duration period.

EXAMPLE#10 device offline time

When you have many devices, it is not easy to keep track of their connections. It is normal for a device to be offline for a few seconds, and this is usually not a cause of problems. However, longer periods of offline time can indicate problems with the WiFi network or gateways.

  • The type of devices that are being used can also affect the likelihood of offline time. For example, devices that are constantly transmitting data, such as some sensors, are more likely to experience offline time than devices that are used less frequently.
  • The location of the devices can also play a role. Devices that are located in areas with a lot of interference, such as near microwaves or other electronic devices, are more likely to experience offline time.
  • The quality of the WiFi network can also affect the likelihood of offline time. A WiFi network that is using a weak signal or that is experiencing interference is more likely to cause problems.

To calculate the total offline time of devices, we can use a device_offline table, which we can create using the 'messages' table. The device_offline table contains information about the duration of time between each device's 'disconnect' and 'reconnect' events, so we can simply sum the duration of all of the duration periods for each device to calculate the total offline time.

Here is an example of how to use the 'messages' table to build a view that contains information about the duration of time between SET and RESET events:

  • SET is a '_connected' = false tuyaDEAMON event of a device
  • RESET is the fist '_connected' = true tuyaDEAMON event from the same device
SELECT 
    `t3`.`deviceID`, 
    MAX(`t3`.`device`) AS 'device', 
    MIN(`t3`.`start`) AS 'start', 
    `t3`.`end`, 
    MAX(`t3`.`duration`) AS 'duration'

FROM
(
  SELECT
    `t1`.`device-id` AS `deviceID`,
    `t1`.`device-name` AS `device`,
    `t1`.`timestamp` AS `start`,
    `t2`.`timestamp` AS `end`,
    TIMEDIFF(`t2`.`timestamp`, `t1`.`timestamp`) AS `duration`
  FROM
   -- Join the table to itself to find all pairs of messages where the device ID is the same and the `dps` field is `_connected` and the 
 `data` field is `false` for the first message and `true` for the second message.
    `tuyathome`.`messages` `t1`
      JOIN `tuyathome`.`messages` `t2`
  WHERE
    (
 -- Filter the results to only include pairs of messages where the second message has a timestamp that is greater than the first message's timestamp.
       (`t1`.`device-id` = `t2`.`device-id`)
        AND(`t1`.`dps` = '_connected') 
        AND(`t1`.`data` = 'false') 

        AND(`t2`.`dps` = '_connected') 
        AND(`t2`.`data` = 'true') 

        AND(`t2`.`timestamp` > `t1`.`timestamp`) 
        AND(
   -- Filter the results to only include pairs of messages where the second message is the first RESET. This can give multiple pairs, if they are multiple SET
          NOT(
             EXISTS(
                SELECT
                    1
                FROM
                    `tuyathome`.`messages` `tx`
                WHERE
                    (
                        ( `tx`.`dps` = '_connected') 
                        AND(`tx`.`device-id` = `t1`.`device-id`) 
                        AND(`tx`.`data` = 'true')
                        AND(`t1`.`timestamp` < `tx`.`timestamp`) 
                        AND(`tx`.`timestamp` < `t2`.`timestamp` )
                    )
                )
            )
        )
    )
) `t3`
-- This groups all pairs (case of multiple SET) to select the first SET (see the top SELECT)
GROUP BY `t3`.`deviceID`, `t3`.`end`
ORDER BY  'start’, ‘end’

This simple SQL query works correctly, but it is very very slow, practically unusable with tuyaDEAMON.

The next implementation, using variables, is instead very efficient (0.095 sec to get 420 records from a 24h 'messages' table of 64000+ records on a slow Android server):

SELECT 
   MAX(d.p_deviceID) deviceID, 
   MAX(d.p_device) device, 
   MAX(d.p_start) OFFstart, 
   MAX(d.p_end) OFFend, 
   TIMEDIFF(MAX(d.p_end), MAX(d.p_start)) duration, 
   IF(TIMEDIFF(MAX(d.p_end), MAX(d.p_start)) > '00:01:00', '***', '') warning
FROM (
   SELECT 
       @start := CASE WHEN f.value = 'false' AND @value <> 'false' THEN f.`timestamp` ELSE null END p_start,
       @end   := CASE WHEN  f.value = 'true' AND @value = 'false'  THEN f.`timestamp` ELSE null END p_end,
       @num   := IF(@devid = f.`device-id` , @num +(@start IS NOT NULL), @num + 1)   p_num,
       @devid := f.`device-id` p_deviceID,
       @value := f.value p_value,
       f.`device-name` p_device
    FROM 
       (SELECT   @value := '', @devid := '',  @start := 0, @end := 0, @num := 0) init_vars,
       (SELECT * FROM messages t
          WHERE t.`dps` = '_connected'
          ORDER BY  t.`device-id`, t.`timestamp`  ASC 
          LIMIT 18446744073709551615) f
      ) d
   GROUP BY d.p_num
   HAVING duration IS NOT NULL
   ORDER BY OFFstart, OFFend;

Unfortunately, this query not always can be used in a view, due to the limitations of SELECT in views, and therefore it is necessary to use it in a procedure. This procedure can be used to update a 'device_offtime' table, at least once every 24 hours. For good performances, you need to define some indexes for the 'messages' table: dps, device-id, timestamp.

Here are the operations to carry out to create and update periodically a device_offline table.

Step1: 'storico.device_offline' table

-- Database: `storico`
-- table: `device_offline`

CREATE TABLE `device_offline` (
  `deviceID` char(30) NOT NULL,
  `device` char(40) DEFAULT NULL,
  `OFFstart` datetime NOT NULL,
  `OFFend` datetime DEFAULT NULL,
  `duration` time DEFAULT NULL,
  `warning` char(4) DEFAULT NULL
) ENGINE=InnoDB DEFAULT CHARSET=utf8;

-- adding index

ALTER TABLE `device_offline`
  ADD PRIMARY KEY (`deviceID`,`OFFstart`) USING BTREE;

Step2: 'update_offline' procedure

-- optional delete
DROP PROCEDURE `update_offline`;

CREATE PROCEDURE `update_offline`()  

INSERT INTO `storico`.`device_offline`(
  `deviceID`,
  `device` ,
  `OFFstart`,
  `OFFend` ,
  `duration` ,
  `warning`)

SELECT 
   MAX(d.p_deviceID) , 
   MAX(d.p_device) , 
   MAX(d.p_start) , 
   MAX(d.p_end) , 
   TIMEDIFF(MAX(d.p_end), MAX(d.p_start)) duration, 
   -- the warning (***) is for duration > 1 minute
   IF(TIMEDIFF(MAX(d.p_end), MAX(d.p_start)) > '00:01:00', '***', '') 
FROM (
   SELECT 
       @start := CASE WHEN f.value = 'false' AND @value <> 'false' THEN f.`timestamp` ELSE null END p_start,
       @end   := CASE WHEN  f.value = 'true' AND @value = 'false'  THEN f.`timestamp` ELSE null END p_end,
       @num   := IF(@devid = f.`device-id` , @num +(@start IS NOT NULL), @num + 1)   p_num,
       @devid := f.`device-id` p_deviceID,
       @value := f.value p_value,
       f.`device-name` p_device
    FROM 
       (SELECT   @value := '', @devid := '',  @start := 0, @end := 0, @num := 0) init_vars,
       (SELECT * FROM `tuyathome`.messages t
          WHERE t.`dps` = '_connected'
          ORDER BY  t.`device-id`, t.`timestamp`  ASC 
          LIMIT 18446744073709551615) f
      ) d
   GROUP BY d.p_num
   HAVING duration IS NOT NULL
   
   ON DUPLICATE KEY UPDATE  
      `OFFend`    =  VALUES(`OFFend`),
      `duration`  =  VALUES(`duration`),
      `warning`   =  VALUES(`warning`)

Step3: call 'daily_update' procedure

This procedure is called periodically by tuyaDAEMON (see Consolidate).

  -- append this 
  DO SLEEP(2);
  CALL `tuyathome`.`update_offline`();

That's all folks!