How to edit historical data

Edit historical data using usage-event APIs

Edit historical data when billable metrics are derived using the COUNT SQL aggregate function

In this example, we rent out our premises on a per day basis. The following data schema for sending usage events to Zenskar:

{
  "data": {
    "Id": "string",
    "Premises_used": "string"
  },
  "timestamp": "timestamp",
  "customer_id": "string"
}

Let us assume that the usage event APIs are used to send the following data to Zenskar:

data.Iddata.Premises_usedtimestampcustomer_id
c01Yes2023-04-28T13:26:05.017000acc93335-aabb-43e9-aabb-138ac880b715
c02Yes2023-04-29T13:26:05.017000acc93335-aabb-43e9-aabb-138ac880b715
c03Yes2023-04-30T13:26:05.017000acc93335-aabb-43e9-aabb-138ac880b715

The billable metric, in this example, is the number of days the premises were used. The following SQL query uses COUNT aggregate function on the data.Premises_used column to calculate billable metric.

SELECT 
    COALESCE(COUNT(data.Premises_used)) AS "quantity"
FROM 
    your_table
WHERE 
    DATE("timestamp") >= DATE({{start_date}}) AND
    DATE("timestamp") <= DATE({{end_date}}) AND
    "customer_id" = CAST({{customer.external_id}} AS String)


Method 1: modify the data schema by introducing a new data field

🔖

Note

The examples given in this document are for reference only. The ideas given herein are guidelines and not rules. You must adapt the ideas given in this document as per your use case.

In this example, we will use a boolean type.

{
  "data": {
    "Id": "string",
    "Premises_used": "float",
    "Error_in_entry_do_not_count": "bool"
  },
  "timestamp": "timestamp",
  "customer_id": "string"
}

The following is the database table schema that Zenskar will create.

data.Iddata.Premises_useddata.Error_in_entry_do_not_counttimestampcustomer_id
c01Yes2023-04-28T13:26:05.017000acc93335-aabb-43e9-aabb-138ac880b715
c02Yes2023-04-29T13:26:05.017000acc93335-aabb-43e9-aabb-138ac880b715
c03Yes2023-04-30T13:26:05.017000acc93335-aabb-43e9-aabb-138ac880b715
c04NoTrue2023-05-01T13:26:05.017000acc93335-aabb-43e9-aabb-138ac880b715

The following SQL query calculates the adjusted count of data.Premises_used column by taking into account the count of the newly introduced data.Error_in_entry_do_not_count column:

SELECT 
    COALESCE(COUNT(data.Premises_used) - COUNT(NULLIF(data.Error_in_entry_do_not_count,'')) AS "quantity"
FROM 
    your_table
WHERE 
    DATE("timestamp") >= DATE({{start_date}}) AND
    DATE("timestamp") <= DATE({{end_date}}) AND
    "customer_id" = CAST({{customer.external_id}} AS String)


Breakdown of the SQL query

SELECT clause:

  • COALESCE(...): This function returns the first non-null value in its list of arguments. Here, it is used to ensure that if the result of the subtraction is null, which can happen if there are no rows, it will return 0 instead.
  • COUNT(data.Premises_used): This counts the number of non-null entries in the data.Premises_used column.
  • COUNT(NULLIF(data.Error_in_entry_do_not_count, '')): The NULLIF function returns NULL if data.Error_in_entry_do_not_count is an empty string (''), effectively counting only non-empty entries. So, this counts the number of non-empty entries in data.Error_in_entry_do_not_count. The entire expression calculates the difference between the count of data.Premises_used and the count of non-empty data.Error_in_entry_do_not_count.

FROM clause specifies the table from which the data is being queried.

WHERE clause:

  • DATE("timestamp") >= DATE({{start_date}}) AND DATE("timestamp") <= DATE({{end_date}}): this filters the results to include only those records where the timestamp is within the specified date range, defined by the variables {{start_date}} and {{end_date}}.
  • "customer_id" = CAST({{customer.external_id}} AS String): This filters the results to include only rows where the customer_id matches the external_id, after converting external_id to a string.

Method 2: introduce a new usage event for deduction

Let us assume that you used the following data schema for sending usage events:

{
  "data": {
    "Id": "string",
    "Premises_used": "string"
  },
  "timestamp": "timestamp",
  "customer_id": "string"
}

Let us assume that the usage event APIs are used to send the following data to Zenskar:

data.Iddata.Premises_usedtimestampcustomer_id
c01Yes2023-04-28T13:26:05.017000acc93335-aabb-43e9-aabb-138ac880b715
c02Yes2023-04-29T13:26:05.017000acc93335-aabb-43e9-aabb-138ac880b715
c03Yes2023-04-30T13:26:05.017000acc93335-aabb-43e9-aabb-138ac880b715

You may define another schema that can be used for deduction:

{
  "data": {
    "Id": "string",
    "Error": "string"
  },
  "timestamp": "timestamp",
  "customer_id": "string"
}

Let us assume that the usage event APIs are used to send the following data to Zenskar for deduction:

data.Iddata.Errortimestampcustomer_id
c01Yes2023-05-15T13:26:05.017000acc93335-aabb-43e9-aabb-138ac880b715
c02Yes2023-05-29T13:26:05.017000acc93335-aabb-43e9-aabb-138ac880b715
SELECT 
    (SELECT COUNT(data.Premises_used) FROM your_table) - 
    (SELECT COUNT(data.Error) FROM your_another_table) AS "quantity";

Edit historical data when billable metrics are derived using the SUM aggregate function

Let us assume that you used the following data schema for sending usage events:

{
  "data": {
    "Id": "string",
    "Usage": "float"
  },
  "timestamp": "timestamp",
  "customer_id": "string"
}

Let us assume that the usage event APIs are used to send the following data to Zenskar:

data.Iddata.Usagetimestampcustomer_id
c01301.42023-04-28T13:26:05.017000acc93335-aabb-43e9-aabb-138ac880b715
c025002023-04-29T13:26:05.017000acc93335-aabb-43e9-aabb-138ac880b715
c03104.82023-04-30T13:26:05.017000acc93335-aabb-43e9-aabb-138ac880b715

You realize that for data.Id equaling c02 should be 475 and not 500. In this case, you can make another usage event API call to add another unique data.Id with a data.Usage of -25, as shown below.

data.Iddata.Usagetimestampcustomer_id
c01301.42023-04-28T13:26:05.017000acc93335-aabb-43e9-aabb-138ac880b715
c025002023-04-29T13:26:05.017000acc93335-aabb-43e9-aabb-138ac880b715
c03104.82023-04-30T13:26:05.017000acc93335-aabb-43e9-aabb-138ac880b715
c04-252023-05-01T13:26:05.017000acc93335-aabb-43e9-aabb-138ac880b715

When aggregating using the SUM function, the negative entry in the data.Usage column will account for the data error.

SELECT 
    SUM(data.Usage) AS "quantity"
FROM 
    your_table
WHERE 
    DATE("timestamp") >= DATE({{start_date}}) AND
    DATE("timestamp") <= DATE({{end_date}}) AND
    "customer_id" = CAST({{customer.external_id}} AS String)

Edit historical data when billable metrics are derived using the MAX, AVG, MIN, and UNIQUE COUNT aggregate functions

Techniques similar to the ones mentioned for COUNT and SUM aggregate functions can be used for MAX, AVG, MIN, and UNIQUE COUNT` aggregate functions.

Editing historical data using ROW_NUMBER() OVER (PARTITION BY) SQL function

The ROW_NUMBER() function in SQL is a window function that assigns a unique sequential integer to rows within a partition of a result set. It is often used to uniquely identify rows within groups of data. The PARTITION BY clause is used to define how the rows are divided into groups.

Breakdown of ROW_NUMBER() OVER (PARTITION BY)

  • ROW_NUMBER(): generates a unique number for each row in the result set, starting from 1 for the first row in each partition.
  • OVER: specifies the window over which the function operates. It can include:
  • PARTITION BY: defines the groups (partitions) within the data. Each partition is treated independently when generating row numbers.

Let us assume that the usage event APIs are used to send the following data to Zenskar:

data.Iddata.Usagetimestampcustomer_id
c01301.42023-04-28T13:26:05.017000acc93335-aabb-43e9-aabb-138ac880b715
c025002023-04-29T13:26:05.017000acc93335-aabb-43e9-aabb-138ac880b715
c03104.82023-04-30T13:26:05.017000acc93335-aabb-43e9-aabb-138ac880b715

You realize that for data.Id equaling c02, data.Usage should be 475 and not 500. In this case, you can make another usage event API call to add a row with data.Id equaling c02, data.Usage of 475, and latest timestamp, as shown below.

data.Iddata.Usagetimestampcustomer_id
c01301.42023-04-28T13:26:05.017000acc93335-aabb-43e9-aabb-138ac880b715
c025002023-04-29T13:26:05.017000acc93335-aabb-43e9-aabb-138ac880b715
c024752023-05-01T13:26:05.017000acc93335-aabb-43e9-aabb-138ac880b715
c03104.82023-04-30T13:26:05.017000acc93335-aabb-43e9-aabb-138ac880b715

The following query deduplicates the readings based on the most recent timestamp.

WITH DeduplicatedReadings AS (
    SELECT 
        data.Id,
        data.Usage,
        timestamp,
        customer_id,
        ROW_NUMBER() OVER (PARTITION BY data.Id ORDER BY timestamp DESC) AS rn
    FROM 
        device_readings
)
SELECT 
 				data.Id,
        data.Usage,
        timestamp,
        customer_id,
FROM 
    DeduplicatedReadings
WHERE 
    rn = 1;

The above SQL query will give the following result:

data.Iddata.Usagetimestampcustomer_id
c01301.42023-04-28T13:26:05.017000acc93335-aabb-43e9-aabb-138ac880b715
c024752023-05-01T13:26:05.017000acc93335-aabb-43e9-aabb-138ac880b715
c03104.82023-04-30T13:26:05.017000acc93335-aabb-43e9-aabb-138ac880b715

Edit historical data using data-source connectors

Edit historical data when data source supports remote querying

Edit data in your database and let Zenskar take care of the rest of the workflow.

Edit historical data when data source does not support remote querying

Edit data in your database and let the periodic sync take care of the syncing the updated data with Zenskar.