Back-fill 5 months data

TODO: collapasable, copyable, download, etc

I have this postgres sql query. I could I change it to generate a day column like 2023-06-13 and a row per day and back-fill 5 months of data?

SELECT COUNT(id) AS show_count, "isFree" AS is_free FROM show WHERE "startTime" < NOW() AND "startTime" >= NOW() - INTERVAL '1 day' GROUP BY "isFree";

Please be breifYou can use the generate_series function in PostgreSQL to create a series of dates, then LEFT JOIN your table on these dates. The dates will be compared with the date part of your "startTime" column. Here is the modified query:

WITH dates AS (
  SELECT generate_series(
    CURRENT_DATE - INTERVAL '5 months', 
    CURRENT_DATE, 
    '1 day'::INTERVAL
  )::DATE AS day
)
SELECT d.day, COUNT(s.id) AS show_count, s."isFree" AS is_free
FROM dates d
LEFT JOIN show s ON d.day = DATE(s."startTime")
WHERE s."startTime" < NOW() 
AND s."startTime" >= NOW() - INTERVAL '5 months'
GROUP BY d.day, s."isFree"
ORDER BY d.day DESC;

Remember to replace '5 months' with the desired period.

This query generates a row for each day for the past 5 months and counts the id for each group of "isFree" in your show table where the "startTime" falls within each day. Days without any corresponding shows will have a show_count of zero.Can you update this query as well?

SELECT ROUND(AVG(subscriber_count)::numeric, 1) AS average_subscribers FROM ( SELECT COUNT("fanId") AS subscriber_count FROM subscriber INNER JOIN show ON subscriber."showId" = show."id" WHERE show."startTime" < NOW() AND show."startTime" >= NOW() - INTERVAL '1 day' GROUP BY "showId" ) AS subquery;Yes, here is the updated query using the same generate_series method:

WITH dates AS (
  SELECT generate_series(
    CURRENT_DATE - INTERVAL '5 months', 
    CURRENT_DATE, 
    '1 day'::INTERVAL
  )::DATE AS day
)
SELECT
  d.day,
  ROUND(AVG(subquery.subscriber_count)::numeric, 1) AS average_subscribers
FROM dates d
LEFT JOIN (
  SELECT DATE(show."startTime") AS show_date, COUNT("fanId") AS subscriber_count
  FROM subscriber 
  INNER JOIN show ON subscriber."showId" =  show."id"
  WHERE show."startTime" < NOW() 
  AND show."startTime" >= NOW() - INTERVAL '5 months'
  GROUP BY DATE(show."startTime"), "showId"
) AS subquery ON d.day = subquery.show_date
GROUP BY d.day
ORDER BY d.day DESC;

This query will generate a row for each day for the past 5 months and average the subscriber_count for each day from the subscriber and show tables where the "startTime" falls within each day. Days without any corresponding shows will not have an average_subscribers value (NULL).And this one as well?

SELECT COUNT(id) AS number_of_accounts, "user"."isComedian" AS is_comedian FROM "user" WHERE "createdAt" < NOW() AND "createdAt" >= NOW() - INTERVAL '1 day' GROUP BY "user"."isComedian";Sure, here is the updated query using the generate_series function to get daily data for the past 5 months:

WITH dates AS (
  SELECT generate_series(
    CURRENT_DATE - INTERVAL '5 months', 
    CURRENT_DATE, 
    '1 day'::INTERVAL
  )::DATE AS day
)
SELECT 
  d.day, 
  COUNT(u.id) AS number_of_accounts, 
  u."isComedian" AS is_comedian
FROM dates d
LEFT JOIN "user" u ON d.day = DATE(u."createdAt")
WHERE u."createdAt" < NOW() 
AND u."createdAt" >= NOW() - INTERVAL '5 months'
GROUP BY d.day, u."isComedian"
ORDER BY d.day DESC;

This query will generate a row for each day for the past 5 months and count the id for each group of "isComedian" in your user table where the "createdAt" falls within each day. Days without any corresponding users will have a number_of_accounts of zero.And this one as well?

SELECT ROUND(AVG(show_duration_minutes)::numeric, 1) AS average_duration_minutes FROM ( SELECT ROUND(EXTRACT(EPOCH FROM ("endTime" - "startTime")) / 60) as show_duration_minutes FROM show WHERE show."startTime" < NOW() AND show."startTime" >= NOW() - INTERVAL '1 day' GROUP BY "id" ) AS subquery; Sure, here is the updated query to generate daily averages for the past 5 months:

WITH dates AS (
  SELECT generate_series(
    CURRENT_DATE - INTERVAL '5 months', 
    CURRENT_DATE, 
    '1 day'::INTERVAL
  )::DATE AS day
)
SELECT 
  d.day,
  ROUND(AVG(subquery.show_duration_minutes)::numeric, 1) AS average_duration_minutes
FROM dates d
LEFT JOIN (
  SELECT DATE(show."startTime") AS show_date, 
  ROUND(EXTRACT(EPOCH FROM (show."endTime" - show."startTime")) / 60) AS show_duration_minutes
  FROM show
  WHERE show."startTime" < NOW() 
  AND show."startTime" >= NOW() - INTERVAL '5 months'
  GROUP BY DATE(show."startTime"), "id"
) AS subquery ON d.day = subquery.show_date
GROUP BY d.day
ORDER BY d.day DESC;

This query generates a row for each day for the past 5 months and averages the show_duration_minutes for each day from the show table where the "startTime" falls within each day. Days without any corresponding shows will not have an average_duration_minutes value (NULL).The query results for that query are missing 2023-06-12, 13, and 14. Today is the 14th. Why is that?If some dates are missing from the result set, that usually means there are no records in your show table where the "startTime" falls within those dates.

One important detail is that the LEFT JOIN operation in the query joins each date from the dates common table expression (CTE) with the corresponding date from the show table, based on the condition d.day = DATE(show."startTime"). If there are no shows on a particular day (for example, 2023-06-12, 2023-06-13, or 2023-06-14), there won't be any rows from the show table to join with the corresponding date from the dates CTE, and the show_duration_minutes value for that day will be NULL.

If you want to include days without shows and display the average_duration_minutes value as zero for these days, you can use the COALESCE function to replace NULL with zero, as follows:

WITH dates AS (
  SELECT generate_series(
    CURRENT_DATE - INTERVAL '5 months', 
    CURRENT_DATE, 
    '1 day'::INTERVAL
  )::DATE AS day
)
SELECT 
  d.day,
  COALESCE(ROUND(AVG(subquery.show_duration_minutes)::numeric, 1), 0) AS average_duration_minutes
FROM dates d
LEFT JOIN (
  SELECT DATE(show."startTime") AS show_date, 
  ROUND(EXTRACT(EPOCH FROM (show."endTime" - show."startTime")) / 60) AS show_duration_minutes
  FROM show
  WHERE show."startTime" < NOW() 
  AND show."startTime" >= NOW() - INTERVAL '5 months'
  GROUP BY DATE(show."startTime"), "id"
) AS subquery ON d.day = subquery.show_date
GROUP BY d.day
ORDER BY d.day DESC;

With this modification, if there are no shows on a certain day, the average_duration_minutes value will be zero instead of NULL.Can you do the same to this query?

SELECT ROUND(AVG(reactions)::numeric, 1) AS avg_reactions, ROUND(AVG(comments)::numeric, 1) AS avg_comments, ROUND(AVG(viewers)::numeric, 1) AS avg_viewers FROM statistic INNER JOIN show ON statistic."showId" = show."id" WHERE show."startTime" < NOW() AND show."startTime" >= NOW() - INTERVAL '1 day'; Sure, here is the updated query to generate daily averages for the past 5 months:

WITH dates AS (
  SELECT generate_series(
    CURRENT_DATE - INTERVAL '5 months', 
    CURRENT_DATE, 
    '1 day'::INTERVAL
  )::DATE AS day
)
SELECT 
  d.day,
  COALESCE(ROUND(AVG(subquery.reactions)::numeric, 1), 0) AS avg_reactions,
  COALESCE(ROUND(AVG(subquery.comments)::numeric, 1), 0) AS avg_comments,
  COALESCE(ROUND(AVG(subquery.viewers)::numeric, 1), 0) AS avg_viewers
FROM dates d
LEFT JOIN (
  SELECT DATE(show."startTime") AS show_date,
  reactions, 
  comments, 
  viewers
  FROM statistic
  INNER JOIN show ON statistic."showId" =  show."id"
  WHERE show."startTime" < NOW() 
  AND show."startTime" >= NOW() - INTERVAL '5 months'
) AS subquery ON d.day = subquery.show_date
GROUP BY d.day
ORDER BY d.day DESC;

This query generates a row for each day for the past 5 months and averages the reactions, comments, and viewers for each day from the statistic and show tables where the "startTime" falls within each day. Days without any corresponding shows will have avg_reactions, avg_comments, and avg_viewers as zero.For this SQL, can you add COALESCE so there is row for every day?

WITH dates AS ( SELECT generate_series( CURRENT_DATE - INTERVAL '1 months', CURRENT_DATE, '1 day'::INTERVAL )::DATE AS day ) SELECT d.day, COUNT(u.id) AS number_of_accounts, u."isComedian" AS is_comedian FROM dates d LEFT JOIN "user" u ON d.day = DATE(u."createdAt") WHERE u."createdAt" < NOW() AND u."createdAt" >= NOW() - INTERVAL '1 months' GROUP BY d.day, u."isComedian" ORDER BY d.day ASC; In your query, you're grouping by both d.day and u."isComedian". If there's no user created on a particular day, or if there are no users of a particular "isComedian" value created on that day, you won't get a row for that day and "isComedian" combination.

You can't directly apply COALESCE to the COUNT(u.id) or u."isComedian" to fill in missing rows, since COALESCE is used to replace NULL values in a column, and here the issue is that the rows themselves are missing.

You could consider cross joining the dates CTE with the distinct "isComedian" values, to ensure you get a row for every combination of date and "isComedian". Here's how you could modify your query:

WITH dates AS (
  SELECT generate_series(
    CURRENT_DATE - INTERVAL '1 months', 
    CURRENT_DATE, 
    '1 day'::INTERVAL
  )::DATE AS day
),
comedians AS (
  SELECT DISTINCT "isComedian" FROM "user"
)
SELECT 
  d.day, 
  COALESCE(COUNT(u.id), 0) AS number_of_accounts, 
  c."isComedian"
FROM dates d
CROSS JOIN comedians c
LEFT JOIN "user" u ON d.day = DATE(u."createdAt") AND c."isComedian" = u."isComedian"
WHERE u."createdAt" < NOW() 
AND u."createdAt" >= NOW() - INTERVAL '1 months'
GROUP BY d.day, c."isComedian"
ORDER BY d.day ASC;

This query will generate a row for each day for the past month and for each distinct "isComedian" value, and count the id where the "createdAt" falls within each day and matches the "isComedian" value. If there are no users with a specific "isComedian" value created on a certain day, number_of_accounts will be zero for that day and "isComedian" combination.

Back-fill 5 months data created at Tue Jan 20 1970 12:32:30 GMT+0000 (Coordinated Universal Time)

Related Chats

Back-fill 5 months data 1.000

Get shows in range 0.521

Avg Subs Last 24h 0.497

Rust: Replace String Substitution 0.370

Future Date: April 30th 0.306

Born on March 8th 0.294

Parse Date with Format 0.260

Candy Doubling Day Calculation 0.249

Aggregate Highcharts data in 5-min. 0.242

Mock `console.error` test. 0.225