Rails tip – Grouping ActiveRecord objects by day or week using datetrunc

Here’s a quick Rails question for you. We have a bunch of Posts in a Postgres DB which we want to order by publication day - but Posts published on the same day should be ordered by a different column: upvotes. (This is basically how Product Hunt does it). We’re paginating, so we only care about the first 100 posts. How would we do it?

Maybe you’d start with something like this:

Post.all.group_by { |post|
  post.published_at.to_date
}.flat_map { |day, posts|
  posts.sort_by(&:score).reverse
}.first(100)

It’s not terrible (it works), it’s the kind of thing you’ll find if you Google “rails group by day”. The problem with the code is a) it’s a bit ugly and b) is it’s not very efficient. Using our Enumerable methods (group_by and flat_map), means we have to load all our Posts into memory, sort them with Ruby, then pull out the 100 we care about. This is slow & inefficient.

To get me started I ran this on 30K records, it took about 7 seconds (I didn’t profile the memory usage, but I’d expect it to be quite high). So we might get away with using Ruby to order a few hundred records, but this doesn’t scale to thousands or millions of records.

Postgres is way faster and doing these kind of sorts (especially with indexes, as we’ll see below) - but simply ordering by published_at and upvotes doesn’t give us what we want:

Post.order('published_at DESC, upvotes DESC').limit(5).pluck(:published_at, :upvotes)
#=> (14.4ms) SELECT "posts"."published_at", "posts"."upvotes" FROM "posts" ORDER BY published_at DESC, upvotes DESC LIMIT 5

[[Fri, 15 Jan 2016 15:25:46 UTC +00:00, 12],
 [Fri, 15 Jan 2016 13:22:11 UTC +00:00, 158],
 [Fri, 15 Jan 2016 12:50:37 UTC +00:00, 88],
 [Thu, 14 Jan 2016 21:48:44 UTC +00:00, 24],
 [Thu, 14 Jan 2016 14:46:51 UTC +00:00, 282]]

Unless two are added at the same second, the second sort column (upvotes) will never be used. We need to to tell Postgres to sort by the publication day, but to ignore the rest of the timestamp.

Enter datetrunc

Luckily we can use super handy function called date_trunc for just that (it’s documented here). Despite it’s handiness I couldn’t find any real mention of people using it in a Rails context - although it is used inside the groupdate gem which I’d recommend for more complex use-cases, or if you’re not using Postgres.



datetrunc is really simple, it truncates a date down to a certain precision. If the precision is 'day' for example, it’ll “reset” the date to midnight (a bit like Rails’ beginning_of_day method). Let’s modify our example above:

Post.order("date_trunc('day', published_at) DESC, upvotes DESC").limit(5).pluck("date_trunc('day', published_at)", :upvotes)
#=> SELECT date_trunc('day', published_at), "collaborations"."upvotes" FROM "collaborations" ORDER BY date_trunc('day', published_at) DESC, upvotes DESC LIMIT 5

[[2016-01-15 00:00:00 UTC, 158]],
 [2016-01-15 00:00:00 UTC, 88],
 [2016-01-15 00:00:00 UTC, 12]],
 [2016-01-14 00:00:00 UTC, 282]],
 [2016-01-14 00:00:00 UTC, 24]]

Pretty cool huh? Everything after the day in our publication timestamp is discarded, which means posts from the same day are correctly sorted by upvotes 🎊!

We can truncate down to any arbitrary precision, from 'week' to 'millennium' (useful for that database of ancient alien races you might be making).

Post.order("date_trunc('week', published_at)")
Post.order("date_trunc('millennium', published_at)")

Be careful, unlike most of the time in Ruby, single/double quotes aren't interchangeable here - in Postgres literal strings like 'day' have to be single-quoted.

Using indexes for crazy performance 🔥

Using date_trunc is faster than our plain-Ruby approach from the get-go - running our final query (below) on the same 30K rows takes just 155ms seconds!

Post.order("date_trunc('day', published_at) DESC, upvotes DESC").limit(100)
#=> Post Load (155.2ms)  SELECT  "posts".* FROM "posts" ORDER BY date_trunc('day', published_at) DESC, upvotes DESC  LIMIT 100

We can push performance even further by adding an index. We’re doing something a bit more complex than just indexing the value of column here, because we’re running it through a function first. We’ll need expression (or function-based) index. These aren’t natively supported in vanilla Rails yet (but there’s an open Pull Request to add them), so we’ll use the schema_plus_pg_indexes gem.

With the gem added to our Gemfile, we’ll generate a migration

bin/rails g migration add_creation_day_score_index_to_posts

Then we’ll add our expression index:

class AddCreationDayScoreIndexToPosts < ActiveRecord::Migration

  def up
    add_index :posts, expression: "date_trunc('day', published_at) DESC, score DESC", name: 'posts_creation_day_score_index'
  end

  def down
    remove_index :posts, name: 'posts_creation_day_score_index'
  end

end

With the index in place, I saw the query complete in 2.2ms - that’s a 70x speedup. Of course 150ms is already pretty fast - but this has real implications if we’re grouping by day/month on a really big table.

I tried using date_trunc on a collection of videos from Peg’s database, with 16 million rows - ordering the Videos’ published_at day and view_count. Without indexing the query took 42 seconds, with indexing it was only - 20ms not much slower than our 30K row table! I didn’t even try the pure Ruby version (spoiler alert: you’re in for a bad time).

Conclusion

date_trunc is a very handy little function, that lets you group by time period much much quicker than in plain old Ruby. Have you used date_trunc? Any tips for getting the most out of it? Let me know in the comments 😊!


A
August 2024 Calendar
0 points
11 months ago

To get me started I ran this on 30K records, it took about 7 seconds (I didn’t profile the memory usage, but I’d expect it to be quite high). So we might get away with using Ruby to order a few hundred records, but this doesn’t scale to thousands or millions of records.

Z
zamith
0 points
9 years ago

This is a nice tip, but the title says grouping and what you're doing is sorting. Do you know of a way get a hash with the month as key and an array of records as value?

J
John
0 points
10 years ago

Great read! I've used date_trunc to show a specific range of records.

For example, let's say I wanted to view the records where the "created_at" is this month...

scope :this_month, -&gt; { where("date_trunc('month',created_at) = date_trunc('month',now())") }

You can also use it for specified date ranges. For example, you want to pull the records between this month and the start of two months ago. You could so something like...

scope :last_two_months, -&gt; { where("date_trunc('month', created_at) BETWEEN date_trunc('month', now()) - INTERVAL '2 MONTH' AND date_trunc('month', now())") }

A
alexpeattie
0 points
10 years ago

Thanks John! And yes - that's a really nice tip 👍👍.

A
August 2024 Calendar
0 points
14 months ago

Informative Article. Thank You alexpeattie

R
Raecoo
0 points
9 years ago

Thanks for your sharing. Use gems groupdate and chartkick is enough to build a beautiful report for most cases.

M
Mario Gintili
0 points
10 years ago

great article :D

T
Tomás Alvarez
0 points
7 years ago

Watch out in grouping or date_trunc'ing database values directly if you have your timezone configuration different than utc in your rails app. If that happens, database values for example for a timezone of GTM-3, for the date yyyy-mm-dd 01:00:00.000 will be stored in the db as yyyy-mm-dd(-1) 22:00:00.000 => yes, the day before, due to offset of the timezone, and if you need day precision, well... you will end up with incorrect grouping. Values that should be on some day, will be on the day before, and values of the next day will end up in the day before

O
October 2023 Calendar
0 points
21 months ago

Be careful, unlike most of the time in Ruby, single/double quotes aren't interchangeable here - in Postgres literal strings like 'day' have to be single-quoted.

J
Jerome
0 points
9 years ago

I attempted the indexing you suggest, however that process generates a string that is much longer than 63 characters index_posts_on_expression_and_DATE_TRUNC('week', previsto) ASC_and_name_and_i_week_index So much so that I cannot devise how to shorten it...

O
October 2024 Calendar
0 points
9 months ago

Great post! I found your explanation of using datetrunc with ActiveRecord really helpful. Grouping by day or week can be tricky, but your examples made it clear. Thanks for sharing these insights!

A
April 2023 Calendar
0 points
2 years ago

Pretty cool huh? Everything after the day in our publication timestamp is discarded, which means posts from the same day are correctly sorted by upvotes.

N
November 2023 Calendar
0 points
21 months ago

Luckily we can use super handy function called date_trunc for just that (it’s documented here). Despite it’s handiness I couldn’t find any real mention of people using it in a Rails context - although it is used inside the groupdate gem which I’d recommend for more complex use-cases, or if you’re not using Postgres.

F
February 2023 Calendar
0 points
2 years ago

With the index in place, I saw the query complete in 2.2ms - that’s a 70x speedup. Of course 150ms is already pretty fast - but this has real implications if we’re grouping by day/month on a really big table.

M
May 2023 Calendar
0 points
2 years ago

We can push performance even further by adding an index. We’re doing something a bit more complex than just indexing the value of column here

B
Betina Jessen
0 points
2 years ago

"Rails tip – Grouping ActiveRecord objects by day or week using datetrunc" offers a valuable insight into leveraging the datetrunc method in Rails to group ActiveRecord objects by specific time intervals. This technique proves particularly useful when working with temporal data and needing to aggregate or analyze records based on daily or weekly intervals. By utilizing datetrunc, developers can easily extract the desired time component from a datetime column and group records accordingly. This allows for efficient data organization, reporting, and visualization, enabling meaningful insights to be derived from the collected data. Whether it's analyzing user activity, sales data, or any time-based metrics, understanding how to group ActiveRecord objects using datetrunc empowers Rails developers to unlock the full potential of their application's temporal data and make informed decisions. This Rails tip serves as a handy resource for enhancing data manipulation and analysis capabilities within Rails applications. Free Printable

G
Gilad Penn
0 points
9 years ago

thank you so much