Rails tip – Grouping ActiveRecord objects by day or week using datetrunc
Here’s a quick Rails question for you. We have a bunch of Posts in a Postgres DB which we want to order by publication day - but Posts published on the same day should be ordered by a different column: upvotes. (This is basically how Product Hunt does it). We’re paginating, so we only care about the first 100 posts. How would we do it?
Maybe you’d start with something like this:
Post.all.group_by { |post|
post.published_at.to_date
}.flat_map { |day, posts|
posts.sort_by(&:score).reverse
}.first(100)
It’s not terrible (it works), it’s the kind of thing you’ll find if you Google “rails group by day”. The problem with the code is a) it’s a bit ugly and b) is it’s not very efficient. Using our Enumerable
methods (group_by
and flat_map
), means we have to load all our Posts into memory, sort them with Ruby, then pull out the 100 we care about. This is slow & inefficient.
To get me started I ran this on 30K records, it took about 7 seconds (I didn’t profile the memory usage, but I’d expect it to be quite high). So we might get away with using Ruby to order a few hundred records, but this doesn’t scale to thousands or millions of records.
Postgres is way faster and doing these kind of sorts (especially with indexes, as we’ll see below) - but simply ordering by published_at
and upvotes
doesn’t give us what we want:
Post.order('published_at DESC, upvotes DESC').limit(5).pluck(:published_at, :upvotes)
#=> (14.4ms) SELECT "posts"."published_at", "posts"."upvotes" FROM "posts" ORDER BY published_at DESC, upvotes DESC LIMIT 5
[[Fri, 15 Jan 2016 15:25:46 UTC +00:00, 12],
[Fri, 15 Jan 2016 13:22:11 UTC +00:00, 158],
[Fri, 15 Jan 2016 12:50:37 UTC +00:00, 88],
[Thu, 14 Jan 2016 21:48:44 UTC +00:00, 24],
[Thu, 14 Jan 2016 14:46:51 UTC +00:00, 282]]
Unless two are added at the same second, the second sort column (upvotes) will never be used. We need to to tell Postgres to sort by the publication day, but to ignore the rest of the timestamp.
Enter datetrunc
Luckily we can use super handy function called date_trunc
for just that (it’s documented here). Despite it’s handiness I couldn’t find any real mention of people using it in a Rails context - although it is used inside the groupdate gem which I’d recommend for more complex use-cases, or if you’re not using Postgres.
datetrunc
is really simple, it truncates a date down to a certain precision. If the precision is 'day'
for example, it’ll “reset” the date to midnight (a bit like Rails’ beginning_of_day method). Let’s modify our example above:
Post.order("date_trunc('day', published_at) DESC, upvotes DESC").limit(5).pluck("date_trunc('day', published_at)", :upvotes)
#=> SELECT date_trunc('day', published_at), "collaborations"."upvotes" FROM "collaborations" ORDER BY date_trunc('day', published_at) DESC, upvotes DESC LIMIT 5
[[2016-01-15 00:00:00 UTC, 158]],
[2016-01-15 00:00:00 UTC, 88],
[2016-01-15 00:00:00 UTC, 12]],
[2016-01-14 00:00:00 UTC, 282]],
[2016-01-14 00:00:00 UTC, 24]]
Pretty cool huh? Everything after the day in our publication timestamp is discarded, which means posts from the same day are correctly sorted by upvotes 🎊!
We can truncate down to any arbitrary precision, from 'week'
to 'millennium'
(useful for that database of ancient alien races you might be making).
Post.order("date_trunc('week', published_at)")
Post.order("date_trunc('millennium', published_at)")
Be careful, unlike most of the time in Ruby, single/double quotes aren't interchangeable here - in Postgres literal strings like 'day'
have to be single-quoted.
Using indexes for crazy performance 🔥
Using date_trunc
is faster than our plain-Ruby approach from the get-go - running our final query (below) on the same 30K rows takes just 155ms seconds!
Post.order("date_trunc('day', published_at) DESC, upvotes DESC").limit(100)
#=> Post Load (155.2ms) SELECT "posts".* FROM "posts" ORDER BY date_trunc('day', published_at) DESC, upvotes DESC LIMIT 100
We can push performance even further by adding an index. We’re doing something a bit more complex than just indexing the value of column here, because we’re running it through a function first. We’ll need expression (or function-based) index. These aren’t natively supported in vanilla Rails yet (but there’s an open Pull Request to add them), so we’ll use the schema_plus_pg_indexes
gem.
With the gem added to our Gemfile, we’ll generate a migration
bin/rails g migration add_creation_day_score_index_to_posts
Then we’ll add our expression index:
class AddCreationDayScoreIndexToPosts < ActiveRecord::Migration
def up
add_index :posts, expression: "date_trunc('day', published_at) DESC, score DESC", name: 'posts_creation_day_score_index'
end
def down
remove_index :posts, name: 'posts_creation_day_score_index'
end
end
With the index in place, I saw the query complete in 2.2ms - that’s a 70x speedup. Of course 150ms is already pretty fast - but this has real implications if we’re grouping by day/month on a really big table.
I tried using date_trunc
on a collection of videos from Peg’s database, with 16 million rows - ordering the Videos’ published_at
day and view_count
. Without indexing the query took 42 seconds, with indexing it was only - 20ms not much slower than our 30K row table! I didn’t even try the pure Ruby version (spoiler alert: you’re in for a bad time).
Conclusion
date_trunc
is a very handy little function, that lets you group by time period much much quicker than in plain old Ruby. Have you used date_trunc
? Any tips for getting the most out of it? Let me know in the comments 😊!
To get me started I ran this on 30K records, it took about 7 seconds (I didn’t profile the memory usage, but I’d expect it to be quite high). So we might get away with using Ruby to order a few hundred records, but this doesn’t scale to thousands or millions of records.
This is a nice tip, but the title says grouping and what you're doing is sorting. Do you know of a way get a hash with the month as key and an array of records as value?
Great read! I've used date_trunc to show a specific range of records.
For example, let's say I wanted to view the records where the "created_at" is this month...
scope :this_month, -> { where("date_trunc('month',created_at) = date_trunc('month',now())") }
You can also use it for specified date ranges. For example, you want to pull the records between this month and the start of two months ago. You could so something like...
scope :last_two_months, -> { where("date_trunc('month', created_at) BETWEEN date_trunc('month', now()) - INTERVAL '2 MONTH' AND date_trunc('month', now())") }
Thanks John! And yes - that's a really nice tip 👍👍.
Informative Article. Thank You alexpeattie
Thanks for your sharing. Use gems groupdate and chartkick is enough to build a beautiful report for most cases.
great article :D
Watch out in grouping or date_trunc'ing database values directly if you have your timezone configuration different than utc in your rails app. If that happens, database values for example for a timezone of GTM-3, for the date yyyy-mm-dd 01:00:00.000 will be stored in the db as yyyy-mm-dd(-1) 22:00:00.000 => yes, the day before, due to offset of the timezone, and if you need day precision, well... you will end up with incorrect grouping. Values that should be on some day, will be on the day before, and values of the next day will end up in the day before
Be careful, unlike most of the time in Ruby, single/double quotes aren't interchangeable here - in Postgres literal strings like 'day' have to be single-quoted.
I attempted the indexing you suggest, however that process generates a string that is much longer than 63 characters
index_posts_on_expression_and_DATE_TRUNC('week', previsto) ASC_and_name_and_i_week_index
So much so that I cannot devise how to shorten it...Great post! I found your explanation of using datetrunc with ActiveRecord really helpful. Grouping by day or week can be tricky, but your examples made it clear. Thanks for sharing these insights!
Pretty cool huh? Everything after the day in our publication timestamp is discarded, which means posts from the same day are correctly sorted by upvotes.
Luckily we can use super handy function called date_trunc for just that (it’s documented here). Despite it’s handiness I couldn’t find any real mention of people using it in a Rails context - although it is used inside the groupdate gem which I’d recommend for more complex use-cases, or if you’re not using Postgres.
With the index in place, I saw the query complete in 2.2ms - that’s a 70x speedup. Of course 150ms is already pretty fast - but this has real implications if we’re grouping by day/month on a really big table.
We can push performance even further by adding an index. We’re doing something a bit more complex than just indexing the value of column here
"Rails tip – Grouping ActiveRecord objects by day or week using datetrunc" offers a valuable insight into leveraging the datetrunc method in Rails to group ActiveRecord objects by specific time intervals. This technique proves particularly useful when working with temporal data and needing to aggregate or analyze records based on daily or weekly intervals. By utilizing datetrunc, developers can easily extract the desired time component from a datetime column and group records accordingly. This allows for efficient data organization, reporting, and visualization, enabling meaningful insights to be derived from the collected data. Whether it's analyzing user activity, sales data, or any time-based metrics, understanding how to group ActiveRecord objects using datetrunc empowers Rails developers to unlock the full potential of their application's temporal data and make informed decisions. This Rails tip serves as a handy resource for enhancing data manipulation and analysis capabilities within Rails applications. Free Printable
thank you so much