在说 Rails 的 includes 和 joins 之前,先来回顾一下 SQL 中的 inner join 和 left join 分别是什么,稍后会用到。

inner join: 只返回两个表中联结字段相等的行
left join: 返回包括左表中的所有记录和右表中联结字段相等的记录

举一个小例子:

表A记录如下:
aID     aNum
1       a1
2       a2
3       a3
4       a4
5       a5

表B记录如下:
bID     bName
1       b1
2       b2
3       b3
4       b4
8       b8
--------------------------------------------
1. inner join
sql语句如下: 
select * from A inner join B on A.aID = B.bID

结果如下:
aID     aNum     bID     bName
1       a1        1       b1
2       a2        2       b2
3       a3        3       b3
4       a4        4       b4

结果说明:
很明显,这里只显示出了 A.aID = B.bID 的记录,这说明 inner join 只显示符合条件的记录。
--------------------------------------------
2. left join
sql语句如下: 
select * from A left join B on A.aID = B.bID

结果如下:
aID     aNum     bID     bName
1       a1        1       b1
2       a2        2       b2
3       a3        3       b3
4       a4        4       b4
5       a5        NULL     NULL

结果说明:
left join 是以 A 表的记录为基础的, A 可以看成左表, B 可以看成右表, left join 是以左表为准的.
换句话说,左表(A)的记录将会全部表示出来, 而右表(B)只会显示符合搜索条件的记录(例子中为: A.aID = B.bID).
B表记录不足的地方均为NULL.
--------------------------------------------

注:left join 是 left outer join 的缩写

下面从一个小例子来理解 includes 和 joins 的区别

以下是 Group 和 Post 的表内容( Group has_many posts ),为了方便阅读我调整了格式:

2.3.0 :100 > Group.all
  Group Load (4.9ms)  SELECT "groups".* FROM "groups"
 => #<ActiveRecord::Relation [
 #<Group id: 1, title: "111111", description: "this is group 1", created_at: "2016-12-21 12:35:25", updated_at: "2016-12-21 12:35:25">, 
 #<Group id: 2, title: "222222", description: "this is group 2", created_at: "2016-12-21 12:35:34", updated_at: "2016-12-21 12:35:34">, 
 #<Group id: 3, title: "333333", description: "this is group 3", created_at: "2016-12-21 12:35:43", updated_at: "2016-12-21 12:35:43">
 ]> 
2.3.0 :101 > Post.all
  Post Load (2.7ms)  SELECT "posts".* FROM "posts"
 => #<ActiveRecord::Relation [
 #<Post id: 1, content: "group_1_111111", group_id: 1, created_at: "2016-12-21 12:36:45", updated_at: "2016-12-21 12:36:45">, 
 #<Post id: 2, content: "group_1_222222", group_id: 1, created_at: "2016-12-21 12:36:51", updated_at: "2016-12-21 12:36:51">, 
 #<Post id: 3, content: "group_1_333333", group_id: 1, created_at: "2016-12-21 12:36:55", updated_at: "2016-12-21 12:36:55">, 
 #<Post id: 4, content: "group_2_111111", group_id: 2, created_at: "2016-12-21 12:37:05", updated_at: "2016-12-21 12:37:05">, 
 #<Post id: 5, content: "group_2_222222", group_id: 2, created_at: "2016-12-21 12:37:09", updated_at: "2016-12-21 12:37:09">, 
 #<Post id: 6, content: "group_1_444444", group_id: 1, created_at: "2016-12-21 12:37:17", updated_at: "2016-12-21 12:37:17">
 ]> 

先来看看直接用 joins 和 includes 的结果有什么不同:

2.3.0 :111 >   Group.joins(:posts)
  Group Load (2.2ms)  SELECT "groups".* FROM "groups" INNER JOIN "posts" ON "posts"."group_id" = "groups"."id"
 => #<ActiveRecord::Relation [
 #<Group id: 1, title: "111111", description: "this is group 1", created_at: "2016-12-21 12:35:25", updated_at: "2016-12-21 12:35:25">, 
 #<Group id: 1, title: "111111", description: "this is group 1", created_at: "2016-12-21 12:35:25", updated_at: "2016-12-21 12:35:25">, 
 #<Group id: 1, title: "111111", description: "this is group 1", created_at: "2016-12-21 12:35:25", updated_at: "2016-12-21 12:35:25">, 
 #<Group id: 2, title: "222222", description: "this is group 2", created_at: "2016-12-21 12:35:34", updated_at: "2016-12-21 12:35:34">, 
 #<Group id: 2, title: "222222", description: "this is group 2", created_at: "2016-12-21 12:35:34", updated_at: "2016-12-21 12:35:34">, 
 #<Group id: 1, title: "111111", description: "this is group 1", created_at: "2016-12-21 12:35:25", updated_at: "2016-12-21 12:35:25">]> 
2.3.0 :112 > 
2.3.0 :113 >   Group.includes(:posts)
  Group Load (0.9ms)  SELECT "groups".* FROM "groups"
  Post Load (0.5ms)  SELECT "posts".* FROM "posts" WHERE "posts"."group_id" IN (1, 2, 3)
 => #<ActiveRecord::Relation [
 #<Group id: 1, title: "111111", description: "this is group 1", created_at: "2016-12-21 12:35:25", updated_at: "2016-12-21 12:35:25">, 
 #<Group id: 2, title: "222222", description: "this is group 2", created_at: "2016-12-21 12:35:34", updated_at: "2016-12-21 12:35:34">, 
 #<Group id: 3, title: "333333", description: "this is group 3", created_at: "2016-12-21 12:35:43", updated_at: "2016-12-21 12:35:43">]> 
2.3.0 :114 > 

joins 只请求了一次数据库查询,只取回 groups 表的数据,它会有重复记录,因为是通过 inner join 的方式。 includes 请求了两次数据库查询,取回了 groups 的所有数据和 posts 中与 groups 关联的所有数据。其实这就是 left join 的方式,在后面结合 where 子句使用的情况下会更清晰的看到。

来看下面这段代码的输出:

ro = Group.all
ro.each do |r|
  r.posts.each do |p|
    puts p.content
  end
end
--------------------------------------------
  Group Load (0.8ms)  SELECT "groups".* FROM "groups"
  Post Load (0.7ms)  SELECT "posts".* FROM "posts" WHERE "posts"."group_id" = $1  [["group_id", 1]]
group_1_111111
group_1_222222
group_1_333333
group_1_444444
  Post Load (0.4ms)  SELECT "posts".* FROM "posts" WHERE "posts"."group_id" = $1  [["group_id", 2]]
group_2_111111
group_2_222222
  Post Load (0.4ms)  SELECT "posts".* FROM "posts" WHERE "posts"."group_id" = $1  [["group_id", 3]]

这个就是很著名的 “N+1 query” 问题,当我用 ro 拿到所有的 group 数据后,每一个 r.posts 都会触发一次数据库查询请求,例子中有 3 条 group 记录就产生 3 条请求,如果数据库足够大,那么这个巨大的时间成本是不可承受的。

解决方法就是利用 includes,因为 rails 提供的 includes 方法会 eager loading 所有的数据:

ri = Group.includes(:posts).all
ri.each do |r|
  r.posts.each do |p|
    puts p.content
  end
end
--------------------------------------------
  Group Load (0.4ms)  SELECT "groups".* FROM "groups"
  Post Load (0.8ms)  SELECT "posts".* FROM "posts" WHERE "posts"."group_id" IN (1, 2, 3)
group_1_111111
group_1_222222
group_1_333333
group_1_444444
group_2_111111
group_2_222222

正如之前分析的,includes 的两次查询同时取回了 groups 和 posts 中的所有数据放入内存,所以在之后的 r.posts 中程序直接就从内存中获取了 post 的 content,而不用再针对每条记录做一次数据库的查询了,很强大吧。

那么 joins 在这个情况的表现如何呢?

rj = Group.joins(:posts).all
rj.each do |r|
  r.posts.each do |p|
    puts p.content
  end
end
--------------------------------------------
  Group Load (0.9ms)  SELECT "groups".* FROM "groups" INNER JOIN "posts" ON "posts"."group_id" = "groups"."id"
  Post Load (0.4ms)  SELECT "posts".* FROM "posts" WHERE "posts"."group_id" = $1  [["group_id", 1]]
group_1_111111
group_1_222222
group_1_333333
group_1_444444
  Post Load (0.4ms)  SELECT "posts".* FROM "posts" WHERE "posts"."group_id" = $1  [["group_id", 1]]
group_1_111111
group_1_222222
group_1_333333
group_1_444444
  Post Load (0.9ms)  SELECT "posts".* FROM "posts" WHERE "posts"."group_id" = $1  [["group_id", 1]]
group_1_111111
group_1_222222
group_1_333333
group_1_444444
  Post Load (0.3ms)  SELECT "posts".* FROM "posts" WHERE "posts"."group_id" = $1  [["group_id", 2]]
group_2_111111
group_2_222222
  Post Load (0.3ms)  SELECT "posts".* FROM "posts" WHERE "posts"."group_id" = $1  [["group_id", 2]]
group_2_111111
group_2_222222
  Post Load (0.3ms)  SELECT "posts".* FROM "posts" WHERE "posts"."group_id" = $1  [["group_id", 1]]
group_1_111111
group_1_222222
group_1_333333
group_1_444444

因为只取回 groups 表的数据,并且通过 inner join 的方式会有重复记录,所以 joins 完全不符合这个场景的需求。

那什么时候会用到 joins 呢?这就到了 where 出场的时候了。当我要获取 group 的数据,而筛选条件是与其关联的 post 的时候,我们来看具体代码:

2.3.0 :120 >   Group.joins(:posts).where(posts: {content: 'group_2_111111'})
  Group Load (3.7ms)  SELECT "groups".* FROM "groups" INNER JOIN "posts" ON "posts"."group_id" = "groups"."id" WHERE "posts"."content" = $1  [["content", "group_2_111111"]]
 => #<ActiveRecord::Relation [#<Group id: 2, title: "222222", description: "this is group 2", created_at: "2016-12-21 12:35:34", updated_at: "2016-12-21 12:35:34">]> 

显然,joins 的查询方式很好的满足了需求,并且只返回 group 的数据。那如果在 includes 上用 where 子句会怎么样呢?

2.3.0 :121 >   Group.includes(:posts).where(posts: {content: 'group_2_111111'})
  SQL (1.6ms)  SELECT "groups"."id" AS t0_r0, "groups"."title" AS t0_r1, "groups"."description" AS t0_r2, "groups"."created_at" AS t0_r3, "groups"."updated_at" AS t0_r4, "posts"."id" AS t1_r0, "posts"."content" AS t1_r1, "posts"."group_id" AS t1_r2, "posts"."created_at" AS t1_r3, "posts"."updated_at" AS t1_r4 FROM "groups" LEFT OUTER JOIN "posts" ON "posts"."group_id" = "groups"."id" WHERE "posts"."content" = $1  [["content", "group_2_111111"]]
 => #<ActiveRecord::Relation [#<Group id: 2, title: "222222", description: "this is group 2", created_at: "2016-12-21 12:35:34", updated_at: "2016-12-21 12:35:34">]> 

发现数据库的查询请求不再是原来的两条,而合并成了一条同时返回 group 和 post 的数据,并且是通过 left outer join 的方式连接表。结果虽然相同,但是 includes 消耗了多余的内存,并且查询逻辑也不完全匹配。关于 includes 结合 where 的使用,建议还是仔细阅读 rails guides 的官方文档。

最后做一个总结:
joins 的作用是连接两张表,一般是结合条件查询使用,当需要获取 A 表中的数据而查询条件是与 A 关联的 B 表时,使用 joins。includes 则是当需要使用 A、B 两张关联数据表的时候,用 includes 方法将数据一次 eager loading 进内存,以此减少之后大量的数据库读取操作。

  • 最后的最后,提一个小细节,因为关联关系是 group has_many posts, 所以当 includes 和 joins 的顺序反过来的时候 groups 表要用单数。
2.3.0 :126 >   Post.joins(:groups)
ActiveRecord::ConfigurationError: Can't join 'Post' to association named 'groups'; perhaps you misspelled it?
 .
 .
 .
2.3.0 :127 >   Post.joins(:group)
  Post Load (1.3ms)  SELECT "posts".* FROM "posts" INNER JOIN "groups" ON "groups"."id" = "posts"."group_id"
 => #<ActiveRecord::Relation [
 #<Post id: 1, content: "group_1_111111", group_id: 1, created_at: "2016-12-21 12:36:45", updated_at: "2016-12-21 12:36:45">, 
 #<Post id: 2, content: "group_1_222222", group_id: 1, created_at: "2016-12-21 12:36:51", updated_at: "2016-12-21 12:36:51">, 
 #<Post id: 3, content: "group_1_333333", group_id: 1, created_at: "2016-12-21 12:36:55", updated_at: "2016-12-21 12:36:55">, 
 #<Post id: 4, content: "group_2_111111", group_id: 2, created_at: "2016-12-21 12:37:05", updated_at: "2016-12-21 12:37:05">, 
 #<Post id: 5, content: "group_2_222222", group_id: 2, created_at: "2016-12-21 12:37:09", updated_at: "2016-12-21 12:37:09">, 
 #<Post id: 6, content: "group_1_444444", group_id: 1, created_at: "2016-12-21 12:37:17", updated_at: "2016-12-21 12:37:17">]> 
2.3.0 :128 > Post.includes(:group)
  Post Load (5.5ms)  SELECT "posts".* FROM "posts"
  Group Load (1.0ms)  SELECT "groups".* FROM "groups" WHERE "groups"."id" IN (1, 2)
 => #<ActiveRecord::Relation [
 #<Post id: 1, content: "group_1_111111", group_id: 1, created_at: "2016-12-21 12:36:45", updated_at: "2016-12-21 12:36:45">, 
 #<Post id: 2, content: "group_1_222222", group_id: 1, created_at: "2016-12-21 12:36:51", updated_at: "2016-12-21 12:36:51">, 
 #<Post id: 3, content: "group_1_333333", group_id: 1, created_at: "2016-12-21 12:36:55", updated_at: "2016-12-21 12:36:55">, 
 #<Post id: 4, content: "group_2_111111", group_id: 2, created_at: "2016-12-21 12:37:05", updated_at: "2016-12-21 12:37:05">, 
 #<Post id: 5, content: "group_2_222222", group_id: 2, created_at: "2016-12-21 12:37:09", updated_at: "2016-12-21 12:37:09">, 
 #<Post id: 6, content: "group_1_444444", group_id: 1, created_at: "2016-12-21 12:37:17", updated_at: "2016-12-21 12:37:17">]> 

参考资料: