Commit 5f9ed9d6 authored by Philippe Pepiot's avatar Philippe Pepiot
Browse files

fix(RQLExpression)!: performance issue on RQLExpressions using EXISTS()

BREAKING CHANGE: explicitly use EXISTS in RQLExpression for permissions

This backout the changeset dfcc3f74b3c8 which introduced wrapping all
{E,R}RQLExpression where clause with EXISTS().

It appear to have very bad performance on PostgreSQL on queries already using
EXISTS(), in this case rql was generating a query with double EXISTS(), leading
to a very bad query plan:

=# create table t as select * from generate_series(1, 1000000) as id;
SELECT 1000000
=# create unique index on t(id);
CREATE INDEX

For the RQLExpression "EXISTS(X identity X)" the generated sql was: Any X WHERE EXISTS(EXISTS(X identity X)), X eid %(eid)s, which is equivalent to:

=# explain analyze select id from t where exists(select 1 where exists(select 1 from t as x where t.id = x.id) and t.id = 42);
                                                               QUERY PLAN
-----------------------------------------------------------------------------------------------------------------------------------------
 Seq Scan on t  (cost=0.00..8469425.00 rows=500000 width=4) (actual time=0.277..1426.342 rows=1 loops=1)
   Filter: (SubPlan 2)
   Rows Removed by Filter: 999999
   SubPlan 2
     ->  Result  (cost=8.45..8.46 rows=1 width=0) (actual time=0.001..0.001 rows=0 loops=1000000)
           One-Time Filter: ($1 AND (t.id = 42))
           InitPlan 1 (returns $1)
             ->  Index Only Scan using t_id_idx on t x  (cost=0.42..8.44 rows=1 width=0) (actual time=0.001..0.001 rows=1 loops=1000000)
                   Index Cond: (id = t.id)
                   Heap Fetches: 1000000
 Planning Time: 0.190 ms
 Execution Time: 1426.384 ms


The planner wasn't able to optimise this (bad written) query, it produce a full table read (Seq Scan).

With a single EXISTS, the query perform much better: Any X WHERE EXISTS(X identity X), X eid %(eid)s, which is equivalent to:

=# explain analyze select id from t where exists(select 1 from t as x where t.id = x.id) and t.id = 42;
                                                       QUERY PLAN
-------------------------------------------------------------------------------------------------------------------------
 Nested Loop Semi Join  (cost=0.85..16.90 rows=1 width=4) (actual time=0.093..0.095 rows=1 loops=1)
   ->  Index Only Scan using t_id_idx on t  (cost=0.42..8.44 rows=1 width=4) (actual time=0.081..0.082 rows=1 loops=1)
         Index Cond: (id = 42)
         Heap Fetches: 1
   ->  Index Only Scan using t_id_idx on t x  (cost=0.42..8.44 rows=1 width=4) (actual time=0.008..0.008 rows=1 loops=1)
         Index Cond: (id = 42)
         Heap Fetches: 1
 Planning Time: 0.383 ms
 Execution Time: 0.136 ms


An alternative patch would be to insert the EXISTS only if the original query
doesn't already contains a EXISTS. But I think it's better to drop the magic
here and let user control what they really want in their expressions.

I added a note to the 3.27 changelog for this.

Also add some tests to RQLExpressionTC with EXISTS().
parent d0d3bf0a1efd
Pipeline #53334 passed with stage
in 4 minutes and 46 seconds
sync_schema_props_perms('EmailAddress')
......@@ -317,7 +317,6 @@ class RQLExpression(object):
keyarg = None
rqlst.recover()
return rql, found, keyarg
rqlst.where = nodes.Exists(rqlst.where)
return rqlst.as_string(), None, None
def _check(self, _cw, **kwargs):
......
......@@ -52,10 +52,10 @@ class EmailAddress(EntityType):
__permissions__ = {
# application that wishes public email, or use it for something else
# than users (eg Company, Person), should explicitly change permissions
'read': ('managers', ERQLExpression('U use_email X')),
'read': ('managers', ERQLExpression('EXISTS(U use_email X)')),
'add': ('managers', 'users',),
'delete': ('managers', 'owners', ERQLExpression('P use_email X, U has_update_permission P')),
'update': ('managers', 'owners', ERQLExpression('P use_email X, U has_update_permission P')),
'delete': ('managers', 'owners', ERQLExpression('EXISTS(P use_email X, U has_update_permission P)')),
'update': ('managers', 'owners', ERQLExpression('EXISTS(P use_email X, U has_update_permission P)')),
}
alias = String(fulltextindexed=True, maxsize=56)
......
......@@ -465,9 +465,9 @@ class RQLExpressionTC(TestCase):
self.assertEqual(keyarg, None)
def test_expression(self):
expr = ERQLExpression('U use_email X')
expr = ERQLExpression('EXISTS(U use_email X)')
rql, found, keyarg = expr.transform_has_permission()
self.assertEqual(rql, 'Any X WHERE EXISTS(U use_email X, X eid %(x)s, U eid %(u)s)')
self.assertEqual(rql, 'Any X WHERE EXISTS(U use_email X), X eid %(x)s, U eid %(u)s')
self.assertEqual(found, None)
self.assertEqual(keyarg, None)
......
Markdown is supported
0% or .
You are about to add 0 people to the discussion. Proceed with caution.
Finish editing this message first!
Please register or to comment