Skip to content
  • Philippe Pepiot's avatar
    fix(RQLExpression)!: performance issue on RQLExpressions using EXISTS() · 496aca08d600
    Philippe Pepiot authored
    BREAKING CHANGE: explicitly use EXISTS in RQLExpression for permissions
    
    This backout the changeset dfcc3f74b3c8 which introduced wrapping all
    {E,R}RQLExpression where clause with EXISTS().
    
    It appear to have very bad performance on PostgreSQL on queries already using
    EXISTS(), in this case rql was generating a query with double EXISTS(), leading
    to a very bad query plan:
    
    =# create table t as select * from generate_series(1, 1000000) as id;
    SELECT 1000000
    =# create unique index on t(id);
    CREATE INDEX
    
    For the RQLExpression "EXISTS(X identity X)" the generated sql was: Any X WHERE EXISTS(EXISTS(X identity X)), X eid %(eid)s, which is equivalent to:
    
    =# explain analyze select id from t where exists(select 1 where exists(select 1 from t as x where t.id = x.id) and t.id = 42);
                                                                   QUERY PLAN
    -----------------------------------------------------------------------------------------------------------------------------------------
     Seq Scan on t  (cost=0.00..8469425.00 rows=500000 width=4) (actual time=0.277..1426.342 rows=1 loops=1)
       Filter: (SubPlan 2)
       Rows Removed by Filter: 999999
       SubPlan 2
         ->  Result  (cost=8.45..8.46 rows=1 width=0) (actual time=0.001..0.001 rows=0 loops=1000000)
               One-Time Filter: ($1 AND (t.id = 42))
               InitPlan 1 (returns $1)
                 ->  Index Only Scan using t_id_idx on t x  (cost=0.42..8.44 rows=1 width=0) (actual time=0.001..0.001 rows=1 loops=1000000)
                       Index Cond: (id = t.id)
                       Heap Fetches: 1000000
     Planning Time: 0.190 ms
     Execution Time: 1426.384 ms
    
    
    The planner wasn't able to optimise this (bad written) query, it produce a full table read (Seq Scan).
    
    With a single EXISTS, the query perform much better: Any X WHERE EXISTS(X identity X), X eid %(eid)s, which is equivalent to:
    
    =# explain analyze select id from t where exists(select 1 from t as x where t.id = x.id) and t.id = 42;
                                                           QUERY PLAN
    -------------------------------------------------------------------------------------------------------------------------
     Nested Loop Semi Join  (cost=0.85..16.90 rows=1 width=4) (actual time=0.093..0.095 rows=1 loops=1)
       ->  Index Only Scan using t_id_idx on t  (cost=0.42..8.44 rows=1 width=4) (actual time=0.081..0.082 rows=1 loops=1)
             Index Cond: (id = 42)
             Heap Fetches: 1
       ->  Index Only Scan using t_id_idx on t x  (cost=0.42..8.44 rows=1 width=4) (actual time=0.008..0.008 rows=1 loops=1)
             Index Cond: (id = 42)
             Heap Fetches: 1
     Planning Time: 0.383 ms
     Execution Time: 0.136 ms
    
    
    An alternative patch would be to insert the EXISTS only if the original query
    doesn't already contains a EXISTS. But I think it's better to drop the magic
    here and let user control what they really want in their expressions.
    
    I added a note to the 3.27 changelog for this.
    
    Also add some tests to RQLExpressionTC with EXISTS().
    496aca08d600