Git Product home page Git Product logo

Comments (11)

zhaoyongjie avatar zhaoyongjie commented on June 9, 2024 1

@eschutho BTW, I'll open the SIP as soon as possible in the week.

from superset.

zhaoyongjie avatar zhaoyongjie commented on June 9, 2024

In the above example, I am trying to fetch the top 10 years from the years 1989 - 2004 where the name Alex was the most popular and compare that count to the period 10 years prior.

@eschutho I think you misunderstand the "TOK N" analytics meaning, and what is "order by" and "limit" clause in SQL meaning.

The first screenshot from your post in the explore page will ask sort by the count, and then only retrieve the top 10 rows. If you really want to fetch the top 10 years, you should set ds as value of SORT BY control.

image image

from superset.

eschutho avatar eschutho commented on June 9, 2024

Hi, thanks @zhaoyongjie. Apologies, but the last query that I showed as the correct example I should have sorted by count instead of by date. That may be where the confusion came from. I updated it in my example. But essentially, I'm looking to sort by the count, so in this case in 1995 there were 12 instances, and that row should be first. I want the top ten years where the name had the highest count. This is the corrected version:
Screenshot 2024-03-15 at 1 37 45 PM

from superset.

zhaoyongjie avatar zhaoyongjie commented on June 9, 2024

so in this case in 1995 there were 12 instances, and that row should be first. I want the top ten years where the name had the highest count. This is the corrected version:

it should be another topic that how to sort the result set after fetched the data from database, there is a sort operator in the post processing for it.

Your SQL have some issues:

  1. The order by and limit clause should combine together always, it means getting "top N" records from the table. The subquery from your post didn't provide limit clause in subquery.
  2. The SQL and the user interface(Explore page) are inconsistent, "top ten years" was selected in the UI, then the SQL clause should be order by ds limit 10 not only in subquery, but also in the outer query.
  3. Significant extra computing, 10 records dataset join <---> unpredictable records.

As the mention before, if you really need a secondary sort by, the sort operator might help.

from superset.

Antonio-RiveroMartnez avatar Antonio-RiveroMartnez commented on June 9, 2024
  1. ... The subquery from your post didn't provide limit clause in subquery.
  2. Significant extra computing, 10 records dataset join <---> unpredictable records.

Including that in the subquery would truncate the set used to JOIN correct? Shouldn't we leverage the DBMSs optimizations when executing the query as a whole instead? Otherwise we end up having incomplete result sets if for example the LIMIT is too low, as as stated before data inconsistency is something we should never aim for.

To tackle this bug I would say that in general we have two limitations depending on which database are we working on:

  1. Not all databases support CTE
  2. Not all databases support JOINS

For the first one we would have to write something that check whether or not CTE are supported, if it does, make use of it, if not, write the query manually.

For the second, if the database supports joins we should ensure data consistency at pre-query execution with the outer joins, if not supported we should fallback to DataFrame joins at post processing time, but again, just as a fallback because data inconsistency might arise.

from superset.

zhaoyongjie avatar zhaoyongjie commented on June 9, 2024

@Antonio-RiveroMartnez
Upon rereading this issue, I believe the confusion lies in understanding the 'limit' control and the 'sort by' control in the UI, as well as how to apply them in SQL.

  1. The value of the 'limit' control is 10, and the 'sort by' control is 'DS'. This indicates sorting by the 'DS' column and then returning the top 10 records.
  2. The value of the 'limit' control is 10, and the 'sort by' control is 'count'. This indicates sorting by the 'count' column (metric) and then returning the top 10 records.

The logic is included in all visualizations. Currently, there isn't a control to specify how to sort a column as a secondary sort by. In other words, Superset doesn't have a way to define sorting by DS and returning the top 10, then sorting by count and returning the top 10. I also posted the result in the comment. The result remains the same whether you use Pandas join or SQL join.

from superset.

zhaoyongjie avatar zhaoyongjie commented on June 9, 2024
  1. Not all databases support CTE
  2. Not all databases support JOINS

It's not only compatibility about CTE and JOIN, but also how to how to define time delta in different DB.

from superset.

Related Issues (20)

Recommend Projects

  • React photo React

    A declarative, efficient, and flexible JavaScript library for building user interfaces.

  • Vue.js photo Vue.js

    🖖 Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.

  • Typescript photo Typescript

    TypeScript is a superset of JavaScript that compiles to clean JavaScript output.

  • TensorFlow photo TensorFlow

    An Open Source Machine Learning Framework for Everyone

  • Django photo Django

    The Web framework for perfectionists with deadlines.

  • D3 photo D3

    Bring data to life with SVG, Canvas and HTML. 📊📈🎉

Recommend Topics

  • javascript

    JavaScript (JS) is a lightweight interpreted programming language with first-class functions.

  • web

    Some thing interesting about web. New door for the world.

  • server

    A server is a program made to process requests and deliver data to clients.

  • Machine learning

    Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.

  • Game

    Some thing interesting about game, make everyone happy.

Recommend Org

  • Facebook photo Facebook

    We are working to build community through open source technology. NB: members must have two-factor auth.

  • Microsoft photo Microsoft

    Open source projects and samples from Microsoft.

  • Google photo Google

    Google ❤️ Open Source for everyone.

  • D3 photo D3

    Data-Driven Documents codes.