Comments (11)
@eschutho BTW, I'll open the SIP as soon as possible in the week.
from superset.
In the above example, I am trying to fetch the top 10 years from the years 1989 - 2004 where the name Alex was the most popular and compare that count to the period 10 years prior.
@eschutho I think you misunderstand the "TOK N" analytics meaning, and what is "order by" and "limit" clause in SQL meaning.
The first screenshot from your post in the explore page will ask sort by the count, and then only retrieve the top 10 rows. If you really want to fetch the top 10 years, you should set ds
as value of SORT BY control.
from superset.
Hi, thanks @zhaoyongjie. Apologies, but the last query that I showed as the correct example I should have sorted by count instead of by date. That may be where the confusion came from. I updated it in my example. But essentially, I'm looking to sort by the count, so in this case in 1995 there were 12 instances, and that row should be first. I want the top ten years where the name had the highest count. This is the corrected version:
from superset.
so in this case in 1995 there were 12 instances, and that row should be first. I want the top ten years where the name had the highest count. This is the corrected version:
it should be another topic that how to sort the result set after fetched the data from database, there is a sort
operator in the post processing for it.
Your SQL have some issues:
- The
order by
andlimit
clause should combine together always, it means getting "top N" records from the table. The subquery from your post didn't provide limit clause in subquery. - The SQL and the user interface(Explore page) are inconsistent, "top ten years" was selected in the UI, then the SQL clause should be
order by ds limit 10
not only in subquery, but also in the outer query. - Significant extra computing, 10 records dataset join <---> unpredictable records.
As the mention before, if you really need a secondary sort by, the sort operator might help.
from superset.
- ... The subquery from your post didn't provide limit clause in subquery.
- Significant extra computing, 10 records dataset join <---> unpredictable records.
Including that in the subquery would truncate the set used to JOIN correct? Shouldn't we leverage the DBMSs optimizations when executing the query as a whole instead? Otherwise we end up having incomplete result sets if for example the LIMIT is too low, as as stated before data inconsistency is something we should never aim for.
To tackle this bug I would say that in general we have two limitations depending on which database are we working on:
- Not all databases support CTE
- Not all databases support JOINS
For the first one we would have to write something that check whether or not CTE are supported, if it does, make use of it, if not, write the query manually.
For the second, if the database supports joins we should ensure data consistency at pre-query execution with the outer joins, if not supported we should fallback to DataFrame joins at post processing time, but again, just as a fallback because data inconsistency might arise.
from superset.
@Antonio-RiveroMartnez
Upon rereading this issue, I believe the confusion lies in understanding the 'limit' control and the 'sort by' control in the UI, as well as how to apply them in SQL.
- The value of the 'limit' control is 10, and the 'sort by' control is 'DS'. This indicates sorting by the 'DS' column and then returning the top 10 records.
- The value of the 'limit' control is 10, and the 'sort by' control is 'count'. This indicates sorting by the 'count' column (metric) and then returning the top 10 records.
The logic is included in all visualizations. Currently, there isn't a control to specify how to sort a column as a secondary sort by. In other words, Superset doesn't have a way to define sorting by DS and returning the top 10, then sorting by count and returning the top 10. I also posted the result in the comment. The result remains the same whether you use Pandas join or SQL join.
from superset.
- Not all databases support CTE
- Not all databases support JOINS
It's not only compatibility about CTE and JOIN, but also how to how to define time delta in different DB.
from superset.
Related Issues (20)
- deck.gl Multiple Layers chart not rendering
- Failure to connecto to Trino HOT 4
- HTML attributes being eliminated - Customizing Colors in Markdown Widgets not working HOT 1
- The subtotal in the pivot table doesn't correct or the row item doesn't happen due to the pagination feature
- `generator-superset` plugin uses deprecated and out of sync packages with the rest of Superset
- Export to pivoted .csv: internal server error 500 (Apache Superset version 3.0.1)
- Sorting by last modified date doesn't always return the correct sorting order HOT 2
- Dashboards is not created and return to welcome page HOT 1
- Using Filter per value of Column of type BIGINT breaks UI when the value is long(16 digits)
- Apache Superset 3.0.2 Scheduled reports didn't run as expected as per the Report Schedule HOT 1
- Unable to create database connection, error: "GENERIC_DB_ENGINE_ERROR"
- API call (csstemplateasyncmodelview/api/read) fails with 401 in "Embedded Dashboard/IFrame" for token obtain by Admin User or any other user
- Disabled SSH_TUNNELING creating periodically many "idle in transaction" PGSQL connections
- Export to PDF leaves charts empty on first export
- local Superset with Helm HOT 2
- Export to Excel is not working for deck.gl charts
- [SIP-121] Proposal for Metrics Comparison base on the Temporal Dimension or/and Textual Dimension HOT 7
- Time series bar chart doesn't show all time grain HOT 3
- parse error in select in cases of postgresSQL dialect HOT 7
Recommend Projects
-
React
A declarative, efficient, and flexible JavaScript library for building user interfaces.
-
Vue.js
🖖 Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.
-
Typescript
TypeScript is a superset of JavaScript that compiles to clean JavaScript output.
-
TensorFlow
An Open Source Machine Learning Framework for Everyone
-
Django
The Web framework for perfectionists with deadlines.
-
Laravel
A PHP framework for web artisans
-
D3
Bring data to life with SVG, Canvas and HTML. 📊📈🎉
-
Recommend Topics
-
javascript
JavaScript (JS) is a lightweight interpreted programming language with first-class functions.
-
web
Some thing interesting about web. New door for the world.
-
server
A server is a program made to process requests and deliver data to clients.
-
Machine learning
Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.
-
Visualization
Some thing interesting about visualization, use data art
-
Game
Some thing interesting about game, make everyone happy.
Recommend Org
-
Facebook
We are working to build community through open source technology. NB: members must have two-factor auth.
-
Microsoft
Open source projects and samples from Microsoft.
-
Google
Google ❤️ Open Source for everyone.
-
Alibaba
Alibaba Open Source for everyone
-
D3
Data-Driven Documents codes.
-
Tencent
China tencent open source team.
from superset.