Comments (20)
Hi! Can you provide a screenshot of the desired table and what you get with pivot { col1 then col2 } .groupBy { row1 and row2 }
? What i see on the link:
Not sure if this is the desired table
from dataframe.
Hi,
With pivot { "Party" } .groupBy { "Province" and "Gender" }.sum("Age")
I want to get:
Using dataframe I get Province
and Gender
pivoted independently :
Alberta, Female, ...
.....
Alberta, Male, ...
from dataframe.
@koperagen Maybe we need to have the function then
in ColumnsSelectionDsl
like in PivotDsl
. This way, my expression become: pivot { "Party" } .groupBy { "Province" then "Gender" }.sum("Age")
from dataframe.
Thank you for the use case, it's very interesting! Maybe df.pivot { Gender then Party }.groupBy { Province }.sum { Age }
does the thing? It yields a dataframe with this schema:
from dataframe.
If I have to group only two columns, df.pivot { Gender then Party }.groupBy { Province }.sum { Age }
does the thing.
But actualy I'm using dataframe to add the Pivot Table feature to the framework Galite. It should allow grouping data by rows and by columns. How could I make the table below with dataframe?
I have gender and smoker grouped horizontally / day and time are grouped vertically.
from dataframe.
Now that you mention horizontal / vertical grouping, documentation states https://kotlin.github.io/dataframe/pivot.html
df.pivot { pivotColumns }.groupBy { indexColumns }
pivotColumns — columns with values for horizontal data grouping and generation of new columns
indexColumns — columns with values for vertical data grouping
According to I have gender and smoker grouped horizontally / day and time are grouped vertically.
this code should work:
df.pivot { smoker then gender }.groupBy { day and time }
Does it work as you would expect?
Also, can you share this dataset if it's public?
from dataframe.
Gives:df.pivot { smoker then gender }.groupBy { day and time }
Update: df.pivot { day then time }.groupBy { gender and smoker }
Gives:
My actual workaround is to sort Values on the rows and try to span them df.pivot { smoker then gender }.groupBy { day and time }.sortBy { day and time }
which gives:
Update: My actual workaround is to sort Values on the rows and try to span them df.pivot { day then time }.groupBy { gender and smoker }.sortBy { gender and smoker }
which gives:
from dataframe.
I see the problem, yes. Can you provide the dataset so i can give it a try myself? It would be easier for me to figure it out in REPL :) Or we can pick other dataset
from dataframe.
Hi, @hfazai. As far as I understand, you need hierarchical grouping of rows, so that df.groupBy { a then b }
will return DataFrame
with a single row for every distinct a
value and every row will contain nested DataFrame
grouped by b
. Currently DataFrame
supports nesting only for columns, but not for rows. It is a good feature request, thank you! We'll consider it
from dataframe.
Currently this is the only way to perform something similar to nested rows:
from dataframe.
I see the problem, yes. Can you provide the dataset so i can give it a try myself? It would be easier for me to figure it out in REPL :) Or we can pick other dataset
Try: val df = DataFrame.read("https://pivottable.js.org/examples/mps.csv")
from dataframe.
Hi, @hfazai. As far as I understand, you need hierarchical grouping of rows, so that
df.groupBy { a then b }
will returnDataFrame
with a single row for every distincta
value and every row will contain nestedDataFrame
grouped byb
. CurrentlyDataFrame
supports nesting only for columns, but not for rows. It is a good feature request, thank you! We'll consider it
Hi @nikitinas, yes exactly :)
from dataframe.
Currently this is the only way to perform something similar to nested rows:
I'll try it, thanks!
Actually I did a workaround with the sortBy { } and it works fine and it would be great to add this feature by adding then
to ColumnsSelectionDsl
so that we can call df.groupBy { a then b }
from dataframe.
My actual workaround is to sort Values on the rows and try to span them
df.pivot { smoker then gender }.groupBy { day and time }.sortBy { day and time }
which gives:
I can't match this image with your code. According to attached screenshot, you do
df.groupBy { gender and smoker }.pivot { day then time }.sortBy { gender and smoker }
And you want df.groupBy { gender then smoker }.pivot { day and time }
to have all rows with the same gender
stored consequently (by default, without sorting) and also merge equal values in gender
vertically. Is it correct?
from dataframe.
Yes, sorry I was wrong I copied the last code in my tests, I mean df.groupBy { gender and smoker }.pivot { day then time }.sortBy { gender and smoker }
from dataframe.
Let's discuss implementation.
An obvious solution is to add GroupedDataRow
that will store value for the first grouping key and nested DataFrame
with other grouping keys. It will be similar to FrameColumn
from my previous comment:
but will have a better presentation: nested DataFrames
will be injected into the root table without expanders and group
column.
This approach raises several questions:
- How many rows should return
df.rowsCount()
: 2 or 4? - Can
gender
andparty
columns be addressed directly asdf.gender
? If yes, will they have size 4 whileprovince
have size 2? This will break current restriction that all columns inDataFrame
should have equal size. - Should
df.sortBy { gender }
work across the whole table (breaking original row grouping) or sort only within row groups? - ...
I think that fair implementation of row grouping may overcomplicate DataFrame
usage. I assume that in most cases grouped rows are needed only for better visual representation, so we can just add some info into DataColumn
API with span ranges indicating subsequent values that should be rendered in a single merged cell. So internally nothing will change and such "merged" column will behave just as any other column. What do you think about it?
from dataframe.
And we also should add groupBy { a then b }
.
from dataframe.
we can just add some info into
DataColumn
API with span ranges indicating subsequent values that should be rendered in a single merged cell. So internally nothing will change and such "merged" column will behave just as any other column. What do you think about it?
I agree with you concerns about changing the internal implementation. The solution of adding the additional informations about the row grouping looks good to me.
I think to add span ranges indicating subsequent values that should be rendered in a single merged cell, the groupBy { a then b }
should sort the rows before.
from dataframe.
I think that sorting should be performed explicitly, because in some cases original order of rows must be preserved. We can add .asc()
and .desc()
modifiers in column selector, similar to sortBy
:
df.groupBy { a.asc() and b.desc() }
from dataframe.
Sounds good 👍
from dataframe.
Related Issues (20)
- Unclear error message when using an invalid ColumnAccessor
- Cannot `groupBy` multiple unnamed expression columns
- Jupyter `importDataSchema()` does not generate column accessors HOT 1
- Generate sitemap.xml automatically
- Java 11? HOT 2
- JDBC H2, `CHAR(10)` column read as `Char?` instead of `String?` HOT 2
- Consider adding a parameter to nest aggregated columns: aggregate(under: String?) HOT 2
- Unresolved reference when using Properties to access HOT 9
- Kotlin DataFrame compiler plugin HOT 1
- DataFrame.read doesn't work when called in shadow jar
- Cannot run publish docs, GH actions
- Kotlin Notebook crash when creating DataFrame from List<Map> HOT 5
- `type: KType` in `DataColumnImpl` mismatches actual values sometimes HOT 3
- "Smart" column add in `DynamicDataFrameBuilder` HOT 4
- `Map<String, Any?>.toDataRow()` function?
- Maven plugin to generate schemas from data
- Add Maven and Amper configurations in the documentation
- Parallel computations
- `df.describe().describe()` crashes HOT 4
- `df.std()` does not infer types of generated columns
Recommend Projects
-
React
A declarative, efficient, and flexible JavaScript library for building user interfaces.
-
Vue.js
🖖 Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.
-
Typescript
TypeScript is a superset of JavaScript that compiles to clean JavaScript output.
-
TensorFlow
An Open Source Machine Learning Framework for Everyone
-
Django
The Web framework for perfectionists with deadlines.
-
Laravel
A PHP framework for web artisans
-
D3
Bring data to life with SVG, Canvas and HTML. 📊📈🎉
-
Recommend Topics
-
javascript
JavaScript (JS) is a lightweight interpreted programming language with first-class functions.
-
web
Some thing interesting about web. New door for the world.
-
server
A server is a program made to process requests and deliver data to clients.
-
Machine learning
Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.
-
Visualization
Some thing interesting about visualization, use data art
-
Game
Some thing interesting about game, make everyone happy.
Recommend Org
-
Facebook
We are working to build community through open source technology. NB: members must have two-factor auth.
-
Microsoft
Open source projects and samples from Microsoft.
-
Google
Google ❤️ Open Source for everyone.
-
Alibaba
Alibaba Open Source for everyone
-
D3
Data-Driven Documents codes.
-
Tencent
China tencent open source team.
from dataframe.