tixxit / framian Goto Github PK
View Code? Open in Web Editor NEWFramian
License: Apache License 2.0
Framian
License: Apache License 2.0
This looks like an interesting project I was pointed to via Twitter. Would be nice to have a little introductory readme.
Thanks.
A common use case is to have secondary or even tertiary+ values/Cols
that we're sorting by. It would be nice to allow sortBy
to take multiple Cols
. This is not equivalent to zip
ing the Cols
first, since it would allow each Cols
's values to be treated as Cell
s separately (eg. an NA
in the 3rd value wouldn't force the entire sort key to be NA
).
@dwhjames made a first push by updating values
to cells
and denseValues
to values
. We should also update filterValues
to filterCells
, add a new filterValues
. I'm not sure if there are any others...
In a few places it would be nice to have a joinBy
method:
trait Frame[Row, Col] {
...
def joinBy[A: Order: ClassTag](by: Cols[Col, A])(that: Frame[A, Col]): Frame[Row, Col]
...
}
Or is there a recommended successor? This project came as "the" recommended one for local machine scala dataframes by a UK professor two years ago .. so it would be of interest: https://darrenjw.wordpress.com/2015/08/21/data-frames-and-tables-in-scala/
Perhaps I missed it, but I couldn't see an easy way to do this. It would obviously be very useful.
It would be nice to add efficient from
and to
methods on Index
and Series
.
scala> val s: Series[String, Int] = Series("alice" -> 1, "branden" -> 2, "dan" -> 3, "tom" -> 4, "zed" -> 5)
scala> s.from("branden").to("tom")
Series("branden" -> 2, "dan" -> 3, "tom" -> 4)
Series
has the following methods:
def firstValue: Option[(K, V)] = {
var i = 0
while (i < index.size) {
val row = index.indexAt(i)
if (column.isValueAt(row))
return Some(index.keyAt(i) -> column.valueAt(row))
i += 1
}
None
}
def lastValue: Option[(K, V)] = {
var i = index.size - 1
while (i >= 0) {
val row = index.indexAt(i)
if (column.isValueAt(row))
return Some(index.keyAt(i) -> column.valueAt(row))
i -= 1
}
None
}
These are currently only used in one place:
def getFirstAndLastTopLevelMetricData(settings: ScaleSettings): EitherT[Future, MetricDataError, (List[Metric], Frame[Company, MetricColumn])] = {
import MetricColumn.{ date, comparable }
getTopLevelMetricData(settings) map { case (metadata, frame0) =>
val metricKey = MetricColumn.Data(settings.metricsString(0))
val frame = frame0.mapRowGroups { case (comp, group0) =>
val group = group0.columns(date).groupAs[LocalDate]
val values = group.column[Number](metricKey)
val stripped = for {
(date0, _) <- values.firstValue
(date1, _) <- values.lastValue
} yield {
group.retainRows(date0, date1).columns(comparable).groupAs[Company]
}
stripped getOrElse Frame.empty[Company, MetricColumn]
}
(metadata, frame)
}
}
My issue here is that these behave quite differently to the First
and Last
reducers.
NM
We could add a new method to Series
:
def findAsc[B](f: (K, Column[V], Int) => Option[B]): Option[B] = {
var i = 0
while (i < index.size) {
val row = index.indexAt(i)
val res = f(index.keyAt(i), column, row)
if (res.isDefined)
return res
i += 1
}
None
}
Then firstValue
becomes equivalent to:
def findFirstValue: Option[(K, V)] =
findAsc((key, col, row) =>
if (column.isValueAt(row))
Some(key -> column.valueAt(row))
else
None
)
Similarly, we would have findDesc
and findLastKeyVal
.
I guess there is a penalty for abstracting these traversals to findAsc
and findDesc
.
Thoughts?
It would be nice to have a foreach style method for iterating over all cells in a frame. Something like:
trait Frame[Row, Col] {
...
def foreachCell[A: ColumnTyper, U](f: (Row, Col, Cell[A]) => U): Unit
...
}
When implementing this, we should ensure we traverse the rows/cols according to the Frame's orientataion (eg. if isColOriented
is true, traverse column-by-column).
It would also be nice to then implement methods such as,
def toColOrientedMatrix[A: ColumnTyper: ClassTag](na: => A, nm: => A = ???): Array[Array[A]] = ???
def toRowOrientedMatrix[A: ColumnTyper: ClassTag](na: => A, nm: => A = ???): Array[Array[A]] = ???
but this can come later.
If I were to build a web front over a Frame or Series, how would I go about giving both of them a common web format representation (aka JSON, etc)
Is there a converter to eventually convert a data frame to JSON?
When I do something like val frame = Frame.fromRows(company1,company2)
I get Frame[Int,Int]. I wonder, why does it not extract all column names from case class field names?
Please, update shapeless version. I cannot use framian because latest versions of shapeless make it crash.
Love the clean immutable design of this library - but I see that when the row count of input CSV rises the implementation consumes large amounts of memory (and CPU) -- using the Cars93.csv sample and scaling up the file to 1M rows - loading the 152Mb CSV takes more than 7Gb of memory and more than 15 minutes of processing on Macbook i7 16Gb machine with SSD. wc -l bigcars.csv takes less than 1 second.
On the plus side - running the example to filter and add column with calculations performs reasonably well - just a few seconds for these operations once the data is in memory.
Also applying pull request #65 for issue #64 makes output exceedingly long - more than 8 hours not completed.
How can I set column labels to a frame from a given list of string?
A declarative, efficient, and flexible JavaScript library for building user interfaces.
๐ Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.
TypeScript is a superset of JavaScript that compiles to clean JavaScript output.
An Open Source Machine Learning Framework for Everyone
The Web framework for perfectionists with deadlines.
A PHP framework for web artisans
Bring data to life with SVG, Canvas and HTML. ๐๐๐
JavaScript (JS) is a lightweight interpreted programming language with first-class functions.
Some thing interesting about web. New door for the world.
A server is a program made to process requests and deliver data to clients.
Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.
Some thing interesting about visualization, use data art
Some thing interesting about game, make everyone happy.
We are working to build community through open source technology. NB: members must have two-factor auth.
Open source projects and samples from Microsoft.
Google โค๏ธ Open Source for everyone.
Alibaba Open Source for everyone
Data-Driven Documents codes.
China tencent open source team.