It would be nice to be able to save the RegressionResults to a file, to avoid rerunnin

<a class="user-mention notranslate" data-hovercard-type="user" data-hovercard-url="/us

Question <a class="user-mention notranslate" data-hovercard-type="user" data-hovercard

Make RegressionResults serializable about gneiss HOT 7 OPEN

mortonjt commented on August 12, 2024

Make RegressionResults serializable

from gneiss.

Comments (7)

antgonza commented on August 12, 2024

I like pickle files, easy with Pandas.

from gneiss.

wasade commented on August 12, 2024

Pickle is easy but python specific. TSV may be nice as it allows for operability easily in R or Python

from gneiss.

mortonjt commented on August 12, 2024

Good point.

Note that there are advantages of saving the raw Python object. There are functions like predict that cannot be serialized, since it operates directly on the RegressionResults object. Right now, it is possible to serialize individual attributes such as residuals, so this functionality is somewhat supported.

from gneiss.

mortonjt commented on August 12, 2024

@wasade @antgonza it looks like pickle is not going to work, particularly with large trees. I get the following error with a tree with 14000 tips

RecursionError: maximum recursion depth exceeded while pickling an object

I guess we'll want to invent a file format to store all of this information ...

from gneiss.

wasade commented on August 12, 2024

store the tree as a newick string?

from gneiss.

mortonjt commented on August 12, 2024

Question @wasade when it comes to converting objects to json, there are any limitations?

Right now, part of the problem is that there will need to be Python objects serialized such as OLSResults. If so, then it may doable to have a json parser on top of a pickle parser.

from gneiss.

mortonjt commented on August 12, 2024

This is going to be increasingly necessary as we string up more tree visualizations to the CLI.

Right now, I'm thinking about having two main types of file formats, namely. Model and Results formats. The Model file formats are only responsible for storing the barebones model parameters. For example, the coefficients in a regression model, and the tree used to generate the balances. And these models can be used for making predictions.

The Results model format will be much larger, storing all of the model statistics and residuals. Maybe using something like pytables + hdf5 will be more suitable for handling these large tables.

from gneiss.

Recommend Projects

Make RegressionResults serializable about gneiss HOT 7 OPEN

Comments (7)

Related Issues (20)

Recommend Projects

React

Vue.js

Typescript

TensorFlow

Django

Laravel

D3

Recommend Topics

javascript

web

server

Machine learning

Visualization

Game

Recommend Org

Facebook

Microsoft

Google

Alibaba

D3

Tencent