Git Product home page Git Product logo

Comments (7)

antgonza avatar antgonza commented on August 12, 2024

I like pickle files, easy with Pandas.

from gneiss.

wasade avatar wasade commented on August 12, 2024

Pickle is easy but python specific. TSV may be nice as it allows for operability easily in R or Python

from gneiss.

mortonjt avatar mortonjt commented on August 12, 2024

Good point.

Note that there are advantages of saving the raw Python object. There are functions like predict that cannot be serialized, since it operates directly on the RegressionResults object. Right now, it is possible to serialize individual attributes such as residuals, so this functionality is somewhat supported.

from gneiss.

mortonjt avatar mortonjt commented on August 12, 2024

@wasade @antgonza it looks like pickle is not going to work, particularly with large trees. I get the following error with a tree with 14000 tips

RecursionError: maximum recursion depth exceeded while pickling an object

I guess we'll want to invent a file format to store all of this information ...

from gneiss.

wasade avatar wasade commented on August 12, 2024

store the tree as a newick string?

from gneiss.

mortonjt avatar mortonjt commented on August 12, 2024

Question @wasade when it comes to converting objects to json, there are any limitations?

Right now, part of the problem is that there will need to be Python objects serialized such as OLSResults. If so, then it may doable to have a json parser on top of a pickle parser.

from gneiss.

mortonjt avatar mortonjt commented on August 12, 2024

This is going to be increasingly necessary as we string up more tree visualizations to the CLI.

Right now, I'm thinking about having two main types of file formats, namely. Model and Results formats. The Model file formats are only responsible for storing the barebones model parameters. For example, the coefficients in a regression model, and the tree used to generate the balances. And these models can be used for making predictions.

The Results model format will be much larger, storing all of the model statistics and residuals. Maybe using something like pytables + hdf5 will be more suitable for handling these large tables.

from gneiss.

Related Issues (20)

Recommend Projects

  • React photo React

    A declarative, efficient, and flexible JavaScript library for building user interfaces.

  • Vue.js photo Vue.js

    🖖 Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.

  • Typescript photo Typescript

    TypeScript is a superset of JavaScript that compiles to clean JavaScript output.

  • TensorFlow photo TensorFlow

    An Open Source Machine Learning Framework for Everyone

  • Django photo Django

    The Web framework for perfectionists with deadlines.

  • D3 photo D3

    Bring data to life with SVG, Canvas and HTML. 📊📈🎉

Recommend Topics

  • javascript

    JavaScript (JS) is a lightweight interpreted programming language with first-class functions.

  • web

    Some thing interesting about web. New door for the world.

  • server

    A server is a program made to process requests and deliver data to clients.

  • Machine learning

    Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.

  • Game

    Some thing interesting about game, make everyone happy.

Recommend Org

  • Facebook photo Facebook

    We are working to build community through open source technology. NB: members must have two-factor auth.

  • Microsoft photo Microsoft

    Open source projects and samples from Microsoft.

  • Google photo Google

    Google ❤️ Open Source for everyone.

  • D3 photo D3

    Data-Driven Documents codes.