Git Product home page Git Product logo

rotavirus's Introduction

audience

rotavirus's People

Contributors

deanla avatar

Stargazers

 avatar

Watchers

 avatar  avatar

rotavirus's Issues

low memory warning

While reading a DataFrame there is one parameter which is called low_memory and it's set to True by default. It's function is to decide minimal data type that is required to fit values of each column which seems to be for memory optimization purposes. In order to detect correct data type we need to consider all values in a column which doesn't seem to be optimal for big DataFrame because of 2 reasons I guess: memory and data loading time. And my assumption is that Pandas is optimizing both. That's why this parameter is True by default. I didn't dig into the implementation of that optimized version, how it detects data types (maybe reading some chunk of DataFrame take the minimal requirement).
The problem is that sometimes it gives unexpected results. Once I spent one week of some heavy calculations on chunks of data with a hope that I could assemble it back using index which was definitely unique. But I didn't check one specific detail that index was 8digit at the beginning of data and it was becoming 16digits (it was takes from some db with different versions primary key). While reading chunks of data I was actually getting first 8digits from 16digit index since low_memory was set to True by default and didn't check all index values. Finally I ended up with the calculations with no hope to assemble back and merge to original data.
I told such a long and dramatic story because that low_memory option is very strange, nobody takes it seriously but it becomes very critical in some cases.
So, please consider that case and put some warnings about that in dovpanda.

Recommend Projects

  • React photo React

    A declarative, efficient, and flexible JavaScript library for building user interfaces.

  • Vue.js photo Vue.js

    ๐Ÿ–– Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.

  • Typescript photo Typescript

    TypeScript is a superset of JavaScript that compiles to clean JavaScript output.

  • TensorFlow photo TensorFlow

    An Open Source Machine Learning Framework for Everyone

  • Django photo Django

    The Web framework for perfectionists with deadlines.

  • D3 photo D3

    Bring data to life with SVG, Canvas and HTML. ๐Ÿ“Š๐Ÿ“ˆ๐ŸŽ‰

Recommend Topics

  • javascript

    JavaScript (JS) is a lightweight interpreted programming language with first-class functions.

  • web

    Some thing interesting about web. New door for the world.

  • server

    A server is a program made to process requests and deliver data to clients.

  • Machine learning

    Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.

  • Game

    Some thing interesting about game, make everyone happy.

Recommend Org

  • Facebook photo Facebook

    We are working to build community through open source technology. NB: members must have two-factor auth.

  • Microsoft photo Microsoft

    Open source projects and samples from Microsoft.

  • Google photo Google

    Google โค๏ธ Open Source for everyone.

  • D3 photo D3

    Data-Driven Documents codes.