Git Product home page Git Product logo

Comments (5)

flamby avatar flamby commented on July 30, 2024 3

It seems one has to retrieve npartitions from original dask dataframe, and pass it to append.
So I fixed it this way:

collection.append(symbol, df_diff, npartitions=item.data.npartitions)

Will it work everytime?

from pystore.

JugglingNumbers avatar JugglingNumbers commented on July 30, 2024 1

The problem is that dd.from_pandas() checks:
if (npartitions is None) == (chunksize is None): raise ValueError("Exactly one of npartitions and chunksize must be specified.")

So when the append function calls dd.from_pandas(df, npartitions = None) it raises the error but if you call dd.from_pandas(df, npartitions = None, chunksize=100000) it works. Presumably dask is using npartitions = 1 as its default even though the api says npartitions is optional and doesn't list a default.

The code below is what needs to be tweaked. The new variable could be set to use npartitions = 1 (new = dd.from_pandas(data, npartitions=1), since this will be superseded by the passed value after the dataframes are combined. I'm willing to bet Ran comes up with a more elegant solution though.

# combine old dataframe with new
current = self.item(item)
new = dd.from_pandas(data, npartitions=npartitions)
# combined = current.data.append(new)
combined = dd.concat([current.data, new]).drop_duplicates(keep="last")
if npartitions is None:
memusage = combined.memory_usage(deep=True).sum()
if isinstance(combined, dd.DataFrame):
memusage = memusage.compute()
npartitions = int(1 + memusage // DEFAULT_PARTITION_SIZE)

from pystore.

viveksethu avatar viveksethu commented on July 30, 2024

thank you @flamby

this fix is working and thanks for sharing and saving time for others.

the pystore notebook demo too works only with this fix, else throws an error:

ValueError: Exactly one of npartitions and chunksize must be specified.

great thanks to @ranaroussi for this wonderful library

from pystore.

XBKZ avatar XBKZ commented on July 30, 2024

Thank you to @ranaroussi for this nices libraries and thank you to @flamby who fix this nasty bug in the Windows 10 environment ! I had exactly the same message ("Exactly one of npartitions and chunksize must be specified") and the append was impossible. Now, it's work. Thank you again.

from pystore.

yohplala avatar yohplala commented on July 30, 2024

Hello, same here (Win10 environment)!
Thanks for the fix @flamby !

from pystore.

Related Issues (20)

Recommend Projects

  • React photo React

    A declarative, efficient, and flexible JavaScript library for building user interfaces.

  • Vue.js photo Vue.js

    🖖 Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.

  • Typescript photo Typescript

    TypeScript is a superset of JavaScript that compiles to clean JavaScript output.

  • TensorFlow photo TensorFlow

    An Open Source Machine Learning Framework for Everyone

  • Django photo Django

    The Web framework for perfectionists with deadlines.

  • D3 photo D3

    Bring data to life with SVG, Canvas and HTML. 📊📈🎉

Recommend Topics

  • javascript

    JavaScript (JS) is a lightweight interpreted programming language with first-class functions.

  • web

    Some thing interesting about web. New door for the world.

  • server

    A server is a program made to process requests and deliver data to clients.

  • Machine learning

    Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.

  • Game

    Some thing interesting about game, make everyone happy.

Recommend Org

  • Facebook photo Facebook

    We are working to build community through open source technology. NB: members must have two-factor auth.

  • Microsoft photo Microsoft

    Open source projects and samples from Microsoft.

  • Google photo Google

    Google ❤️ Open Source for everyone.

  • D3 photo D3

    Data-Driven Documents codes.