Git Product home page Git Product logo

Comments (6)

yohplala avatar yohplala commented on September 6, 2024

Hello,
I have updated the code so that anyone can execute it in a terminal and can reproduce the error (previous code was not working on its own, it needed a data file. I have made an extract that I have embedded in the code)
Thanks in advance for any help and advice.
Bests,
Pierrot

from pystore.

yohplala avatar yohplala commented on September 6, 2024

[ADDITION]
Ok, I tested 1st the use of pandas concat() function (not using pystore). I don't have the error message.
It would mean that the trouble is coming from dask dataframe handling?

Following code (direct use of pandas, not pystore/dask/parquet) works:

import os
import pandas as pd

ts_list = ['Sun Dec 22 2019 07:40:00 GMT-0100',
           'Sun Dec 22 2019 07:45:00 GMT-0100',
           'Sun Dec 22 2019 07:50:00 GMT-0100',
           'Sun Dec 22 2019 07:55:00 GMT-0100']

op_list = [7134.0, 7134.34, 7135.03, 7131.74]

GC = pd.DataFrame(list(zip(ts_list, op_list)), columns =['date', 'open'])

# Getting timestamps back into GC, and resolving it to UTC time
GC['date'] = pd.to_datetime(GC['date'], utc=True)

# Rename columns
GC.rename(columns={'date': 'Timestamp'}, inplace=True)
    
# Set timestamp column as index
GC.set_index('Timestamp', inplace = True, verify_integrity = True)

combined = pd.concat([GC[:-1], GC[-1:]]).drop_duplicates(keep="last")

Problem is not solved.

from pystore.

yohplala avatar yohplala commented on September 6, 2024

Hmm, it seems I don't succeed to reproduce the error in a script without having to re-write in depth collection.py.
I am stopping the delving here (it seemed to me, it could be an error with my dataframe formatting maybe, that I could then submit either in stackoverflow or pandas github or dask if it was dask related)
But I have no clue where the bug is without going further into dask.

As this is not my priority at the moment, I will only use the write() funciton of pystore, and when I will have to append data, I will do it with pandas concat() function, then write() with pystore using overwrite=True.

I hope this trouble in Windows 10 environment can be solved (I am hinting that this error, along with having to use 'npartitions=item.data.npartitions' in append() function may actually be linked)

Have a good day,
Bests,
Pierrot

from pystore.

yohplala avatar yohplala commented on September 6, 2024

For those who are in the same case, here is an ugly workaround which logic I mention in above comment.

import os
import pandas as pd
import pystore

ts_list = ['Sun Dec 22 2019 07:40:00 GMT-0100',
           'Sun Dec 22 2019 07:45:00 GMT-0100',
           'Sun Dec 22 2019 07:50:00 GMT-0100',
           'Sun Dec 22 2019 07:55:00 GMT-0100']

op_list = [7134.0, 7134.34, 7135.03, 7131.74]

GC = pd.DataFrame(list(zip(ts_list, op_list)), columns =['date', 'open'])

# Getting timestamps back into GC, and resolving it to UTC time
GC['date'] = pd.to_datetime(GC['date'], utc=True)

# Rename columns
GC.rename(columns={'date': 'Timestamp'}, inplace=True)
    
# Set timestamp column as index
GC.set_index('Timestamp', inplace = True, verify_integrity = True)

# Connect to datastore (create it if not exist)
store = pystore.store('OHLCV')
# Access a collection (create it if not exist)
collection = store.collection('AAPL')
item_ID = 'EOD'
collection.write(item_ID, GC[:-1], overwrite=True)

# WORKAROUND
# Re-create an append function

item = collection.item(item_ID)
current = item.to_pandas()
combined = pd.concat([current, GC[-1:]]).drop_duplicates(keep="last")
collection.write(item_ID, combined, overwrite=True)

Bests,

from pystore.

sdementen avatar sdementen commented on September 6, 2024

I think that https://github.com/ranaroussi/pystore/blob/master/pystore/collection.py#L181 should
combined = dd.concat([current.to_pandas(), new]).drop_duplicates(keep="last")
instead of currently
combined = dd.concat([current.data, new]).drop_duplicates(keep="last")

@ranaroussi could you confirm ?

from pystore.

sdementen avatar sdementen commented on September 6, 2024

probably related to issue dask/dask#6925

from pystore.

Related Issues (20)

Recommend Projects

  • React photo React

    A declarative, efficient, and flexible JavaScript library for building user interfaces.

  • Vue.js photo Vue.js

    🖖 Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.

  • Typescript photo Typescript

    TypeScript is a superset of JavaScript that compiles to clean JavaScript output.

  • TensorFlow photo TensorFlow

    An Open Source Machine Learning Framework for Everyone

  • Django photo Django

    The Web framework for perfectionists with deadlines.

  • D3 photo D3

    Bring data to life with SVG, Canvas and HTML. 📊📈🎉

Recommend Topics

  • javascript

    JavaScript (JS) is a lightweight interpreted programming language with first-class functions.

  • web

    Some thing interesting about web. New door for the world.

  • server

    A server is a program made to process requests and deliver data to clients.

  • Machine learning

    Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.

  • Game

    Some thing interesting about game, make everyone happy.

Recommend Org

  • Facebook photo Facebook

    We are working to build community through open source technology. NB: members must have two-factor auth.

  • Microsoft photo Microsoft

    Open source projects and samples from Microsoft.

  • Google photo Google

    Google ❤️ Open Source for everyone.

  • D3 photo D3

    Data-Driven Documents codes.