Comments (6)
Hello,
I have updated the code so that anyone can execute it in a terminal and can reproduce the error (previous code was not working on its own, it needed a data file. I have made an extract that I have embedded in the code)
Thanks in advance for any help and advice.
Bests,
Pierrot
from pystore.
[ADDITION]
Ok, I tested 1st the use of pandas concat() function (not using pystore). I don't have the error message.
It would mean that the trouble is coming from dask dataframe handling?
Following code (direct use of pandas, not pystore/dask/parquet) works:
import os
import pandas as pd
ts_list = ['Sun Dec 22 2019 07:40:00 GMT-0100',
'Sun Dec 22 2019 07:45:00 GMT-0100',
'Sun Dec 22 2019 07:50:00 GMT-0100',
'Sun Dec 22 2019 07:55:00 GMT-0100']
op_list = [7134.0, 7134.34, 7135.03, 7131.74]
GC = pd.DataFrame(list(zip(ts_list, op_list)), columns =['date', 'open'])
# Getting timestamps back into GC, and resolving it to UTC time
GC['date'] = pd.to_datetime(GC['date'], utc=True)
# Rename columns
GC.rename(columns={'date': 'Timestamp'}, inplace=True)
# Set timestamp column as index
GC.set_index('Timestamp', inplace = True, verify_integrity = True)
combined = pd.concat([GC[:-1], GC[-1:]]).drop_duplicates(keep="last")
Problem is not solved.
from pystore.
Hmm, it seems I don't succeed to reproduce the error in a script without having to re-write in depth collection.py.
I am stopping the delving here (it seemed to me, it could be an error with my dataframe formatting maybe, that I could then submit either in stackoverflow or pandas github or dask if it was dask related)
But I have no clue where the bug is without going further into dask.
As this is not my priority at the moment, I will only use the write() funciton of pystore, and when I will have to append data, I will do it with pandas concat() function, then write() with pystore using overwrite=True.
I hope this trouble in Windows 10 environment can be solved (I am hinting that this error, along with having to use 'npartitions=item.data.npartitions' in append() function may actually be linked)
Have a good day,
Bests,
Pierrot
from pystore.
For those who are in the same case, here is an ugly workaround which logic I mention in above comment.
import os
import pandas as pd
import pystore
ts_list = ['Sun Dec 22 2019 07:40:00 GMT-0100',
'Sun Dec 22 2019 07:45:00 GMT-0100',
'Sun Dec 22 2019 07:50:00 GMT-0100',
'Sun Dec 22 2019 07:55:00 GMT-0100']
op_list = [7134.0, 7134.34, 7135.03, 7131.74]
GC = pd.DataFrame(list(zip(ts_list, op_list)), columns =['date', 'open'])
# Getting timestamps back into GC, and resolving it to UTC time
GC['date'] = pd.to_datetime(GC['date'], utc=True)
# Rename columns
GC.rename(columns={'date': 'Timestamp'}, inplace=True)
# Set timestamp column as index
GC.set_index('Timestamp', inplace = True, verify_integrity = True)
# Connect to datastore (create it if not exist)
store = pystore.store('OHLCV')
# Access a collection (create it if not exist)
collection = store.collection('AAPL')
item_ID = 'EOD'
collection.write(item_ID, GC[:-1], overwrite=True)
# WORKAROUND
# Re-create an append function
item = collection.item(item_ID)
current = item.to_pandas()
combined = pd.concat([current, GC[-1:]]).drop_duplicates(keep="last")
collection.write(item_ID, combined, overwrite=True)
Bests,
from pystore.
I think that https://github.com/ranaroussi/pystore/blob/master/pystore/collection.py#L181 should
combined = dd.concat([current.to_pandas(), new]).drop_duplicates(keep="last")
instead of currently
combined = dd.concat([current.data, new]).drop_duplicates(keep="last")
@ranaroussi could you confirm ?
from pystore.
probably related to issue dask/dask#6925
from pystore.
Related Issues (20)
- Does append() work on OSX? HOT 3
- how to read all columns but the one use for partition
- Nested Dataframes causes exception
- collection.list_items() with metadata paremeter is showing "*** json.decoder.JSONDecodeError: Expecting value: line 1 column 198 (char 197)" HOT 1
- Append function not working
- Cause of most silent append errors HOT 3
- Multiindex and/or building minute bars HOT 1
- Is append loading the entire data into memory just to append new data ? HOT 1
- .to_pandas() error [can't read parquet file even though there is data in it when i look with parquet viewer] HOT 1
- Pystore Tutorial loading data problem
- issue reading back an item with metadata.json but no "_metadata"
- _updated in metadata use hour instead of minute
- Append lose data : by default remove duplicted indices. HOT 1
- Importing Pystore now gives ''EntryPoints' object has no attribute 'get''. HOT 1
- problem
- Strange path behaviour when using IPython terminal in Spyder
- Append ignores time series index when data is identical? HOT 1
- drop fastparquet and use pyarrow. this is required on latest versions of dask HOT 4
- utils.set_path fails when argument is type pathlib.Path
- is this project abandoned? HOT 4
Recommend Projects
-
React
A declarative, efficient, and flexible JavaScript library for building user interfaces.
-
Vue.js
🖖 Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.
-
Typescript
TypeScript is a superset of JavaScript that compiles to clean JavaScript output.
-
TensorFlow
An Open Source Machine Learning Framework for Everyone
-
Django
The Web framework for perfectionists with deadlines.
-
Laravel
A PHP framework for web artisans
-
D3
Bring data to life with SVG, Canvas and HTML. 📊📈🎉
-
Recommend Topics
-
javascript
JavaScript (JS) is a lightweight interpreted programming language with first-class functions.
-
web
Some thing interesting about web. New door for the world.
-
server
A server is a program made to process requests and deliver data to clients.
-
Machine learning
Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.
-
Visualization
Some thing interesting about visualization, use data art
-
Game
Some thing interesting about game, make everyone happy.
Recommend Org
-
Facebook
We are working to build community through open source technology. NB: members must have two-factor auth.
-
Microsoft
Open source projects and samples from Microsoft.
-
Google
Google ❤️ Open Source for everyone.
-
Alibaba
Alibaba Open Source for everyone
-
D3
Data-Driven Documents codes.
-
Tencent
China tencent open source team.
from pystore.