I figured i will share some thoughts i have after trying PyBroker a bit. <p dir="a

First off, thank you for your thoughtful input <a class="user-mention notranslate" dat

Indicators and dataframes about pybroker HOT 5 CLOSED

edtechre commented on September 18, 2024

Indicators and dataframes

from pybroker.

Comments (5)

rokups commented on September 18, 2024 1

Hmm i did not think about using lambdas. Thank you for the example. I suppose this is solved then?

from pybroker.

edtechre commented on September 18, 2024

First off, thank you for your thoughtful input @rokups, it is much appreciated. My thoughts are below:

df['spread'] = df['high'] - df['low']
df['spread_sma20'] = ta.SMA(df['spread'], 20)
df['spread_sma40'] = ta.SMA(df['spread'], 40)

This looks trivial on surface and of course is nothing PyBroker can not do, but actually this is very powerful.

To achieve something like this in PyBroker we would have to create a custom indicator functions for spread_sma20 and spread_sma40. But here we waste calculation of the spread column as it is done twice now.

PyBroker computes indicators in parallel using a process pool. To simplify this, the indicators are distributed across multiple processes for each ticker and indicator function pair. This means that there are no dependencies between indicators, making their computation easily parallelizable.

If you need to share custom data between indicators, you can register a custom data column with PyBroker and then create your own DataSource class or pass your own DataFrame to PyBroker. The Creating a Custom DataSource notebook shows how to do this. In your example, you would calculate the spread column in your DataFrame and then register it using pybroker.register_columns. The custom column will then be made available on the BarData instance passed to your indicator function.

It also is rather cumbersome to use indicator libraries like lib-ta or pandas_ta. These libraries already provide one-func-call indicators that we now must wrap in another function to acquaint them with PyBroker.

I am considering creating a wrapper around ta-lib. You should already be able to use pandas_ta by using a custom data source and registering custom columns, as explained previously. Perhaps I can add an example of pandas_ta to the custom DataSources notebook.

This dataframe need to be split anyway, besides merging dataframes of different symbols puts a burden on the user to make sure that dataframes of all queried symbols are of equal length and user must properly merge them in case there are missing candles. If everyone has to do it - might as well do it in the library.
Then, if dataframes were separate, we could also have a user-implemented indicators_fn(df) in the same spirit as exec_fn, which would allow massaging dataframe in any way we see necessary and utilizing all power of pandas.

Creating multiple DataFrames would introduce extra overhead and complexity. External APIs for historical data are designed to return a single DataFrame to maintain simplicity and performance. However, a bigger concern is that having multiple DataFrames may not parallelize efficiently across multiple processes due to memory limitations and would also severely slow down serialization given PyBroker's current implementation. On the other hand, NumPy arrays can be mem-mapped across processes with ease and can be accelerated using Numba.

There is one special case where my proposed approach is not good enough: pairs trading. We need price data of two symbols in order to calculate necessary metrics.

You can retrieve the indicator of another symbol using ExecContext#indicator(), as well as OHLCV + custom column data with ExecContext#foreign().

I agree that support for multi-symbol indicators would make sense. It is something that I considered during the design phase, but I limited the implementation to single-symbol indicators for the sake of simplicity in the initial release (V1). I need to give this more thought, but my plan would be to add support for multi-symbol indicators as a configuration option that groups data for all symbols per indicator. If you have any suggestions, please let me know. In the meantime, you can calculate the multi-symbol indicator outside of PyBroker, save it to a DataFrame column, and then register the custom column with PyBroker.

from pybroker.

rokups commented on September 18, 2024

Hmm what you say does make sense...

I am considering creating a wrapper around ta-lib

Here is a little help on that: talibgen.py.txt

This is an updated and fixed script from TA-Lib/ta-lib-python#212, should simplify the process.

from pybroker.

edtechre commented on September 18, 2024

Great, thank you!

from pybroker.

edtechre commented on September 18, 2024

After reviewing TA-Lib again, I am unsure if creating a wrapper for it adds significant value. It's already fairly straightforward to integrate TA-Lib with PyBroker by using lambdas as shown in the following example:

import talib

rsi_20 = pybroker.indicator('rsi_20', lambda data: talib.RSI(data.close, timeperiod=20))
rsi_20(df)

I added this example to the Writing Indicators notebook.

from pybroker.

Indicators and dataframes about pybroker HOT 5 CLOSED

Comments (5)

Related Issues (20)

Recommend Projects

React

Vue.js

Typescript

TensorFlow

Django

Laravel

D3

Recommend Topics

javascript

web

server

Machine learning

Visualization

Game

Recommend Org

Facebook

Microsoft

Google

Alibaba

D3

Tencent