thesps / conifer Goto Github PK
View Code? Open in Web Editor NEWFast inference of Boosted Decision Trees in FPGAs
License: Apache License 2.0
Fast inference of Boosted Decision Trees in FPGAs
License: Apache License 2.0
Currently, AdaBoost models from scikit-learn [1] are implicitly ignored by conifer's sklearn converter [2]. Neither of the existing BDT or RF converter methods work "out of the box" for AdaBoostClassifier, which names some attributes differently to the other methods. A new converter method will be needed. But since the base classifiers are DecisionTrees, it should not be a major addition.
[1] https://github.com/scikit-learn/scikit-learn/blob/0.21.3/sklearn/ensemble/weight_boosting.py#L292
[2] https://github.com/thesps/conifer/blob/master/conifer/converters/sklearn.py#L42-L46
Currently, the VHDL backend generates some Modelsim scripts to simulate the conifer model. Since many users may not have access to Modelsim, it would be beneficial to alternatively generate scripts for the Vivado simulator.
When trying to run the onnx conversion on a model that was trained in xgboost I encounter the following error
Traceback (most recent call last):
File "/home/cebrown/Documents/Tracker/NewKF/Firmware/work/src/l1tk-for-emp/tq/scripts/conifer_convert.py", line 40, in
hdl_model = conifer.model(bdt_model, conifer.converters.onnx, conifer.backends.vhdl, cfg)
File "/home/cebrown/anaconda3/envs/tq/lib/python3.9/site-packages/conifer/model.py", line 11, in init
self._ensembleDict = converter.convert(bdt)
File "/home/cebrown/anaconda3/envs/tq/lib/python3.9/site-packages/conifer/converters/onnx.py", line 29, in convert
return convert_bdt(onnx_clf)
File "/home/cebrown/anaconda3/envs/tq/lib/python3.9/site-packages/conifer/converters/onnx.py", line 8, in convert_bdt
treelist,max_depth,base_values,no_features,no_classes=convert_graph(onnx_clf)
File "/home/cebrown/anaconda3/envs/tq/lib/python3.9/site-packages/conifer/converters/onnx.py", line 32, in convert_graph
if(onnx_clf.graph.node[1].name=='ZipMap'):
IndexError: list index (1) out of range
Upon digging into the ONNX model itself the structure of the graph is different in the xgboost converted model, the fundamental difference seeming to be that in an sklearn converted model as is in the unit tests the header is:
ir_version: 4 producer_name: "skl2onnx" producer_version: "1.9.2" domain: "ai.onnx" model_version: 0 doc_string: ""
And for an xgboost model:
ir_version: 7 producer_name: "OnnxMLTools" producer_version: "1.7.0" domain: "onnxconverter-common" model_version: 0 doc_string: ""
It would seem that ONNX models are not created equally and the graph structure while on the whole is similar there are some key differences that break the current conversion code.
Hi,
It seems that synthesis time is really long, specially because the interval requirements for the pipelining are not met and Vitis needs to guess the right parameters (which it does badly).
I have been trying to use this tool to compile a GBDT based on XGBoost with 700 trees, 40 features - 1000000 data points - and the following parameters:
{
'objective': 'binary:logistic',
'grow_policy': 'lossguide',
'eval_metric': 'rmse',
'eta': 1,
'gamma': 1,
'alpha': 1,
'max_leaves': 7,
'max_depth': 3,
'seed': 0,
'nthread': 16
}
Trying to compile the resulting HLS code without modifying it fails because of the optimizations. After adjusting the optimizations, synthesis time takes hours. The same happened with a similar GBDT with only 50 trees and without modifying the HLS code.
Is there something I might be doing wrong?
Is this library recommended for such "big" models?
The implementation of summation over tree scores in HLS backend prevents automatic expression balancing when saturation and rounding are used in the score precision type.
Trees:
for(int i = 0; i < n_trees; i++){
Classes:
for(int j = 0; j < fn_classes(n_classes); j++){
score[j] += scores[i][j];
}
}
This should be replaced with a balanced tree reduce implementation.
I get the same error as an other user running the example code from the HLS4ML tutorial or the Conifer Usage section.
Running this line causes an error:
model = conifer.model(clf, conifer.converters.sklearn,
conifer.backends.xilinxhls, cfg)
I have looked into the model.py myself but could not figure out which class should be called..
Is there a solution to this problem?
Thanks in advance!
The other user with the same problem:
fastmachinelearning/hls4ml-tutorial#37
The XGBoost converter is not working for XGBoost 2.0.0. This issue is for tracking the development of the fix.
If you click the links in the README for the json or Xilinx headers you are directed to a non existent github page: https://github.com/nlohmann/json)%5Bhere%5D looks like an issue in the markdown, link still works if you remove the extra characters at the end.
Hi,
I've started working with this tool. I tried to convert a simple xgboost tree into VHDL. Conifer is creating the files without a problem, however when I try to import them in Vivado it's a mess. After struggling a bit with the library names (it's my first time using VHDL, usually I'm working with Verilog), I decide to go another route and convert the VHDL code into a Verilog module using GHDL (and/or Yosys).
ghdl -a Constants.vhd
ghdl -a Types.vhd
ghdl -a AddReduce.vhd
ghdl -a Arrays0.vhd
ghdl -a Tree.vhd
ghdl -a BDT.vhd
ghdl -a BDTTop.vhd
ghdl --synth --out=verilog bdttop > BDTTop.v
It's seems to be working, and while my VHDL attemps crashed Vivado, the Verilog file plays very nicely.
However I get following warnings :
AddReduce.vhd:28:11:warning: declaration of "addreduce" hides entity "addreduce" [-Whide]
component AddReduce is
^
AddReduce.vhd:59:10:warning: no assignment for offsets 0:17 of signal "dint"
signal dInt : tyArray(0 to intLen - 1) := (others => (others => '0'));
^
AddReduce.vhd:59:10:warning: no assignment for offsets 0:17 of signal "dint"
signal dInt : tyArray(0 to intLen - 1) := (others => (others => '0'));
And indeed, when I try to run a simulation with Vivado I find that the output signal from the tree is 'XXXXXXXX', and I have some internal ZZZs .
Is this an issue from your side, or did something go wrong during the translation process ?
(I'm using Vivado 2020.1)
The HLS backend gives wrong results on an example regression task, for unknown reason.
Running the sklearn_regression.py
example with conifer.backends.vivadohls
gives wrong results. VHDL backend gives correct results.
y_skl[:10]
array([15.37657419, 18.24400667, 22.7363659 , 25.21349792, 21.54568329,
23.23484561, 23.23484561, 17.9478476 , 21.42883943, 14.77063661])
y_hdl[:10]
array([15.37651062, 18.24401855, 22.73638916, 25.21348572, 21.54573059,
23.23486328, 23.23486328, 17.94779968, 21.42884827, 14.77056885])
y_hls[:10]
array([28.4943, 29.7487, 28.3512, 26.6453, 27.1684, 25.1494, 25.1494,
27.075 , 28.1972, 25.4532])
PR #53 probably broke synthesis for Vivado HLS (while it works with more recent Vitis HLS).
In the HLS build log:
ERROR: [HLS 200-101] 'open_solution': Unknown option '-flow_target'.
This will need some care to find a solution that works across all versions, but could be handled either in the Python HLS backend writer or the TCL script itself.
Hey,
I just found your project, and it look awesome :D I wanted to try it out quickly, to see if it's indeed what I needed.
I'm currently working on windows, and it seems like conifer doesn't find vivado hls or vitis hls as I get following errors :
Could not find ap_ headers (e.g., ap_fixed.h). None of XILINX_AP_INCLUDE, XILINX_HLS, XILINX_VIVADO are defined
---------------------------------------------------------------------------
Exception Traceback (most recent call last)
Cell In[15], line 3
1 # Create and compile the model
2 model = conifer.converters.convert_from_xgboost(xgbt_model.get_booster(), cfg)
----> 3 model.compile()
5 # Synthesize the model
6 model.build()
File ~\anaconda3\envs\connifer-octave\Lib\site-packages\conifer\backends\xilinxhls\writer.py:454, in XilinxHLSModel.compile(self)
452 if ap_include is None:
453 os.chdir(curr_dir)
--> 454 raise Exception("Couldn't find Xilinx ap_ headers. Source the Vivado/Vitis HLS toolchain, or set XILINX_AP_INCLUDE environment variable.")
455 cmd = f"g++ -O3 -shared -std=c++14 -fPIC $(python3 -m pybind11 --includes) {ap_include} bridge.cpp firmware/BDT.cpp firmware/{cfg.project_name}.cpp -o conifer_bridge_{self._stamp}.so"
456 logger.debug(f'Compiling with command {cmd}')
Exception: Couldn't find Xilinx ap_ headers. Source the Vivado/Vitis HLS toolchain, or set XILINX_AP_INCLUDE environment variable.
After struggling a bit with it, I've looked into the code and found this :
conifer/conifer/backends/xilinxhls/writer.py
Lines 19 to 28 in 086c82e
This looks like a linux-only command. Can conifer only run on Linux ?
Hello,
I am trying to generate a bitstream directly for the ZCU104. However, when I checked the generated HLS project found out that:
1- There is no AXI bridge.
2- there is no TCL script to generate the bitstream.
I thought because I am using the Xilinxhls backend, but it seems this is the same case for the other backend.
Sould make this manually, Or I missed something here?
Hello there,
I currently am experimenting RandomForests from sklearn into a ZCU102 board. I first tried with the classic HLS/Vivado/Vitis flow but was struggling with the results. I tried using pynq + the hls accelerator and my results are still weird.
So, for the example I am using the basic wine dataset from sklearn, with a RF (100 trees with a max depth of 100).
With sklearn I obtain these predictions: (using clf.predict_proba), which are fine
[0.97 0.03 0. ]
[0.93 0.05 0.02]
[0.06 0.12 0.82]
[0.91 0.08 0.01]
[0.07 0.85 0.08]
Then, with the model converted and compiled I obtain this : (using model.decision_function)
[ 8.59375000e-01 6.23525391e+01 2.60214844e+01]
[ 7.51953125e-01 -3.56474609e+01 2.61230469e+01]
[ 1.75781250e-01 8.43525391e+01 2.62246094e+01]
[ 7.03125000e-01 -8.66474609e+01 2.63261719e+01]
[ 2.83203125e-01 -9.96474609e+01 2.64277344e+01]
These results are strange and I don't understand them, what would be the explanation about them ?
Finally, on the PL, here are the results provided by accelerator.decision_function(np.float32(X_test))
[0.859375 0. 0. ]
[0.7519531 0. 0. ]
[0.17578125 0. 0. ]
[0.703125 0. 0. ]
[0.28320312 0. 0. ]
These one correspond to the precedent results given by the converted model.
For the conversion I used the examples :
clf = RandomForestClassifier(n_estimator=100, max_depth=100)
clf.fit(X_train, X_test)
cfg = conifer.backends.xilinxhls.auto_config()
accelerator_config = {'Board' : 'zcu102',
'InterfaceType': 'float'}
cfg['AcceleratorConfig'] = accelerator_config
cfg['OutputDir'] = 'prj_{}'.format(int(datetime.datetime.now().timestamp()))
model = conifer.converters.convert_from_sklearn(clf, cfg)
model.compile()
y_hls = model.decision_function(X_test)
y_skl = clf.predict_proba(X_test)
model.build(bitfile=True, package=True)
What am I doing wrong ?
Thank you in advance
Hi,
I tried to run xgboost_to_hls.py
example, but Vivado HLS fails with segmentation fault
It's probably something with the array ranges?
INFO: [XFORM 203-11] Balancing expressions in function 'hls4ml_burr' (firmware/hls4ml_burr.cpp:4)...4 expression(s) balanced.
Stack dump:
0. Running pass 'Function Pass Manager' on module '~/ip_cores/hls4ml_burr/build/hls4ml_burr_prj/solution1/.autopilot/db/a.o.1.tmp.bc'.
1. Running pass 'Instruction simplification' on function '@"BDT::Tree<3, ap_fixed<18, 8, (ap_q_mode)5, (ap_o_mode)3, 0> [10], ap_fixed<18, 8, (ap_q_mode)5, (ap_o_mode)3, 0>, ap_fixed<18, 8, (ap_q_mode)5, (ap_o_mode)3, 0> >::decision_function"'
Abnormal program termination (11)
Please check '~/hls4ml_burr/build/hs_err_pid27358.log' for details
segfault in /tools/Xilinx/Vivado/2020.1/bin/unwrapped/lnx64.o/vivado_hls -exec vivado_hls -f build_prj.tcl reset=1 csim=0 synth=1 cosim=0 validation=0 export=1 vsynth=0, exiting...
Makefile:12: recipe for target 'build_hls' failed
make: *** [build_hls] Error 139
See the project archive for more details
https://justbeamit.com/q9zra
Hi,
I found an issue with xgboost example
https://github.com/thesps/conifer/blob/master/examples/xgboost_to_hls.py
y_hls
and y_xgb
aren't close
y_hls = expit(model.decision_function(X_test))
y_xgb = bst.predict(dtest)
diff = y_xgb - y_hls
print(diff[abs(diff)>0.05])
[-0.13502171 0.06955624 -0.1099674 -0.2427507 -0.14311438 -0.0606428
0.08703702 -0.054607 -0.41907781 -0.12813512 0.28282228 -0.21637464
0.31876776 0.26711339 -0.14989728 -0.05887845 -0.06809392 0.12303647
-0.08492118 -0.07751923 -0.05739652 -0.11599926 -0.14425865 -0.08459726
-0.12540119 -0.06227853 -0.27874367 -0.29141373 0.12563779 -0.22311496
-0.13287621 -0.17924546 -0.10041202]
As soon as output is normalized to 1, absolute error up to 0.31 seems to be too high for practical usage.
Is it a known issue?
Hi,
Importing conifer after having installed it from pip (using pip install conifer
) raises an error.
Here are the details:
Python v3.10.6
Conifer v0.4
Content of conifer_test.py
file:
import conifer
On Windows
Traceback (most recent call last):
File "G:\projects\work\test_conifer\conifer_test.py", line 1, in <module>
import conifer
File "G:\projects\work\test_conifer\conifer.py", line 2, in <module>
from conifer import backends
ImportError: cannot import name 'backends' from partially initialized module 'conifer' (most likely due to a circular import) (G:\projects\work\test_conifer\conifer.py)
On Linux WSL2
achoum@abc:/mnt/g/projects/work/test_conifer$ python3 conifer_test.py
Traceback (most recent call last):
File "/mnt/g/projects/work/test_conifer/conifer_test.py", line 1, in <module>
import conifer
File "/mnt/g/projects/work/test_conifer/conifer.py", line 2, in <module>
from conifer import backends
ImportError: cannot import name 'backends' from partially initialized module 'conifer' (most likely due to a circular import) (/mnt/g/projects/work/test_conifer/conifer.py)
Cheers :)
I just downloaded the package and went to run the sklearn_to_hls.py in the example directory but ran into a few issues. To run, I did "python sklearn_to_hls.py". Here were the issues and my current solutions that you may want to consider including:
Error 1: import conifer ModuleNotFoundError: No module named 'conifer'
Solution 1: Include init.py script in the top directory where the conifer directory is so that is can be seen as a package, then move the sklearn_to_hls.py script to the same top directory and run. There may be a better solution here, but in general the scripts in the examples directory cannot see the conifer directory and the conifer directory is not currently viewed as a package by python.
Error 2: y_hls = model.decision_function(X)[:,0] IndexError: too many indices for array
Solution 2: y_hls is a 1D array so the [:,0] was the issue here. If you take this out then the issue is fixed. Every other script in example does not have [:,0] after model.decision_function so this is the only place where this will need to be fixed.
I can submit a pull request for my solution to error 2. For error 1, perhaps you have a better solution?
Hi all,
I am trying to do:
cnf = conifer.model(clf._Booster, conifer.converters.xgboost, conifer.backends.vivadohls, cfg)
cnf.compile()
But I get:
82 data = node.split('<').
---> 83 feature = int(data[0].split('[')[-1].replace('f','')).
84 threshold = float(data[1].split(']')[0]).
85 child_left = int(node.split('yes=')[1].split(',')[0]).
ValueError: invalid literal for int() with base 10: 'busy_cycles'
Printing 'node' I have:
0:[busy_cycles<299923] yes=1,no=2,missing=1
busy_cycles
is the name of a feature in my dataset. Is line 83 trying to convert it to int
?
I am running an Ubuntu virtual machine (Ubuntu Desktop 18.04 LTS and have installed Vivado 2019.2. I have created a python environment and have installed conifer using pip install conifer
.
However, when running the example scripts I run into the following error:
Exception: Couldn't find Xilinx ap_ headers. Source the Vivado/Vitis HLS toolchain, or set XILINX_AP_INCLUDE environment variable.
What is the fix here?
The RandomForestClassifier
in scikit-learn is similar to the GradientBoostingClassifier
in that an ensemble of Decision Trees is used. For binary classification tasks, I think conifer already works for RandomForestClassifier
s. However, for multi-class problems, whereas GradientBoostingClassifier
learns a different tree for each class at each 'estimator', the RandomForestClassifier
trains a single tree with multiple output scores. So, RandomForestClassifier
with multi-class problems won't work at the moment, but should not be too challenging to support.
A declarative, efficient, and flexible JavaScript library for building user interfaces.
๐ Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.
TypeScript is a superset of JavaScript that compiles to clean JavaScript output.
An Open Source Machine Learning Framework for Everyone
The Web framework for perfectionists with deadlines.
A PHP framework for web artisans
Bring data to life with SVG, Canvas and HTML. ๐๐๐
JavaScript (JS) is a lightweight interpreted programming language with first-class functions.
Some thing interesting about web. New door for the world.
A server is a program made to process requests and deliver data to clients.
Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.
Some thing interesting about visualization, use data art
Some thing interesting about game, make everyone happy.
We are working to build community through open source technology. NB: members must have two-factor auth.
Open source projects and samples from Microsoft.
Google โค๏ธ Open Source for everyone.
Alibaba Open Source for everyone
Data-Driven Documents codes.
China tencent open source team.