thesps / conifer Goto Github PK

View Code? Open in Web Editor NEW

40.0 6.0 22.0 60.45 MB

Fast inference of Boosted Decision Trees in FPGAs

License: Apache License 2.0

Python 75.09% VHDL 7.89% Shell 0.76% Tcl 2.46% C++ 13.71% C 0.10%

machine-learning boosted-decision-trees fpga low-latency

conifer's People

Contributors

Stargazers

Watchers

conifer's Issues

'XGBClassifer' object has no attribute 'save config'

I trained an XGBClassifer model and I followed the example xgboost_to_hls.py however I keep getting this error:

How can I solve this issue?

scikit-learn AdaBoostClassifier

Currently, AdaBoost models from scikit-learn [1] are implicitly ignored by conifer's sklearn converter [2]. Neither of the existing BDT or RF converter methods work "out of the box" for AdaBoostClassifier, which names some attributes differently to the other methods. A new converter method will be needed. But since the base classifiers are DecisionTrees, it should not be a major addition.

[1] https://github.com/scikit-learn/scikit-learn/blob/0.21.3/sklearn/ensemble/weight_boosting.py#L292
[2] https://github.com/thesps/conifer/blob/master/conifer/converters/sklearn.py#L42-L46

Vivado simulation for VHDL backend

Currently, the VHDL backend generates some Modelsim scripts to simulate the conifer model. Since many users may not have access to Modelsim, it would be beneficial to alternatively generate scripts for the Vivado simulator.

ONNX Conversion Fails with ONNX model produced from xgboost

When trying to run the onnx conversion on a model that was trained in xgboost I encounter the following error

Traceback (most recent call last):
File "/home/cebrown/Documents/Tracker/NewKF/Firmware/work/src/l1tk-for-emp/tq/scripts/conifer_convert.py", line 40, in
hdl_model = conifer.model(bdt_model, conifer.converters.onnx, conifer.backends.vhdl, cfg)
File "/home/cebrown/anaconda3/envs/tq/lib/python3.9/site-packages/conifer/model.py", line 11, in init
self._ensembleDict = converter.convert(bdt)
File "/home/cebrown/anaconda3/envs/tq/lib/python3.9/site-packages/conifer/converters/onnx.py", line 29, in convert
return convert_bdt(onnx_clf)
File "/home/cebrown/anaconda3/envs/tq/lib/python3.9/site-packages/conifer/converters/onnx.py", line 8, in convert_bdt
treelist,max_depth,base_values,no_features,no_classes=convert_graph(onnx_clf)
File "/home/cebrown/anaconda3/envs/tq/lib/python3.9/site-packages/conifer/converters/onnx.py", line 32, in convert_graph
if(onnx_clf.graph.node[1].name=='ZipMap'):
IndexError: list index (1) out of range

Upon digging into the ONNX model itself the structure of the graph is different in the xgboost converted model, the fundamental difference seeming to be that in an sklearn converted model as is in the unit tests the header is:

ir_version: 4 producer_name: "skl2onnx" producer_version: "1.9.2" domain: "ai.onnx" model_version: 0 doc_string: ""

And for an xgboost model:

ir_version: 7 producer_name: "OnnxMLTools" producer_version: "1.7.0" domain: "onnxconverter-common" model_version: 0 doc_string: ""

It would seem that ONNX models are not created equally and the graph structure while on the whole is similar there are some key differences that break the current conversion code.

Long synthesis time on Vitis

Hi,
It seems that synthesis time is really long, specially because the interval requirements for the pipelining are not met and Vitis needs to guess the right parameters (which it does badly).
I have been trying to use this tool to compile a GBDT based on XGBoost with 700 trees, 40 features - 1000000 data points - and the following parameters:

{
            'objective': 'binary:logistic',
            'grow_policy': 'lossguide',
            'eval_metric': 'rmse',
            'eta': 1,
            'gamma': 1,
            'alpha': 1,
            'max_leaves': 7,
            'max_depth': 3,
            'seed': 0,
            'nthread': 16
 }

Trying to compile the resulting HLS code without modifying it fails because of the optimizations. After adjusting the optimizations, synthesis time takes hours. The same happened with a similar GBDT with only 50 trees and without modifying the HLS code.

Is there something I might be doing wrong?
Is this library recommended for such "big" models?

Expression balancing

The implementation of summation over tree scores in HLS backend prevents automatic expression balancing when saturation and rounding are used in the score precision type.

    Trees:
    for(int i = 0; i < n_trees; i++){
      Classes:
      for(int j = 0; j < fn_classes(n_classes); j++){
        score[j] += scores[i][j];
      }
    }

This should be replaced with a balanced tree reduce implementation.

Type error 'module' object is not callable

I get the same error as an other user running the example code from the HLS4ML tutorial or the Conifer Usage section.
Running this line causes an error:
model = conifer.model(clf, conifer.converters.sklearn,
conifer.backends.xilinxhls, cfg)

I have looked into the model.py myself but could not figure out which class should be called..

Is there a solution to this problem?
Thanks in advance!

The other user with the same problem:
fastmachinelearning/hls4ml-tutorial#37

XGBoost 2.0.0

The XGBoost converter is not working for XGBoost 2.0.0. This issue is for tracking the development of the fix.

README links broken

If you click the links in the README for the json or Xilinx headers you are directed to a non existent github page: https://github.com/nlohmann/json)%5Bhere%5D looks like an issue in the markdown, link still works if you remove the extra characters at the end.

VHDL Backend : Missing assignement in AddReduce.vhd ?

Hi,

I've started working with this tool. I tried to convert a simple xgboost tree into VHDL. Conifer is creating the files without a problem, however when I try to import them in Vivado it's a mess. After struggling a bit with the library names (it's my first time using VHDL, usually I'm working with Verilog), I decide to go another route and convert the VHDL code into a Verilog module using GHDL (and/or Yosys).

In case someone is curious, that's how I'm currently doing it :

ghdl -a Constants.vhd
ghdl -a Types.vhd
ghdl -a AddReduce.vhd
ghdl -a Arrays0.vhd
ghdl -a Tree.vhd
ghdl -a BDT.vhd
ghdl -a BDTTop.vhd

ghdl --synth --out=verilog bdttop > BDTTop.v

It's seems to be working, and while my VHDL attemps crashed Vivado, the Verilog file plays very nicely.

However I get following warnings :

AddReduce.vhd:28:11:warning: declaration of "addreduce" hides entity "addreduce" [-Whide]
component AddReduce is
          ^
AddReduce.vhd:59:10:warning: no assignment for offsets 0:17 of signal "dint"
  signal dInt : tyArray(0 to intLen - 1) := (others => (others => '0'));
         ^
AddReduce.vhd:59:10:warning: no assignment for offsets 0:17 of signal "dint"
  signal dInt : tyArray(0 to intLen - 1) := (others => (others => '0'));

And indeed, when I try to run a simulation with Vivado I find that the output signal from the tree is 'XXXXXXXX', and I have some internal ZZZs .

Is this an issue from your side, or did something go wrong during the translation process ?

(I'm using Vivado 2020.1)

Regression

The HLS backend gives wrong results on an example regression task, for unknown reason.

Running the sklearn_regression.py example with conifer.backends.vivadohls gives wrong results. VHDL backend gives correct results.

y_skl[:10]
array([15.37657419, 18.24400667, 22.7363659 , 25.21349792, 21.54568329,
       23.23484561, 23.23484561, 17.9478476 , 21.42883943, 14.77063661])

y_hdl[:10]
array([15.37651062, 18.24401855, 22.73638916, 25.21348572, 21.54573059,
       23.23486328, 23.23486328, 17.94779968, 21.42884827, 14.77056885])

y_hls[:10]
array([28.4943, 29.7487, 28.3512, 26.6453, 27.1684, 25.1494, 25.1494,
       27.075 , 28.1972, 25.4532])

Vivado HLS build broken

PR #53 probably broke synthesis for Vivado HLS (while it works with more recent Vitis HLS).

In the HLS build log:

ERROR: [HLS 200-101] 'open_solution': Unknown option '-flow_target'.

This will need some care to find a solution that works across all versions, but could be handled either in the Python HLS backend writer or the TCL script itself.

Not working on Windows ?

Hey,

I just found your project, and it look awesome :D I wanted to try it out quickly, to see if it's indeed what I needed.

I'm currently working on windows, and it seems like conifer doesn't find vivado hls or vitis hls as I get following errors :

Could not find ap_ headers (e.g., ap_fixed.h). None of XILINX_AP_INCLUDE, XILINX_HLS, XILINX_VIVADO are defined

---------------------------------------------------------------------------
Exception                                 Traceback (most recent call last)
Cell In[15], line 3
      1 # Create and compile the model
      2 model = conifer.converters.convert_from_xgboost(xgbt_model.get_booster(), cfg)
----> 3 model.compile()
      5 # Synthesize the model
      6 model.build()

File ~\anaconda3\envs\connifer-octave\Lib\site-packages\conifer\backends\xilinxhls\writer.py:454, in XilinxHLSModel.compile(self)
    452 if ap_include is None:
    453     os.chdir(curr_dir)
--> 454     raise Exception("Couldn't find Xilinx ap_ headers. Source the Vivado/Vitis HLS toolchain, or set XILINX_AP_INCLUDE environment variable.")
    455 cmd = f"g++ -O3 -shared -std=c++14 -fPIC $(python3 -m pybind11 --includes) {ap_include} bridge.cpp firmware/BDT.cpp firmware/{cfg.project_name}.cpp -o conifer_bridge_{self._stamp}.so"
    456 logger.debug(f'Compiling with command {cmd}')

Exception: Couldn't find Xilinx ap_ headers. Source the Vivado/Vitis HLS toolchain, or set XILINX_AP_INCLUDE environment variable.

After struggling a bit with it, I've looked into the code and found this :

conifer/conifer/backends/xilinxhls/writer.py

Lines 19 to 28 in 086c82e

 def get_tool_exe_in_path(tool): 

 if tool not in _TOOLS.keys(): 

 return None 

 tool_exe = _TOOLS[tool] 

 if os.system('which {} > /dev/null 2>/dev/null'.format(tool_exe)) != 0: 

 return None 

 return tool_exe

This looks like a linux-only command. Can conifer only run on Linux ?

Bitstream generation feature

Hello,

I am trying to generate a bitstream directly for the ZCU104. However, when I checked the generated HLS project found out that:
1- There is no AXI bridge.
2- there is no TCL script to generate the bitstream.
I thought because I am using the Xilinxhls backend, but it seems this is the same case for the other backend.

Sould make this manually, Or I missed something here?

RF results

Hello there,

I currently am experimenting RandomForests from sklearn into a ZCU102 board. I first tried with the classic HLS/Vivado/Vitis flow but was struggling with the results. I tried using pynq + the hls accelerator and my results are still weird.

So, for the example I am using the basic wine dataset from sklearn, with a RF (100 trees with a max depth of 100).
With sklearn I obtain these predictions: (using clf.predict_proba), which are fine
[0.97 0.03 0. ]
[0.93 0.05 0.02]
[0.06 0.12 0.82]
[0.91 0.08 0.01]
[0.07 0.85 0.08]

Then, with the model converted and compiled I obtain this : (using model.decision_function)
[ 8.59375000e-01 6.23525391e+01 2.60214844e+01]
[ 7.51953125e-01 -3.56474609e+01 2.61230469e+01]
[ 1.75781250e-01 8.43525391e+01 2.62246094e+01]
[ 7.03125000e-01 -8.66474609e+01 2.63261719e+01]
[ 2.83203125e-01 -9.96474609e+01 2.64277344e+01]
These results are strange and I don't understand them, what would be the explanation about them ?

Finally, on the PL, here are the results provided by accelerator.decision_function(np.float32(X_test))
[0.859375 0. 0. ]
[0.7519531 0. 0. ]
[0.17578125 0. 0. ]
[0.703125 0. 0. ]
[0.28320312 0. 0. ]
These one correspond to the precedent results given by the converted model.

For the conversion I used the examples :
clf = RandomForestClassifier(n_estimator=100, max_depth=100)
clf.fit(X_train, X_test)

cfg = conifer.backends.xilinxhls.auto_config()
accelerator_config = {'Board' : 'zcu102',
'InterfaceType': 'float'}
cfg['AcceleratorConfig'] = accelerator_config
cfg['OutputDir'] = 'prj_{}'.format(int(datetime.datetime.now().timestamp()))

model = conifer.converters.convert_from_sklearn(clf, cfg)
model.compile()

y_hls = model.decision_function(X_test)
y_skl = clf.predict_proba(X_test)

model.build(bitfile=True, package=True)

What am I doing wrong ?
Thank you in advance

xgboost example VivadoHLS segfault

Hi,

I tried to run xgboost_to_hls.py example, but Vivado HLS fails with segmentation fault
It's probably something with the array ranges?

INFO: [XFORM 203-11] Balancing expressions in function 'hls4ml_burr' (firmware/hls4ml_burr.cpp:4)...4 expression(s) balanced.
Stack dump:
0.	Running pass 'Function Pass Manager' on module '~/ip_cores/hls4ml_burr/build/hls4ml_burr_prj/solution1/.autopilot/db/a.o.1.tmp.bc'.
1.	Running pass 'Instruction simplification' on function '@"BDT::Tree<3, ap_fixed<18, 8, (ap_q_mode)5, (ap_o_mode)3, 0> [10], ap_fixed<18, 8, (ap_q_mode)5, (ap_o_mode)3, 0>, ap_fixed<18, 8, (ap_q_mode)5, (ap_o_mode)3, 0> >::decision_function"'
Abnormal program termination (11)
Please check '~/hls4ml_burr/build/hs_err_pid27358.log' for details
segfault in /tools/Xilinx/Vivado/2020.1/bin/unwrapped/lnx64.o/vivado_hls -exec vivado_hls -f build_prj.tcl reset=1 csim=0 synth=1 cosim=0 validation=0 export=1 vsynth=0, exiting...
Makefile:12: recipe for target 'build_hls' failed
make: *** [build_hls] Error 139

See the project archive for more details
https://justbeamit.com/q9zra

xgboost precision

Hi,

I found an issue with xgboost example
https://github.com/thesps/conifer/blob/master/examples/xgboost_to_hls.py

y_hls and y_xgb aren't close

y_hls = expit(model.decision_function(X_test))
y_xgb = bst.predict(dtest)
diff  =  y_xgb  - y_hls 
print(diff[abs(diff)>0.05])

[-0.13502171  0.06955624 -0.1099674  -0.2427507  -0.14311438 -0.0606428
  0.08703702 -0.054607   -0.41907781 -0.12813512  0.28282228 -0.21637464
  0.31876776  0.26711339 -0.14989728 -0.05887845 -0.06809392  0.12303647
 -0.08492118 -0.07751923 -0.05739652 -0.11599926 -0.14425865 -0.08459726
 -0.12540119 -0.06227853 -0.27874367 -0.29141373  0.12563779 -0.22311496
 -0.13287621 -0.17924546 -0.10041202]

As soon as output is normalized to 1, absolute error up to 0.31 seems to be too high for practical usage.
Is it a known issue?

`ImportError: cannot import name 'backends'` when importing conifer

Hi,

Importing conifer after having installed it from pip (using pip install conifer) raises an error.

Here are the details:

Python v3.10.6

Conifer v0.4

Content of conifer_test.py file:

import conifer

On Windows

Traceback (most recent call last):
  File "G:\projects\work\test_conifer\conifer_test.py", line 1, in <module>
    import conifer
  File "G:\projects\work\test_conifer\conifer.py", line 2, in <module>
    from conifer import backends
ImportError: cannot import name 'backends' from partially initialized module 'conifer' (most likely due to a circular import) (G:\projects\work\test_conifer\conifer.py)

On Linux WSL2

achoum@abc:/mnt/g/projects/work/test_conifer$ python3 conifer_test.py
Traceback (most recent call last):
  File "/mnt/g/projects/work/test_conifer/conifer_test.py", line 1, in <module>
    import conifer
  File "/mnt/g/projects/work/test_conifer/conifer.py", line 2, in <module>
    from conifer import backends
ImportError: cannot import name 'backends' from partially initialized module 'conifer' (most likely due to a circular import) (/mnt/g/projects/work/test_conifer/conifer.py)

Cheers :)

sklearn_to_hls.py example issues

I just downloaded the package and went to run the sklearn_to_hls.py in the example directory but ran into a few issues. To run, I did "python sklearn_to_hls.py". Here were the issues and my current solutions that you may want to consider including:

Error 1: import conifer ModuleNotFoundError: No module named 'conifer'
Solution 1: Include init.py script in the top directory where the conifer directory is so that is can be seen as a package, then move the sklearn_to_hls.py script to the same top directory and run. There may be a better solution here, but in general the scripts in the examples directory cannot see the conifer directory and the conifer directory is not currently viewed as a package by python.

Error 2: y_hls = model.decision_function(X)[:,0] IndexError: too many indices for array
Solution 2: y_hls is a 1D array so the [:,0] was the issue here. If you take this out then the issue is fixed. Every other script in example does not have [:,0] after model.decision_function so this is the only place where this will need to be fixed.

I can submit a pull request for my solution to error 2. For error 1, perhaps you have a better solution?

XGBoost feature error

Hi all,

I am trying to do:

cnf = conifer.model(clf._Booster, conifer.converters.xgboost, conifer.backends.vivadohls, cfg)
cnf.compile()

But I get:

     82       data = node.split('<').     
---> 83       feature = int(data[0].split('[')[-1].replace('f','')).     
     84       threshold = float(data[1].split(']')[0]).      
     85       child_left = int(node.split('yes=')[1].split(',')[0]).    

ValueError: invalid literal for int() with base 10: 'busy_cycles'

Printing 'node' I have:
0:[busy_cycles<299923] yes=1,no=2,missing=1

busy_cycles is the name of a feature in my dataset. Is line 83 trying to convert it to int?

Exception: Couldn't find Xilinx ap_ headers

I am running an Ubuntu virtual machine (Ubuntu Desktop 18.04 LTS and have installed Vivado 2019.2. I have created a python environment and have installed conifer using pip install conifer.

However, when running the example scripts I run into the following error:

Exception: Couldn't find Xilinx ap_ headers. Source the Vivado/Vitis HLS toolchain, or set XILINX_AP_INCLUDE environment variable.

What is the fix here?

Random Forest

The RandomForestClassifier in scikit-learn is similar to the GradientBoostingClassifier in that an ensemble of Decision Trees is used. For binary classification tasks, I think conifer already works for RandomForestClassifiers. However, for multi-class problems, whereas GradientBoostingClassifier learns a different tree for each class at each 'estimator', the RandomForestClassifier trains a single tree with multiple output scores. So, RandomForestClassifier with multi-class problems won't work at the moment, but should not be too challenging to support.

	def get_tool_exe_in_path(tool):
	if tool not in _TOOLS.keys():
	return None

	tool_exe = _TOOLS[tool]

	if os.system('which {} > /dev/null 2>/dev/null'.format(tool_exe)) != 0:
	return None

	return tool_exe

thesps / conifer Goto Github PK

conifer's People

Contributors

Stargazers

Watchers

Forkers

conifer's Issues

Recommend Projects

Recommend Topics

Recommend Org