vallettea / koala Goto Github PK
View Code? Open in Web Editor NEWTranspose your Excel calculations into python for better performances and scaling.
License: GNU General Public License v3.0
Transpose your Excel calculations into python for better performances and scaling.
License: GNU General Public License v3.0
An idea is to use this kind of strategy:
https://axialcorps.com/2013/09/27/dont-slurp-how-to-read-files-in-python/
The problem seems to be that we handle gzip
files. An alternative solution could be https://docs.python.org/3/library/zlib.html#zlib.decompressobj, but we need to know the 'window size' (wbits
), which i don't think we do in advance...
See #12.
When you have an input that affect the value of a volatile, your whole calculation will be broken if you use Spreadsheet.clean_volatiles()
.
Detecting in advance which files are concerned is then a nice feature.
Exception: Problem evalling: cells and values in a Range must have the same size
There is a cache
dictionary in the Spreadsheet.clean_volatiles()
function, whose purpose is to reduce the amount of expression calculated, when the formula is the same as one previously found.
The problem is that sometimes, the same formula, called from a different cell, will evaluate differently.
This might lead to bad evaluations, and might explain #44.
But performance might be impacted.
excellib.vdb()
doesn't output exactly the same result when using partial start_period
or end_period
(meaning, floats)
This is to avoid to eval()
at each node of the graph, which takes quite a long time.
When use the OFFSET height and width so that the output Cell is actually a Range, it is most probable that this output doesn't exist in the cellmap, leading to errors.
White spaces in formulas are a problem:
clean_volatiles()
function might end up not replacing parts of formula since revert_rpn()
(which outputs the part of the formula to replace) returns a formula without white spaces.The current set up is not replacing white spaces.
Formulas like:
=(totalDecom-SUM(INDEX(FA_RecCostsDecom;1;1):INDEX(FA_RecCostsDecom;1;CA_Periods-1)))*Deprec_UOPRates
when calculated on a cell referenced as FA_RecCostsDecom
trigger infinite loop.
This is because currently our koala algorithm reevaluates a range each time it sees it in a formula.
A good way to handle this would be to store Ranges (in a koala sense) in a Spreadsheet.range_dict
object so that when koala encounters a Range it already knows, it can directly use the values without reevaluating the Range (then avoiding the infinite loop).
2 problems though:
This line should output an ExcelError
.
Currently, 2 different tokenizers are used in Koala:
koala/ast/tokenizer.pyx
)koala/openpyxl/tokenizer.py
).We need to merge the 2 into one to avoid complexity.
It might be interesting to merge CellRange
and Range
into one unique class, for clarity purposes.
https://github.com/anthill/koala/blob/master/koala/ast/excelutils.py#L15
https://github.com/anthill/koala/blob/master/koala/ast/Range.py#L118
When you have inputs that can modify cells with formulas containing INDEX
or OFFSET
, you don't want to pre parse your formula to clean the volatiles.
So you need to able to calculate entirely your workbook (even if it takes a great amout of time).
Currently, this is not possible and leads to evaluation errors due to bad parsing of ":" characters.
A generic mode addressing this case needs to be available.
Most of the Excel Error Codes i've put so far are "#N/A" or "#VALUE!", we might need more precision.
We might need to find a way to be more explicit about the errors
Volatile functions in Excel are functions that always trigger evaluation (see: http://www.decisionmodels.com/calcsecretsi.htm)
What we have called "volatiles" in our code is actually functions that output a reference to a cell, which is not the same.
For the sake of clarity, we need to rename what we call volatiles in our code.
Formulas like A1:OFFSET(A1,0,1)
lead to errors when you don't clean_volatiles()
.
This issue overruns #25.
Our current strategy is not to fill Ranges
with empty cells.
But this might lead to apply_all
operations on Ranges
with different sizes, raising an Exception
.
We might need to consider filling the missing cells values with zeros on such occasions.
See #16.
we don't need to remove all index (only the one that give address) and not the one giving back a value. For the moment, we remove all.
When you prune your graph, the cellmap of the reduced graph has a smaller nb of cells than the original cellmap. But Ranges
have been created with the original cellmap, so they might have a valid reference to a cell that doesn't exist in the reduced cellmap.
This problem get solved by dumping/loading the graph, since Ranges
are recreated from the reduced cellmap.
But still, such inconsistency should be addressed
Related to #17.
We need to understand exactly where we gain perfs and where we simplify the graphs.
A detailed benchmark is then needed.
The main 3 options we've added are:
For each of these options, we want to know:
gen_graph
?set_value
?evaluate
?Cells with a formula that implies a volatile have indirect links to other cells.
On set_value()
, are these Cells correctly reset ?
Technically, ExcelCompiler
reads the Excel, and does the initialization of Spreadsheet
, not much more.
We could imagine all of this done in Spreadsheet
class.
Not urgent at all.
Just in the future we will need to automatically launch tests before committing.
But before that, we need to structure a little bit our testing procedure.
Currently, it is necessary to call ExcelCompiler.clean_volatiles()
, which will call the Spreadsheet
equivalent.
But calling directly Spreadsheet.clean_volatiles()
won't generate a new graph.
Opening this possibility requires to rethink how ast.__init__()
works.
When using a certain number of inputs/ouputs (say, 15 each), the loop looks like infinite.
Ranges with empty cells end up being sparse Ranges (meaning they lack some Cell references).
Then using Spreadsheet.set_value()
might result in setting incorrect values on Cells.
This part of the code was necessary at one point because the update of Range
s was not optimal.
This ended up in infinite loops for some files, but this might have been fixed by these lines.
We need to check that this is the case.
excelutils.py l424,* flatten method* for cells values:
if isinstance(el, collections.Iterable) and not isinstance(el, basestring):
@iOiurson How do you feel with that ?
Make it clearer
As counta counts the number of cells whose value are not none, setvalue with reset could potentially alter the result.
After some experiments, fixing a cell in the middle of a calculation chain has proven to output fixed results. More investigation is needed.
We use the numpy
version of Excel IRR function.
But for file 230, Calculations!CU273
does not output expected value.
It seems final evaluation value is correct though ...
When a Range
is partial due to empty cells, the find_associated_values()
method won't work if the reference cell is associated to an empty cell, since this cell will be missing in the Range
.
This is when using get_arguments_from_ast()
and get_volatiles_arguments_from_ast()
.
Only first and last parents are considered.
For now we have removed these cases temporarily.
Evaluation does not always give the same result.
Example:
With InputData!G14
as 2018
in the .XLS,
print 'First evaluation', sp.evaluate('Cashflow!G187') # => outputs -2966.25862693
sp.set_value('InputData!G14', 0) # this is to avoid direct evaluation
sp.set_value('InputData!G14', 2025)
print 'Second evaluation', sp.evaluate('Cashflow!G187') # => outputs -3719.5504961
With InputData!G14
as 2025
in the .XLS,
print 'First evaluation', sp.evaluate('Cashflow!G187') # => outputs -2582.30664008
sp.set_value('InputData!G14', 0) # this is to avoid direct evaluation
sp.set_value('InputData!G14', 2025)
print 'Second evaluation', sp.evaluate('Cashflow!G187') # => outputs -2582.30663952
A declarative, efficient, and flexible JavaScript library for building user interfaces.
๐ Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.
TypeScript is a superset of JavaScript that compiles to clean JavaScript output.
An Open Source Machine Learning Framework for Everyone
The Web framework for perfectionists with deadlines.
A PHP framework for web artisans
Bring data to life with SVG, Canvas and HTML. ๐๐๐
JavaScript (JS) is a lightweight interpreted programming language with first-class functions.
Some thing interesting about web. New door for the world.
A server is a program made to process requests and deliver data to clients.
Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.
Some thing interesting about visualization, use data art
Some thing interesting about game, make everyone happy.
We are working to build community through open source technology. NB: members must have two-factor auth.
Open source projects and samples from Microsoft.
Google โค๏ธ Open Source for everyone.
Alibaba Open Source for everyone
Data-Driven Documents codes.
China tencent open source team.