behroozmansouri / tangentcft Goto Github PK

Python 60.95% Perl 0.49% HTML 38.56%

tangentcft's Introduction

TangentCFT

Tangent Combined FastText (Tangent-CFT) is a embedding model for mathematical formulas. When searching for mathematical content, accurate measures of formula similarity can help with tasks such as document ranking, query recommendation, and result set clustering. While there have been many attempts at embedding words and graphs, formula embedding is in its early stages. We introduce a new formula embedding model that we use with two hierarchical representations, (1) Symbol Layout Trees (SLTs) for appearance, and (2) Operator Trees (OPTs) for mathematical content. Following the approach of graph embeddings such as DeepWalk, we generate tuples representing paths between pairs of symbols depth-first, embed tuples using the fastText n-gram embedding model, and then represent an SLT or OPT by its average tuple embedding vector. We then combine SLT and OPT embeddings, leading to state-of-the-art results for the formula retrieval task of NTCIR-12.

Requirements

The codebase is implemented in Python 3.6. Package versions used for development are in requirement.txt file.

Dataset

To evaluate our embedding model we used NTCIR-12 dataset, focusing on formula retrieval task. The collection contains over 590000 mathematical formulas from Wikipedia with 20 formula queries with their relevant formulas. For comparison with previous approaches we used bpref score to evaluate the top-1000 relevant formulas. Also one can easily use anydataset, such as [Math Stach Exchange] (https://math.stackexchange.com/), in form of csv file of latex formula and formula ids (separated by $$ sign) to train a new model.

Running TangentCFT

Here are the steps to do the Tangent-CFT embeddings. The first step to run TangentCFT is to set the configuration file which are the parameters for fastText. Also, one can specify the directory to save the output vector for each of the formulas for further analysis. The configuration file should be in Configuration directory, under the config directory with file name in format of config_x where x show the run id. Here is an example of configuration file:

context_window_size,20
hs,0
id,1
iter,30
max,6
min,3
negative,20
ngram,1
result_vector_file_path,None
skip_gram,1
vector_size,300

The next step is to decide to train a cft model. Here is a command to train and do retrieval with SLT representation:

python3 tangent_cft_front_end.py -ds "/NTCIR12_MathIR_WikiCorpus_v2.1.0/MathTagArticles" -cid 1  -em slt_encoder.tsv --mp slt_model --rf slt_ret.tsv --qd "/TestQueries" --ri 1

The command above, uses the configuration file, with id 1, use the NTCIR 12 dataset to train the model based on slt representation and saves the encoding map in slt_encoder.csv file and the cft model in file slt_model. The retrieval result with SLT representation is saved in file slt_ret.tsv Next, use the following command to do the same for OPT representation:

python3 tangent_cft_front_end.py -ds "/NTCIR12_MathIR_WikiCorpus_v2.1.0/MathTagArticles" --slt False -cid 2  -em opt_encoder.tsv --mp opt_model --tn False --rf opt_ret.tsv --qd "/TestQueries" --tn False --ri 2

Finally, to train the cft model on SLT Type representation and do the retrieval use the following command:

python3 tangent_cft_front_end.py -ds "/home/bm3302/Downloads/NTCIR12_MathIR_WikiCorpus_v2.1.0/MathTagArticles" -cid 3  -em slt_type_encoder.tsv --mp slt_type_model --rf slt_type_ret.tsv --qd "/TestQueries" --et 2 --tn False --ri 3

The following three commands are used to train model on each representation and do retrieval. However, Tangent-CFT model, combines the three vector representations. Therefore, after training, use the following command to combine the retrieval results:

python3 tangent_cft_combine_results.py

The retrieval result will be saved in file cft_res.

Checking the retrieval results. After the model is trained and the retrieval is done, the results are saved the directory "Retrieval_Results". In each line of the result file, there is the query id followed by relevant formula id, its rank, the similarity score and run id. TangentCFT results on NTCIR-12 dataset is Retrieval_Results directory as the sample. To evaluate the results, the judge file of NTCIR-12 task, is located in the Evaluation directory with Trec_eval tool. This file is different from the original NTCIR-12 judge file. There are some formula ids with special characters that in our model we have changed (normlized) their name, therefore, we normalized their name in judge file as well.

Reproducability error.

References

Please cite Tangent-CFT: An Embedding Model for Mathematical Formulas paper. (Mansouri, B., Rohatgi, S., Oard, D. W., Wu, J., Giles, C. L., & Zanibbi, R. (2019, September). Tangent-CFT: An Embedding Model for Mathematical Formulas. In Proceedings of the 2019 ACM SIGIR International Conference on Theory of Information Retrieval (pp. 11-18). ACM.)

tangentcft's People

Contributors

Stargazers

Watchers

Forkers

daiyifan arqmath mir-mu w32zhong kkxcam aspnetcs peterguoruc hhchen1105 metif12 dezow jer-ry vknyazkova

tangentcft's Issues

is the res_tangent_cft file representing the reported result in the paper?

Hi ! I was trying to reproduce the bpref result in your paper by running trec_eval with the res_tangent_cft file in this repo. The results were a bit off from the reported Tangent-CFT bpref in the paper... Am I doing something wrong?

what I did:
trec_eval -l3 judge.dat res_tangent_cft | grep -v "docno" | grep -E "(^P|bpref)" for full bpref and

trec_eval -l1 judge.dat res_tangent_cft | grep -v "docno" | grep -E "(^P|bpref)" for partial bpref.

The results I got were 0.5685 for full bpref and 0.6824 for partial bpref. In the paper those numbers are 0.6 and 0.71.

Thanks in advance for your help and clarification !

RuntimeError: you must first build vocabulary before training the model

please help....

encoding train data...
training the fast text model...
Setting Configuration
Training the model
Traceback (most recent call last):
File "/a/Tangent/TangentCFT-master/tangent_cft_front_end.py", line 92, in
main()
File "/a/Tangent/TangentCFT-master/tangent_cft_front_end.py", line 61, in main
dictionary_formula_tuples_collection = system.train_model(
File "/a/Tangent/TangentCFT-master/tangent_cft_back_end.py", line 39, in train_model
self.module.train_model(self.config, list(dictionary_formula_tuples_collection.values()))
File "/a/Tangent/TangentCFT-master/tangent_cft_module.py", line 30, in train_model
self.model.train(configuration, lst_lst_encoded_tuples)
File "/a/Tangent/TangentCFT-master/tangent_cft_model.py", line 37, in train
self.model = FastText(fast_text_train_data, vector_size=size, window=window, sg=sg, hs=hs,
File "/home/sachin/miniconda3/envs/my_env/lib/python3.10/site-packages/gensim/models/fasttext.py", line 435, in init
super(FastText, self).init(
File "/home/sachin/miniconda3/envs/my_env/lib/python3.10/site-packages/gensim/models/word2vec.py", line 427, in init
self.train(
File "/home/sachin/miniconda3/envs/my_env/lib/python3.10/site-packages/gensim/models/word2vec.py", line 1042, in train
self._check_training_sanity(epochs=epochs, total_examples=total_examples, total_words=total_words)
File "/home/sachin/miniconda3/envs/my_env/lib/python3.10/site-packages/gensim/models/word2vec.py", line 1540, in _check_training_sanity
raise RuntimeError("you must first build vocabulary before training the model")
RuntimeError: you must first build vocabulary before training the model

i think model.build_vocab(, update=True)
is missing or something please help

is the results in paper in the dir Retrieval_Results ?

run the follow command:
trec_eval -q -c -l 1.0 judge_modified.dat res_tangent_cft

the avg bpref is 0.6824
while the paper is 0.71?

tangent_cft_back_end.py==>ValueError: max() arg is an empty sequence

HI, nice repo here.
However whilst I tried to run
python3 tangent_cft_front_end.py -ds "./NTCIR-12_MathIR_Wikipedia_Corpus/MathTagArticles" -cid 1 -em slt_encoder.tsv --mp slt_model --rf slt_ret.tsv --qd "/TestQueries" --ri 1

I get the following error:

Traceback (most recent call last):
File "tangent_cft_front_end.py", line 88, in
main()
File "tangent_cft_front_end.py", line 62, in main
tokenize_number=tokenize_number
File "/home/TangentCFT/tangent_cft_back_end.py", line 32, in train_model
self.__load_encoder_map(map_file_path)
File "/home/TangentCFT/tangent_cft_back_end.py", line 175, in __load_encoder_map
self.node_id = max(list(self.encoder_map_node.values())) + 1
ValueError: max() arg is an empty sequence

Any suggestions?

how to use my data to run this model

my data is standard latex format and saved it as txt，
just like: g(x,y) = \frac{1}{n^2} \sum_{i=[n/2]}^{n/2} \sum_{j=[n/2]}^{n/2} f(x+i,x+j)
how to extract the formulas become a tuples？

NotImplementedError: Gensim's FastText implementation does not yet support word_ngrams != 1.

Error occur while training on cft model on SLT Type (third cmd)
please help....

Traceback (most recent call last):
File "tangent_cft_front_end.py", line 92, in
main()
File "tangent_cft_front_end.py", line 61, in main
dictionary_formula_tuples_collection = system.train_model(
File "/Final/TangentCFT/TangentCFT-master/tangent_cft_back_end.py", line 41, in train_model
self.module.train_model(self.config, list(dictionary_formula_tuples_collection.values()))
File "/Final/TangentCFT/TangentCFT-master/tangent_cft_module.py", line 29, in train_model
self.model.train(configuration, lst_lst_encoded_tuples)
File "/Final/TangentCFT/TangentCFT-master/tangent_cft_model.py", line 28, in train
self.model = FastText(fast_text_train_data, vector_size=size, window=window, sg=sg, hs=hs,
File "/home/.local/lib/python3.8/site-packages/gensim/models/fasttext.py", line 423, in init
raise NotImplementedError("Gensim's FastText implementation does not yet support word_ngrams != 1.")
NotImplementedError: Gensim's FastText implementation does not yet support word_ngrams != 1.

How to parse latex in dataset

Hi, I found that some formula is written in latex format instead of mathML in dataset. (ex. wpmath0000012/Algebra.html).
As a result, it can't be parsed to training data, and be the corpus while retrievaling.
However, the retrieval result, res_tangent_cft has record the formula, Algebra:0. Hoe does it occur?
I tried to complete the TODO part in math_extractor.py for parsing latex. But, it still has bug. Is there complete version for the part?
Thanks.

where are the 20 fomula queries in the NTCIR-12dataset？

I only find many html files and some FormulaStats and filecounts files in dataset. But no file is called 'query' and I couldn't find anything about query in CorpusOverview.md. Could anyone help me? any help would be appreciated!

ValueError: cannot reshape array of size 150 into shape (1,300)

Please help...

Traceback (most recent call last):
File "tangent_cft_front_end.py", line 92, in
main()
File "tangent_cft_front_end.py", line 69, in main
retrieval_result = system.retrieval(dictionary_formula_tuples_collection,
File "/TangentCFT-master/tangent_cft_back_end.py", line 68, in retrieval
tensor_values, index_formula_id = self.module.index_collection_to_tensors(dictionary_formula_tuples_collection)
File "/TangentCFT-master/tangent_cft_module.py", line 46, in index_collection_to_tensors
xx = xx.reshape(1, 300)
ValueError: cannot reshape array of size 150 into shape (1,300)

Thank you

RuntimeError: you must first build vocabulary before training the model

When I try to train the model on the NTCIR-12 data, following the instructions in the readme, the call to gensim's FastText (line 30 in tangent_cft_model.py) throws "RuntimeError: you must first build vocabulary before training the model".

I'm using gensim 3.4.0 as instructed in requirements.txt. Is the version information incorrect, maybe? Or is there an undocumented step that I'm supposed to perform first? Thanks in advance for any help.

TypeError: unsupported operand type(s) for /: 'NoneType' and 'int'

Hello @BehroozMansouri @ARQMath , this repo is very helpful, but when I ran tangent_cft_combine_results.py I got the error below:
Traceback (most recent call last):
File "/workspace/formula1/TangentCFT/tangent_cft_module.py", line 115, in __get_vector_representation
temp_vector = temp_vector + self.model.get_vector_representation(encoded_tuple)
File "/workspace/formula1/TangentCFT/tangent_cft_model.py", line 44, in get_vector_representation
return self.model.wv[encoded_math_tuple]
File "/opt/conda/lib/python3.7/site-packages/gensim/models/keyedvectors.py", line 169, in getitem
return self.get_vector(entities)
File "/opt/conda/lib/python3.7/site-packages/gensim/models/keyedvectors.py", line 277, in get_vector
return self.word_vec(word)
File "/opt/conda/lib/python3.7/site-packages/gensim/models/keyedvectors.py", line 1622, in word_vec
raise KeyError('all ngrams for word %s absent from model' % word)
KeyError: 'all ngrams for word \x01⍸\x0fᆞ\x04 absent from model'
ERROR:root:'all ngrams for word \x01⍸\x0fᆞ\x04 absent from model'
Traceback (most recent call last):
File "/workspace/formula1/TangentCFT/tangent_cft_module.py", line 115, in __get_vector_representation
temp_vector = temp_vector + self.model.get_vector_representation(encoded_tuple)
File "/workspace/formula1/TangentCFT/tangent_cft_model.py", line 44, in get_vector_representation
return self.model.wv[encoded_math_tuple]
File "/opt/conda/lib/python3.7/site-packages/gensim/models/keyedvectors.py", line 169, in getitem
return self.get_vector(entities)
File "/opt/conda/lib/python3.7/site-packages/gensim/models/keyedvectors.py", line 277, in get_vector
return self.word_vec(word)
File "/opt/conda/lib/python3.7/site-packages/gensim/models/keyedvectors.py", line 1622, in word_vec
raise KeyError('all ngrams for word %s absent from model' % word)
KeyError: 'all ngrams for word \x01⍸\x0fᆞ\x04 absent from model'
ERROR:root:'all ngrams for word ǣ҆\x01 absent from model'
Traceback (most recent call last):
File "/workspace/formula1/TangentCFT/tangent_cft_module.py", line 115, in __get_vector_representation
temp_vector = temp_vector + self.model.get_vector_representation(encoded_tuple)
File "/workspace/formula1/TangentCFT/tangent_cft_model.py", line 44, in get_vector_representation
return self.model.wv[encoded_math_tuple]
File "/opt/conda/lib/python3.7/site-packages/gensim/models/keyedvectors.py", line 169, in getitem
return self.get_vector(entities)
File "/opt/conda/lib/python3.7/site-packages/gensim/models/keyedvectors.py", line 277, in get_vector
return self.word_vec(word)
File "/opt/conda/lib/python3.7/site-packages/gensim/models/keyedvectors.py", line 1622, in word_vec
raise KeyError('all ngrams for word %s absent from model' % word)
KeyError: 'all ngrams for word ǣ҆\x01 absent from model'
ERROR:root:'all ngrams for word ΈΈ\x07 absent from model'
Traceback (most recent call last):
File "/workspace/formula1/TangentCFT/tangent_cft_module.py", line 115, in __get_vector_representation
temp_vector = temp_vector + self.model.get_vector_representation(encoded_tuple)
File "/workspace/formula1/TangentCFT/tangent_cft_model.py", line 44, in get_vector_representation
return self.model.wv[encoded_math_tuple]
File "/opt/conda/lib/python3.7/site-packages/gensim/models/keyedvectors.py", line 169, in getitem
return self.get_vector(entities)
File "/opt/conda/lib/python3.7/site-packages/gensim/models/keyedvectors.py", line 277, in get_vector
return self.word_vec(word)
File "/opt/conda/lib/python3.7/site-packages/gensim/models/keyedvectors.py", line 1622, in word_vec
raise KeyError('all ngrams for word %s absent from model' % word)
KeyError: 'all ngrams for word ΈΈ\x07 absent from model'
Traceback (most recent call last):
File "tangent_cft_combine_results.py", line 100, in
main()
File "tangent_cft_combine_results.py", line 80, in main
tokenize_number=True)
File "tangent_cft_combine_results.py", line 28, in get_vectors
tokenize_all, tokenize_number)
File "/workspace/formula1/TangentCFT/tangent_cft_back_end.py", line 81, in get_collection_query_vectors
index_formula_id = self.module.index_collection_to_numpy(dictionary_formula_tuples_collection)
File "/workspace/formula1/TangentCFT/tangent_cft_module.py", line 64, in index_collection_to_numpy
vector = self.__get_vector_representation(dictionary_formula_lst_encoded_tuples[formula])
File "/workspace/formula1/TangentCFT/tangent_cft_module.py", line 119, in __get_vector_representation
return (temp_vector / counter)
TypeError: unsupported operand type(s) for /: 'NoneType' and 'int'

I have already ran slt, opt and slt_type command as README guide, looking foward to your help, thank you!

TypeError: encode_tuples() takes from 4 to 8 positional arguments but 9 were given

Getting this error while trying to run the command python3 tangent_cft_front_end.py -ds "/NTCIR12_MathIR_WikiCorpus_v2.1.0/MathTagArticles" -cid 1 -em slt_encoder.tsv --mp slt_model --rf slt_ret.tsv --qd "/TestQueries" --ri 1

def for encode_tuples

def encode_tuples(self, node_map, node_id, math_tuples, embedding_type=TupleTokenizationMode.Both_Separated,
                      ignore_full_relative_path=True, tokenize_all=False, tokenize_number=True):
        """
        Takes the encoder map (which can be empty) and the last node id and enumerates the tuple tokens to converts the
        tuples to words (with n-gram as each tokenized tuple element) to make the formulas ready to be fed to fasttext
        :param node_map: dictionary of tokens and their id
        :param node_id: the last node id
        :param math_tuples: list of formula tuples (which are extracted by Tangent-S) to be encoded
        :param embedding_type: one of the four possible tokenization model
        :param ignore_full_relative_path: determines to ignore the full relative path or not (default True)
        :param tokenize_all: determines to tokenize all elements (such as numbers and text) (default False)
        :param tokenize_number: determines to tokenize the numbers or not (default True)
        :return:
        """
        self.node_id = node_id
        self.__encoder_map_node = node_map
        encoded_tuples = []
        for math_tuple in math_tuples:
            encoded_slt_tuple = ""
            slt_elements = self.__get_slt_elements(math_tuple, ignore_full_relative_path=ignore_full_relative_path)

            converted_value = self.__convert_node_elements(slt_elements[0], embedding_type,
                                                           tokenize_all=tokenize_all,
                                                           tokenize_number=tokenize_number)
            encoded_slt_tuple = encoded_slt_tuple + converted_value

            converted_value = self.__convert_node_elements(slt_elements[1], embedding_type,
                                                           tokenize_all=tokenize_all,
                                                           tokenize_number=tokenize_number)
            encoded_slt_tuple = encoded_slt_tuple + converted_value

            converted_value = self.__convert_path_elements(slt_elements[2])
            encoded_slt_tuple = encoded_slt_tuple + converted_value
            "Encode the full relative path"
            if not ignore_full_relative_path:
                converted_value = self.__convert_path_elements(slt_elements[3])
                encoded_slt_tuple = encoded_slt_tuple + converted_value
            encoded_tuples.append(encoded_slt_tuple)

        temp_update_list = self.update_list
        self.update_list = {}
        return encoded_tuples, temp_update_list, self.node_id

where encode_tuples is being called

encoded_tuples, update_map_node, update_map_edge, node_id, edge_id = \
            TupleEncoder.encode_tuples(self.encoder_map_node, self.encoder_map_edge, self.node_id, self.edge_id,
                                       list_of_tuples, embedding_type, ignore_full_relative_path, tokenize_all,
                                       tokenize_number)

[LaTeX input] Raised .get_query() error from AbstractDataReader class

Hi, thanks for a very interesting concept and the codebase to try it out.

I try to run the code with the command:

python3 tangent_cft_front_end.py -ds "Dataset/latex_samples.csv" --wiki False -cid 1 -em slt_encoder.tsv --mp slt_model --rf slt_ret.tsv --ri 1

using a LaTeX input (latex_samples.csv) as follows:

\frac12$$01
\frac{3}{2}$$02
\frac{1}{2\pi}\int_{-\infty}^{\infty}e^{-\frac{x^2}{2}}dx$$03
\frac{5}{6}x-\frac{4}{3}y+\frac{5}{4}z={8}$$04
\sqrt{x^2+1}$$05

Unfortunately, I'm receiving that error:

reading train data...
Convert LaTeX to MathML:$\frac12$
Convert LaTeX to MathML:$\frac{3}{2}$
Convert LaTeX to MathML:$\frac{1}{2\pi}\int_{-\infty}^{\infty}e^{-\frac{x^2}{2}}dx$
Convert LaTeX to MathML:$\frac{5}{6}x-\frac{4}{3}y+\frac{5}{4}z={8}$
Convert LaTeX to MathML:$\sqrt{x^2+1}$
0
5
encoding train data...
training the fast text model...
Setting Configuration
Training the model
saving the fast text model...

Traceback (most recent call last):
  File "tangent_cft_front_end.py", line 93, in <module>
    main()
  File "tangent_cft_front_end.py", line 72, in main
    tokenize_number
  File "/TangentCFT/tangent_cft_back_end.py", line 70, in retrieval
    dictionary_query_tuples = self.data_reader.get_query()
  File "/TangentCFT/DataReader/abstract_data_reader.py", line 6, in get_query
    raise NotImplementedError
NotImplementedError

I also got rid of the missing argument issue in that class:

class AbstractDataReader:
(...)
    def get_query(self):
        raise NotImplementedError

because of the line dictionary_query_tuples = self.data_reader.get_query() in the /TangentCFT/tangent_cft_back_end.py.

Do I understand it correctly that .csv input uses the MSEDataReader class, and that there is not implemented .get_query() method in that class and that's why the error occurs? The error, of course, is not showing when --r False flag is set (no retrieval, so no get query method needed), but I'm not sure if that's the case and maybe I'm missing the point of the whole csv data usage.

Could you please shed a light on the LaTeX formulas (in .csv) usage with your code? Thank you!

ERROR:root:'all ngrams for word \ueae8\uea8c𑇗Ǵ absent from model' when doing retrieval.

I searched for this error, and it was caused by the word is not present in the training vocabulary. Hence, the FastText model cannot return a meaningful word vector for the input word.
But I already used all the training data in "MathtagArticles" directory.
Is there anything I miss?

ERROR:root:'all ngrams for word \ueae8\uea8c𑇗Ǵ absent from model'
Traceback (most recent call last):
File "/app/TangentCFT/tangent_cft_module.py", line 115, in __get_vector_representation
temp_vector = temp_vector + self.model.get_vector_representation(encoded_tuple)
File "/app/TangentCFT/tangent_cft_model.py", line 45, in get_vector_representation
return self.model.wv[encoded_math_tuple]
File "/home/user/miniconda/lib/python3.9/site-packages/gensim/models/keyedvectors.py", line 169, in getitem
return self.get_vector(entities)
File "/home/user/miniconda/lib/python3.9/site-packages/gensim/models/keyedvectors.py", line 277, in get_vector
return self.word_vec(word)
File "/home/user/miniconda/lib/python3.9/site-packages/gensim/models/keyedvectors.py", line 1622, in word_vec
raise KeyError('all ngrams for word %s absent from model' % word)
KeyError: 'all ngrams for word \ueae8\uea8c𑇗Ǵ absent from model'

The error appear many times with different word.
But after it, it still produced "slt_ret.tsv" file, will it cause any problem for the retrieval result?

can only concatenate str (not "NoneType") to str

when I input the command：python3 tangent_cft_front_end.py -cid 1 -ds '/online/haoyu.hao/formula_semantic/TangentCFT-0909/DataSet/NTCIR-12/MathTagArticles' --slt False -em 'encoder.csv' --mp 'opt_model' --t False --rf res_1

I got

raceback (most recent call last):
File "tangent_cft_front_end.py", line 94, in
main()
File "tangent_cft_front_end.py", line 77, in main
dictionary_formula_tuples_collection = system.load_model(
File "/online/haoyu.hao/formula_semantic/TangentCFT-0909/tangent_cft_back_end.py", line 81, in load_model
dictionary_formula_tuples_collection = self.__encode_train_tuples(embedding_type, ignore_full_relative_path,
File "/online/haoyu.hao/formula_semantic/TangentCFT-0909/tangent_cft_back_end.py", line 27, in __encode_train_tuples
dictionary_lst_encoded_tuples[formula] = self.__encode_lst_tuples(dictionary_formula_slt_tuple[formula],
File "/online/haoyu.hao/formula_semantic/TangentCFT-0909/tangent_cft_back_end.py", line 35, in __encode_lst_tuples
encoded_tuples, temp_update_list, node_id = self.tuple_encoder.encode_tuples(self.encoder_map, self.node_id,
File "/online/haoyu.hao/formula_semantic/TangentCFT-0909/Embedding_Preprocessing/encoder_tuple_level.py", line 84, in encode_tuples
converted_value = self.__convert_path_elements(slt_elements[2])
File "/online/haoyu.hao/formula_semantic/TangentCFT-0909/Embedding_Preprocessing/encoder_tuple_level.py", line 130, in __convert_path_elements
converted_value += value
TypeError: can only concatenate str (not "NoneType") to str

ValueError: max() arg is an empty sequence (while running the tangent_cft_front_end

self.node_id = max(list(self.encoder_map_node.values())) + 1
ValueError: max() arg is an empty sequence

Can anyone suggest what is the resason solution will be appreciated
Please help
Need solution ASAP