idramalab / lambretta Goto Github PK

Source code and labeled dataset for Lambretta

Python 100.00%

lambretta's Issues

fetch_results.py -> cc, results, querystring is undefined

On line 43 cc is undefined, and what does it stand for?

    for item in splitter:
        addedawk+='/'+item+'/'
        if cc<len(splitter):#for the last item we don't need && 
            addedawk+=' && '
    addedawk+="' "+awk_source_path+" > cmdtmp";
    cmd=awkroot+addedawk
    print("Running for query :  ",querystring," command ",cmd)

On line 76 and 80 results and querystring is undefined

        result=awk_query(query)
        with open(awk_output_export_path,"a+") as of:
            json.dump({"keyword":query,"data":results},of)
            of.write("\n")
    except Exception as err:
        ff=open("awkerrors_test.txt","a+")
        ff.write("quer"+querystring+" Error : "+str(err))

Reproduction:

On line 20 replace (Because of #4 )

xx=open("candidates_test.txt")

with

xx=open("candidate_queries.txt")

Run

python3 ./fetch_results.py

Python gives this error:

Python is not reporting results because it has not been encountered in the execution path.

 json.dump({"keyword":query,"data":results},of)

Generated awk_output_test.json is not valid JSON

The generated output is of type

File: awk_output_test.json

{"keyword": "value", "data": null}
{"keyword": "value", "data": null}
{"keyword": "value", "data": null}
{"keyword": "value", "data": null}

Which is not valid JSON

fetch_results.py awk_output_export_path is hardcoded

If the path you have specified does not exist by default, attempting to access it using Python will result in an error.

awk_output_export_path="/data/ppaudeldata/VoterFraud/awk_output_test.json"

How to generate candidates_test.txt

in fetch_results_py line 20 candidates_test.txt file is being opened how do I generate the file before it is opened

xx=open("candidates_test.txt")
search=[]
for x in xx:
    x=x.rstrip()
    search.append(x)

candidate_query_generator.py -> raw_cleaned_tweet is undefined

In candidate_query_generator.py on line 39-40

 raw_cleaned_claim.append(item.strip())
 cleaned_tweet=' '.join(raw_cleaned_tweet).strip()

raw_cleaned_tweet is undefined

Perhaps raw_cleaned_claim is the recommended or intended variable to use instead

Replacing raw_cleaned_tweet with raw_cleaned_claim fixes it (Correct me if I am wrong)

Reproduction:

Run

python3 ./candidate_query_generator.py

Python reports

Undefined variable `total` in ` generate_semantic_features.py`

The variable total is undefined

        # Creating spanning subset
        left_join = sorted_data[0:int(0.2*len(sorted_data))]
        right_join = sorted_data[-int(0.2*len(sorted_data)):]
        mid = int(len(sorted_data)*0.5)
       # total 👇 👇👇
        mid_join = sorted_data[mid-(int(total*0.1)):mid+(int(total*0.1))]
        # Sliced data is spanning subset discussed in the paper
        sliced_data = left_join+mid+right_join
        texts = []

In generate_semantic_features.py, if data is null it throws an exception

for x in xx:
    try:
        x = x.rstrip()
        jsx = json.loads(x)
        query = jsx["keyword"]
        query_split = query.split(" ")
        query = ' '.join(query_split)
        print("Working on ... ", query)
        data = jsx["data"]
        if len(data) == 0: # <- error on this line

Please note that the line number printed in the traceback output differs from the actual line number because I imported traceback

Error

object of type 'NoneType' has no len()
Working on ...  change address flags fulton
An exception occurred on line number:
Traceback (most recent call last):
  File "generate_semantic_features.py", line 88, in <module>
    if len(data) == 0:
TypeError: object of type 'NoneType' has no len()

Order of the files to be executed

What is the order of the files to be executed inorder to get the output

The many files in the repo are:

README.md
claimextractor.py
candidate_query_generator.py
fetch_results.py
training_claims.csv
export_all_features_ltr.py
generate_semantic_features.py

export_all_features_ltr.py -> 172 jsx is undefined

jsx is undefined and what does it identify?

'''
Create TF-IDF vectorizer 
'''
#First, generate the document by accumulating all the claims 
docs=[]
for item in jsx:
    tr4w = TextRank4Keyword()
    claim=item["claim"]
    docs.append(jsx["claim"])
cv=CountVectorizer()
word_count_vector=cv.fit_transform(docs)
tfidf_transformer=TfidfTransformer(smooth_idf=True,use_idf=True)
tfidf_transformer.fit(word_count_vector)
def sort_coo(coo_matrix):
    tuples = zip(coo_matrix.col, coo_matrix.data)
    return sorted(tuples, key=lambda x: (x[1], x[0]), reverse=True)
feature_names=cv.get_feature_names()

Reproduction:
Run

python3 ./export_all_features_ltr.py

Python will report

[DOUBDT] awk_output_test.json data is null when top 50 rows of the dataset is used

In order to save time during data processing, I decided to use a partial dataset, specifically the top 50 rows. However, I encountered an unexpected issue where the data field generated in awk_output_test.json is null .

Is it an expected behaviour?

This is the code i used to generate the partial dataset

import pandas as pd

df = pd.read_csv('data_source/data_source.csv', nrows=50)

df.to_csv('data_source/MOD_data_source.csv', index=False)

File: awk_output_test.json

{"keyword": "pennsylvania democrat", "data": null}
{"keyword": "democrat pre-canvass", "data": null}
{"keyword": "pre-canvass vote", "data": null}
{"keyword": "vote liberal", "data": null}
{"keyword": "liberal areas", "data": null}
{"keyword": "areas let", "data": null}
.
.
.

All data fields are null and the file has 7087 lines

idramalab / lambretta Goto Github PK

lambretta's Issues

fetch_results.py -> cc, results, querystring is undefined

Generated awk_output_test.json is not valid JSON

fetch_results.py awk_output_export_path is hardcoded

How to generate candidates_test.txt

candidate_query_generator.py -> raw_cleaned_tweet is undefined

Undefined variable `total` in ` generate_semantic_features.py`

In generate_semantic_features.py, if data is null it throws an exception

Order of the files to be executed

export_all_features_ltr.py -> 172 jsx is undefined

[DOUBDT] awk_output_test.json data is null when top 50 rows of the dataset is used

Recommend Projects

React

Vue.js

Typescript

TensorFlow

Django

Laravel

D3

Recommend Topics

javascript

web

server

Machine learning

Visualization

Game

Recommend Org

Facebook

Microsoft

Google

Alibaba

D3

Tencent