Git Product home page Git Product logo

lambretta's Issues

fetch_results.py -> cc, results, querystring is undefined

On line 43 cc is undefined, and what does it stand for?

    for item in splitter:
        addedawk+='/'+item+'/'
        if cc<len(splitter):#for the last item we don't need && 
            addedawk+=' && '
    addedawk+="' "+awk_source_path+" > cmdtmp";
    cmd=awkroot+addedawk
    print("Running for query :  ",querystring," command ",cmd)

On line 76 and 80 results and querystring is undefined

        result=awk_query(query)
        with open(awk_output_export_path,"a+") as of:
            json.dump({"keyword":query,"data":results},of)
            of.write("\n")
    except Exception as err:
        ff=open("awkerrors_test.txt","a+")
        ff.write("quer"+querystring+" Error : "+str(err))

Reproduction:

On line 20 replace (Because of #4 )

xx=open("candidates_test.txt")

with

xx=open("candidate_queries.txt")

Run

python3 ./fetch_results.py

Python gives this error:

image

Python is not reporting results because it has not been encountered in the execution path.

 json.dump({"keyword":query,"data":results},of)

Generated awk_output_test.json is not valid JSON

The generated output is of type

File: awk_output_test.json

{"keyword": "value", "data": null}
{"keyword": "value", "data": null}
{"keyword": "value", "data": null}
{"keyword": "value", "data": null}

Which is not valid JSON

How to generate candidates_test.txt

in fetch_results_py line 20 candidates_test.txt file is being opened how do I generate the file before it is opened

xx=open("candidates_test.txt")
search=[]
for x in xx:
    x=x.rstrip()
    search.append(x)

candidate_query_generator.py -> raw_cleaned_tweet is undefined

In candidate_query_generator.py on line 39-40

 raw_cleaned_claim.append(item.strip())
 cleaned_tweet=' '.join(raw_cleaned_tweet).strip()

raw_cleaned_tweet is undefined

Perhaps raw_cleaned_claim is the recommended or intended variable to use instead

Replacing raw_cleaned_tweet with raw_cleaned_claim fixes it (Correct me if I am wrong)

Reproduction:

Run

python3 ./candidate_query_generator.py

Python reports
image

Undefined variable `total` in ` generate_semantic_features.py`

The variable total is undefined

        # Creating spanning subset
        left_join = sorted_data[0:int(0.2*len(sorted_data))]
        right_join = sorted_data[-int(0.2*len(sorted_data)):]
        mid = int(len(sorted_data)*0.5)
       # total ๐Ÿ‘‡ ๐Ÿ‘‡๐Ÿ‘‡
        mid_join = sorted_data[mid-(int(total*0.1)):mid+(int(total*0.1))]
        # Sliced data is spanning subset discussed in the paper
        sliced_data = left_join+mid+right_join
        texts = []

image

In generate_semantic_features.py, if data is null it throws an exception

for x in xx:
    try:
        x = x.rstrip()
        jsx = json.loads(x)
        query = jsx["keyword"]
        query_split = query.split(" ")
        query = ' '.join(query_split)
        print("Working on ... ", query)
        data = jsx["data"]
        if len(data) == 0: # <- error on this line

Please note that the line number printed in the traceback output differs from the actual line number because I imported traceback

Error

object of type 'NoneType' has no len()
Working on ...  change address flags fulton
An exception occurred on line number:
Traceback (most recent call last):
  File "generate_semantic_features.py", line 88, in <module>
    if len(data) == 0:
TypeError: object of type 'NoneType' has no len()

Order of the files to be executed

What is the order of the files to be executed inorder to get the output

The many files in the repo are:

  1. README.md
  2. claimextractor.py
  3. candidate_query_generator.py
  4. fetch_results.py
  5. training_claims.csv
  6. export_all_features_ltr.py
  7. generate_semantic_features.py

export_all_features_ltr.py -> 172 jsx is undefined

jsx is undefined and what does it identify?

'''
Create TF-IDF vectorizer 
'''
#First, generate the document by accumulating all the claims 
docs=[]
for item in jsx:
    tr4w = TextRank4Keyword()
    claim=item["claim"]
    docs.append(jsx["claim"])
cv=CountVectorizer()
word_count_vector=cv.fit_transform(docs)
tfidf_transformer=TfidfTransformer(smooth_idf=True,use_idf=True)
tfidf_transformer.fit(word_count_vector)
def sort_coo(coo_matrix):
    tuples = zip(coo_matrix.col, coo_matrix.data)
    return sorted(tuples, key=lambda x: (x[1], x[0]), reverse=True)
feature_names=cv.get_feature_names()

Reproduction:
Run

python3 ./export_all_features_ltr.py 

Python will report
image

[DOUBDT] awk_output_test.json data is null when top 50 rows of the dataset is used

In order to save time during data processing, I decided to use a partial dataset, specifically the top 50 rows. However, I encountered an unexpected issue where the data field generated in awk_output_test.json is null .

Is it an expected behaviour?

This is the code i used to generate the partial dataset

import pandas as pd

df = pd.read_csv('data_source/data_source.csv', nrows=50)

df.to_csv('data_source/MOD_data_source.csv', index=False)

File: awk_output_test.json

{"keyword": "pennsylvania democrat", "data": null}
{"keyword": "democrat pre-canvass", "data": null}
{"keyword": "pre-canvass vote", "data": null}
{"keyword": "vote liberal", "data": null}
{"keyword": "liberal areas", "data": null}
{"keyword": "areas let", "data": null}
.
.
.

All data fields are null and the file has 7087 lines

Recommend Projects

  • React photo React

    A declarative, efficient, and flexible JavaScript library for building user interfaces.

  • Vue.js photo Vue.js

    ๐Ÿ–– Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.

  • Typescript photo Typescript

    TypeScript is a superset of JavaScript that compiles to clean JavaScript output.

  • TensorFlow photo TensorFlow

    An Open Source Machine Learning Framework for Everyone

  • Django photo Django

    The Web framework for perfectionists with deadlines.

  • D3 photo D3

    Bring data to life with SVG, Canvas and HTML. ๐Ÿ“Š๐Ÿ“ˆ๐ŸŽ‰

Recommend Topics

  • javascript

    JavaScript (JS) is a lightweight interpreted programming language with first-class functions.

  • web

    Some thing interesting about web. New door for the world.

  • server

    A server is a program made to process requests and deliver data to clients.

  • Machine learning

    Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.

  • Game

    Some thing interesting about game, make everyone happy.

Recommend Org

  • Facebook photo Facebook

    We are working to build community through open source technology. NB: members must have two-factor auth.

  • Microsoft photo Microsoft

    Open source projects and samples from Microsoft.

  • Google photo Google

    Google โค๏ธ Open Source for everyone.

  • D3 photo D3

    Data-Driven Documents codes.