idramalab / lambretta Goto Github PK
View Code? Open in Web Editor NEWSource code and labeled dataset for Lambretta
Source code and labeled dataset for Lambretta
On line 43
cc
is undefined, and what does it stand for?
for item in splitter:
addedawk+='/'+item+'/'
if cc<len(splitter):#for the last item we don't need &&
addedawk+=' && '
addedawk+="' "+awk_source_path+" > cmdtmp";
cmd=awkroot+addedawk
print("Running for query : ",querystring," command ",cmd)
On line 76
and 80
results
and querystring
is undefined
result=awk_query(query)
with open(awk_output_export_path,"a+") as of:
json.dump({"keyword":query,"data":results},of)
of.write("\n")
except Exception as err:
ff=open("awkerrors_test.txt","a+")
ff.write("quer"+querystring+" Error : "+str(err))
Reproduction:
On line 20
replace (Because of #4 )
xx=open("candidates_test.txt")
with
xx=open("candidate_queries.txt")
Run
python3 ./fetch_results.py
Python gives this error:
Python is not reporting results
because it has not been encountered in the execution path.
json.dump({"keyword":query,"data":results},of)
The generated output is of type
File: awk_output_test.json
{"keyword": "value", "data": null}
{"keyword": "value", "data": null}
{"keyword": "value", "data": null}
{"keyword": "value", "data": null}
Which is not valid JSON
If the path you have specified does not exist by default, attempting to access it using Python will result in an error.
awk_output_export_path="/data/ppaudeldata/VoterFraud/awk_output_test.json"
in fetch_results_py line 20
candidates_test.txt file is being opened how do I generate the file before it is opened
xx=open("candidates_test.txt")
search=[]
for x in xx:
x=x.rstrip()
search.append(x)
In candidate_query_generator.py
on line 39-40
raw_cleaned_claim.append(item.strip())
cleaned_tweet=' '.join(raw_cleaned_tweet).strip()
raw_cleaned_tweet
is undefined
Perhaps raw_cleaned_claim
is the recommended or intended variable to use instead
Replacing raw_cleaned_tweet
with raw_cleaned_claim
fixes it (Correct me if I am wrong)
Reproduction:
Run
python3 ./candidate_query_generator.py
The variable total
is undefined
# Creating spanning subset
left_join = sorted_data[0:int(0.2*len(sorted_data))]
right_join = sorted_data[-int(0.2*len(sorted_data)):]
mid = int(len(sorted_data)*0.5)
# total ๐ ๐๐
mid_join = sorted_data[mid-(int(total*0.1)):mid+(int(total*0.1))]
# Sliced data is spanning subset discussed in the paper
sliced_data = left_join+mid+right_join
texts = []
for x in xx:
try:
x = x.rstrip()
jsx = json.loads(x)
query = jsx["keyword"]
query_split = query.split(" ")
query = ' '.join(query_split)
print("Working on ... ", query)
data = jsx["data"]
if len(data) == 0: # <- error on this line
Please note that the line number printed in the traceback output differs from the actual line number because I imported traceback
Error
object of type 'NoneType' has no len()
Working on ... change address flags fulton
An exception occurred on line number:
Traceback (most recent call last):
File "generate_semantic_features.py", line 88, in <module>
if len(data) == 0:
TypeError: object of type 'NoneType' has no len()
What is the order of the files to be executed inorder to get the output
The many files in the repo are:
jsx
is undefined and what does it identify?
'''
Create TF-IDF vectorizer
'''
#First, generate the document by accumulating all the claims
docs=[]
for item in jsx:
tr4w = TextRank4Keyword()
claim=item["claim"]
docs.append(jsx["claim"])
cv=CountVectorizer()
word_count_vector=cv.fit_transform(docs)
tfidf_transformer=TfidfTransformer(smooth_idf=True,use_idf=True)
tfidf_transformer.fit(word_count_vector)
def sort_coo(coo_matrix):
tuples = zip(coo_matrix.col, coo_matrix.data)
return sorted(tuples, key=lambda x: (x[1], x[0]), reverse=True)
feature_names=cv.get_feature_names()
Reproduction:
Run
python3 ./export_all_features_ltr.py
In order to save time during data processing, I decided to use a partial dataset, specifically the top 50 rows. However, I encountered an unexpected issue where the data field generated in awk_output_test.json is null
.
Is it an expected behaviour?
This is the code i used to generate the partial dataset
import pandas as pd
df = pd.read_csv('data_source/data_source.csv', nrows=50)
df.to_csv('data_source/MOD_data_source.csv', index=False)
File: awk_output_test.json
{"keyword": "pennsylvania democrat", "data": null}
{"keyword": "democrat pre-canvass", "data": null}
{"keyword": "pre-canvass vote", "data": null}
{"keyword": "vote liberal", "data": null}
{"keyword": "liberal areas", "data": null}
{"keyword": "areas let", "data": null}
.
.
.
All data fields are null
and the file has 7087
lines
A declarative, efficient, and flexible JavaScript library for building user interfaces.
๐ Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.
TypeScript is a superset of JavaScript that compiles to clean JavaScript output.
An Open Source Machine Learning Framework for Everyone
The Web framework for perfectionists with deadlines.
A PHP framework for web artisans
Bring data to life with SVG, Canvas and HTML. ๐๐๐
JavaScript (JS) is a lightweight interpreted programming language with first-class functions.
Some thing interesting about web. New door for the world.
A server is a program made to process requests and deliver data to clients.
Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.
Some thing interesting about visualization, use data art
Some thing interesting about game, make everyone happy.
We are working to build community through open source technology. NB: members must have two-factor auth.
Open source projects and samples from Microsoft.
Google โค๏ธ Open Source for everyone.
Alibaba Open Source for everyone
Data-Driven Documents codes.
China tencent open source team.