Git Product home page Git Product logo

Comments (7)

may- avatar may- commented on June 15, 2024 1

Hi,

There, I call RESTful MediaWiki API https://en.wikipedia.org/w/api.php with following options:

  • format = xml
  • action = query
  • list = random
  • rnnamespace = 0 (built-in namespace numbers: you can check here.)
  • rnlimit = 10 (this number is specified by the variable limit in my code.)

So, the URL looks like:
http://en.wikipedia.org/w/api.php?format=xml&action=query&list=random&rnnamespace=0&rnlimit=10

In this query, You'll get 10 random wikipedia article IDs (normal articles only, no category page or anything others) in xml format.

You can test the API options in sandbox.

Did I answer your question?

from cnn-re-tf.

may- avatar may- commented on June 15, 2024 1

Ah, this is a known problem. I'll fix it in future, but for now, could you please try this? Sorry for inconvenience.

If you still get any error, please let me know :)

from cnn-re-tf.

li10141110 avatar li10141110 commented on June 15, 2024

yes,very thank you!you answer my question perfectly!But I have another question,I can not run that program successfully.It will be wrong,and it shows that:
Traceback (most recent call last):
File "/home/wuyujuan/Desktop/cnn-re-tf-master1/distant_supervision.py", line 693, in
main()
File "/home/wuyujuan/Desktop/cnn-re-tf-master1/distant_supervision.py", line 682, in main
extract_positive()
File "/home/wuyujuan/Desktop/cnn-re-tf-master1/distant_supervision.py", line 570, in extract_positive
entities = util.load_from_dump(os.path.join(data_dir, "entities.cPickle"))
File "/home/wuyujuan/Desktop/cnn-re-tf-master1/util.py", line 285, in load_from_dump
with open(filename, 'rb') as infile:
IOError: [Errno 2] No such file or directory: '/home/wuyujuan/Desktop/cnn-re-tf-master1/data/entities.cPickle'
Can you help me solve this problem?very thank you!

from cnn-re-tf.

li10141110 avatar li10141110 commented on June 15, 2024

dear,
I'm sorry but I have tried this before and it does't work.
following are my results:
/home/wuyujuan/Envs/cnn_re/bin/python /home/wuyujuan/Desktop/cnn-re-tf-another_try/distant_supervision.py
===== step 1 =====
[1/4] Downloading wiki articles ...
Traceback (most recent call last):

File "/home/wuyujuan/Desktop/cnn-re-tf-another_try/distant_supervision.py", line 694, in
main()
File "/home/wuyujuan/Desktop/cnn-re-tf-another_try/distant_supervision.py", line 682, in main
positive_examples()
File "/home/wuyujuan/Desktop/cnn-re-tf-another_try/distant_supervision.py", line 453, in positive_examples
ret = loop(step, doc_id, limit, entities, relations, counter)
File "/home/wuyujuan/Desktop/cnn-re-tf-another_try/distant_supervision.py", line 344, in loop
docs = download_wiki_articles(doc_id, limit)
File "/home/wuyujuan/Desktop/cnn-re-tf-another_try/distant_supervision.py", line 73, in download_wiki_articles
pages = bs(r, "html.parser").findAll('page')
File "/home/wuyujuan/Envs/cnn_re/lib/python2.7/site-packages/bs4/init.py", line 192, in init
elif len(markup) <= 256 and (
TypeError: object of type 'NoneType' has no len()

Process finished with exit code 1

thank for your time,may!

from cnn-re-tf.

li10141110 avatar li10141110 commented on June 15, 2024

dear,
I got the following after several days' work:
[1/4] Downloading wiki articles ...
['16648-.txt' '17514-.txt' '24591-.txt' '28182-.txt' '28356-.txt'
'28852-.txt' '36847-.txt' '36896-.txt' '7252-.txt']
[2/4] Performing named entity recognition ...
Invoked on Fri Jul 21 21:28:36 CST 2017 with arguments: -loadClassifier C:/Users/Jason Lee/Desktop/cnn-re-tf-master/stanford-ner-jason\classifiers\english.all.3class.distsim.crf.ser.gz -outputFormat tabbedEntities -textFile C:/Users/Jason Lee/Desktop/cnn-re-tf-master\data\orig\7252-.txt
loadClassifier=C:/Users/Jason Lee/Desktop/cnn-re-tf-master/stanford-ner-jason\classifiers\english.all.3class.distsim.crf.ser.gz
textFile=C:/Users/Jason Lee/Desktop/cnn-re-tf-master\data\orig\7252-.txt
outputFormat=tabbedEntities
Loading classifier from C:/Users/Jason Lee/Desktop/cnn-re-tf-master/stanford-ner-jason\classifiers\english.all.3class.distsim.crf.ser.gz ... done [3.9 sec].
CRFClassifier tagged 4376 words in 177 documents at 4635.59 words per second.
[3/4] Linking entities ...
[4/4] Linking predicates ...
Traceback (most recent call last):
File "C:/Users/Jason Lee/Desktop/cnn-re-tf-master/distant_supervision.py", line 718, in
main()
File "C:/Users/Jason Lee/Desktop/cnn-re-tf-master/distant_supervision.py", line 706, in main
positive_examples()
File "C:/Users/Jason Lee/Desktop/cnn-re-tf-master/distant_supervision.py", line 477, in positive_examples
ret = loop(step, doc_id, limit, entities, relations, counter)
File "C:/Users/Jason Lee/Desktop/cnn-re-tf-master/distant_supervision.py", line 406, in loop
relations[(subj, obj)] = search_property(arg1, arg2)
File "C:/Users/Jason Lee/Desktop/cnn-re-tf-master/distant_supervision.py", line 299, in search_property
response = r.json()
AttributeError: 'NoneType' object has no attribute 'json'

It seems that I have problem in the fourth step.It would be helpful if you can provide me with some suggestions.
Thank you for your time!

from cnn-re-tf.

li10141110 avatar li10141110 commented on June 15, 2024

dear,
everything is ok,thank you for your attention! I have got my own articles.
best wishes!

from cnn-re-tf.

may- avatar may- commented on June 15, 2024

Sorry, I had no time to look at your problem. But you found a solution...

from cnn-re-tf.

Related Issues (8)

Recommend Projects

  • React photo React

    A declarative, efficient, and flexible JavaScript library for building user interfaces.

  • Vue.js photo Vue.js

    🖖 Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.

  • Typescript photo Typescript

    TypeScript is a superset of JavaScript that compiles to clean JavaScript output.

  • TensorFlow photo TensorFlow

    An Open Source Machine Learning Framework for Everyone

  • Django photo Django

    The Web framework for perfectionists with deadlines.

  • D3 photo D3

    Bring data to life with SVG, Canvas and HTML. 📊📈🎉

Recommend Topics

  • javascript

    JavaScript (JS) is a lightweight interpreted programming language with first-class functions.

  • web

    Some thing interesting about web. New door for the world.

  • server

    A server is a program made to process requests and deliver data to clients.

  • Machine learning

    Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.

  • Game

    Some thing interesting about game, make everyone happy.

Recommend Org

  • Facebook photo Facebook

    We are working to build community through open source technology. NB: members must have two-factor auth.

  • Microsoft photo Microsoft

    Open source projects and samples from Microsoft.

  • Google photo Google

    Google ❤️ Open Source for everyone.

  • D3 photo D3

    Data-Driven Documents codes.