contentmine / pycproject Goto Github PK
View Code? Open in Web Editor NEWProvides basic function to read a ContentMine CProject and CTrees into python datastructures.
License: MIT License
Provides basic function to read a ContentMine CProject and CTrees into python datastructures.
License: MIT License
a function to be able to look at neighbouring terms would be great. for example to look at the term "patient" three words around the term "anxiety" => "patient adj3 anxiety". also one to look, if they are placed in the same sentence or paragraph.
after the update (6a4a8a4), i can not upgrade via pip my pycproject.
this is what happened:
First i executed pip install pycproject --upgrade
This did not work, cause
*********************************************************************************
Could not find function xmlCheckVersion in library libxml2. Is libxml2 installed?
*********************************************************************************
error: command 'x86_64-linux-gnu-gcc' failed with exit status 1
----------------------------------------
Command "/usr/bin/python3 -u -c "import setuptools, tokenize;__file__='/tmp/pip-build-i4z7rgdq/lxml/setup.py';f=getattr(tokenize, 'open', open)(__file__);code=f.read().replace('\r\n', '\n');f.close();exec(compile(code, __file__, 'exec'))" install --record /tmp/pip-0bdxv72b-record/install-record.txt --single-version-externally-managed --compile --user --prefix=" failed with error code 1 in /tmp/pip-build-i4z7rgdq/lxml/
So i tried to upgrade libxml2, first with pip (did not work), then with apt-get (ubuntu 16.04).
After this i tried to upgrade pycproject again, another error was thrown.
x86_64-linux-gnu-gcc -pthread -DNDEBUG -g -fwrapv -O2 -Wall -Wstrict-prototypes -g -fstack-protector-strong -Wformat -Werror=format-security -Wdate-time -D_FORTIFY_SOURCE=2 -fPIC -Isrc/lxml/includes -I/usr/include/python3.5m -c src/lxml/lxml.etree.c -o build/temp.linux-x86_64-3.5/src/lxml/lxml.etree.o -w
In file included from src/lxml/lxml.etree.c:320:0:
src/lxml/includes/etree_defs.h:14:31: fatal error: libxml/xmlversion.h: No such file or directory
compilation terminated.
Compile failed: command 'x86_64-linux-gnu-gcc' failed with exit status 1
creating tmp
cc -I/usr/include/libxml2 -c /tmp/xmlXPathInit40vcw29a.c -o tmp/xmlXPathInit40vcw29a.o
/tmp/xmlXPathInit40vcw29a.c:1:26: fatal error: libxml/xpath.h: No such file or directory
compilation terminated.
*********************************************************************************
Could not find function xmlCheckVersion in library libxml2. Is libxml2 installed?
*********************************************************************************
error: command 'x86_64-linux-gnu-gcc' failed with exit status 1
----------------------------------------
Failed building wheel for lxml
Running setup.py clean for lxml
Failed to build lxml
Installing collected packages: lxml, pycproject
Found existing installation: lxml 3.5.0
Uninstalling lxml-3.5.0:
Exception:
Traceback (most recent call last):
File "/usr/lib/python3.5/shutil.py", line 538, in move
os.rename(src, real_dst)
PermissionError: [Errno 13] Permission denied: '/usr/lib/python3/dist-packages/lxml' -> '/tmp/pip-19mus2bz-uninstall/usr/lib/python3/dist-packages/lxml'
During handling of the above exception, another exception occurred:
Traceback (most recent call last):
File "/home/cheeseman/.local/lib/python3.5/site-packages/pip/basecommand.py", line 215, in main
status = self.run(options, args)
File "/home/cheeseman/.local/lib/python3.5/site-packages/pip/commands/install.py", line 342, in run
prefix=options.prefix_path,
File "/home/cheeseman/.local/lib/python3.5/site-packages/pip/req/req_set.py", line 778, in install
requirement.uninstall(auto_confirm=True)
File "/home/cheeseman/.local/lib/python3.5/site-packages/pip/req/req_install.py", line 754, in uninstall
paths_to_remove.remove(auto_confirm)
File "/home/cheeseman/.local/lib/python3.5/site-packages/pip/req/req_uninstall.py", line 115, in remove
renames(path, new_path)
File "/home/cheeseman/.local/lib/python3.5/site-packages/pip/utils/__init__.py", line 267, in renames
shutil.move(old, new)
File "/usr/lib/python3.5/shutil.py", line 550, in move
rmtree(src)
File "/usr/lib/python3.5/shutil.py", line 474, in rmtree
_rmtree_safe_fd(fd, path, onerror)
File "/usr/lib/python3.5/shutil.py", line 432, in _rmtree_safe_fd
onerror(os.unlink, fullname, sys.exc_info())
File "/usr/lib/python3.5/shutil.py", line 430, in _rmtree_safe_fd
os.unlink(name, dir_fd=topfd)
PermissionError: [Errno 13] Permission denied: 'cssselect.py'
make pyCproject a python module. dont know what needs to be done for this, but should be aspired i think.
ipython files from former workshops.
Just an idea as I'm starting to play with pycproject...
I have a mix of open-access and closed articles, all of which have metadata, but only some of which have scholarly.html. This could arise in other situations as well, if for example some issue prevented scholarly.html to be generated for some files, or if I haven't yet downloaded the articles.
In this situation, should I have code that uses get_title, get_abstract etc, I would expect it to get data from scholarly.html if available, but otherwise get what it can from the metadata.
This way I don't have to write two different code paths for open and closed articles, and code that only uses information available in the metadata works before downloading the articles.
Does this make sense? Or is the metadata structure so repository-specific that it makes no sense to try to get information reliably from it?
Cheers
add testing workflow to the code.
update python code to style and documentation conventions.
https://www.python.org/dev/peps/pep-0257/
https://www.python.org/dev/peps/pep-0008/#documentation-strings
https://github.com/OKFNat/armsScraper/blob/master/code/eu-arms.py
where it runs on.
when it starts/stops.
to make the extraction reproducible.
add a plot with the frequency of degree centrality in the network.
when using the python wrapper, since a few weeks i have this problem. its happening in jupyter as it is in python3. maybe we have to define the versions of the used packages in advance, so it can not break cause of changing pip modules we rely on.
`import numpy as np
from pandas import Series, DataFrame
import matplotlib.pyplot as plt
from pycproject.readctree import CProject
from pycproject.factnet import *
import os
from collections import Counter
%matplotlib inline
AttributeError Traceback (most recent call last)
in ()
2 from pandas import Series, DataFrame
3 import matplotlib.pyplot as plt
----> 4 from pycproject.readctree import CProject
5 from pycproject.factnet import *
6 import os
/home/cheeseman/.local/lib/python3.5/site-packages/pycproject/init.py in ()
----> 1 from . import readctree
/home/cheeseman/.local/lib/python3.5/site-packages/pycproject/readctree.py in ()
17
18 # import data handling
---> 19 from bs4 import BeautifulSoup
20
21
/usr/lib/python3/dist-packages/bs4/init.py in ()
28 import warnings
29
---> 30 from .builder import builder_registry, ParserRejectedMarkup
31 from .dammit import UnicodeDammit
32 from .element import (
/usr/lib/python3/dist-packages/bs4/builder/init.py in ()
312 register_treebuilders_from(_htmlparser)
313 try:
--> 314 from . import _html5lib
315 register_treebuilders_from(_html5lib)
316 except ImportError:
/usr/lib/python3/dist-packages/bs4/builder/_html5lib.py in ()
68
69
---> 70 class TreeBuilderForHtml5lib(html5lib.treebuilders._base.TreeBuilder):
71
72 def init(self, soup, namespaceHTMLElements):
AttributeError: module 'html5lib.treebuilders' has no attribute '_base'
`
A declarative, efficient, and flexible JavaScript library for building user interfaces.
๐ Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.
TypeScript is a superset of JavaScript that compiles to clean JavaScript output.
An Open Source Machine Learning Framework for Everyone
The Web framework for perfectionists with deadlines.
A PHP framework for web artisans
Bring data to life with SVG, Canvas and HTML. ๐๐๐
JavaScript (JS) is a lightweight interpreted programming language with first-class functions.
Some thing interesting about web. New door for the world.
A server is a program made to process requests and deliver data to clients.
Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.
Some thing interesting about visualization, use data art
Some thing interesting about game, make everyone happy.
We are working to build community through open source technology. NB: members must have two-factor auth.
Open source projects and samples from Microsoft.
Google โค๏ธ Open Source for everyone.
Alibaba Open Source for everyone
Data-Driven Documents codes.
China tencent open source team.