mannlabs / ckg Goto Github PK

Clinical Knowledge Graph (CKG) is a platform with twofold objective: 1) build a graph database with experimental data and data imported from diverse biomedical databases 2) automate knowledge discovery making use of all the information contained in the graph

License: MIT License

Dockerfile 0.01% R 0.01% Python 0.88% Shell 0.01% HTML 0.01% CSS 0.01% Jupyter Notebook 99.10%

ckg's People

Contributors

Stargazers

Watchers

Forkers

metabdel hhefzi scarltee vemonet skin-science pwforks dotaartist reloadbrain enryh guaponi alenegro81 ikwattro xiaohuiz animesh sailor723 ammarcsj linshangyu yshin1209 sam-mix anuragraj yotofu nlp-kg sushantgautam juvesjostar pri-cph nkstampe keshava ksachikonye chunthebear flrntdfr wilfoderek igg-bioinfo xiong3134 pj0616 mihai-sysbio natnaelt sailfish009 chalabcom pmdhenriques feigeliudan01 njeanray caoool xyc1207 aclowes sherkinglee tta77 moqingxinai liuz-bio liyuan-bioinfo fuzzylife zerodesigner liu-maomao ssrisunt rajeshksoni11 kdpan wuxuehong214 m168168 nike-adidas pq7799 ramosvacca mars-wei alexgarciac parthosen williamgjn qingshanxiushui nilskre victorgoitea duongvtt96 drei-e3 drychkov multiomics-analytics-group leilaoutem nephantes tjohnson-somalogic gaybro8777 viascientific bmedi wenliangz huangtao36 harel-coffee nnalpas vincentwei2021 hushell zarathustraed deepbiolabs daichengxin baiyuqi wtbxsjy faiqamehboobawan wbing520 hanshuqing99 keshavaspanda fanlushuai

ckg's Issues

"Out of memory" crash and a little question

Hi,I met some questions when exploring your knowledge graph as I list below:

Firstly,as I finshed the building step(for docker container),I excuted some lines in Cypher to test the knowledge graph.
But after the line "match data=(n{name:"cancer"})-[*1..2]->(m) return data" input,I just got my neo4j crashed and the error code"out of memory".
Even if I rebuilt the docker container with "docker run -d --name ckgapp -p 7474:7474 -p 7687:7687 -p 8090:8090 -p 8050:8050 --env=NEO4J_dbms_memory_pagecache_size=20G --env=NEO4J_dbms_memory_heap_initial__size=10G --env=NEO4J_dbms_memory_heap_max__size=20G docker-ckg:latest",it remained the same.

Secondly,I am curious about whether there would be any node or relationship about time infomation in your graph,while I got nothing from your property key"timepoint".So I wonder if there exists such information in CKG.

I would be very appreciated if you can help me with those questions and longing for your reply.

Node and edge lists in TSV format

Currently, this graph is only available as a Neo4J dump, which makes it inaccessible to neo4j newbs like me.

Could it be available also as two TSVs, one with the node list and their attributes and one with the edge list and the attributes of the edges?

I tried to convert the Neo4J dump myself, but due to my inexperience, I was unable even to load the dump into neo4j.

Once the TSVs are available, I'd love to run several different analyses on this graph.

Thanks!

Docker Permission Issues

Describe the bug
Hi, I tried to explore the docker installation a bit more. I noticed when clicking on the sample projects, I get permission denied errors. Also when trying to upload data to a self-created project the options are grayed out.

To Reproduce
Steps to reproduce the behavior:

Go to Home -> Available Projects
Click on any of them (e.g. NON-ALCHOLIC FATTY LIVER DISEASE)
Page reloads but nothing happens, terminal shows permission denied (Screenshot 1)

Go to Home -> DATA UPLOAD
Type project identifier
All options are greyed out

Note that it correctly throws an error if the project name is not found.

Screenshots

Docker dependency missing

Docker build command fails unless the user has downloaded the jre-8u221-linux-x64.tar.gz file and placed it in the resources/ directory.

Add system requirements more prominently

Is your feature request related to a problem? Please describe.
I needed some time to find the information on how much disk space I actually need to get a running installation of the CKG.

Describe the solution you'd like
Could be added to the main page under Cloning and installing.

Installation requires >= 80 GB of disk space. See details here

Additional context
I can create a pull-request, if you like.

can not import name"get_current_traceback" from 'werkzeug_debug_tbttools'.

Describe the bug
A clear and concise description of what the bug is.

I tried to install CKG in a Windows system with Python. Everything was installed without any errors. I could see the data folder was created. However, when I run CKG app, there was always this error message<can not import name"get_current_traceback" from 'werkzeug_debug_tbttools'> .
I tried to install Redis and different versions of werkzeug. However, it did not help me to solve this error.

To Reproduce
Steps to reproduce the behavior:

Go to '...'
Click on '....'
Scroll down to '....'
See error

Expected behavior

Screenshots
If applicable, add screenshots to help explain your problem.

Desktop (please complete the following information):

OS: [e.g. Windows]
Browser [e.g. chrome]
Version [e.g. Windows 10 Education]

Additional context
Add any other context about the problem here.

Latest neo4j dump referenced in dockerfile unavailable

Love the concept, project, and tech. I'm trying to build this but the ckg_190521_neo4j_4.2.3.dump file referenced in the dockerfile is unavailable from the Mendeley site.

Using the latest available dump file, ckg_201020_neo4j_3.5.20.dump, forces a neo4j downgrade to 3.5 and results in other issues when trying to load neo4j and jupyter plugins, commercial databases, and the ckg app.

Some guidance would be great, I'm excited about this project!

Docker image build errors

I had a build error earlier and the solution was this:

A requirement to run the Docker is Java SE Runtime Environment. Please go to https://www.oracle.com/java/technologies/javase-jre8-downloads.html and download the jre-8u221-linux-x64.tar.gz file. Once downloaded, place it in CKG/resources/. (https://ckg.readthedocs.io/en/latest/intro/getting-started-with-docker.html)

However, the latest jre is 251 so I updated the docker file and now I get this error:

Improve documentation of data dump

Is your feature request related to a problem? Please describe.

We're working on integrating CKG as a benchmarking data set in PyKEEN in this PR (pykeen/pykeen#213). We were able to automate download of version 2 of the data from the Mendeley/Elsevier page.

However, after decompression, we were not able to understand the contents of the downloads/ or reports/ folders. It appears the downloads/ folder contains zip archives with images. However, the structure and nomenclature within reports/ is not intuitive.

I guessed that the h5 files could be opened with Pandas, but I got some errors about the data formatting, so that might not be correct. Further, each folder has the same structure of files, but without context as to what P0000001-P0000006 mean.

From the imports/ folder, we were able to extract 7,617,419 entity types, 11 relation types, and 26,691,525 triples. This is quite far from what's claimed in the CKG paper, so I assume some of the data is in the reports/ folder.

Describe the solution you'd like

Please add a README to each directory in the data dump that describes the structure of the data, the file types, and schemata for the data contained within. Code examples of how to load the data would be appreciated as well.

Thanks for providing this resource - we hope that inclusion in PyKEEN will enable more people to work with your dataset and would love to explain further if you're interested. Feel free to reach out with an issue on PyKEEN's repo, by twitter https://twitter.com/cthoyt, or by email [email protected]

docker : Permission denied for mkdr

Docker in windows got the following error
ERROR [31/78] RUN service neo4j start && sleep 30 && service neo4j stop && cat /var/log/neo4j/neo 0.3s

[31/78] RUN service neo4j start && sleep 30 && service neo4j stop && cat /var/log/neo4j/neo4j.log:
#36 0.267 Active database: graph.db
#36 0.268 Directories in use:
#36 0.268 home: /var/lib/neo4j
#36 0.268 config: /etc/neo4j
#36 0.268 logs: /var/log/neo4j
#36 0.268 plugins: /var/lib/neo4j/plugins
#36 0.268 import: NOT SET
#36 0.268 data: /var/lib/neo4j/data
#36 0.268 certificates: /var/lib/neo4j/certificates
#36 0.268 run: /var/run/neo4j
#36 0.270 mkdir: cannot create directory ‘/var/run/neo4j\r’: Permission denied

executor failed running [/bin/sh -c service neo4j start && sleep 30 && service neo4j stop && cat /var/log/neo4j/neo4j.log]: exit code: 1

Docker build fails

Describe the bug
Docker build fails
Step 14/54 : ADD /resources/jre-8u221-linux-x64.tar.gz /usr/local/oracle-jre8-installer-local
ADD failed: stat /var/lib/docker/tmp/docker-builder253828483/resources/jre-8u221-linux-x64.tar.gz: no such file or directory

To Reproduce
run docker build -t mycontainer .

Isuue with building ckg in Neo4j

Hello, when I try to build Neo4j Graph database in windows I keep getting this error

(CKG_ENV) C:\Users\sitas\CKG\src\graphdb_builder\builder>python builder.py -b full -u neo4j

Traceback (most recent call last):
File "builder.py", line 14, in
from graphdb_builder.builder import importer, loader
File "C:\Users\sitas\CKG\src\graphdb_builder\builder\importer.py", line 20, in
from graphdb_builder.users import users_controller as uh
File "C:\Users\sitas\CKG\src\graphdb_builder\users\users_controller.py", line 8, in
from passlib.hash import bcrypt
File "C:\Users\sitas\CKG_ENV\lib\site-packages\passlib\hash.py", line 25, in
from passlib.registry import proxy
File "C:\Users\sitas\CKG_ENV\lib\site-packages\passlib\registry.py", line 12, in
from passlib.ifc import PasswordHash
File "C:\Users\sitas\CKG_ENV\lib\site-packages\passlib\ifc.py", line 10, in
from passlib.utils.decor import deprecated_method
File "C:\Users\sitas\CKG_ENV\lib\site-packages\passlib\utils_init.py", line 845, in
from time import clock as timer
ImportError: cannot import name 'clock' from 'time' (unknown location)

can anyone please help me? I am new to neo4j and CKG

No permission to load database -- running as a different user?

Hi, I cannot start the database. It gives a suggestion (running as a different user?), but I don't know how to change the user.
Thank you.

Step 41/86 : RUN sudo -u neo4j neo4j-admin load --from=/var/lib/neo4j/data/backup/ckg_201020_neo4j_3.5.20.dump --database=graph.db --force
---> Running in 942101509cd9
command failed: you do not have permission to load a database -- is Neo4j running as a different user?
The command '/bin/sh -c sudo -u neo4j neo4j-admin load --from=/var/lib/neo4j/data/backup/ckg_201020_neo4j_3.5.20.dump --database=graph.db --force' returned a non-zero code: 1

CKG unauthorized login

Dear Developer,
thank you very much for your amazing work.
I built and run locally CKG using docker.

When I open the CKG app in my browser everything seems to work.

However, I cannot login because I don't known what username and password I have to use.

I tried with the username and password of neo4j, but login was unauthorized .

Can you help me with this point?

Thank you very much

CKG unauthorized login

Hi,

First of all, congratulation for the CKG paper release. We have set up the docker image of the CKG app in our lab but we encounter a issue at the step where we want to create our personnal account in the Admin tab of the web browser access CKG app.

We have filled all the requested field for new account creation and a new user ID was created. However, when we want to log in our account, we use the same new user ID for ID and password because I have read that for our first log the ID/PW are identical but it says that one or the other is invalid and therefore prevent us to connect in our new account.

Can you help me with this point?

Thanks for your help.

Best regards

Cannot find the mapped_code column in the compound file from FooDB

Hi,

I couldn't find the column that you are referring in the code below (line 115) where there is a mapped_code data in the column index 44 in the "Compound.csv" from FooDB. I checked today the file but there are only 16 columns. Do I miss something?

CKG/ckg/graphdb_builder/databases/parsers/foodbParser.py

Lines 114 to 117 in 6854c98

 compound_id = row[0] 

 mapped_code = row[44] 

 if str(mapped_code) != 'nan': 

 compounds[compound_id] = mapped_code

Project creation - Project Tissue/Project Disease - No results found

Describe the bug
A clear and concise description of what the bug is.
When I try to create a new project and I try to select tissue and disease the drop down list does not contain any entries and I am not able to create projects.
To Reproduce
Steps to reproduce the behavior:

Go to Project creation/Project Tissue or Project Disease
Click on Drop down arrow
No results found

Expected behavior
I expect to see different options which I can select from as these two fields are mandatory. However I only get a message No results found.

Screenshots

Desktop (please complete the following information):

OS: Windows 10 Enterprise
Browser: Chrome
Version: 98.0.4758.102 (Official Build) (64-bit)

Additional context
I also see the same problem when I tried to install the Neo4j database and the Python library in Ubuntu. Do I need to build a full CKGs Graph Database with graphdb_builder first before I create a new project?

Docker couldn't start ckgapp

I believe this is due to the NumPy version. Can you freeze the current working environment on a running instance?
Any idea/suggestion would be appreciated.

Failed to generate report after uploading data

Describe the bug
with the example_files，i failed to generate report after uploading data。

To Reproduce
Steps to reproduce the behavior:

Go to 'project creation'
Upload example_files data
Click generate report
Seeing nothing
There is nothing in the container ……/report

Expected behavior
A report

Screenshots

Desktop (please complete the following information):

OS: uabntu 20.04 windows 10
Browser: Microsoft edge
Version [e.g. 22]

Additional context
Add any other context about the problem here.

Graph builder killed without any error thrown

Hi I got an issue when running

python builder.py --build_type full --download True --user neo4j

The builder was killed without any error thrown:

100% [...................................................] 126835 / 126835Done Parsing database DrugBank
Downloading https://stringdb-static.org/download/protein.aliases.v11.0/9606.protein.aliases.v11.0.txt.gz /home/user/CKG/data/databases/STRING
100% [.............................................] 450734725 / 450734725Killed

And I checked the graph builder log under CKG/logs, it is empty.

Do you know what could cause this issue? Thanks!

Neo4j and graph problems

I can see your own node information in the http://localhost:7474/browser/, but the secondary Desktop I can't see any node information, database is completely empty.
At http://localhost:7474/browser/, after I start node, they will be very mess, I can't normal use, I want to know is what reason.
When creating the project, the number used cannot be stored in the database normally. However, you can find the number created by the system in localhost:7474/browser/ and use this number to search for the project

Loading data using builder.py fails

Describe the bug
loading data into the database fails when trying to load the latest version of drugdb. seems like the file names and potential the format of the input file have changed.

2021-02-14 18:18:26,308 - database_controller - ERROR - Database DrugBank: (<class 'lxml.etree.XMLSyntaxError'>, XMLSyntaxError('Document is empty, line 1, column 1'), <traceback object at 0x19d61a3c0>), file: databases_controller.py,line: 205

To Reproduce
Steps to reproduce the behavior:
go to the builder.py and execute with standard command for minimal or full

Expected behavior
no errors in the log

Dockerfile won't build as implemented

RUN wget -P /CKG https://data.mendeley.com/datasets/mrcf7f4tc2/1/files/c0d058a2-adfa-4b96-97d9-c9ec7fc5adb9/data.tar.gz?dl=1
RUN tar -xzf data.tar.gz

fails because the file will be created at /CKG/data.tar.gz?dl=1

changing to:
RUN wget -O /CKG/data.tar.gz
RUN tar -xzf /CKG/data.tar.gz

will let the build complete without errors but I'm not sure how it interacts with the following lines
ADD . /CKG
ENV PYTHONPATH "${PYTHONPATH}:/CKG/src"

Neo4j database is not online or page cannot be opened

Hi,
first of all, congratulations on CKG's excellent working ability. Now I have successfully run CKG in Docker, but I have encountered a new problem. Neo4j always fails to open the page or has the error shown in the figure.
What is the installation problem? Is there any solution?
Thank you very much.

R package install error

During regular (non-docker) installation of CKG, R packages preprocessCore and impute cannot be installed.
Please find the error message attached.

Macbook Pro M1 chip - iOS Monterrey versio. 12.2
R version 4.1.3

What I’ve tried so far:
Brew install gcc (which contains gfortran) - we found most sources online point towards error in gfortran with these specific error messages, no luck, same error message

Then we saw that the code says unable to access URL: 'https://bioconductor.org/packages/3.14/books/bin/macosx/big-sur-arm64/contrib/4.1/PACKAGES’
Which is an empty URL, so I tried to manually install them from the bio conductor page, no luck

I also tried answering no to the download from source question - as suggested here(https://stackoverflow.com/questions/69639782/installing-gfortran-on-macbook-with-apple-m1-chip-for-use-in-r)

To Reproduce
All the steps in the installation have been completed successfully (https://ckg.readthedocs.io/en/latest/intro/getting-started-with-requirements.html) up to the block

install.packages('BiocManager')
BiocManager::install()
BiocManager::install(c('AnnotationDbi', 'GO.db', 'preprocessCore', 'impute'))
install.packages(c('devtools', 'tidyverse', 'flashClust', 'WGCNA', 'samr'),
dependencies=TRUE, repos='http://cran.rstudio.com/')
install.packages('IRkernel')

At which point I receive the error message attached - there are too many warnings in there to decide which one is causing the issue ultimately but I guess the main error is:

ERROR: compilation failed for package ‘preprocessCore’ and 'impute'

R_package_error.docx

Import error can't import name 'Markup' from 'jinja2'

Hi @albsantosdel

after docker run -d --name ckgapp -p 7474:7474 -p 7687:7687 -p 8090:8090 -p 8050:8050 docker-ckg:latest

cannot import name 'Markup' from 'jinja2' (/usr/local/lib/python3.7/site-packages/jinja2/init.py)

the docker logs

Windows Installation Issues

Describe the bug
Hi,
I tried to follow the Windows installation guide and tried an install on a fresh Windows machine.
OS: Microsoft Windows Server 2019

The first errors that occurred were several SSL: CERTIFICATE_VERIFY_FAILED when installing the requirements via pip.
I stumbled across this thread and could imagine that this might be related to Python 3.6. I was able to get all packages installed by manually installing the packages that raised the error (i.e. pip3 install --ignore-installed -r requirements.txt > Error > pip3 install error_package > pip3 install --ignore-installed -r requirements.txt > Error > pip3 install error_package > pip3 install --ignore-installed -r requirements.txt.

After the requirements install went through, I went to the celery installation. This then fails as no matching distribution can be found. Is this meant to be installed with pip or pip3? As it is stated to be pip and all the other installs are with pip3.

Installation of a newer celery version celery-4.4.7 works, but I am wondering if this is causes problems at a later stage as indicated in the documentation.
Attached is the environment that is not compatible with the celery install.

I can imagine that solving environment/package issues might take while (even maybe switching to a newer Python version), so maybe it is worth investigating the celery functionality issue. Any suggestions on that? Then I might be able to investigate.

env_export.txt

The ckg_config.yml file doesn't exist.

Describe the bug
The ckg_config.yml file doesn't exist in the directory.

To Reproduce
Steps to reproduce the behavior:

Run builder.py.
ERROR-"No such file or directory: '../CKG-master/ckg/config/ckg_config.yml'.

Expected behavior
A clear and concise description of what you expected to happen.

Docker build fail due to pip install

Describe the bug
Docker fails at installing pip packages: tables and tensorflow

To Reproduce
Steps to reproduce the behavior:

Download Oracle JRE available in /resources (/jre-8u251-linux-x64.tar.gz, the 221 version is not available)
Run docker build (as mentioned in your doc)
It will fail due to multiprocess==0.70.7 (hangs), using multiprocess==0.70.9 to fix this
It will fail due to tables==3.5.2
If I upgrade to tables==3.6.1, then tensorflow==1.15.1 fails
if I remove tensorflow, then the whole install fail

I also needed to fix all the COPY and ADD command that were using absolute path which seens to point ti the root of the machine to put files in Docker, I changed to make sure it uses the current directory:

ADD /resources/jre-8u251-linux-x64.tar.gz /usr/local/oracle-jre8-installer-local
# becomes:
ADD ./resources/jre-8u251-linux-x64.tar.gz /usr/local/oracle-jre8-installer-local

I also download the dump (6G) and data (1.6G) before to make sure no issue happens with the download:

wget -O resources/ckg_080520.dump https://data.mendeley.com/datasets/mrcf7f4tc2/1/files/bf08667b-588f-4f40-b5fd-930f4e05368f/ckg_080520.dump?dl=1
wget -O resources/data.tar.gz https://data.mendeley.com/datasets/mrcf7f4tc2/1/files/c0d058a2-adfa-4b96-97d9-c9ec7fc5adb9/data.tar.gz?dl=1

Apart from that the Dockerfile is the same. So I am surprised that it completed for other users

The updated Dockerfile I am using can be found at https://github.com/vemonet/CKG

Expected behavior
Should manage to install pip packages that are published to Pypi

Screenshots

Error (stuck for more than 10min for 1.4MB):

Collecting more-itertools==7.0.0
  Downloading more_itertools-7.0.0-py3-none-any.whl (53 kB)
Collecting multi-key-dict==2.0.3
  Downloading multi_key_dict-2.0.3.tar.gz (8.4 kB)
Collecting multidict==4.5.2
  Downloading multidict-4.5.2.tar.gz (105 kB)
Collecting multiprocess==0.70.7
  Downloading multiprocess-0.70.7.tar.gz (1.4 MB)

Error message for tables==3.5.2:

Collecting statsmodels==0.10.0
  Downloading statsmodels-0.10.0.tar.gz (14.0 MB)
Collecting tables==3.5.2
  Downloading tables-3.5.2.tar.gz (7.8 MB)
    ERROR: Command errored out with exit status 1:
     command: /usr/bin/python3 -c 'import sys, setuptools, tokenize; sys.argv[0] = '"'"'/tmp/pip-install-p7zeh6o_/tables/setup.py'"'"'; __file__='"'"'/tmp/pip-install-p7zeh6o_/tables/setup.py'"'"';f=getattr(tokenize, '"'"'open'"'"', open)(__file__);code=f.read().replace('"'"'\r\n'"'"', '"'"'\n'"'"');f.close();exec(compile(code, __file__, '"'"'exec'"'"'))' egg_info --egg-base /tmp/pip-pip-egg-info-ifcdl756
         cwd: /tmp/pip-install-p7zeh6o_/tables/
    Complete output (12 lines):
    /tmp/H5closeqa6mg_ii.c: In function ‘main’:
    /tmp/H5closeqa6mg_ii.c:2:5: warning: implicit declaration of function ‘H5close’ [-Wimplicit-function-declaration]
        2 |     H5close();
          |     ^~~~~~~
    /usr/bin/ld: cannot find -lhdf5
    collect2: error: ld returned 1 exit status
    * Using Python 3.8.2 (default, Apr 27 2020, 15:53:34)
    * USE_PKGCONFIG: True
    .. ERROR:: Could not find a local HDF5 installation.
       You may need to explicitly state where your local HDF5 headers and
       library can be found by setting the ``HDF5_DIR`` environment
       variable or by using the ``--hdf5`` command-line option.
    ----------------------------------------
ERROR: Command errored out with exit status 1: python setup.py egg_info Check the logs for full command output.
The command '/bin/sh -c pip3 install --ignore-installed -r requirements.txt' returned a non-zero code: 1

Issue with tensorflow when I update tables. The issue stay with other version of tensorflow (which are available on pypi). It's complaining that it should be tensorflow 2

Collecting statsmodels==0.10.0
  Downloading statsmodels-0.10.0.tar.gz (14.0 MB)
Collecting tables==3.6.1
  Downloading tables-3.6.1-cp38-cp38-manylinux1_x86_64.whl (4.3 MB)
ERROR: Could not find a version that satisfies the requirement tensorflow==1.15.1 (from -r requirements.txt (line 169)) (from versions: 2.2.0rc1, 2.2.0rc2, 2.2.0rc3, 2.2.0rc4, 2.2.0)
ERROR: No matching distribution found for tensorflow==1.15.1 (from -r requirements.txt (line 169))
The command '/bin/sh -c pip3 install --ignore-installed -r requirements.txt' returned a non-zero code: 1

Desktop (please complete the following information):

OS: Ubuntu 18.04
Version: docker version:

Client: Docker Engine - Community
 Version:           19.03.11
 API version:       1.40
 Go version:        go1.13.10
 Git commit:        42e35e61f3
 Built:             Mon Jun  1 09:12:22 2020
 OS/Arch:           linux/amd64
 Experimental:      false

Server: Docker Engine - Community
 Engine:
  Version:          19.03.11
  API version:      1.40 (minimum version 1.12)
  Go version:       go1.13.10
  Git commit:       42e35e61f3
  Built:            Mon Jun  1 09:10:54 2020
  OS/Arch:          linux/amd64
  Experimental:     false
 containerd:
  Version:          1.2.13
  GitCommit:        7ad184331fa3e55e52b890ea95e65ba581ae3429
 runc:
  Version:          1.0.0-rc10
  GitCommit:        dc9208a3303feef5b3839f4323d9beb36df0a9dd
 docker-init:
  Version:          0.18.0
  GitCommit:        fec3683

Additional context

I am really surprised that so many services are deployed and installed in a single Docker image
It would be much easier and stable to separate those services in separater containers and run it using docker-compose (and use this way to install CKG for anyone)

Dealing with multiple diseases

I had a question about designating different diseases for a particular patient in the ClinicalData.xlsx file.

In the project creation process, I can select multiple disease in our case diseases that are often linked or similar (like inflammatory bowel disease, and Chron's disease)

However, I am unsure how to input the disease for these patients in the ClinicalData.xlsx file. Should I only choose one disease? Can I designate both of them using a separate delimiter?

TypeError: argument of type 'NoneType' is not iterable

I want to generate a new report using my own data. The project P0000001 report already exists, so I want to generate it again first, and set 'force = True', but an error occurred, like this:
This problem has bothered me for a long time. I hope someone can help me solve this problem, thanks !!!

Neo4j Enterprise

Thank you for this project. It is really useful. I wanted to ask how could I use the Enterprise version (I already have the appropriate licence, but I have been unable to properly set up the Dockerfile).

Data Upload to Project

I have managed to create a project using the CKG interface, but am having a problem with the data upload step for a project with the ID = P0000007. I encounter this error on the CKG local interface:
Error: No data was uploaded for project: P0000007. Review your experimental design and data files.

I am trying to upload the following files (the txt files are the outputs from MaxQuant):
ExperimentalDesign_P0000007.xlsx, ClinicalData_P0000007.xlsx, protein_groups.txt, peptides.txt, and OxidationSites.txt

I am unsure if my formatting in the xlsx files is correct

ExperimentalDesign

ClinicalData

And my MaxQuant file has headers of the following form, where 1_M1 is the first technical replicate for the sample M1. So in this file there would be 2 samples each with 3 replicates for a total of 6 measurements:
Intensity 1_M1
Intensity 2_M1
Intensity 3_M1
Intensity 1_M2
Intensity 2_M2
Intensity 3_M2

My question is how should the Experimental and Clinical spreadsheets be set up? Should each replicate be given its own line? .

To futher illustrate this point. In this table, also provided in the documentation, I don't understand how the 2nd and 3rd rows are generated as the KO2 becomes KO (same with KO3 -> KO). And how should an experiment with these 5 MaxQuant measurement headers be represented by the ExperimentalDesign and ClinicalDesign spreadsheets.

Technical replicate	Analytical sample id	Timepoint	Result
1	KO_plate1		1_KO_plate1
1	KO2_plate1	0	1_KO_plate1_0
1	KO3_plate1	30	1_KO_plate1_30
1	KO4_plate2		1_KO4_plate2
2	KO4_plate2		2_KO4_plate2

Thank you in advance.

Switch to OpenJDK instead of proprietary Oracle JDK

Is your feature request related to a problem? Please describe.
Anyone wanting to start the CKG needs to go to Oracle and download their proprietary Java installation which is proprietary not properly versioned (e.g. I cannot install exactly the same version as you)

Describe the solution you'd like
Remove Oracle JDK dependency and use OpenJDK
If possible I would like the CKG to use OpenJDK.

This will allow you to properly build and push the Docker Image so anyone who wants to run the CKG will be able to do it in one click instead of fiddling with Oracle JDK download, and the Dockerfile

Describe alternatives you've considered
A clear and concise description of any alternative solutions or features you've considered.

Additional context
It will be much more easier for us to contribute, and redistribute this awesome project!

Failed to receive a keyserver [16/78]

Describe the bug
Failed to receive a keyserver [16/78]

=> ERROR [16/78] RUN apt-key adv --keyserver hkp://keyserver.ubuntu.com:80 --recv-keys 51716619E084DAB9 0.9s

[16/78] RUN apt-key adv --keyserver hkp://keyserver.ubuntu.com:80 --recv-keys 51716619E084DAB9:
#19 0.256 Warning: apt-key output should not be parsed (stdout is not a terminal)
#19 0.268 Executing: /tmp/apt-key-gpghome.yAvXbc7nHt/gpg.1.sh --keyserver hkp://keyserver.ubuntu.com:80 --recv-keys 51716619E084DAB9
#19 0.864 gpg: CRC error; 6392DB - 17268F
#19 0.864 gpg: read_block: read error: Invalid keyring
#19 0.864 gpg: Total number processed: 0

Desktop (please complete the following information):

OS: Windows 10
WSL: Ubuntu 20.0.4

Database install question

Hello, when I was installing CKG according to the documentation provided here: https://ckg.readthedocs.io/en/latest/intro/getting-started-with-build.html

One of the images shows the structure of the CKG directory as having many different databases

After running the command:
However after following the instructions and running python builder.py -b full -u neo4j 2> test.log it seems like a majority of the databases are not installed. Looking in my CKG directory I see only the following:

ls CKG/data/databases/

   Jensenlab/    
   PhosphoSitePlus/ 
   STITCH/    
   STRING/      
   UniProt/    
   disgenet/

Another thing I notice is that despite running this overnight, it exited in like 20 minutes. Attached is the log file mentioned above from builder.py
test.log

Thank you, and I appreciate your response. Another question I have is will the documentation for the Advanced Features be updated here?
https://ckg.readthedocs.io/en/latest/advanced_features/import-statistics.html

when click the "import" 500 error occurred

create_user_from_command_line not returning correct results

When creating a user from the command line-

python ckg/graphdb_builder/builder/create_user.py -u test -d test -n test -p 1 -a self
A user node is indeed created in the graph, however the logging messages show this:

Creating user in the database
New user node created: test2. Result: <neo4j.work.result.Result object at 0x7f9fef3184d0>
Done

The function stack is as follows: create_user_from_command_line->create_user->create_user_node->create_user_from_dict->commitQuery

After adding a print message in commitQuery, one can see that the result contains the result from the query.

def commitQuery(driver, query, parameters={}):
    # result = None
    try:
        with driver.session() as session:
            result = session.run(query, parameters)
            print('Peek Result', result.peek())
            return result

Creating user in the database
Peek Result <Record User_nodes=1>

However the result object is never returned to the original call.

    for q in query.split(';')[-2:-1]:
        try:
            result = connector.commitQuery(driver, q+';', parameters=data)
            print('Peek Result', result.peek())
            logger.info("New user node created: {}. Result: {}".format(data['username'], result))
            print("New user node created: {}. Result: {}".format(data['username'], result.peek()))

This results in the final log output like this:

Creating user in the database
Peek Result <Record User_nodes=1>
Peek Result None
New user node created: test. Result: None
Done

Expected behavior
If the response is returned correctly, the second log message 'Peek Result' should also be <Record User_nodes=1>. And the log message 'New user node created: ' should not be None as well.

Permission denied: '/CKG/data/reports/

Hi! I was testing out the CKG App using a docker container.
After uploading the project files and experimental design, there is an issue creating the project reports.
The app keeps running for hours on the app.
I also tried running it in the Jupyter Notebook. (recipes\Access Project Report.ipynb)
The following error occurs in step 3.

OS: Windows 10
Browser Chrome

Thanks

[docker build] what to expect - neo4j browser?

I'm attempting to set up CKG through building the docker container. The image seems to have been built successfully, and I seem to have launched a running container.

"Access JupyterHub" (:8090) gets the normal Jupyter notebook interface
The CKG app (:8050) is able to draw graphs in the "Database Schema" section.

However, in the step "Access Neo4j browser" (:7474), I get a blank webpage, even after 30 minutes.

Should I see the neo4j browser as in older versions? (see screenshot)

Perhaps it could be useful to add screen captures to the "getting started" page :)
https://ckg.readthedocs.io/en/latest/intro/getting-started-with-docker.html

Samples not being processed

I am working with a dataset that is derived from 2 groups totaling 52 samples (36, and 16 respectively) however in the proteomics analysis results. Only 27 samples are processed.

I imagine that the other samples don't meet the default 30% protein cutoff or some other cutoff value, but it is impossible for me to investigate as the samples have been renamed from the ClinicalData and ExperimentalDesign values to something along the lines of AS148 AS97 ...

Is there a reason my samples are being renamed, is there a dictionary or key that will allow me to identify which samples are being processed and which are not?

Thank you again.

[docker build] No Durg node information of Neo4j database

Hello， I build CKG via Docker container, but there are no node and relation for Durg in Neo4j database, I have been put the Drugbank data file in data folder,

Graph algorithms unavailable for neo4j 4.0

Hi,

I am trying to install CKG on a macbook machine. It seems that currently one can only download neo4j 4.0 or 3.5.18+, where graph algorithms seem to be unavailable.

When I try to build from terminal with python builder.py -b full -u neo4j, it returns:
Traceback (most recent call last):
File "builder.py", line 14, in
from graphdb_builder.builder import importer, loader
ModuleNotFoundError: No module named 'graphdb_builder'

Is there a way around it? Thanks!

CKG Docker Container

Hi Alberto,

I was trying to install CKG Docker Container (https://ckg.readthedocs.io/en/latest/intro/getting-started-with-docker.html):

I finished following steps:

However, I was stuck with “access CKG app: http://localhost:8050/”. It showed me “Internal Server Error”.

Could you please help me out with this issue? Thank you very much.

Best regards,
Xiang

Internal server error when logging in http://localhost:8050/

Describe the bug
Hi, Professor @albsantosdel,

Congratulations for the excellent work CKG. We used docker to install CKG and sucessfully access Neo4j and Jupterhub. However, we are not able to access CKG graph via "http://localhost:8050/". An error message "Internal server error” was reported.

Could you please provide me some solutions to login in CKG graph?

Thanks for your help.

Best regards

An He

Missing drug-protein relations

Describe the bug
It seems some drug-protein relations are missing after the parsing. An example is Baricitinib. Now when query the database, there is no protein interaction found for Baricitinib.

To Reproduce
From the STITCH website, we can find the protein interaction for Baricitinib, e.g. JAK1/2. However, in the CKG, such relations could not be found. When I examined the file stitch_drug_acts_on_protein.tsv, I could not find Baricitinib either. I try to understand how the parsing works since the ID of chemical/drug from STITCH looks like PubChem CID, while in CKG, it uses DrugBank ID. I couldn't find the code that unifies those IDs. If you can point me to the place, it would be helpful too.
Thanks!

Expected behavior
We are expecting the relations existing in STITCH also exist in CKG.

Screenshots
N/A

Desktop (please complete the following information):

OS: Ubuntu
Browser chrome
Version 20

Additional context
N/A

[docker build] Docker image size of 64 Gb instead of 150 Gb

Hi,

I was installing the CKG on a windows 10 platform using the tutorial with the following command:

$ cd CKG/
$ docker build -t docker-ckg:latest .

I think everything went well (see attached log file from windows prompt) but the final image in Docker Desktop is approximately 64 Gb instead of 150 Gb mentioned in the tutorial so I was wondering if this was normal ?

CKG_container_build_windprompt.txt

Just to notice that following this I have just tried to run the image from Docker Desktop and open it using google chrome web browser and I have succeeded to connect to JupyterHub, Neo4j browser and web browser access CKG app. so I guess connection is ok but what about the size of the image ?

Thank for you feedback.

Best,

Make code pip-installable

A lot of the documentation is about how to configure the code to be run. Making the code installable with pip would greatly decrease complexity and make python users directly able to use your code.

Would you be willing to accept a PR for this? It would involve changing the repository to the typical src/ layout, in which there's a folder src/ckg/ where all of the python code goes. Then, a setup.cfg file and setup.py file could be added so you could do the following:

git clone https://github.com/MannLabs/CKG.git
cd CKG
pip install .

This would also put the requirements.txt information into setup.cfg. Ultimately it would simplify https://ckg.readthedocs.io/en/latest/intro/getting-started-with-build.html as well since some of this could be done with python module execution

Minimal update error

Using the docker container installation
During running the minimal update I get the error message in docker desktop log:

[2021-09-07 07:13:45,753: WARNING/ForkPoolWorker-1] Done Parsing database phosphositeplus
[2021-09-07 07:13:46,701: INFO/ForkPoolWorker-1] Parsing database drugbank
[D 2021-09-07 07:14:46.643 SingleUserNotebookApp mixins:518] Notifying Hub of activity 2021-09-07T07:09:27.427800Z
[I 2021-09-07 07:14:48.110 JupyterHub log:189] 200 POST /hub/api/users/ckguser/activity ([email protected]) 857.21ms
[2021-09-07 07:18:02,487: ERROR/MainProcess] Process 'ForkPoolWorker-1' pid:652 exited with 'signal 9 (SIGKILL)'
[2021-09-07 07:18:02,801: ERROR/MainProcess] Task handler raised error: WorkerLostError('Worker exited prematurely: signal 9 (SIGKILL) Job: 0.')
Traceback (most recent call last):
File "/usr/local/lib/python3.7/site-packages/billiard/pool.py", line 1267, in mark_as_worker_lost
human_status(exitcode), job._job),
billiard.exceptions.WorkerLostError: Worker exited prematurely: signal 9 (SIGKILL) Job: 0.
07:18:40.547 [ConfigProxy] info: 200 GET /api/routes
[I 2021-09-07 07:18:40.822 JupyterHub proxy:347] Checking routes
[D 2021-09-07 07:19:20.875 SingleUserNotebookApp mixins:518] Notifying Hub of activity 2021-09-07T07:09:27.427800Z
[I 2021-09-07 07:19:21.119 JupyterHub log:189] 200 POST /hub/api/users/ckguser/activity ([email protected]) 96.73ms

Desktop

OS: iOS Catalina version 10.15.7
Browser chrome
Version [e.g. 22]

** Not sure what worker exited prematurely means but I didn't quit anything during installation, neo4j desktop is still running, and I can access all the jupyter notebooks and neo4j online services.

I did make a user before the minimal update was done - I hope this is not an issue.

Dataset of paper

Describe the bug
What's the dataset in the section "Automated CKG analysis for liver disease biomarker discovery."? I am confuseed now.
I's trying to reproduce the process and result of the paper of CKG.

"For the clinical data" in paragraph 1, what's the clinical data?
"The default data analysis uses principal component analysis to
reduce the dimensionality of features for an overview of the data" in paragraph 3, what's the data for this set of process in this paragraph?

	compound_id = row[0]
	mapped_code = row[44]
	if str(mapped_code) != 'nan':
	compounds[compound_id] = mapped_code

mannlabs / ckg Goto Github PK

ckg's People

Contributors

Stargazers

Watchers

Forkers

ckg's Issues

Docker in windows got the following error ERROR [31/78] RUN service neo4j start && sleep 30 && service neo4j stop && cat /var/log/neo4j/neo 0.3s

the docker logs

=> ERROR [16/78] RUN apt-key adv --keyserver hkp://keyserver.ubuntu.com:80 --recv-keys 51716619E084DAB9 0.9s

Recommend Projects

Recommend Topics

Recommend Org

Docker in windows got the following error
ERROR [31/78] RUN service neo4j start && sleep 30 && service neo4j stop && cat /var/log/neo4j/neo 0.3s