Git Product home page Git Product logo

Comments (6)

veleritas avatar veleritas commented on June 18, 2024

At the moment I'm bypassing the error by enclosing the add_edge() call in a try except block and it seems to work fine. Including the up/down regulation edges increased the number of metapaths to ~900, so it does seem to be exponential.

from integrate.

dhimmel avatar dhimmel commented on June 18, 2024

Specifically, the following two types of relationships give duplicate edge errors

@veleritas you're getting the AssertionError: edge already exists? I just reinstalled my integrate conda environment and tried out the two metaedges that were giving you trouble. I didn't get any errors. One possibility is that you ran those notebook cells multiple times? Every repeat execution of a cell containing graph.add_edge will now cause an error.

At the moment I'm bypassing the error by enclosing the add_edge() call in a try except block and it seems to work fine.

Hopefully we can diagnose your issue, so you can remove the error handling here.

Also, is the metaedge generation supposed to be exponential with the number of metapaths in the network? I noticed that if I don't include these types of metapaths in the network, but include everything else, then the number of metapaths drops from 1200 to only 130

It's a combinatorial explosion! Not sure if that counts as exponential. The reason the 5 edges you mention have such a huge effect on the total number of possible metapaths is that they connect genes, compounds, and diseases -- which also have lot's of other metaedges. In the future, I could see some heuristic method that only computed DWPCs for metapaths that were likely to provide novel information.

from integrate.

veleritas avatar veleritas commented on June 18, 2024

So I went back to see if I could pin down the reason why we seem to be getting different results. On a fresh Ubuntu 16.04 instance I have confirmed that integrate.ipynb runs just fine without the edge exists AssertionError with the Anaconda environment specified by https://github.com/dhimmel/integrate/blob/master/environment.yml

(I am using Anaconda 4.3.1 for these tests).

However, if you update the packages in the integrate environment through a conda update --all command, then the integrate notebook breaks on the two edge types that I mentioned in the first comment. It seems weird to me that updating Python dependencies would break the integrate code at this point in time, but it seems like this should classify as a bug?

Here's the environment.yml file dump after the conda update command:

name: integrate
channels:
- defaults
dependencies:
- bleach=1.5.0=py35_0
- cycler=0.10.0=py35_0
- dbus=1.10.10=0
- decorator=4.0.11=py35_0
- entrypoints=0.2.2=py35_1
- et_xmlfile=1.0.1=py35_0
- expat=2.1.0=0
- fontconfig=2.12.1=3
- freetype=2.5.5=2
- glib=2.50.2=1
- gst-plugins-base=1.8.0=0
- gstreamer=1.8.0=0
- html5lib=0.999=py35_0
- icu=54.1=0
- ipykernel=4.5.2=py35_0
- ipython=5.3.0=py35_0
- ipython_genutils=0.1.0=py35_0
- ipywidgets=6.0.0=py35_0
- jdcal=1.3=py35_0
- jinja2=2.9.5=py35_0
- jpeg=9b=0
- jsonschema=2.5.1=py35_0
- jupyter=1.0.0=py35_1
- jupyter_client=5.0.0=py35_0
- jupyter_console=5.1.0=py35_0
- jupyter_core=4.3.0=py35_0
- libffi=3.2.1=1
- libgcc=5.2.0=0
- libgfortran=3.0.0=1
- libiconv=1.14=0
- libpng=1.6.27=0
- libsodium=1.0.10=0
- libxcb=1.12=1
- libxml2=2.9.4=0
- markupsafe=0.23=py35_2
- matplotlib=2.0.0=np112py35_0
- mistune=0.7.4=py35_0
- mkl=2017.0.1=0
- nbconvert=5.1.1=py35_0
- nbformat=4.3.0=py35_0
- notebook=4.4.1=py35_0
- numexpr=2.6.2=np112py35_0
- numpy=1.12.1=py35_0
- openssl=1.0.2k=1
- pandas=0.19.2=np112py35_1
- pandocfilters=1.4.1=py35_0
- path.py=10.1=py35_0
- pcre=8.39=1
- pexpect=4.2.1=py35_0
- pickleshare=0.7.4=py35_0
- pip=9.0.1=py35_1
- prompt_toolkit=1.0.13=py35_0
- ptyprocess=0.5.1=py35_0
- pygments=2.2.0=py35_0
- pyparsing=2.1.4=py35_0
- pyqt=5.6.0=py35_2
- python=3.5.3=1
- python-dateutil=2.6.0=py35_0
- pytz=2016.10=py35_0
- pyzmq=16.0.2=py35_0
- qt=5.6.2=3
- qtconsole=4.2.1=py35_1
- readline=6.2=2
- requests=2.13.0=py35_0
- scipy=0.19.0=np112py35_0
- seaborn=0.7.1=py35_0
- setuptools=27.2.0=py35_0
- simplegeneric=0.8.1=py35_1
- sip=4.18=py35_0
- six=1.10.0=py35_0
- sqlite=3.13.0=0
- terminado=0.6=py35_0
- testpath=0.3=py35_0
- tk=8.5.18=0
- tornado=4.4.2=py35_0
- traitlets=4.3.2=py35_0
- wcwidth=0.1.7=py35_0
- wheel=0.29.0=py35_0
- widgetsnbextension=2.0.0=py35_0
- xlsxwriter=0.9.6=py35_0
- xz=5.2.2=1
- zeromq=4.1.5=0
- zlib=1.2.8=3
- pip:
  - et-xmlfile==1.0.1
  - hetio==0.2.3
  - ipython-genutils==0.1.0
  - jupyter-client==5.0.0
  - jupyter-console==5.1.0
  - jupyter-core==4.3.0
  - prompt-toolkit==1.0.13
  - py2neo==2.0.8
  - tqdm==4.11.2
prefix: /home/ubuntu/anaconda3/envs/integrate

from integrate.

dhimmel avatar dhimmel commented on June 18, 2024

My guess is that some pandas behavior has changed.

Can you see which rows are duplicated using the following:

l1000_df[l1000_df.duplicated(['perturbagen', 'entrez_gene_id'], keep=False)]
stargeo_df[stargeo_df.duplicated(['slim_id', 'entrez_gene_id'], keep=False)]

It seems weird to me that updating Python dependencies would break the integrate code at this point in time, but it seems like this should classify as a bug?

Version changes frequently break things! If you want to update a dependency for an existing codebase, I'd do it one at a time and carefully. I wouldn't recommend conda update --all in these instances. Different codebases have different comparability needs. For example, dhimmel/hetio targets python 3.4+, but for a scripted analysis like dhimmel/integrate it usually makes sense to pick a single environment and stick to it.

That being said, I'm happy to implement a forward compatible syntax if we can figure out what the bug is.

from integrate.

veleritas avatar veleritas commented on June 18, 2024

I can try to figure out what changed to cause these duplicate edges, but that will probably take a few days as I work through other priorities.

from integrate.

dhimmel avatar dhimmel commented on June 18, 2024

I can try to figure out what changed to cause these duplicate edges, but that will probably take a few days as I work through other priorities.

Up to you. The motivation to diagnose it rather than use error handling is the possibility that's it's part of a bigger problem... but if you're getting the expected number of edges, it's probably not a huge issue.

from integrate.

Related Issues (15)

Recommend Projects

  • React photo React

    A declarative, efficient, and flexible JavaScript library for building user interfaces.

  • Vue.js photo Vue.js

    🖖 Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.

  • Typescript photo Typescript

    TypeScript is a superset of JavaScript that compiles to clean JavaScript output.

  • TensorFlow photo TensorFlow

    An Open Source Machine Learning Framework for Everyone

  • Django photo Django

    The Web framework for perfectionists with deadlines.

  • D3 photo D3

    Bring data to life with SVG, Canvas and HTML. 📊📈🎉

Recommend Topics

  • javascript

    JavaScript (JS) is a lightweight interpreted programming language with first-class functions.

  • web

    Some thing interesting about web. New door for the world.

  • server

    A server is a program made to process requests and deliver data to clients.

  • Machine learning

    Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.

  • Game

    Some thing interesting about game, make everyone happy.

Recommend Org

  • Facebook photo Facebook

    We are working to build community through open source technology. NB: members must have two-factor auth.

  • Microsoft photo Microsoft

    Open source projects and samples from Microsoft.

  • Google photo Google

    Google ❤️ Open Source for everyone.

  • D3 photo D3

    Data-Driven Documents codes.