princetonuniversity / aspire Goto Github PK

View Code? Open in Web Editor NEW

13.0 13.0 3.0 86.66 MB

Algorithms for Single Particle Reconstruction

License: Other

MATLAB 96.29% C 2.91% M 0.56% C++ 0.25%

aspire's People

Contributors

Stargazers

Watchers

Forkers

yunpeng-shi garrettwrong bu22lightyr

aspire's Issues

Installation

Installation error due to wrong gcc version. Possible errors:

unrecognized command line option "-std=c++11
issue with mex files

Fixed by updating gcc.

Fix q_to_rot

Does not deal with more than one quaternion.

`initstate` called in functions

This can mess things up in subtle ways. For example, if we call noise_exp2d and noise_rexpr with the same parameters, the noise images generated will be completely correlated with one another.

In general, I think we should limit initstate to being called by users or in scripts (such as tests) instead of individual functions doing it. Alternatively, if they must reset the seed, they should have the seed as an input and/or reset the original state after the function is done.

``fastrotate`` optimizations?

Currently, this function contains a lot of FFTs/IFFTs which slows it down. The function is only called from align_main, where the result is then again transformed into Fourier. Could we speed this up by staying in the Fourier domain from throughout?

class averaging + common lines + reconstruction script?

Would be nice to have a script in the examples folder that does the whole pipeline. Will make it easier for others to adapt the code since all the key functions will be in there.

VDM class averaging vs. bispectrum class averaging

When comparing the two, the VDM approach does not seem to bring any improvement, in fact it gives slightly worse performance. For example, running the test_classavg script with and without VDM classification (that is, by setting or unsetting the use_VDM flag), the values of mean(acosd(d_f)) goes from 2.18 to 2.03 while max(acosd(d_f)) goes from 10.4 to 6.79.

Shouldn't we at least observe some improvement from using the VDM classification scheme? Perhaps this is not the correct regime?

Workflow caching & continuation

If one of the workflow functions crashes midway through, we have to re-run the entire function, which can take time. Also, if we change a subset of the parameters, the entire function has to be run again, even for parts that are unaffected by the change.

It would be nice if the workflow functions could detect what has been calculated already, with what parameters, and in that case skip those calculations, which can save a lot of time in these situations.

Spurious figures in ``VDM_LP``

Again, there are bar plots that appear when calling VDM whch in turn calls VDM_LP, which is where these figures are generated. I suggest suppressing these for the moment and potentially returning these values as outputs that could be plotted by the user if desired.

Make Parallel Computing Optional

parpool, GPU

Image sizes

There are some functions (like Initial_classification) that will accept images of any size, but only work on odd-sized images. For even-sized images, they crash. It would probably be good to have a check (in the workflow functions, perhaps) for even-sized images and inform the user of this particular limitation.

Bar plot in Bispec_2Drot_1?

Is there some reason this needs to pop up every time we run the function? If so, shouldn't we at least provide some labels?

``choose_support_v6`` chooses small supports

Running the script

n = 128;

f = load('cleanrib.mat');
vol = f.volref(1:4:end,1:4:end,1:4:end);
vol = vol/max(vol(:));

qs = qrand(n);

im_clean = cryo_project(vol, qs);

[sPCA_data, ~, ~, recon_spca] = data_sPCA(im_clean, 0);

disp(norm(im_clean(:)-recon_spca(:))^2/norm(im_clean(:))^2);

gives us an MSE of about 0.50 by default, which seems a bit high for just applying a basis projection, since we expect our images to be more or less in this basis.

If we edit data_sPCA to include the line

R = floor(size(images, 1)/2);

before the call to precomp_fb, the error drops down to around 0.20. Furthermore, if we also include the line

c = 0.5;

it drops further to around 0.10.

Is this expected behavior? It would seem that the sPCA step shouldn't necessarily distort our images to this extent. In other words, we should really only have noise in the orthogonal complement to the basis, so clean images should be preserved.

Remove dependence on IRT

``Initial_classification_FD`` gives incorrect angles

Specifically, it doesn't seem to give the same angles as the old Initial_classification when dealing with flipped/reflected images. To see this, consider the test script

n = 1024;
sigma = 0.5;

n_nbor = 3;

f = load('cleanrib.mat');
vol = f.volref(1:4:end,1:4:end,1:4:end);
vol = vol/max(vol(:));

qs = qrand(n);

im_clean = cryo_project(vol, qs);

im = im_clean+sigma*randn(size(im_clean));

[sPCA_data, ~, ~, recon_spca] = data_sPCA(im, sigma^2);

[class, class_refl, class_rot] = Initial_classification_FD(sPCA_data, n_nbor, false);

fname = tempname;
WriteMRC(im, 1, fname);
im_reader = imagestackReader(fname);

tmpdir = tempname;
mkdir(tmpdir);

[~, ~, im_ave_file, ~] = ...
    align_main(im_reader, class_rot, class, class_refl, sPCA_data, n_nbor, 0, 1:n, recon_spca, tmpdir);

delete(fname);
rmdir(tmpdir);

im_ave = ReadMRC(im_ave_file);

delete(im_ave_file);

disp(norm(im(:)-im_clean(:))^2/norm(im_clean(:))^2);
disp(norm(im_ave(:)-im_clean(:))^2/norm(im_clean(:))^2);

Now adding the line

class_rot(class_refl==2) = mod(class_rot(class_refl==2)+180, 360);

just after the call to Initial_classification_FD brings down the MSE from about 0.65 to 0.50. The difference is even greater in less noisy situations, when the MSE for the class averages will actually be larger than those of the original images when not correcting for the angles.

Presumably this has something to do with how the sPCA coefficients and/or bispectrum is computed in the new version. That being said, adding the above line to the Initial_classification_FD should solve the problem.

Multicore in ``cryo_clmatrix_cpu``

Right now, this seems to top out at about one core. Could we speed this up without resorting to a GPU?

align_main crashes

When running align_main (through cryo_workflow_classmeans) I am getting the following error:

Reference to non-existent field 'R'.

Error in align_main (line 33)
r_max = FBsPCA_data.R;

Error in cryo_workflow_classmeans_execute (line 53)
[ shifts, corr, unsortedaveragesfname, norm_variance ] = align_main(prewhitened_projs,...

Error in cryo_workflow_classmeans (line 40)
cryo_workflow_classmeans_execute(workflow_fname);

``align_main`` broken again

The FBsPCA_data input that it receives from Initial_classification is no longer of the right format.

I'm guessing this has to do with Initial_classification being replaced by Initial_classification_FD, which no longer generates this structure but gets it from data_sPCA. Should we fix Initial_classification to generate this structure itself?

Incorporate previous MEX implementations into NUFFT wrappers

If we don't have any external NUFFT libraries installed, we can at least use the MEX implementations that are already in the package.

Missing `cryo_vol2rays`

Specifically, this function is called in reshift_test1 and reshift_test2 but is nowhere to be found. Was it perhaps present in some earlier version but removed?

``test_classavg`` crashes when ``use_VDM`` is zero

A number of functions depend on the output of the VDM function and so the script crashes when it is not used. It can be fixed by assigning the outputs of the Initial_classiication_FD function to the VDM output variables.

"Unit" tests are too long

Many of the test_ scripts for the various functions take a long time to complete, making them unsuitable for unit tests. It would perhaps be better to separate "software" unit tests that verify the workings on the functions on small examples from "math" tests that check results at large sizes, low/high SNR, etc.

Separate SDPLR installation script

In the same way as the NUFFT packages, we can have a script that downloads and installs SDPLR instead of including it in the package.

Refinement package

Not really an issue but - are we going to keep it in the new release?

Current workflow uses functions in development

Specifically, cryo_workflow_abinitio_execute calls cryo_assign_orientations_to_raw_projections, which is in development/projections. There's a few options:

We can move this out of development into the standard function space,
We can remove the call from the workflow, or
We can add the development folders to the initpath (or create a different one, initpath_dev) although this will make us unlikely to catch these problems.

test_ClassAvg example scripts broken

Since align_main now doesn't take image arrays as input, the example scripts test_ClassAvg1 and test_ClassAvg2 no longer work. This was introduced in 2d9a5b5.

Most likely the way to solve this is to introduce an imagestackReader subclass that takes an in-memory stack of images and serves those.

Out of core workflow

For the new code. The new code is already batch-wise, so this is probably not high priority right now.

Example scripts

Document example scripts better.
Move all examples from subdirs to one main examples folder.
Ensure that each package has its own examples.
Make faster tests (don't use large images, lot of images in the example scripts).

``denoise_images_analytical'' crashes when U{1} and Coeff{1} are empty

Not sure what is happening here but ``data_sPCA'' calls this function with only U{2} and Coeff{2} non-empty, so it crashes.

Put generated data in a single spot

Right now, a lot of it ends up in projections/class_average/simulation, which is not great if you're restricted in terms of space or are on a network drive. Could we have some folder where we put all the generated datasets? Then users can symlink this to wherever they want to store it (like the /scratch directory that we have at PACM).

Include IRT package in NUFFT wrappers

Since this has some well-written NUFFT functions, we should perhaps include this in the wrapper functions.

Directory structure depth?

There are many levels of subfolders, which gets a bit messy. Is it necessary to have more than one or two levels? I'm talking mainly about

projections/class_average/simulation
reconstruction/FIRM/FIRM

et cetera.

I'm guessing this comes from different contributors copying in their code (with its directory structure) into one folder of the toolbox. Maybe there's some room for reorganization/flattening?

``cryo_clmatrix`` crashes in Octave

This is due to gpuDeviceCount not being defined in Octave since it's part of the parallel computing toolbox. Presumably, older versions of MATLAB and those without the parallel computing toolbox would also error on this.

``tempmrcdir`` collisions

The way it is right now, tempmrcdir gives the same directory each time it is called. This becomes an issue if we run several processes simultaneously, since they all expect this directory to be empty and used by that process exclusively. For example, if we attempt to run align_main twice giving the directory returned by tempmrcdir as the temporary directory, it crashes.

Either this should generate a new directory each time it's called, like tempname, or it should generate a name that somehow incorporates the process ID. In the latter case, however, this doesn't solve the use case of running something like align_main twice for different data in the same process.

cryo_syncmatrixIJ_vote crashes in Octave (< 4.2.0)

Due to Octave bug #33523 which does not allow taking means over non-existent dimensions.This was fixed in Octave 4.2.0 but for previous versions, it crashes.

Utility functions

Remove duplicate implementations of utility functions like mask_fuzzy etc. Check aLibs.

NFFT installation

Automatic installation of NFFT needed. Currently done manually.

CWF documentation

Remove debugging comments, add I/P O/P descriptions.

Number of frequencies required for non-empty bispectrum

The way that bispec_Operator_1 is written, it will not yield any bispectrum components if only frequencies 1 and 2 are present. Why is this? We should be able to get the component k1=1, k2=1 and k1+k2=2 in this case, so why does the code not compute it?

Imaginary components in `cryo_project`

When calling it through, cryo_gen_projections, for example through

cryo_gen_projections(64,1,1);

it crashes due to large imaginary components. This worked a while ago, so some of the changes that we've introduced must have broken it.

Graceful handling of too much noise

Right now, if there is too much noise in the images (that is, big enough that no good PCA components are extracted), the user is presented with some puzzling errors when trying to run the class averaging. We should handle this more gracefully, either by issuing some error, or (better?) by falling back on the original (non-averaged) images. This becomes especially relevant once we want to show plots of the performance for a range of SNRs where we want a transition into the high-noise regime that makes sense.

Prevent cryo_workflow_preprocess from deleting input data

If there's no phase flipping, no downsampling, no cropping, etc. the preprocessing workflow will delete the input MRCS file.

Separate planning and execution of NUFFT functions

In many cases, we can gain a significant speedup by fixing an NUFFT plan and executing it several times. Right now, the wrappers do not allow for this, with only limited recycling of plans.

We should separate the planning and execution, which makes a difference for the Chemnitz NFFT package, while allowing the standard nufft* functions to work as before for convenience.

Check input in NUFFT wrappers

Want to prevent accidentally the frequencies being accidentally transposed in input, for example.

Quaternions / Rotation Matrices/ Euler Angles

Currently we have ALL of these!

Replace FB_SPCA

Needed by Initial_classification. The function was in projections/FBsPCA but was removed in b10c9bd. Presumably, its replacement is in the projections/Denoising/ffb folder that was added in 68987af, but it's not clear to me how the call to FB_SPCA should be replaced.

Fix function names

Get rid of weird names that have version numbers in them (for Tejal)

Do we want to follow a standard naming convention? @yoelsh @janden

Redo install.m script

Right now traverses directory tree and runs all the makemex scripts it can find but none are left. It would be better to have this just run the install scripts that we have.

ASPIRE README/ tutorial

This is still on our to-do list. @yoelsh @janden

``sort_list_weights_wrefl`` crashes in Octave (<4.2.0)

This is due to an Octave bug in the union function when given row inputs using the rows flag. The bug has been fixed in version 4.2.0 but for earlier version, it doesn't work correctly. I suggest implementing a workaround for now since this newer version is not widely distributed yet.

Standard Documentation Style

Agree on the style and come up with a template, and document everything else accordingly.
(Preferably not exceed 80 characters).