Git Product home page Git Product logo

aspire's People

Contributors

adishasha avatar aizeny avatar amitsinger avatar eitanrose avatar garrettwrong avatar ido90 avatar itaysason avatar janden avatar junchaoxia avatar pgabi avatar rbrook avatar yaelharpaz1 avatar yoelsh avatar zhizhenz avatar

Stargazers

 avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar

Watchers

 avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar

aspire's Issues

Installation

Installation error due to wrong gcc version. Possible errors:

unrecognized command line option "-std=c++11
issue with mex files

Fixed by updating gcc.

`initstate` called in functions

This can mess things up in subtle ways. For example, if we call noise_exp2d and noise_rexpr with the same parameters, the noise images generated will be completely correlated with one another.

In general, I think we should limit initstate to being called by users or in scripts (such as tests) instead of individual functions doing it. Alternatively, if they must reset the seed, they should have the seed as an input and/or reset the original state after the function is done.

``fastrotate`` optimizations?

Currently, this function contains a lot of FFTs/IFFTs which slows it down. The function is only called from align_main, where the result is then again transformed into Fourier. Could we speed this up by staying in the Fourier domain from throughout?

VDM class averaging vs. bispectrum class averaging

When comparing the two, the VDM approach does not seem to bring any improvement, in fact it gives slightly worse performance. For example, running the test_classavg script with and without VDM classification (that is, by setting or unsetting the use_VDM flag), the values of mean(acosd(d_f)) goes from 2.18 to 2.03 while max(acosd(d_f)) goes from 10.4 to 6.79.

Shouldn't we at least observe some improvement from using the VDM classification scheme? Perhaps this is not the correct regime?

Workflow caching & continuation

If one of the workflow functions crashes midway through, we have to re-run the entire function, which can take time. Also, if we change a subset of the parameters, the entire function has to be run again, even for parts that are unaffected by the change.

It would be nice if the workflow functions could detect what has been calculated already, with what parameters, and in that case skip those calculations, which can save a lot of time in these situations.

Spurious figures in ``VDM_LP``

Again, there are bar plots that appear when calling VDM whch in turn calls VDM_LP, which is where these figures are generated. I suggest suppressing these for the moment and potentially returning these values as outputs that could be plotted by the user if desired.

Image sizes

There are some functions (like Initial_classification) that will accept images of any size, but only work on odd-sized images. For even-sized images, they crash. It would probably be good to have a check (in the workflow functions, perhaps) for even-sized images and inform the user of this particular limitation.

Bar plot in Bispec_2Drot_1?

Is there some reason this needs to pop up every time we run the function? If so, shouldn't we at least provide some labels?

``choose_support_v6`` chooses small supports

Running the script

n = 128;

f = load('cleanrib.mat');
vol = f.volref(1:4:end,1:4:end,1:4:end);
vol = vol/max(vol(:));

qs = qrand(n);

im_clean = cryo_project(vol, qs);

[sPCA_data, ~, ~, recon_spca] = data_sPCA(im_clean, 0);

disp(norm(im_clean(:)-recon_spca(:))^2/norm(im_clean(:))^2);

gives us an MSE of about 0.50 by default, which seems a bit high for just applying a basis projection, since we expect our images to be more or less in this basis.

If we edit data_sPCA to include the line

R = floor(size(images, 1)/2);

before the call to precomp_fb, the error drops down to around 0.20. Furthermore, if we also include the line

c = 0.5;

it drops further to around 0.10.

Is this expected behavior? It would seem that the sPCA step shouldn't necessarily distort our images to this extent. In other words, we should really only have noise in the orthogonal complement to the basis, so clean images should be preserved.

``Initial_classification_FD`` gives incorrect angles

Specifically, it doesn't seem to give the same angles as the old Initial_classification when dealing with flipped/reflected images. To see this, consider the test script

n = 1024;
sigma = 0.5;

n_nbor = 3;

f = load('cleanrib.mat');
vol = f.volref(1:4:end,1:4:end,1:4:end);
vol = vol/max(vol(:));

qs = qrand(n);

im_clean = cryo_project(vol, qs);

im = im_clean+sigma*randn(size(im_clean));

[sPCA_data, ~, ~, recon_spca] = data_sPCA(im, sigma^2);

[class, class_refl, class_rot] = Initial_classification_FD(sPCA_data, n_nbor, false);

fname = tempname;
WriteMRC(im, 1, fname);
im_reader = imagestackReader(fname);

tmpdir = tempname;
mkdir(tmpdir);

[~, ~, im_ave_file, ~] = ...
    align_main(im_reader, class_rot, class, class_refl, sPCA_data, n_nbor, 0, 1:n, recon_spca, tmpdir);

delete(fname);
rmdir(tmpdir);

im_ave = ReadMRC(im_ave_file);

delete(im_ave_file);

disp(norm(im(:)-im_clean(:))^2/norm(im_clean(:))^2);
disp(norm(im_ave(:)-im_clean(:))^2/norm(im_clean(:))^2);

Now adding the line

class_rot(class_refl==2) = mod(class_rot(class_refl==2)+180, 360);

just after the call to Initial_classification_FD brings down the MSE from about 0.65 to 0.50. The difference is even greater in less noisy situations, when the MSE for the class averages will actually be larger than those of the original images when not correcting for the angles.

Presumably this has something to do with how the sPCA coefficients and/or bispectrum is computed in the new version. That being said, adding the above line to the Initial_classification_FD should solve the problem.

align_main crashes

When running align_main (through cryo_workflow_classmeans) I am getting the following error:

Reference to non-existent field 'R'.

Error in align_main (line 33)
r_max = FBsPCA_data.R;

Error in cryo_workflow_classmeans_execute (line 53)
[ shifts, corr, unsortedaveragesfname, norm_variance ] = align_main(prewhitened_projs,...

Error in cryo_workflow_classmeans (line 40)
cryo_workflow_classmeans_execute(workflow_fname);

``align_main`` broken again

The FBsPCA_data input that it receives from Initial_classification is no longer of the right format.

I'm guessing this has to do with Initial_classification being replaced by Initial_classification_FD, which no longer generates this structure but gets it from data_sPCA. Should we fix Initial_classification to generate this structure itself?

Missing `cryo_vol2rays`

Specifically, this function is called in reshift_test1 and reshift_test2 but is nowhere to be found. Was it perhaps present in some earlier version but removed?

``test_classavg`` crashes when ``use_VDM`` is zero

A number of functions depend on the output of the VDM function and so the script crashes when it is not used. It can be fixed by assigning the outputs of the Initial_classiication_FD function to the VDM output variables.

"Unit" tests are too long

Many of the test_ scripts for the various functions take a long time to complete, making them unsuitable for unit tests. It would perhaps be better to separate "software" unit tests that verify the workings on the functions on small examples from "math" tests that check results at large sizes, low/high SNR, etc.

Separate SDPLR installation script

In the same way as the NUFFT packages, we can have a script that downloads and installs SDPLR instead of including it in the package.

Refinement package

Not really an issue but - are we going to keep it in the new release?

Current workflow uses functions in development

Specifically, cryo_workflow_abinitio_execute calls cryo_assign_orientations_to_raw_projections, which is in development/projections. There's a few options:

  • We can move this out of development into the standard function space,
  • We can remove the call from the workflow, or
  • We can add the development folders to the initpath (or create a different one, initpath_dev) although this will make us unlikely to catch these problems.

test_ClassAvg example scripts broken

Since align_main now doesn't take image arrays as input, the example scripts test_ClassAvg1 and test_ClassAvg2 no longer work. This was introduced in 2d9a5b5.

Most likely the way to solve this is to introduce an imagestackReader subclass that takes an in-memory stack of images and serves those.

Out of core workflow

For the new code. The new code is already batch-wise, so this is probably not high priority right now.

Example scripts

  • Document example scripts better.
  • Move all examples from subdirs to one main examples folder.
  • Ensure that each package has its own examples.
  • Make faster tests (don't use large images, lot of images in the example scripts).

Put generated data in a single spot

Right now, a lot of it ends up in projections/class_average/simulation, which is not great if you're restricted in terms of space or are on a network drive. Could we have some folder where we put all the generated datasets? Then users can symlink this to wherever they want to store it (like the /scratch directory that we have at PACM).

Directory structure depth?

There are many levels of subfolders, which gets a bit messy. Is it necessary to have more than one or two levels? I'm talking mainly about

projections/class_average/simulation
reconstruction/FIRM/FIRM

et cetera.

I'm guessing this comes from different contributors copying in their code (with its directory structure) into one folder of the toolbox. Maybe there's some room for reorganization/flattening?

``cryo_clmatrix`` crashes in Octave

This is due to gpuDeviceCount not being defined in Octave since it's part of the parallel computing toolbox. Presumably, older versions of MATLAB and those without the parallel computing toolbox would also error on this.

``tempmrcdir`` collisions

The way it is right now, tempmrcdir gives the same directory each time it is called. This becomes an issue if we run several processes simultaneously, since they all expect this directory to be empty and used by that process exclusively. For example, if we attempt to run align_main twice giving the directory returned by tempmrcdir as the temporary directory, it crashes.

Either this should generate a new directory each time it's called, like tempname, or it should generate a name that somehow incorporates the process ID. In the latter case, however, this doesn't solve the use case of running something like align_main twice for different data in the same process.

Utility functions

Remove duplicate implementations of utility functions like mask_fuzzy etc. Check aLibs.

Number of frequencies required for non-empty bispectrum

The way that bispec_Operator_1 is written, it will not yield any bispectrum components if only frequencies 1 and 2 are present. Why is this? We should be able to get the component k1=1, k2=1 and k1+k2=2 in this case, so why does the code not compute it?

Imaginary components in `cryo_project`

When calling it through, cryo_gen_projections, for example through

cryo_gen_projections(64,1,1);

it crashes due to large imaginary components. This worked a while ago, so some of the changes that we've introduced must have broken it.

Graceful handling of too much noise

Right now, if there is too much noise in the images (that is, big enough that no good PCA components are extracted), the user is presented with some puzzling errors when trying to run the class averaging. We should handle this more gracefully, either by issuing some error, or (better?) by falling back on the original (non-averaged) images. This becomes especially relevant once we want to show plots of the performance for a range of SNRs where we want a transition into the high-noise regime that makes sense.

Separate planning and execution of NUFFT functions

In many cases, we can gain a significant speedup by fixing an NUFFT plan and executing it several times. Right now, the wrappers do not allow for this, with only limited recycling of plans.

We should separate the planning and execution, which makes a difference for the Chemnitz NFFT package, while allowing the standard nufft* functions to work as before for convenience.

Replace FB_SPCA

Needed by Initial_classification. The function was in projections/FBsPCA but was removed in b10c9bd. Presumably, its replacement is in the projections/Denoising/ffb folder that was added in 68987af, but it's not clear to me how the call to FB_SPCA should be replaced.

Redo install.m script

Right now traverses directory tree and runs all the makemex scripts it can find but none are left. It would be better to have this just run the install scripts that we have.

``sort_list_weights_wrefl`` crashes in Octave (<4.2.0)

This is due to an Octave bug in the union function when given row inputs using the rows flag. The bug has been fixed in version 4.2.0 but for earlier version, it doesn't work correctly. I suggest implementing a workaround for now since this newer version is not widely distributed yet.

Standard Documentation Style

Agree on the style and come up with a template, and document everything else accordingly.
(Preferably not exceed 80 characters).

Recommend Projects

  • React photo React

    A declarative, efficient, and flexible JavaScript library for building user interfaces.

  • Vue.js photo Vue.js

    ๐Ÿ–– Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.

  • Typescript photo Typescript

    TypeScript is a superset of JavaScript that compiles to clean JavaScript output.

  • TensorFlow photo TensorFlow

    An Open Source Machine Learning Framework for Everyone

  • Django photo Django

    The Web framework for perfectionists with deadlines.

  • D3 photo D3

    Bring data to life with SVG, Canvas and HTML. ๐Ÿ“Š๐Ÿ“ˆ๐ŸŽ‰

Recommend Topics

  • javascript

    JavaScript (JS) is a lightweight interpreted programming language with first-class functions.

  • web

    Some thing interesting about web. New door for the world.

  • server

    A server is a program made to process requests and deliver data to clients.

  • Machine learning

    Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.

  • Game

    Some thing interesting about game, make everyone happy.

Recommend Org

  • Facebook photo Facebook

    We are working to build community through open source technology. NB: members must have two-factor auth.

  • Microsoft photo Microsoft

    Open source projects and samples from Microsoft.

  • Google photo Google

    Google โค๏ธ Open Source for everyone.

  • D3 photo D3

    Data-Driven Documents codes.