princetonuniversity / aspire Goto Github PK
View Code? Open in Web Editor NEWAlgorithms for Single Particle Reconstruction
License: Other
Algorithms for Single Particle Reconstruction
License: Other
Installation error due to wrong gcc version. Possible errors:
unrecognized command line option "-std=c++11
issue with mex files
Fixed by updating gcc.
Does not deal with more than one quaternion.
This can mess things up in subtle ways. For example, if we call noise_exp2d and noise_rexpr with the same parameters, the noise images generated will be completely correlated with one another.
In general, I think we should limit initstate
to being called by users or in scripts (such as tests) instead of individual functions doing it. Alternatively, if they must reset the seed, they should have the seed as an input and/or reset the original state after the function is done.
Currently, this function contains a lot of FFTs/IFFTs which slows it down. The function is only called from align_main
, where the result is then again transformed into Fourier. Could we speed this up by staying in the Fourier domain from throughout?
Would be nice to have a script in the examples folder that does the whole pipeline. Will make it easier for others to adapt the code since all the key functions will be in there.
When comparing the two, the VDM approach does not seem to bring any improvement, in fact it gives slightly worse performance. For example, running the test_classavg
script with and without VDM classification (that is, by setting or unsetting the use_VDM
flag), the values of mean(acosd(d_f))
goes from 2.18
to 2.03
while max(acosd(d_f))
goes from 10.4
to 6.79
.
Shouldn't we at least observe some improvement from using the VDM classification scheme? Perhaps this is not the correct regime?
If one of the workflow functions crashes midway through, we have to re-run the entire function, which can take time. Also, if we change a subset of the parameters, the entire function has to be run again, even for parts that are unaffected by the change.
It would be nice if the workflow functions could detect what has been calculated already, with what parameters, and in that case skip those calculations, which can save a lot of time in these situations.
Again, there are bar plots that appear when calling VDM
whch in turn calls VDM_LP
, which is where these figures are generated. I suggest suppressing these for the moment and potentially returning these values as outputs that could be plotted by the user if desired.
parpool, GPU
There are some functions (like Initial_classification) that will accept images of any size, but only work on odd-sized images. For even-sized images, they crash. It would probably be good to have a check (in the workflow functions, perhaps) for even-sized images and inform the user of this particular limitation.
Is there some reason this needs to pop up every time we run the function? If so, shouldn't we at least provide some labels?
Running the script
n = 128;
f = load('cleanrib.mat');
vol = f.volref(1:4:end,1:4:end,1:4:end);
vol = vol/max(vol(:));
qs = qrand(n);
im_clean = cryo_project(vol, qs);
[sPCA_data, ~, ~, recon_spca] = data_sPCA(im_clean, 0);
disp(norm(im_clean(:)-recon_spca(:))^2/norm(im_clean(:))^2);
gives us an MSE of about 0.50 by default, which seems a bit high for just applying a basis projection, since we expect our images to be more or less in this basis.
If we edit data_sPCA
to include the line
R = floor(size(images, 1)/2);
before the call to precomp_fb
, the error drops down to around 0.20. Furthermore, if we also include the line
c = 0.5;
it drops further to around 0.10.
Is this expected behavior? It would seem that the sPCA step shouldn't necessarily distort our images to this extent. In other words, we should really only have noise in the orthogonal complement to the basis, so clean images should be preserved.
Specifically, it doesn't seem to give the same angles as the old Initial_classification
when dealing with flipped/reflected images. To see this, consider the test script
n = 1024;
sigma = 0.5;
n_nbor = 3;
f = load('cleanrib.mat');
vol = f.volref(1:4:end,1:4:end,1:4:end);
vol = vol/max(vol(:));
qs = qrand(n);
im_clean = cryo_project(vol, qs);
im = im_clean+sigma*randn(size(im_clean));
[sPCA_data, ~, ~, recon_spca] = data_sPCA(im, sigma^2);
[class, class_refl, class_rot] = Initial_classification_FD(sPCA_data, n_nbor, false);
fname = tempname;
WriteMRC(im, 1, fname);
im_reader = imagestackReader(fname);
tmpdir = tempname;
mkdir(tmpdir);
[~, ~, im_ave_file, ~] = ...
align_main(im_reader, class_rot, class, class_refl, sPCA_data, n_nbor, 0, 1:n, recon_spca, tmpdir);
delete(fname);
rmdir(tmpdir);
im_ave = ReadMRC(im_ave_file);
delete(im_ave_file);
disp(norm(im(:)-im_clean(:))^2/norm(im_clean(:))^2);
disp(norm(im_ave(:)-im_clean(:))^2/norm(im_clean(:))^2);
Now adding the line
class_rot(class_refl==2) = mod(class_rot(class_refl==2)+180, 360);
just after the call to Initial_classification_FD
brings down the MSE from about 0.65 to 0.50. The difference is even greater in less noisy situations, when the MSE for the class averages will actually be larger than those of the original images when not correcting for the angles.
Presumably this has something to do with how the sPCA coefficients and/or bispectrum is computed in the new version. That being said, adding the above line to the Initial_classification_FD
should solve the problem.
Right now, this seems to top out at about one core. Could we speed this up without resorting to a GPU?
When running align_main (through cryo_workflow_classmeans) I am getting the following error:
Reference to non-existent field 'R'.
Error in align_main (line 33)
r_max = FBsPCA_data.R;
Error in cryo_workflow_classmeans_execute (line 53)
[ shifts, corr, unsortedaveragesfname, norm_variance ] = align_main(prewhitened_projs,...
Error in cryo_workflow_classmeans (line 40)
cryo_workflow_classmeans_execute(workflow_fname);
The FBsPCA_data
input that it receives from Initial_classification
is no longer of the right format.
I'm guessing this has to do with Initial_classification
being replaced by Initial_classification_FD
, which no longer generates this structure but gets it from data_sPCA
. Should we fix Initial_classification
to generate this structure itself?
If we don't have any external NUFFT libraries installed, we can at least use the MEX implementations that are already in the package.
Specifically, this function is called in reshift_test1
and reshift_test2
but is nowhere to be found. Was it perhaps present in some earlier version but removed?
A number of functions depend on the output of the VDM
function and so the script crashes when it is not used. It can be fixed by assigning the outputs of the Initial_classiication_FD
function to the VDM
output variables.
Many of the test_
scripts for the various functions take a long time to complete, making them unsuitable for unit tests. It would perhaps be better to separate "software" unit tests that verify the workings on the functions on small examples from "math" tests that check results at large sizes, low/high SNR, etc.
In the same way as the NUFFT packages, we can have a script that downloads and installs SDPLR instead of including it in the package.
Not really an issue but - are we going to keep it in the new release?
Specifically, cryo_workflow_abinitio_execute
calls cryo_assign_orientations_to_raw_projections
, which is in development/projections
. There's a few options:
initpath
(or create a different one, initpath_dev
) although this will make us unlikely to catch these problems.Since align_main
now doesn't take image arrays as input, the example scripts test_ClassAvg1
and test_ClassAvg2
no longer work. This was introduced in 2d9a5b5.
Most likely the way to solve this is to introduce an imagestackReader
subclass that takes an in-memory stack of images and serves those.
For the new code. The new code is already batch-wise, so this is probably not high priority right now.
Not sure what is happening here but ``data_sPCA'' calls this function with only U{2} and Coeff{2} non-empty, so it crashes.
Right now, a lot of it ends up in projections/class_average/simulation, which is not great if you're restricted in terms of space or are on a network drive. Could we have some folder where we put all the generated datasets? Then users can symlink this to wherever they want to store it (like the /scratch directory that we have at PACM).
Since this has some well-written NUFFT functions, we should perhaps include this in the wrapper functions.
There are many levels of subfolders, which gets a bit messy. Is it necessary to have more than one or two levels? I'm talking mainly about
projections/class_average/simulation
reconstruction/FIRM/FIRM
et cetera.
I'm guessing this comes from different contributors copying in their code (with its directory structure) into one folder of the toolbox. Maybe there's some room for reorganization/flattening?
This is due to gpuDeviceCount
not being defined in Octave since it's part of the parallel computing toolbox. Presumably, older versions of MATLAB and those without the parallel computing toolbox would also error on this.
The way it is right now, tempmrcdir
gives the same directory each time it is called. This becomes an issue if we run several processes simultaneously, since they all expect this directory to be empty and used by that process exclusively. For example, if we attempt to run align_main
twice giving the directory returned by tempmrcdir
as the temporary directory, it crashes.
Either this should generate a new directory each time it's called, like tempname
, or it should generate a name that somehow incorporates the process ID. In the latter case, however, this doesn't solve the use case of running something like align_main
twice for different data in the same process.
Due to Octave bug #33523 which does not allow taking means over non-existent dimensions.This was fixed in Octave 4.2.0 but for previous versions, it crashes.
Remove duplicate implementations of utility functions like mask_fuzzy etc. Check aLibs.
Automatic installation of NFFT needed. Currently done manually.
Remove debugging comments, add I/P O/P descriptions.
The way that bispec_Operator_1
is written, it will not yield any bispectrum components if only frequencies 1 and 2 are present. Why is this? We should be able to get the component k1=1, k2=1 and k1+k2=2 in this case, so why does the code not compute it?
When calling it through, cryo_gen_projections
, for example through
cryo_gen_projections(64,1,1);
it crashes due to large imaginary components. This worked a while ago, so some of the changes that we've introduced must have broken it.
Right now, if there is too much noise in the images (that is, big enough that no good PCA components are extracted), the user is presented with some puzzling errors when trying to run the class averaging. We should handle this more gracefully, either by issuing some error, or (better?) by falling back on the original (non-averaged) images. This becomes especially relevant once we want to show plots of the performance for a range of SNRs where we want a transition into the high-noise regime that makes sense.
If there's no phase flipping, no downsampling, no cropping, etc. the preprocessing workflow will delete the input MRCS file.
In many cases, we can gain a significant speedup by fixing an NUFFT plan and executing it several times. Right now, the wrappers do not allow for this, with only limited recycling of plans.
We should separate the planning and execution, which makes a difference for the Chemnitz NFFT package, while allowing the standard nufft* functions to work as before for convenience.
Want to prevent accidentally the frequencies being accidentally transposed in input, for example.
Currently we have ALL of these!
Right now traverses directory tree and runs all the makemex
scripts it can find but none are left. It would be better to have this just run the install scripts that we have.
This is due to an Octave bug in the union
function when given row inputs using the rows
flag. The bug has been fixed in version 4.2.0 but for earlier version, it doesn't work correctly. I suggest implementing a workaround for now since this newer version is not widely distributed yet.
Agree on the style and come up with a template, and document everything else accordingly.
(Preferably not exceed 80 characters).
A declarative, efficient, and flexible JavaScript library for building user interfaces.
๐ Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.
TypeScript is a superset of JavaScript that compiles to clean JavaScript output.
An Open Source Machine Learning Framework for Everyone
The Web framework for perfectionists with deadlines.
A PHP framework for web artisans
Bring data to life with SVG, Canvas and HTML. ๐๐๐
JavaScript (JS) is a lightweight interpreted programming language with first-class functions.
Some thing interesting about web. New door for the world.
A server is a program made to process requests and deliver data to clients.
Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.
Some thing interesting about visualization, use data art
Some thing interesting about game, make everyone happy.
We are working to build community through open source technology. NB: members must have two-factor auth.
Open source projects and samples from Microsoft.
Google โค๏ธ Open Source for everyone.
Alibaba Open Source for everyone
Data-Driven Documents codes.
China tencent open source team.