ska-sciencedataprocessor / fastimaging Goto Github PK

View Code? Open in Web Editor NEW

6.0 6.0 3.0 45.03 MB

proto typing for fast imaging pipeline

License: Apache License 2.0

CMake 4.81% C++ 92.62% C 0.25% Shell 1.67% Ruby 0.16% Python 0.48%

fastimaging's People

Contributors

Stargazers

Watchers

Forkers

cnwangfeng miguelcarcamov dreamplayer-zhang

fastimaging's Issues

Development plan

Current:

Issue #15: Pipeline / utility executables
PR #13 Review and merge setup scripts for Vagrant development VM.
Issue #12: C++/Python wrapping (PyBind11 or Boost::Python?)
Refactor of 'oversampling' parameter handling (blocked by SKA-ScienceDataProcessor/FastImaging-Python#1)
Scripts and accompanying documentation explaining how run the 'reduce' script under profiling / benchmarking.
Documented results (preferably, scripts to generate and plot results) on scaling behaviour (to be discussed with Anna)

Old

Configuration options e.g. UV grid size (Use Boost program_options?)

We create a pixel-array corresponding to a grid in UV space, for sampling the visibilities.
Generally it's easiest to ask the user to specify the corresponding grid in the image-plane, with parameters cellsize (angle, in units of arc) and Ncells. This then corresponds to an image plane subtending an angle image_width = Ncells*cellsize. A sensible choice of uv-cellsize (in multiples lambda) is then uv_cellsize = 1/image_width = 1/(Ncells*cellsize).

We should provide parameters for specifying Ncells and cellsize. Boost program_options might be a good library for specifying the command line interface, since we can optionally use a config file rather than specifying everything on the command line.

As of 26th Sept this is handled in the Python reference code here

Simple imaging script (gridding + FFT)

I've just added a simple imaging script to the Python repository, here:
https://github.com/SKA-ScienceDataProcessor/FastImaging-Python/blob/master/src/fastimgproto/scripts/simple_imager.py

This gives a simple end-to-end example of loading in visibility data, gridding, applying FT to image space and then writing to file - we should try to mirror this with a C++ implementation to check we get consistent results.

(The FFT calls are here: https://github.com/SKA-ScienceDataProcessor/FastImaging-Python/blob/master/src/fastimgproto/imager.py)

Integrate cnpy into build, convert to/from Armadillo arrays, test.

For initial development we should probably keep data input/output format as simple (and off-the-shelf) as possible. I suggest we use Numpy binary-array format, since this has a basic level of functionality (N-dimensional arrays, ints / floats / doubles, etc) and should give trivial inter-operability with Python (while being significantly more efficient than savetxt ascii format). For now, any additional metadata such as pointing-centre, etc will be handled separately (as part of program arguments / configuration files, etc).

cnpy provides a ready-made library for loading Numpy binary arrays.

I think we probably want a pair of functions to load / save Armadillo matrices from / to .npy format, thus hiding the details of type-casting, etc from the calling code.

We should probably add the cnpy repository as a submodule, as we already do for the google-benchmark library.

Tests

Load a MxN numpy binary array of zero-valued doubles to Arma Matrix format. Alter the values in the four corners, and re-save to Numpy format. Check that the index-convention behaves as expected (same / different to Numpy?).

Stylistic issue: care with ``using namespace``

This is a low-priority, but I wanted to bring up use of the using namespace operator, as here:

FastImaging/src/libstp/convolution/conv_func.h

Line 15 in 2bf7f0b

using namespace arma;

At the risk of very long function definitions, I would suggest we restrict use of the using namespace aliases to implementation (.cpp) files, and always use fully qualified names in the header files. The reasoning behind this is that otherwise you are effectively throwing the namespace away and injecting the full contents of the 'arma', 'std' namespaces in the global namespace of any client code which calls the stp library, which potentially causes confusion and /or name collisions (e.g. what if I'm using another library with a mat class, or, where do I go to look up the documention on the accu function?

Minor syntax / style issues

Namespacing
Library classes and functions should probably be declared inside a namespace. How about 'stp', or is that too similar to the standard library ('std')?
Exception specifications are deprecated
very minor issue (there's only two usages currently), but some deprecated syntax has crept in:

FastImaging/src/stp-runner/stp_runner.h

Line 142 in e0f8de8

void init_logger() throw(TCLAP::ArgException);

http://stackoverflow.com/questions/13841559/deprecated-throw-list-in-c11

(I only noticed because I saw it while reading through and thought 'that looks weird!').
Line wraps
Not sure about the best way to handle this one, but there are some very long lines in the codebase (e.g. 240 chars long) - do Critical generally just use recommend soft-wrapping in the editor to make these readable?

Convolution functions

Implement basic radial / 1d functions r -> f(r), for:

Tophat (aka 'pillbox'): f(r) = 1 if r< r_thresh, otherwise f(r)=0. (This is terrible in real imaging practice but makes a convenient test-function for verifying convolutions procedures, etc)
Sinc function: f(r) = sin(r)/(w*r) where w is a width parameter (usually 1.0).
Gaussian sinc: f(r) = sin(pi/alpha_1*(r+eps))/(pi*(r+eps)) * exp(-(r/alpha_2)**2), where
alpha_1 = 1.55, alpha_2 = 2.52, and eps = 1e-10 to avoid div by zero.

(For reference see e.g. Chapter 7 of Taylor 1998, Synthesis imaging in radio astronomy II).

(Prolate spheroidal to be tackled later).

NB all functions are truncated by the finite pixel-width of the convolution area.

Note that we will want to use these functions interchangably as 'drop-ins' for the convolution routine. For an initial implementation they can simply be functions, so they must all have the same function call signature. Any variable parameters (e.g. r_thresh for the tophat) can be hard-coded. In the long run we might consider switching to more sophisticated implementation that allows for initialization with varying parameters (a callable class).

Python implementation (work-in-progress) can be found here, with unit-tests here.

Debian "Testing" how to

Installation steps

Download the full DVD ISO (3.7 GB) from here
Download the non-free firmware (e.g. Intel WLAN drivers) from here
Unzip the firmware on to a USB stick, in a directory named "firmware". More about this here
Plug-in both USB sticks (or any other media) and boot the machine
Follow the installation process

Optional

After first boot, replace /etc/apt/sources.list with the following: sources.list.zip

Notes

Using the netinst ISO is discouraged as it may lack needed drivers for the hardware.
It's possible to create a custom installer following these instructions.
To be able to use sudo, be sure not to provide a root password during the installation. That way the root account will be disabled.

Packages to install

build-essential
cmake
git
cppcheck
valgrind
libarmadillo-dev
libspdlog-dev
libtclap-dev

2-d kernel generation

We need to be able to generate a 2-d grid of regular samples of the convolution functions.

The 2-d convolution function is typically separable, i.e. func2d(x,y) = func1d(x)*func1d(y)

We should be able to generate kernels at differing sub-pixel offsets, and with optional oversampling.

Exact convolutional gridder

For validation and testing purposes, it would be useful to implement an exact convolutional gridder.
The algorithm is as follows:

For each UV sample (NB, with co-ordinates at non-integer positions in UV space):
- Find all grid-sample positions within convolution_radius (typically integer multiple of uv pixel width).
- For each grid-sample:
  - Calculate the convolution co-efficient (c) for this radius, i.e.
    
    c = conv_kernel_function( distance(uv_position, grid_sample_position))
  - Increment the grid value according to the convolution, i.e.:
    
    grid_sample_value += c * uv_sample_value

The gridder should take the kernel function as a parameter. (i.e. those functions implemented in issue #3).

Refine Data Models: expected impact on the prototype

Hi Tim,

I've noticed the recent activity on the topic Refine Data Models for PIP.IMG in which you participate actively. After skimming through the page it becomes clear that any outcomes of the debate will impact the prototype.
In your expert opinion, how likely is the refined version to become a lot different to the current reference implementation? Will it be a complete overhaul, or just an adjustment of the data models keeping the flow and interfaces?

Should the expected impact be large, I'll need to take action as soon as possible in order to prepare the project for a smooth adjustment later on when changes start to appear.

Error when compiling

FastImaging-master/src/stp/imager/imager.h:112:87: error: cannot convert ‘std::complex’ to ‘double’ for argument 3’ to ‘void cblas_zdscal(blasint, double, double*, blasint)’

and other same errors in imager.h

Use a git-submodule for lib-armadillo source

As discussed briefly via Skype, I suggest we use a git submodule to supply the armadillo source. (I've already uploaded a copy of the 7.5 tarball as a new repository). I have a few reasons for this:

Reliability and stability: Armadillo is hosted via Sourceforge, which has had some issues with distributing malware in the past. While I don't think this affects source-tarball downloads, I'm disinclined to rely on sourceforge as our sole point of download.
Convenience: We're using the latest build of Armadillo (version 7) which is more recent than the system apt-get package on e.g. Ubuntu 16.04. So, we can expect users to need to download and build the code as part of running this library. Might as well provide it as part of the repository.
Versioning: The Readme currently suggests using Armadillo 7.4, but the Armadillo website states that 7.5 is the latest version. So which should I use? A git submodule with a tested tag removes any ambiguity.
Reproducible build options: It seems quite likely that how the armadillo dependency has been built could be important for performance benchmarks - ideally we should integrate the armadillo build into our CMake build scripts, so that the precise build options are documented and can be reliably reproduced.

Purely for convenience, it may be worth pulling in libgtest as a submodule, too.

Pipeline / utility executables

We would like a C++ equivalent to the 'reduce' Python script, so we can easily run the C++ code on test-datasets and profile CPU / RAM usage (link to reduce.py).

Less important, but also useful would be equivalent scripts for making a basic image, and for independently running the sourcefinder, similar to image.py and sourcefind.py.

C++/Python bindings

Python bindings to some of the C++ routines are desirable.

We agreed that I would define the desired interfaces and Critical will implement them accordingly, the first definition is here:
https://github.com/SKA-ScienceDataProcessor/FastImaging-Python/blob/master/src/fastimgproto/bindings/imager.py#L98
( _cpp_image_visibilities). It is accompanied by an equivalent Python function (_python_image_visibilities) for reference.

I'll also define a suitable sourcefinder interface when that's implemented in the C++ codebase.