Git Product home page Git Product logo

kikuchipy's People

Stargazers

 avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar

Watchers

 avatar  avatar  avatar  avatar  avatar

kikuchipy's Issues

Add logging

We should use logging to track what our code does. Some example usages from HyperSpy:

    def normalize_poissonian_noise(self, navigation_mask=None,
                                   signal_mask=None):
        [...]        
        _logger.info(
            "Scaling the data to normalize the (presumably)"
            " Poissonian noise")
def file_reader(filename, endianess='<', mmap_mode=None,
                lazy=False, **kwds):
    _logger.debug("Reading blockfile: %s" % filename)
    [...]
    try:
        header['Note'] = note.decode("latin1").strip("\x00")
    except BaseException:
        # Not sure about the encoding so, if it fails, we carry on
        _logger.warning(
            "Reading the Note metadata of this file failed. "
            "You can help improving "
            "HyperSpy by reporting the issue in "
            "https://github.com/hyperspy/hyperspy")
    _logger.debug("File header: " + str(header))

I think this tutorial is a good place to start to both understand logging and use it correctly: https://realpython.com/python-logging/.

Grain boundary map from virtual foreward scattered detector (VFSD) images

After discussion with Jarle Hjelen, related to #12.

The pattern intensity on boundaries are usually higher than within grains because of pattern overlap. We could use this to get a binary grain boundary map by:

  1. Binning the signal axes and acquiring VFSD images from all bins
  2. Thresholding the images and binarizing them
  3. Summing the binarized images

Some thoughts/caveats:

  • Thresholding a multi phase data set might not be straight forward?
  • Bin size should be carefully selected

Loading .hdf5 files with `kikuchipy.load()` raises error because no default signal is available

Describe the bug
Calling s = kp.load('/path/to/Pattern.hdf5') raises:

Traceback (most recent call last):
[...]
    s = kp.load(data)
  File "/home/hakon/kode/kikuchipy/kikuchipy/io.py", line 159, in load
    for filename in filenames]
  File "/home/hakon/kode/kikuchipy/kikuchipy/io.py", line 159, in <listcomp>
    for filename in filenames]
  File "/home/hakon/kode/kikuchipy/kikuchipy/io.py", line 194, in load_single_file
    return load_with_reader(filename=filename, reader=reader, **kwargs)
  File "/home/hakon/kode/kikuchipy/kikuchipy/io.py", line 231, in load_with_reader
    objects.append(dict2signal(signal_dict, lazy=lazy, axis=axis))
  File "/home/hakon/kode/kikuchipy/kikuchipy/io.py", line 294, in dict2signal
    lazy=lazy)(**signal_dict)
  File "/home/hakon/kode/kikuchipy/kikuchipy/io.py", line 382, in assign_signal_subclass
    and s._signal_type == ''][0]
IndexError: list index out of range

since none of our classes have a blank _signal_type attribute, hence the list we try to get index 0 of is empty. Because the .hdf5 file has no metadata explaining if it is a RadonTransform or an EBSD data set, and both of these classes have the same _signal_dimension and _signal_type, we cannot distinguish between them in assign_signal_subclass().

To Reproduce

import kikuchipy as kp
kp.load('/path/to/Pattern.hdf5')  # With no metadata

Suggested solution

  • Quickest: Pass signal_type to load() like so: kp.load('/path/to/Pattern.hdf5', signal_type='electron_backscatter_diffraction') (or signal_type='EBSD', as EBSD will be available as a _alias_signal_types when I push the hotfix commit for this problem). Or signal_type='RadonTransform' if applicable.
  • Alternative: Set signal_type='EBSD' as the default when loading .hdf5 files without necessary metadata. In HyperSpy, the default is Signal1D or Signal2D.

Additional context
This is no problem when loading from a .hdf5 or .hspy file with metadata with the signal_type parameter.

Thin or bin data set along navigation axes (fewer patterns but same ROI)

Is your feature request related to a problem? Please describe.
A convenience method to bin data along the navigation axes, i.e. removing e.g. every other pattern along x and y, would be nice to have when e.g. a data set is large.

Describe the solution you'd like
A method called perhaps thin() or trim() should accept parameters called perhaps xstep and ystep stating which patterns to remove (e.g. xstep=2 removes every other pattern from the x-direction). The distance between each pattern must be the same for all patterns in each direction. Step sizes in axes_manager must of course be updated.

Describe alternatives you've considered
This should work

import numpy as np
original = np.arange(25).reshape(5, 5)
xstep = 2
ystep = 2
thinned = original[::ystep, ::xstep]

I am not sure what is the most cost effective way in terms of memory and speed.

remove_background gives unexpected errors

I noticed some new errors when loading a hdf5 dataset with hs and then converting it to kp.EBSD signal. I also removed dead pixels before proceeding to the background removal, and did not do it inplace, so the error might lie there too.

s = hs.load('Pattern.hdf5', lazy = False)
s = kp.EBSD(s)
deadpixels = s.find_deadpixels(mask=mask, pattern_number=30, to_plot=True)
s_rd = s.remove_deadpixels(deadpixels=deadpixels, inplace=False)

For completion, this is what the metadata looks like:

s_rd.metadata
Out[11]:
├── General
│ └── title =
└── Signal
├── binned = False
└── signal_type = electron_backscatter_diffraction
s_rd.original_metadata
Out[12]:
└── Acquisition_instrument
└── SEM
└── Detector
└── EBSD
├── deadpixels = array([[ 6, 156],
[ 28, 4],
[ 28, 150],
[ 38, 20],
... , 130],
[180, 107],
[193, 26],
[222, 91]], dtype=int64)
├── deadpixels_corrected = True
└── deadvalue = average

    1. Error due to bg image search, if bg file path is not supplied and the signal has not been loaded with kp, so that the metadata has not been set correctly:

s_rd.remove_background(static=True, dynamic=True)

line 196, in remove_background
bg = os.path.join(omd.General.original_filepath, bg_fname)
AttributeError: 'DictionaryTreeBrowser' object has no attribute 'General'

    1. Error due to deadpixels check:
s_rd.remove_background(static=True, dynamic=True, bg = 'Background acquisition pattern.bmp')

line 207, in remove_background
if omd.deadpixels_corrected and omd.deadpixels and omd.deadvalue:
ValueError: The truth value of an array with more than one element is ambiguous. Use a.any() or a.all()

Static background correction with operation='divide' clips intensities

Description

Static background correction with operation='divide' clips intensities since patterns are cast to int16 instead of float16.

Way to reproduce

>>> import kikuchipy as kp
>>> import numpy as np
>>> s = kp.load('some/data/with/background_pattern')
>>> s.static_background_correction(operation='divide')
>>> s.plot()  # Rubbish, clipped intensities

Version information

Not relevant.

Expected behaviour

Should of course cast to float16 or float32 when dividing... All use of data types should be managed generally, allowing the user to input patterns in whatever data type they want. Simplest solution would be to cast to float64, however, this might give memory problems unless Dask is used properly.

Start using pyEMsoft

Is your feature request related to a problem? Please describe.
EMsoft's wealth of utility functions in Fortran-90 are available via Python wrappers. See https://github.com/EMsoft-org/EMsoft/tree/develop/Source/pyEMsoft and https://github.com/EMsoft-org/EMsoft/wiki/python-examples. We should start to take advantage of this.

This issue is opened to facilitate a discussion of (or just my thoughts on) pyEMsoft and how we can use it.

Describe the solution you'd like
We could already start using the routines available in crystal.f90 (https://github.com/EMsoft-org/EMsoft/blob/develop/Source/pyEMsoft/docs/pyEMsoft.rst) for creating phases and including them in the metadata.

Describe alternatives you've considered
It might be benefitial to rely on pyEMsoft instead of diffpy.structure (as pyXem does) or pymatgen for crystal structures and phases, since users then only need to be familiar with one way of doing things.

Additional context
Another benefit of starting to use pyEMsoft now is that it will hopefully be easier to use other parts of EMsoft's functionality when they become available via Python wrappers.

Write dask arrays to NORDIF binary file

dask.array does not have a tofile() function as numpy has, so calling

for pattern in signal._iterate_signal():
    pattern.flatten().tofile(f)

as is done in file_writer() in nordif.py does not work. Calling compute() on each pattern before calling tofile() is not feasible.

A NORDIF .dat file is one long byte string, so using HyperSpy's block_iterator() can only be used if chunks comply with this requirement (chunks with the full horizontal row of patterns, however many rows can be in the same chunk [I think]).

It might be that dask.array.store can write to a binary file? Will look into this.

Lazy functionality

Problem: When calling ElectronBackscatterDiffraction(s) on a LazySignal loaded with hs.load(file, lazy=True), the signal is cast to Signal2D and thus written to memory.

pyXem has its own load function, will look into loading signals as they do.

Methods to calculate pattern centre following approach by Hjelen et al. (moving screen)

Is your feature request related to a problem? Please describe.
Dictionary indexing with EMsoft requires knowledge of an average pattern centre (xpc, ypc, zpc) for the data set (see e.g. https://link.springer.com/article/10.1007/s40192-019-00137-4). If two patterns with the same Kikuchi bands are acquired with a known increase in the specimen-scintillator distance between the two, (xpc, ypc) can be determined following an approach by Hjelen et al. (http://dx.doi.org/10.1155/TSM.20.29 and https://doi.org/10.1016/0739-6260(91)90128-M).

Describe the solution you'd like

  1. Two patterns of high quality many pixels (typically min. 480x480 px) with the same Kikuchi bands are loaded.
  2. Regions of e.g. (60x60) px with high contrast are automatically found in one of the patterns.
  3. The same regions are found in the other pattern.
  4. Two lines are drawn between two of these regions (see linked papers above), and the intersection between these two is (xpc, ypc).
  5. By calculating the ratio of the distance between the two regions in each pattern, zpc can also be determined.

Ideally, a confidence/error in the choice of (xpc, ypc) should be provided.

Plugin for patterns stored as images in a directory

We need a reader/writer for patterns stored as images in a directory, with names explaining where in the scan they are situated, like "x0y0.jpg", "x0y86.jpg" etc. if the step size is 86 nm. The only necessary user input should be a filename prefix/postfix, a step size and the image extension/format. All experimental parameters etc. must of course be added to the signal after reading.

To do

  • Virtual imaging
  • Separate signal from noise in decomposition components (factors, loadings or both)
  • Improve HyperSpy's function get_decomposition_model() for large datasets (perhaps lazily?)
  • Nearest pattern averaging using different footprints (i.e. neighbours)
    • Set number of nearest neighbours
    • Improve speed of gaussian filter (called via map() atm)
  • Lazy functionality
    • Write dask arrays to NORDIF .dat files efficiently
  • Correct static background
    • Using phase dependent background pattern

Let's update this list as we go along.

I'm working on the task(s) in bold.

Choose "semantic branching model" as release branching model

Is your feature request related to a problem? Please describe.
We need a branching model which eases releasing new features and patches.

Describe the solution you'd like
After considering HyperSpy's three RELEASE_next_major/minor/patch branches, and the branching model of pyXem/scikit-image/etc. with a master and version branches, I like a version of the latter: https://dev-cafe.github.io/branching-model/. Everything is nicely explained in that link, with a link at the start to semver.org. In short, from the link above:

Branch semantics
1. master collects changes towards the next major release.
2. release/X.Y branches collect changes towards the next minor release.
3. release/X.Y.Z branches collect changes towards the next patch release.
4. New features are directed either towards master or towards release/X.Y branches.
5. Patch release branches release/X.Y.Z never receive new features, they only receive bugfixes.

Bugfixes
1. Bugfixes can be directed either towards release/X.Y.Z, or release/X.Y, or master, depending on the intent.
2. Bugfixes directed towards release/X.Y.Z require to bump the patch number, signalled by creating a new tag.
3. Important bugfixes in a given release/X.Y.Z can, if necessary, be ported to release/X.Y and further to master by merging.
4. Important bugfixes in master can, if necessary, be ported to release/X.Y and release/X.Y.Z by cherry picking.

The above link should be included in the contributing guide, along with explanations of which branch to branch off of for features/bugfixes for contributors.

EMsoft plugin

I've started using Marc DeGraef's EMsoft to simulate dynamical EBSPs with a plan to use EMsoft's dictionary indexing to index EBSPs. As far as I can tell, all output files from EMsoft are in a HDF5 file format, so it is easy to visualise in Python, see e.g. the nice master patterns from the EMsoft wiki. DeGraef mentions in the the latest ReadMe.md:

Some of our developers told us they have been working on python wrappers for EMsoft !!! If they are willing to make these available to us, then we will make sure they become part of one of the next releases.

But our use can be complementary to theirs.

Initial ideas:

  • Plot simulated patterns before indexing to assess quality, or differences between simulated master patterns if the input parameters were changed slightly
  • Plot the 2-3 best matching patterns (5D dataset) alongside the original patterns to assess how successful the indexing was.

I think classes for the different data (master pattern, distribution of BSEs on detector after Monte Carlo simulations, simulated dynamic EBSPs) is the best approach. Will look into this after a successful dictionary indexing.

Do not overwrite scan_size in NORDIF reader

Describe the bug
scan_size is overwritten in the NORDIF file_reader() method.

To Reproduce

import kikuchipy as kp
s = kp.load('/path/to/Pattern.dat', scan_size=(3, 3))
print(s.shape)  # Will not return (3, 3) patterns if other info is in Setting.txt file

Expected behavior
Patterns in NORDIF Pattern.dat file should be reshaped into a (3, 3) navigation dimension, even though the Setting.txt file says something else.

Fixed width of images in documentation

Describe the bug
Images in the documentation get stretched (ugly!) when the viewport width is reduced, e.g. reducing browser width or in upright mode on phones. Pretty sure this is due to the use of Sphinx' scale:

.. figure:: _static/image/background_correction/dynamic_correction.jpg
    :align: center
    :scale: 50%

To Reproduce
Steps to reproduce the behavior:

  1. Go to https://kikuchipy.readthedocs.io/en/latest/background_correction.html
  2. Reduce browser width, and...
  3. Behold the ugliness

Expected behavior
Width's should be fixed, e.g. as in HyperSpy's documentation (http://hyperspy.org/hyperspy-doc/current/user_guide/signal2d.html).

Screenshots
These were acquired with the same browser width...

Ugly!
Screenshot_2019-12-29 Background correction — KikuchiPy 0 1 0 documentation

Nice!
Screenshot_2019-12-29 Signal2D Tools — HyperSpy 1 5 2 documentation

Reader/writer for Oxford Instruments' HDF5 file specification

Is your feature request related to a problem? Please describe.
Should have a reader/writer of patterns stored in Oxford Instruments' HDF5 file specification (https://github.com/oinanoanalysis/h5oina), H5OINA, based on the h5ebsd format.

Describe the solution you'd like
We should first create a reader following the above linked specification. If any changes are made to the specification in the future, these should be incorporated as appropriate. A writer might not be strictly necessary, until other softwares appart from Oxford's implement a similar reader.

Describe alternatives you've considered
The reader/writer should be implemented as part of the existing h5ebsd reader, which already reads Bruker Nano's and EDAX TSL's HDF5 file formats. It would be best to get a hold of a H5OINA file to ensure that the reader is correct.

Average patterns with adjacent patterns to reduce noise

Another way of reducing noise in patterns than creating model patterns from decomposition factors is by simply averaging a pattern with the adjacent patterns. This can be done by e.g. using scipy.ndimage.filters.generic_filter and passing numpy.mean as the function to apply to each pattern. There are (at least) two problems to address:

  1. We cannot naively use HyperSpy's map() since we have to also access the adjacent patterns
  2. If adjacent patterns are too different from the pattern to average we will increase the noise! This could be tackled by only averaging with patterns that have a low 'pattern difference', as is explained by Wright et al. (https://www.sciencedirect.com/science/article/pii/S0304399114001946). (Implementing this might be a lot of work? I don't know. Should speak to de Graef in May, since it's his work they refer to in the article.)

A s.average_patterns(num_neighbours=1) function would be nice!

Register as hyperspy extension

Is your feature request related to a problem? Please describe.
We should register KikuchiPy as a HyperSpy extension following their user guide (http://hyperspy.org/hyperspy-doc/current/dev_guide/writing_extensions.html#registering-extensions) and e.g. how pyXem does it (https://github.com/pyxem/pyxem/blob/master/pyxem/hyperspy_extension.yaml).

Describe the solution you'd like
Adding a hyperspy_extension.yaml file in kikuchipy/kikuchipy listing all new HyperSpy objects and add entry_points={'hyperspy.extensions': 'your_package_name = your_package_name'} in setup.py.

Additional context
Doing this might raise awareness of KikuchiPy, both so that it is used more and so to avoid duplication of efforts.

Adaptive histogram equalization of patterns after background correction

So s.remove_background() calls kikuchipy.utils.expt_utils.rescale_pattern_intensity which stretches the contrast in patterns 'locally' (each pattern to [0, 255]) or 'globally' (min. and max. intensity in pattern stack to [0, 255]). The latter maintains relative intensities.

I was initially hesitant to also apply so-called adaptive histogram equalization (http://scikit-image.org/docs/dev/auto_examples/color_exposure/plot_equalize.html), which equalizes intensities within tiles/windows in each pattern to [0, 255], because:

However, I see that De Graef and co-workers (https://doi.org/10.2138/am-2017-6062) use adaptive histogram equalization to great effect... Ergo, this is useful. I see three possibilities:

  1. Just call it via map() and we don't implement anything
  2. We write a dedicated function taking from scikit-image to do just what we need and nothing more
  3. We make this an alternative in s.remove_background(), so only equalize_adapthist is called by us (which again calls rescale_intensity)?

Radon transform

The benefit of having a robust radon transform of patterns is clear. If not for detecting bands (I cannot so far see the benefit of developing or implementing an indexing algorithm in KikuchiPy), than perhaps for background removal like (Schwarzer and Sukkau, 2012: https://www.sciencedirect.com/science/article/pii/S0167865511004260)? Virtual imaging in the radon space (@tinabe's idea) might also prove useful.

The work should build upon the work already done in the radon-transform branch.

Decomposition changes the signal type. If changing to EBSD signal afterwards, learning results are not found.

After performing decomposition on an EBSD signal, the signal type changes to Signal2D. Then the methods for EBSD signals are not available. If changing the signal to EBSD after the decomposition, the learning results are not found, and methods requiring learining results cannot be used. This is an issue, since then it is not possible to utilize e.g. the classify_decomposition_components() method.

s
<EBSD, title: , dimensions: (10, 10|240, 240)>
s.decomposition(algorithm='svd', output_dimension=10)
s
<Signal2D, title: , dimensions: (10, 10|240, 240)>
s = kp.EBSD(s)
components = s.classify_decomposition_components()
<ValueError: No learning results were found.>

Calculate similarity between pattern and dictionary

EBSD method to calculate similarity between each pattern in a scan to a dictionary (preferably with a mask). Might be faster to do this as EMsoft does it, i.e. construct 2D matrices e.g. of 1024 experimental and simulated patterns, and multiplying these to collect dot products, continuously sorting them for each experimental pattern. Should perhaps start with a naive implementation an go from there.

Should be careful not to duplicate work done in pyXem: https://github.com/pyxem/pyxem/blob/master/pyxem/utils/indexation_utils.py#L44

Similarity metrics we should support:

This will be slow, however, let's start with the correct, naive implementation first and improve from there.

Have to wait for #15.

To do:

  • Naive implementation
  • With mask
  • Test
  • Documentation

Split chip background correction

Is your feature request related to a problem? Please describe.

Performing background corrections using one full static/dynamic background pattern on patterns acquired with detectors with unbalanced exposure in the two CCD chip halves can have undesirable effects, as discussed by @wiai in the docs of aloe/xcdskd (https://xcdskd.readthedocs.io/en/latest/pattern_processing/kikuchi_pattern_processing.html#Examples-of-Detector-Effects).

Describe the solution you'd like

@tmcaul has a solution for the dynamic background pattern in https://github.com/tmcaul/ebspy here: https://github.com/tmcaul/ebspy/blob/master/ebspy/ebspy.py#L185. He has generously said we can implement his solution, which was first implemented in AstroEBSD (https://github.com/benjaminbritton/AstroEBSD/blob/b9a080648d7cda8977803a78c6749f56992b134d/bin/EBSP_BGCor.m#L145). Both ebspy and AstroEBSD should be cited in the docstring and user guide.

How we will do it for the static correction, I haven't thought about yet.

Should assign chunks to EDAX/Bruker h5ebsd pattern data sets before read with Dask

Description

Patterns stored in EDAX and Bruker's h5ebsd files are not chunked (at least the ones I've come across), thus when trying to read lazily, this error message is produced:

Traceback (most recent call last):
  File "/home/hakon/miniconda3/envs/kp-test/lib/python3.7/site-packages/IPython/core/interactiveshell.py", line 2712, in safe_execfile
    self.compile if shell_futures else None)
  File "/home/hakon/miniconda3/envs/kp-test/lib/python3.7/site-packages/IPython/utils/py3compat.py", line 168, in execfile
    exec(compiler(f.read(), fname, 'exec'), glob, loc)
  File "/home/hakon/.PyCharmCE2019.3/config/scratches/kp_test.py", line 18, in <module>
    s = kp.load(data, lazy=True)
  File "/home/hakon/kode/kikuchipy/kikuchipy/io/_io.py", line 77, in load
    scan_dicts = reader.file_reader(filename, lazy=lazy, **kwargs)
  File "/home/hakon/kode/kikuchipy/kikuchipy/io/plugins/h5ebsd.py", line 139, in file_reader
    scan_dict_list.append(h5ebsd2signaldict(scan, man, ver, lazy=lazy))
  File "/home/hakon/kode/kikuchipy/kikuchipy/io/plugins/h5ebsd.py", line 315, in h5ebsd2signaldict
    data = da.from_array(data_dset, chunks=data_dset.chunks)
  File "/home/hakon/miniconda3/envs/kp-test/lib/python3.7/site-packages/dask/array/core.py", line 2725, in from_array
    chunks, x.shape, dtype=x.dtype, previous_chunks=previous_chunks
  File "/home/hakon/miniconda3/envs/kp-test/lib/python3.7/site-packages/dask/array/core.py", line 2402, in normalize_chunks
    raise ValueError(CHUNKS_NONE_ERROR_MESSAGE)
ValueError: You must specify a chunks= keyword argument.
This specifies the chunksize of your array blocks.

The relevant lines are (as evident from the traceback) in h5ebsd.py:

    if lazy:
        data = da.from_array(data_dset, chunks=data_dset.chunks)

Way to reproduce

>>> import kikuchipy as kp
>>> s = kp.load('/some/edax/or/bruker/h5ebsd/file', lazy=True)
[...] # See above error message
ValueError: You must specify a chunks= keyword argument.
This specifies the chunksize of your array blocks.

Version information

Not relevant.

Expected behaviour

Should of course assign chunks to data sets if they don't have them already...

In addition to fix this, tests reading lazily from EDAX and Bruker h5ebsd files must be added.

Calculate pattern similarity between two patterns

Useful e.g. when deciding with which adjacent patterns to average each pattern, to avoid averaging across a grain boundary, as per the discussion in #13. A pattern difference map in itself is rich in information, and is comparable to e.g. a kernel average misorientation map or grain boundary map.

The 'pattern difference' procedure as explained by Wright et al. (https://doi.org/10.1016/j.ultramic.2014.10.002):

  1. Convert each pattern to a column vector.
  2. Subtract the average intensity of each pattern from each vector component.
  3. Normalise the column vector.
  4. Calculate difference between two patterns by calculating the dot product between the two normalised column vectors.
  5. Subtract the resulting difference value from 1 so that 0 indicates no difference and 1 represents the maximum difference.
  6. Average the resulting eight difference values for each pattern to its eight adjacent patterns to obtain final difference value

Edit

This is similar to obtaining the normalized cross-correlation (NCC) coeffienct or normalized dot product (NDP). We should have general implementations of both. Should implement them together since they are mainly used for the same puproses. Implementations should be so that they can be used in multiple applications in KikuchiPy. See #116 and #117 for example uses.

To do:

Reader for patterns in Bruker's composite file (.bcf) format

Is your feature request related to a problem? Please describe.
Would be nice to be able to reader patterns from Bruker's composite file (.bcf) format.

Describe the solution you'd like
An IO plugin like for the NORDIF and h5ebsd Bruker and KikuchiPy formats. At least the patterns, but should also read the metadata.

Describe alternatives you've considered
HyperSpy already has a reader for Bruker's .bcf file format containing EDS data, should look into how the file is read there. Relevant links:

Calculate similarity between pattern in a scan and its neighbours

EBSD method to calculate similarity between each pattern in a scan (preferably with mask, most likely this feature will be implemented later) and its neighbours. Which neighbours should preferably be up to the user to decide, should start with 1st, 2nd etc. in either a cross or square. This will be similar to EMsofts average dot product (ADP) map (their implementation https://github.com/EMsoft-org/EMsoft/blob/7762e1961508fe3e71d4702620764ceb98a78b9e/Source/DictionaryIndexing/EMgetADP.f90#L360). The coefficient map can then be used e.g. to look at (how nice!), or use as input to a pattern averaging method so the similarity can be used as a weight in averaging.

Similarity metrics we should support:

This will be slow, however, let's start with the correct, naive implementation first and improve from there.

Have to wait for #15.

To do:

  • Naive implementation
  • With mask
  • Test
  • Documentation

Example Jupyter notebook

Make a simple Jupyter notebook showcasing the functionality available in KikuchiPy:

  • Reading data (both lazily and normally)
  • Setting experimental parameters
  • Static and dynamic background corrections
  • Adaptive histogram equalization
  • Normalised correlation coefficient (comparing experimental and simulated pattern)
  • Statistical decomposition (PCA and IPCA)
  • Writing data

h5ebsd plugin

We should create a reader/writer plugin for the standard .h5ebsd HDF5 format reported by Jackson et al.. According to EMsoft's dictionary indexing manual this format is used by them and also EDAX/TSL. It would be good to think of this when implementing readers for varias EMsoft data #25.

HyperSpy extension file not installed from source distribution on PyPI

Describe the bug
The HyperSpy extension YAML file hyperspy_extension.yaml is not included when installing from the source distribution on PyPI for v0.1.1. Thus, the following won't work:

>>> import hyperspy.api as hs
>>> import numpy as np
>>> s = hs.signals.Signal2D(np.zeros((10, 10, 10, 10)))
>>> s.set_signal_type('EBSD')

To Reproduce

>>> import kikuchipy as kp
Failed to load hyperspy extension from kikuchipy. Please report this issue to the
kikuchipy developers

Expected behavior
The extension file should be included in site-packages/kikuchipy/ when installing with pip.

Additional context
With this fixed, hopefully, tests launched from the PR to the conda-forge/staged-recipies repo will pass (conda-forge/staged-recipes#10523).

Use dask's map_blocks to get principal components and loadings with IncrementalPCA

Describe the bug
Dask throws this warning when iterating over the data matrix using IncrementalPCA.partial_fit(X[start:end])

FutureWarning: The `numpy.may_share_memory` function is not implemented by
Dask array. You may want to use the da.map_blocks function or something similar
to silence this warning.  Your code may stop working in a future release.

So, we should update the IncrementalPCA object by calling da.map_blocks on appropriate chunks. We loose the tqdm progressbar, but can use Dask's own instead. Should put some description for the progressbar, since two will pop up from getting factors and then loadings.

To Reproduce
Perform decomposition on lazy signal with s.decomposition(algorithm='IPCA', output_dimension=1).

Correct spatial distortions in EBSD scans relative to a reference following TrueEBSD

Is your feature request related to a problem? Please describe.
Maps acquired at high stage tilt and slow scan speed (most scans?) can suffer from tilt and drift distortions.

Describe the solution you'd like
Loading reference images with a feature recognised in a virtual image or orientation map derived from the distorted scan... And do as Tong and Britton explains in their paper.

Describe alternatives you've considered
They only correct orientation maps afterward... If a distortion field is obtained, could this be applied in some way to experimental patterns, e.g. using averaging to some extent?

Additional context
Source is the above referenced paper: https://arxiv.org/pdf/1909.00347.pdf

Support Python 3.8

Is your feature request related to a problem? Please describe.

Should support Python 3.8 when the conda-forge Python 3.8 migration is done.

Describe the solution you'd like

Test in builds on Travis CI.

Additional context

hyperspy/hyperspy#2271

Utility function to calculate detector pixel size needed for dictionary indexing with EMsoft

Is your feature request related to a problem? Please describe.
Detector pixel size is needed for dictionary indexing with EMsoft (see e.g. https://link.springer.com/article/10.1007/s40192-019-00137-4).

Describe the solution you'd like
Convenience function to measure the detector pixel size from features with a known distance in an "EBSD pattern" (image). This can e.g. be a mm grid.

Describe alternatives you've considered
It can be argued that this might fall outside of what problems KikuchiPy's should solve, however using HyperSpy has some nice measurement tools etc. to do this... I myself have a notebook (https://nbviewer.jupyter.org/github/hwagit/detector-pixel-size-ebsd-camera/blob/master/detector_pixel_size.ipynb) to determine pixel size "manually" from an image of a mm grid, but it would be nice to have this more automated an reproducible via KikuchiPy.

Set version requirement on dask to 2.9 to avoid having to pass chunks to from_array

Description

Before dask version 2.8.1 (see their changelog), chunks was a required parameter (now the default is chunks='auto'). However, chunks is not passed internally in static_background_correction(), leading to TypeError.

Solution: Set dask >= 2.8.1 in setup.py to fix this.

Way to reproduce

>>> import kikuchipy as kp
>>> s = kp.load('/some/data/with/background/Pattern.dat')
>>> s.static_background_correction()
static_background_correction(self, operation, relative, static_bg)
    404                 ebsd_node = kp.util.io.metadata_nodes(sem=False)
    405                 static_bg = da.from_array(
--> 406                     md.get_item(ebsd_node + ".static_background")
    407                 )
    408             except AttributeError:

TypeError: from_array() missing 1 required positional argument: 'chunks'

Add phases in metadata

Functionality:

  • Read phase information from h5ebsd files etc.
  • Change phase information of file
  • Write phase information to h5ebsd file

Lazy attribute not set to False when calling compute()

Describe the bug
The signal._lazy attribute is not set to False when calling signal.compute() and reading signal data into memory. Although the class is changed from LazyEBSD to EBSD, the EBSD init() function is not called, as is done by e.g. pyXem here... Since the metadata isn't any different between the two classes, I think just setting signal._lazy = False in compute() is OK.

To reproduce/expected behaviour

s = kp.load('/some/data.h5', lazy=True)
s.compute()
print(s._lazy)  # Should print False, will still print True

Saving of learning results only after lazy decomposition raises MemoryError

When performing lazy decomposition it should be possible to only save the learning results. However, when I performed a lazy decomposition on a 15 GB float32 (I have 16 GB RAM) dataset using IncrementalPCA and tried this, a MemoryError was raised. The learning results were around only 600 MB...

Have to look into HyperSpy's s.learning_results.save() to see where unecessary stuff was written to memory.

Create necessary files when export to NORDIF binary format

Is your feature request related to a problem? Please describe.
So far, writing to the NORDIF binary format just creates a Pattern.dat binary file. However, all relevant files created by the NORDIF acquisition software should be created upon calling save().

Describe the solution you'd like
What also should be created upon save() are:

  • Setting file named Setting.txt with relevant information from metadata
  • Static background pattern named Background acquisition.bmp. Unless the user supplies this themselves, we should, in prioritised order: (1) use the one in signal metadata (bin if necessary), (2) create it the average of all patterns with s.mean().
  • Some "calibration" patterns. These could just be patterns from the scan. They would typically be passed as a list of tuples [(x1, y1), (x2, y2), ...] to save(). These should be bmp files.
  • An image of the region of interest named Area ROI.bmp. This then would typically be a virtual backscatter electron image. We can be passed to save(), but we should have a default if not supplied.

Describe alternatives you've considered
A program that updates the "Number of samples" etc. is available here: https://github.com/hwagit/nordif2hdf5/blob/master/update_nordif_setting_file.py

Add testing

Oh boy, we cannot wait too long before writing tests for it all. Shouldn't be too much work if we just look at what pyXem and HyperSpy does.

Recommend Projects

  • React photo React

    A declarative, efficient, and flexible JavaScript library for building user interfaces.

  • Vue.js photo Vue.js

    🖖 Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.

  • Typescript photo Typescript

    TypeScript is a superset of JavaScript that compiles to clean JavaScript output.

  • TensorFlow photo TensorFlow

    An Open Source Machine Learning Framework for Everyone

  • Django photo Django

    The Web framework for perfectionists with deadlines.

  • D3 photo D3

    Bring data to life with SVG, Canvas and HTML. 📊📈🎉

Recommend Topics

  • javascript

    JavaScript (JS) is a lightweight interpreted programming language with first-class functions.

  • web

    Some thing interesting about web. New door for the world.

  • server

    A server is a program made to process requests and deliver data to clients.

  • Machine learning

    Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.

  • Game

    Some thing interesting about game, make everyone happy.

Recommend Org

  • Facebook photo Facebook

    We are working to build community through open source technology. NB: members must have two-factor auth.

  • Microsoft photo Microsoft

    Open source projects and samples from Microsoft.

  • Google photo Google

    Google ❤️ Open Source for everyone.

  • D3 photo D3

    Data-Driven Documents codes.