Git Product home page Git Product logo

wannier-berri's Introduction

Wannier Berri

https://codecov.io/gh/wannier-berri/wannier-berri/branch/master/graph/badge.svg?token=S7CH32VXRP

A code for highly efficient Wannier interpolation.

Evaluation of k-space integrals of Berry curvature, orbital moment and derived quantities by means of MLWFs or tight-binding models. Compared to postw90.x part of Wannier90 code, it has extended functional and improved performance

Web page

http://wannier-berri.org

Mailing list:

To subscribe please send an email to [email protected] with the subject subscribe wannier-berri Firstname Lastname or visit the list homepage https://physik.lists.uzh.ch/sympa/info/wannier-berri

This code is intended for highly-efficient wannier interpolation. Being initially an analog of postw90.x part of Wannier90 code, it has extended functional and improved performance.

Improved performance and accuracy:

Wannier-Berri calculates Brillouin zone integrals very fast with high precision over an ultradense k-grid. This is achieved due to :

  • Using Fast Fourier Transform
  • account of symmetries, to reduce integration to irreducible part of the Brillouin zone
  • recursive adaptive refinement algorithm
  • optimized Fermi level scan
  • optimized minimal distanse replica method (use_ws_distance)

Implemented functionality:

  • Anomalous Hall conductivity
  • Orbital magnetization (modern theory)
  • Ohmic conductivity
  • Berry curvature dipole
  • gyrotropic magnetoelectric effect
  • Hall effect
  • Low-Field Hall effect

Other features:

  • Object oriented structure also makes it potentially easier to implement further features.
  • Calculations may also be performed for any tight-binding model, for which a "_tb.dat" file was generated in watever way.
  • WannierBerri can run in parallel by means of multiprocessing module

Installation

pip3 install wannierberri

Author

Stepan Tsirkin, University of Zurich

License

The code is distributed under the terms of GNU GENERAL PUBLIC LICENSE Version 2, the same as Wannier90

Acknowledgements

The code was inspired by the Wannier90 Fortran code: http://www.wannier.org/ , https://github.com/wannier-developers/wannier90 . Some parts of the code are an adapted translation of postw90 code.

I acknowledge Ivo Souza for a useful discussion.

wannier-berri's People

Contributors

jaemolihm avatar julen-ia avatar liu-xiaoxiong avatar ma-jimenez avatar manxkim avatar minkyu-p avatar patrick-lenggenhager avatar philipp-eck avatar stepan-tsirkin avatar stepats avatar tomusht avatar

Stargazers

 avatar  avatar

Watchers

 avatar

wannier-berri's Issues

Optimization of einsum in shc_B_H

see discussion in wannier-berri#22 (comment)

Optimization implemented in branch shc_opt.
https://github.com/jaemolihm/wannier-berri/tree/shc_opt

I used num_wann=88, number of R points = 1163, SHCqiao=True, len(Efermi)=1, len(omega)=1, NKdiv=[1 1 1], and NKFFT=[10 10 10].
I used kernprof to profile the code.

If there are ~100 Efermi or omega, kubo_sum_elements becomes the bottleneck.
Note that kubo_sum_elements scales as O(N_W^2) while shc_B_H scales as O(N_W^3). So, speed of shc_B_H becomes more important when N_W is larger.

Summary of results

  • Optimization of einsum reduces the time taken in shc_B_H is from 243s to 82s, so it is very effective.
  • For the einsum lines, the speedup is from 180s to 20s.
  • After optimization, _R_to_k_H of SH_R, SR_R and SHR_R takes ~57% of the total time. Optimized einsum takes ~17%.
  • Inside _shc_B_H_einsum_opt, most (~90%) of the time is spent in the actual matrix multiplication.
  • The Pt-opt_SHCqiao_iter-0000.dat output before and after optimization was identical.

Optimized code (part of __Data_K.py)

    def _shc_B_H_einsum_opt(self, C, A, B):
        # Optimized version of C += np.einsum('knlc,klma->knmac', A, B)
        nw = self.num_wann
        for ik in range(C.shape[0]):
            # C[ik] += np.einsum('nlc,lma->nmac', A[ik], B[ik])
            tmp_a = np.swapaxes(A[ik], 1, 2) # nlc -> ncl
            tmp_a = np.reshape(tmp_a, (nw*3, nw)) # ncl -> (nc)l
            tmp_b = np.reshape(B[ik], (nw, nw*3)) # lma -> l(ma)
            tmp_c = tmp_a @ tmp_b # (nc)l, l(ma) -> (nc)(ma)
            tmp_c = np.reshape(tmp_c, (nw, 3, nw, 3)) # (nc)(ma) -> ncma
            C[ik] += np.transpose(tmp_c, (0, 2, 3, 1)) # ncma -> nmac

    @lazy_property.LazyProperty
    @profile
    def shc_B_H(self):
        SH_H = self._R_to_k_H(self.SH_R.copy(), hermitian=False)
        shc_K_H = -1j*self._R_to_k_H(self.SR_R.copy(), hermitian=False)
        # shc_K_H += np.einsum('knlc,klma->knmac', self.S_H, self.D_H)
        self._shc_B_H_einsum_opt(shc_K_H, self.S_H, self.D_H)
        shc_L_H = -1j*self._R_to_k_H(self.SHR_R.copy(), hermitian=False)
        # shc_L_H += np.einsum('knlc,klma->knmac', SH_H, self.D_H)
        self._shc_B_H_einsum_opt(shc_L_H, SH_H, self.D_H)
        return (self.delE_K[:,np.newaxis,:,:,np.newaxis]*self.S_H[:,:,:,np.newaxis,:] +
            self.E_K[:,np.newaxis,:,np.newaxis,np.newaxis]*shc_K_H[:,:,:,:,:] - shc_L_H)

Before optimization:

Total time: 81.6478 s
File: /group2/jmlim/wberri/wannier-berri/wannierberri/__Data_K.py
Function: shc_B_H at line 729

Line #      Hits         Time  Per Hit   % Time  Line Contents
==============================================================
   729                                               @lazy_property.LazyProperty
   730                                               @profile
   731                                               def shc_B_H(self):
   732         1   10723583.0 10723583.0     13.1          SH_H = self._R_to_k_H(self.SH_R.copy(), hermitian=False)
   733         1   28848662.0 28848662.0     35.3          shc_K_H = -1j*self._R_to_k_H(self.SR_R.copy(), hermitian=False)
   734                                                   # shc_K_H += np.einsum('knlc,klma->knmac', self.S_H, self.D_H)
   735         1   11979879.0 11979879.0     14.7          self._shc_B_H_einsum_opt(shc_K_H, self.S_H, self.D_H)
   736         1   20550098.0 20550098.0     25.2          shc_L_H = -1j*self._R_to_k_H(self.SHR_R.copy(), hermitian=False)
   737                                                   # shc_L_H += np.einsum('knlc,klma->knmac', SH_H, self.D_H)
   738         1    6040757.0 6040757.0      7.4          self._shc_B_H_einsum_opt(shc_L_H, SH_H, self.D_H)
   739                                                   return (self.delE_K[:,np.newaxis,:,:,np.newaxis]*self.S_H[:,:,:,np.newaxis,:] +
   740         1    3504838.0 3504838.0      4.3              self.E_K[:,np.newaxis,:,np.newaxis,np.newaxis]*shc_K_H[:,:,:,:,:] - shc_L_H)

Total time: 106.156 s
File: /group2/jmlim/wberri/wannier-berri/wannierberri/__kubo.py
Function: opt_conductivity at line 97

After optimization:

Total time: 242.711 s
File: /group2/jmlim/wberri/wannier-berri/wannierberri/__Data_K.py
Function: shc_B_H at line 729

Line #      Hits         Time  Per Hit   % Time  Line Contents
==============================================================
   729                                               @lazy_property.LazyProperty
   730                                               @profile
   731                                               def shc_B_H(self):
   732         1    5439620.0 5439620.0      2.2          SH_H = self._R_to_k_H(self.SH_R.copy(), hermitian=False)
   733         1   20369183.0 20369183.0      8.4          shc_K_H = -1j*self._R_to_k_H(self.SR_R.copy(), hermitian=False)
   734         1  117323520.0 117323520.0     48.3          shc_K_H += np.einsum('knlc,klma->knmac', self.S_H, self.D_H)
   735                                                   # self._shc_B_H_einsum_opt(shc_K_H, self.S_H, self.D_H)
   736         1   21178733.0 21178733.0      8.7          shc_L_H = -1j*self._R_to_k_H(self.SHR_R.copy(), hermitian=False)
   737         1   75139695.0 75139695.0     31.0          shc_L_H += np.einsum('knlc,klma->knmac', SH_H, self.D_H)
   738                                                   # self._shc_B_H_einsum_opt(shc_L_H, SH_H, self.D_H)
   739                                                   return (self.delE_K[:,np.newaxis,:,:,np.newaxis]*self.S_H[:,:,:,np.newaxis,:] +
   740         1    3260184.0 3260184.0      1.3              self.E_K[:,np.newaxis,:,np.newaxis,np.newaxis]*shc_K_H[:,:,:,:,:] - shc_L_H)


Total time: 265.819 s
File: /group2/jmlim/wberri/wannier-berri/wannierberri/__kubo.py
Function: opt_conductivity at line 97

Other things that I tried

  • Using einsum_path
    To my understanding, einsum_path is useful when there are multiple paths which have different computational costs. However, in this case, only a single path was found.
>>> a = np.random.rand(1000, 40, 40, 3)
>>> b = np.random.rand(1000, 40, 40, 3)
>>> o = np.einsum_path('knlc,klma->knmac', a, b)
>>> print(o[0])
['einsum_path', (0, 1)]
>>> print(o[1])
  Complete contraction:  knlc,klma->knmac
         Naive scaling:  6
     Optimized scaling:  6
      Naive FLOP count:  1.152e+09
  Optimized FLOP count:  1.152e+09
   Theoretical speedup:  1.000
  Largest intermediate:  1.440e+07 elements
--------------------------------------------------------------------------
scaling                  current                                remaining
--------------------------------------------------------------------------
   6            klma,knlc->knmac                             knmac->knmac
  • factoring out the 'k' index from the einsum call
    I tried a for loop for 'k' index and an einsum call inside it, but there was no speedup.

Recommend Projects

  • React photo React

    A declarative, efficient, and flexible JavaScript library for building user interfaces.

  • Vue.js photo Vue.js

    ๐Ÿ–– Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.

  • Typescript photo Typescript

    TypeScript is a superset of JavaScript that compiles to clean JavaScript output.

  • TensorFlow photo TensorFlow

    An Open Source Machine Learning Framework for Everyone

  • Django photo Django

    The Web framework for perfectionists with deadlines.

  • D3 photo D3

    Bring data to life with SVG, Canvas and HTML. ๐Ÿ“Š๐Ÿ“ˆ๐ŸŽ‰

Recommend Topics

  • javascript

    JavaScript (JS) is a lightweight interpreted programming language with first-class functions.

  • web

    Some thing interesting about web. New door for the world.

  • server

    A server is a program made to process requests and deliver data to clients.

  • Machine learning

    Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.

  • Game

    Some thing interesting about game, make everyone happy.

Recommend Org

  • Facebook photo Facebook

    We are working to build community through open source technology. NB: members must have two-factor auth.

  • Microsoft photo Microsoft

    Open source projects and samples from Microsoft.

  • Google photo Google

    Google โค๏ธ Open Source for everyone.

  • D3 photo D3

    Data-Driven Documents codes.