Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Improve get_morb_R #3

Closed
ponweist opened this issue Jul 29, 2014 · 4 comments
Closed

Improve get_morb_R #3

ponweist opened this issue Jul 29, 2014 · 4 comments

Comments

@ponweist
Copy link
Owner

In get_CC_R (get_oper.F90, lines 781ff.), the following matrix product is done inefficiently:

                ! Transform to projected subspace, Wannier gauge
                !
                H_qb1_q_qb2(:,:)=cmplx_0
                do m=1,num_wann
                   do n=1,num_wann
                      do i=1,num_states(qb1)
                         ii=winmin_qb1+i-1
                         do j=1,num_states(qb2)
                            jj=winmin_qb2+j-1
                            H_qb1_q_qb2(n,m)=H_qb1_q_qb2(n,m)&
                                 +conjg(v_matrix(i,n,qb1))&
                                 *Ho_qb1_q_qb2(ii,jj)&
                                 *v_matrix(j,m,qb2)
                         enddo
                      enddo
                   enddo
                enddo

A similar improvement as for get_AA_R (see #2) needs to be done.

@ponweist
Copy link
Owner Author

Note that the critical code section has been duplicated to get_morb_R (get_oper.F90, lines 1006ff.)

This is the current trace (16sm case, 32 processes, all berry tasks enabled, kpath and kslice disabled):
trace-iss3

@ponweist ponweist changed the title Improve get_CC_R Improve get_morb_R Aug 5, 2014
@ponweist
Copy link
Owner Author

ponweist commented Aug 5, 2014

Performance analysis for 16sm case running on 32 processes with the following parameters:

kpath = F
kslice = F

berry = T
berry_task = ahc,morb,kubo
berry_kmesh = 32 32 32

New trace:
trace-iss3-fix

Performance (in CPU cycles) improvement relative to previous code version:

Routine Previous Current Speedup factor
berry_main 1.1e13 7.8e12 ~ 1.4
get_morb_R 3.8e12 5.6e11 ~ 6.8

@ponweist
Copy link
Owner Author

ponweist commented Aug 5, 2014

The next bottleneck in get_morb_R appeared in lines 854ff:

          ! Wannier-gauge overlap matrix S in the projected subspace
          !
          call get_win_min(ik,winmin_q)
          call get_win_min(nnlist(ik,nn),winmin_qb)
          S=cmplx_0
          H_q_qb(:,:)=cmplx_0
          do m=1,num_wann
             do n=1,num_wann
                do i=1,num_states(ik)
                   ii=winmin_q+i-1
                   do j=1,num_states(nnlist(ik,nn))
                      jj=winmin_qb+j-1
                      x = conjg(v_matrix(i,n,ik))*S_o(ii,jj)&
                           *v_matrix(j,m,nnlist(ik,nn))
                      S(n,m)=S(n,m) + x
                      H_q_qb(n,m)=H_q_qb(n,m) + x*eigval(ii,ik)
                   enddo
                enddo
             enddo
          enddo

Check if an extended version of get_gauge_overlap_matrix with an optional output parameter for H_q_qb can be used here.

@ponweist
Copy link
Owner Author

ponweist commented Aug 6, 2014

Now using extended routine get_gauge_overlap_matrix with optional output parameter for H_q_qb.

New trace:
trace-iss3-fix2

New performance analysis:

Routine Previous Current Speedup factor
berry_main 1.1e13 7.4e12 ~ 1.5
get_morb_R 3.8e12 1.8e11 ~ 21

Time for initialization is down from ~53s to ~3s(!).

@ponweist ponweist closed this as completed Aug 6, 2014
@ponweist ponweist mentioned this issue Aug 20, 2014
2 tasks
ponweist pushed a commit to ponweist/wannier90 that referenced this issue Mar 2, 2017
ponweist pushed a commit to ponweist/wannier90 that referenced this issue Mar 2, 2017
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

No branches or pull requests

1 participant