reference-lapack / lapack Goto Github PK
View Code? Open in Web Editor NEWLAPACK development repository
License: Other
LAPACK development repository
License: Other
This label exit_level_0:
in *syconv
is not used, is there anything missing?
There is the following difference in /LAPACK/[s,d]gebal.f between 3.6.0 and 3.5.0:
In LAPACK-3.5.0, C and R are calculated as follows:
DO 150 J = K, L
IF( J.EQ.I )
$ GO TO 150
C = C + ABS( A( J, I ) )
R = R + ABS( A( I, J ) )
150 CONTINUE
. . . . . .
200 CONTINUE
I.e. | Aii | is not added to C and R.
In LAPACK-3.6.0, C and R are calculated as follows:
C = SNRM2( L-K+1, A( K, I ), 1 )
R = SNRM2( L-K+1, A( I, K ), LDA )
. . . .
200 CONTINUE
See /BLAS1/snrm2.f
_ SNRM2 := sqrt( x'_x )
Naturally, there will be another result, because SNRM2 is the euclidean norm and Aii elements are included in calculations.
The same situation is in the case of LAPACK/cgebal.f
In LAPACK-3.5.0, to calculate C and R (estimation of norm) they use a special function
* .. Statement Function definitions ..
CABS1( CDUM ) = ABS( REAL( CDUM ) ) + ABS( AIMAG( CDUM ) )
In LAPACK-3.5.0, C and R are calculated as follows:
C = SCNRM2( L-K+1, A( K, I ), 1 )
R = SCNRM2( L-K+1, A( I , K ), LDA )
. . . . .
200 CONTINUE
_ SCNRM2 := sqrt( conjg( x' )_x )
SCNRM2 should work correctly, because it is successfully used by a lot ot other routines.
Any case, Aii are used in calculations of C and R.
Why have you changed the working algorithm?
I think, balancing works correctly, because those changes were connected only with estimations of
test error.
The difference in results, for example, in cbal.out
In LAPACK-3.5.0:
.. test output of CGEBAL ..
value of largest test error = 0.000E+00
example number where info is not zero = 0
example number where ILO or IHI wrong = 0
example number having largest error = 0
number of examples where info is not 0 = 0
total number of examples tested = 13
In LAPACK-3.6.0:
.. test output of CGEBAL ..
value of largest test error = 0.875E+00
example number where info is not zero = 0
example number where ILO or IHI wrong = 0
example number having largest error = 6
number of examples where info is not 0 = 0
total number of examples tested = 13
This is definitely because Aii are not calculated in one case (in LAPACK-3.5.0) and are calculated in another (in LAPACK-3.6.0).
I did the following experiment:
I used cgebal.f from LAPACK-3.5.0, but commented the IF condition:
DO 200 I = K, L
C = ZERO
R = ZERO
*
DO 150 J = K, L
* IF( J.EQ.I )
* $ GO TO 150
C = C + CABS1( A( J, I ) )
R = R + CABS1( A( I, J ) )
150 CONTINUE
. . . . .
200 CONTINUE
And i've received the same result, as in the case of LAPACK-3.5.0.
.......
.. test output of CGEBAL ..
value of largest test error = 0.875E+00
example number where info is not zero = 0
example number where ILO or IHI wrong = 0
example number having largest error = 6
number of examples where info is not 0 = 0
total number of examples tested = 13
If that is a bug and get fixed in 3.6, should the test also be changed? Or is the "value of largest test error = 0.875E+00" acceptable as passed?
The double real constant ONE is passed into the routine ZGEMM at line 596 of ZTREVC3.
ZGEMM expects a double complex argument. This should be changed to CONE.
I'm testing new QR routines and it seems that operations with WORK1 array won't be obvious for users. I see two possible issues here:
There is a convention in LAPACK that optimal size of workspace is returned in the first element of workspace but in the routines ?GEQR and ?GELQ it is returned in the second element for WORK1 and it can be misleading for users. It would be better to return optimal size of WORK1 in the first element, minimal size in the second element, the type of algorithm in the third etc (but only the optimal size may be returned - see the second issue).
Usually user passes one-element work array for workspace query. But in ?GEQR and ?GELQ its size should be at least 5 elements and it also may cause issues. So I think the better option here is to fill only the first element in workspace query - optimum size of WORK1 and fill other elements (row block size, column block size, algorithm type, ...) only inside computation call.
In row-major case, LAPACKE_?trexc_work
requires ldq >= n
even when compq
is not 'v'
, that is, when the matrix Q is not referenced. The ldq
parameter is always used conditionally only if compq
is 'v'
, so the ldq >= n
check should be also done only in that case.
In ?TREXC the corresponding check is:
IF( LDQ.LT.1 .OR. ( WANTQ .AND. LDQ.LT.MAX( 1, N ) ) ) THEN
INFO = -6
ZGELSD returns INFO>0 for a matrix of dimension 23068-by-23066, resulting from DLASD4 (root finder) exceeding its maximum number of iterations. The problem is not in DLASD4 per se, but in the merging/sorting/deflation of singular values of (two) submatrices. The subroutine that does that, DLASD7, is failing to do the sorting that is later required by DLASD4. The issue in this case is that the 23068-by-23066 matrix has a very large number (1000’s) of singular values extremely close to each other and the merging/sorting/deflation step is not working as intended. The problem is under investigation; it might be related to the issue #34 created by Jack Poulson on Aug 7, 2016.
Obs. ZGELSD computes the minimum-norm solution to a real linear least squares problem
minimize 2-norm(| b - A*x |)
using the singular value decomposition (SVD), which is computed by a divide-and-conquer algorithm.
LAPACK current shift strategy is known to fail on various pencils. Here are six known pencils, (A1,B1), (A2,B2), (A3,B3) where LAPACK fails
A1 = [
0 0 1 ;
0 1 0 ;
1 0 0
];
B1 = [
2 0 0 ;
0 0 1 ;
0 1 0
];
Pencil (A1,B1) has been sent to us by Zbigniew Leyk, Senior Lecturer, Department of Computer Science, Texas A & M University on Jan 30 2007. See: http://icl.cs.utk.edu/lapack-forum/viewtopic.php?t=317
A2 = [
0 0 0 1 ;
1 0 0 0 ;
0 1 0 0 ;
0 0 1 0
];
B2 = [
1 0 0 0 ;
0 1 0 0 ;
0 0 1 0 ;
0 0 0 1
];
Pencil (A2,B2) has been sent to us by Daniel Kressner in 2005 who quote Vasile Sima.
A3 = [
1 1 0;
0 0 -1;
1 0 0
];
B3 = [
-1 0 -1;
0 -1 0;
-1 0 -1
];
Pencil (A3,B3) has been sent to us by Patrick Alken from University of Colorado at Boulder on Apr 24 2007. See: http://icl.cs.utk.edu/lapack-forum/viewtopic.php?t=382
A4 = [
1 1e-9 0;
1e-9 1 0;
0 0 1
];
B4 = [
1 0 0;
0 1 0;
0 0 1
];
Pencil (A4,B4) has been sent to us by Michael Baudin from the Scilab project on Tu Oct 28th 2008. This one pencil make ZGGEV fail. DGGEV handles it correctly. http://bugzilla.scilab.org/show_bug.cgi?id=3652
See as well:
http://icl.cs.utk.edu/lapack-forum/viewtopic.php?f=13&t=4357
And probably also:
http://icl.cs.utk.edu/lapack-forum/viewtopic.php?f=13&t=4855
The fixes
a-- Mathworks' patch
Bobby Cheng sent the lapackers the Mathworks patch to fix LAPACK QZ. This patch works fine on the pencils (A1,B1) and (A2,B2). But fails on pencil (A3,B3). As observed by Jim Demmel, matlab qz works fine with pencil (A3,B3).
Date: Thu, 15 Sep 2005 20:35:00 +0200 (CEST)
From: Daniel Kressner
Subject: Re: [Lapackers] Mathworks modif on LAPACKHi,
below are some comments on Mathworks' modifications to DGEBAL and DHGEQZ.
1)Change SCLFAC from 8 to 2 to be the same as LINPACK.
File:
cgebal.f
zgebal.f
sgebal.f
dgebal.fI support the point of view taken by Mathworks. Changing SCLFAC from 8 to 2 may result in a few extra iterations of the balancing algorithm, but it may also give a balanced matrix of slightly smaller norm and consequently one or two more accurate digits in the computed eigenvalues. I am not aware that the execution time needed by balancing could be a major concern when solving nonsymmetric eigenvalues.
3)Change to implicit shift and exceptional shift calculations.
Some problem were not converging.Files:
zhgeqz.f (lines 602-672)
chgeqz.f (lines 602-672)
shgeqz.f (lines 657-667, 819-821)
dhgeqz.f (lines 657-667, 819-821)As far as I can see, Mathworks applied four changes to *HGEQZ (not all of them are marked in the code):
(1) Exceptional shift strategy
As also pointed out by Vasile Sima, the exceptional shift strategy for the QZ algorithm currently implemented in DHGEQZ seems to be less effective than the one for the QR algorithm. For example, it fails for the matrix pair
[ 0 0 0 1 ]
[ 1 0 0 0 ]
A = [ 0 1 0 0 ], B = identity.
[ 0 0 1 0 ]
LAPACK's exceptional shift is zero (this is the same as the standard shift -> no convergence). Mathworks' exceptional shift is one (much better).
(2) Choice of single real shift
DHGEQZ applies a single shift QZ algorithm if the computed shifts are real. This seems to be a relict of Ward's combination shift QZ algorithm, which has no meaningful purpose anymore (applying two single shift QZ iterations is more expensive than applying one double shift QZ iteration, even in terms of flops).
If two real shifts are computed, Mathworks' change has the effect that always the shift closer to the last diagonal element of A\B is used. If I remember correctly, this is in the spirit of what Wilkinson proposed for symmetric matrices. I support this change and expect that it can lead to slightly faster convergence.
(3) Mathworks included B22 = -B22 on line 820 (in DHGEQZ)
This is a bug fix and I highly recommend to include it. It seems that DHGEQZ computes wrong complex conjugate eigenvalue pairs if DLASV2 computes negative diagonal elements (I am not sure whether this is an unlikely event in the context of the QZ algorithm).
(4) Mathworks uses the workspace to store the number of iterations needed for each eigenvalue.
This is probably used for debugging and should not be included in LAPACK.
With best regards,
Daniel
b-- David Day's patch
I found in the lapackers archive a patch from David Day.
From: David Day (Sandia)
Date: Tue Sep 7 09:49:40 2004
Subject: [Lapackers] Re: [Fwd: Convergence failure of LAPACK's zggevx]Dear Jim Demmel:
The reported problem does involve the Exceptional shifts in QZ. Changing the QZ exceptional shift to something similar to what I have advixed for real QR and real QZ does fix this particular bug report. In my version of zhgeqz.f lines 603 read
ELSE
*
* Exceptional shift. Chosen for no particularly good reason.
*
ESHIFT = ESHIFT + DCONJG( ( ASCALE*A( ILAST-1, ILAST ) ) /
$ ( BSCALE*B( ILAST-1, ILAST-1 ) ) )
After these lines, I added
TEMP =
$ ABS(ASCALE*A(ILAST-1,ILAST-2)/(BSCALE*B(ILAST-1,ILAST-1)))+
$ ABS(ASCALE*A(ILAST ,ILAST-1)/(BSCALE*B(ILAST-2,ILAST-2)))
TEMP2 =
$ ABS(ASCALE*A(ILAST-1,ILAST-2)/(BSCALE*B(ILAST-2,ILAST-2)))+
$ ABS(ASCALE*A(ILAST ,ILAST-1)/(BSCALE*B(ILAST-1,ILAST-1)))
ESHIFT = DCMPLX(MIN(TEMP, TEMP2),0)
This changes the value of the shift, now set on line 616
SHIFT = ESHIFT
>Thank you for including me in this discussion.
--David M. Day
c-- Patrick Alken's patch
Date: Wed, 2 May 2007 16:14:55 -0600
From: Patrick Alken
Subject: Re: [Lapack] dggev fails on non-singular system
Hello,
I have done some testing and I have found that the following
patch fixes all the 3x3 and 4x4 matrix systems that I found failures
on:Replace (in dhgeqz.f):
629 IF( ( DBLE( MAXIT )*SAFMIN )*ABS( H( ILAST-1, ILAST) ).LT.
630 $ ABS( T( ILAST-1, ILAST-1 ) ) ) THEN
631 ESHIFT = ESHIFT + H( ILAST-1, ILAST ) /
632 $ T( ILAST-1, ILAST-1 )
633 ELSE
634 ESHIFT = ESHIFT + ONE / ( SAFMIN*DBLE( MAXIT ) )
635 END IF
with:
629 IF( ( DBLE( MAXIT )*SAFMIN )*ABS( H( ILAST, ILAST-1) ).LT.
630 $ ABS( T( ILAST-1, ILAST-1 ) ) ) THEN
631 ESHIFT = ESHIFT + 0.736 * H( ILAST, ILAST-1 ) /
632 $ T( ILAST-1, ILAST-1 )
633 ELSE
634 ESHIFT = ESHIFT + ONE / ( SAFMIN*DBLE( MAXIT ) )
635 END IF
The idea here is the following: in the previous code, its certainly
possible for H(ILAST-1,ILAST) to equal 0, in which case the test
will pass but ESHIFT will not be modified at all. I have just
replaced it with H(ILAST,ILAST-1) which is guaranteed not to be
zero, since its a subdiagonal element and the previous tests will
search for zeros on the subdiagonal. The 0.736 coefficient is
arbitrary - without it, I still found a small number of convergence
failures on 4x4 integer systems.Patrick Alken
In /SRC/ctrevc3.f
instead of
21 * SUBROUTINE CTREVC3( SIDE, HOWMNY, SELECT, N, T, LDT, VL, LDVL,
22 * VR, LDVR, MM, M, WORK, LWORK, RWORK, INFO )
should be
21 * SUBROUTINE CTREVC3( SIDE, HOWMNY, SELECT, N, T, LDT, VL, LDVL,
22 * VR, LDVR, MM, M, WORK, LWORK, RWORK, LRWORK, INFO )
The same in the case of /SRC/ztrevc3.f
The online web pages located at:
http://www.netlib.org/lapack/#_browse_download_lapack_routines_with_on_line_documentation_browser
http://www.netlib.org/blas/#_svn_access
Are linking to the SVN repository.
I did a google search with the following query
site:http://www.netlib.org/ "icl.cs.utk.edu/svn/lapack-dev/lapack/trunk"
Hans
The LAPACKE_WITH_TMG option is not tested on the TRAVIS dashboard.
To minimize surprises after code changes, this should be added to patch tests.
This work can not be completed until pull request #16 is completed.
A large part of [DS]TREXC code deals with the case when a 2x2 block splits into two 1x1 blocks but it is not clear to me when that can happen.
On entry, T is assumed to be block upper triangular with 2x2 diagonal blocks in Schur canonical form. That is, any 2x2 block corresponds to a pair of complex conjugate eigenvalues. When can such a block split into two 1x1 blocks, that is, into a 2x2 block that has two real eigenvalues? One possible explanation is that the code is prepared to handle the case when a 2x2 block is not in Schur canonical form and can have two real eigenvalues which is revealed after the block passes through Dlaexc. Is that so or am I missing something?
In /SRC/cbbcsd.f
Instead of
276 *> RWORK is REAL array, dimension (MAX(1,LWORK))
277 *> On exit, if INFO = 0, WORK(1) returns the optimal LWORK.
Should be:
276 *> RWORK is REAL array, dimension (MAX(1,LRWORK))
277 *> On exit, if INFO = 0, RWORK(1) returns the optimal LRWORK.
The same for /SRC/zbbcsd.f
There is an issue with the routine zheevd when compiled with Intel fortran using the optimization -O2
or -O3
.
The problem occurs at line 285 of dlaed4:
DELTA( J ) = ( D( J )-D( I ) ) - TAU
which evaluates to zero because of the compiler not respecting the parenthesis. This results in a division-by-zero later on. Including -assume protect_parens
fixes the problem.
Here is a simple program which produces the error:
program test
implicit none
integer, parameter :: n=27
integer i,j
integer liwork,lrwork,lwork,info
integer, allocatable :: iwork(:)
real(8), allocatable :: w(:),rwork(:)
complex(8), allocatable :: a(:,:),work(:)
liwork=5*n+3
lrwork=2*n**2+5*n+1
lwork=n**2+2*n
allocate(w(n),a(n,n))
allocate(iwork(liwork),rwork(lrwork),work(lwork))
a(:,:)=0.d0
do i=1,n
a(i,i)=1.d0
do j=i+1,n
a(i,j)=cos(dble(i))
end do
end do
call zheevd('V','U',n,a,n,w,work,lwork,rwork,lrwork,iwork,liwork,info)
print *,'info',info
end program
I'm not sure whether this is a compiler, LAPACK or user issue. I've added -assume protect_parens
to our code (elk.sourceforge.net) and made zheev as the default eigenvalue solver for the moment.
The documentation and the code indicate that N=0 is allowed. However, in that case the error checking for IFST (and ILST) leads to an unavoidable failure:
ELSE IF( IFST.LT.1 .OR. IFST.GT.N ) THEN
INFO = -7
ELSE IF( ILST.LT.1 .OR. ILST.GT.N ) THEN
INFO = -8
END IF
Hi,
ctrevc3.f:
ctrevc3:
Warning on line 596 of ctrevc3.f: inconsistent calling sequences for cgemm,
arg 6 is ALPHA, which is COMPLEX, so I think this is a valid warning, and the actual parameter ONE should be CONE.
Is my analysis correct?
Similar for ztrevc3.f.
[*] Yeah, I know that f2c cannot translate all of the latest LAPACK, but for the platform I'm targeting that's the best I have.
Kees van Reeuwijk
DSTEMR may return INFO>0. There was a bug in LAPACK 3.5.0 that was troublesome: a matrix of dimension 100 (a sort of perturbation of the identity matrix, provided by the developers of the R package), caused a disastrous infinite loop in one of the subroutines in the calling tree of DSTEMR (symmetric tridiagonal eigensolver based on the MRRR algorighm). That was because of NaNs that the safeguard mechanisms implemented therein failed to deal with. As a fix, the value of a logical flag that was related to the infinite loop was changed, starting with LAPACK 3.6.0. With that change the risk of an infinite loop has disappeared. However, we found out that that change to the logical variable could lead to unexpected consequences: DSTEMR may return INFO>0 (indicating an internal error in the auxiliary subroutine DLARRV) for some matrices that are not particularly “pathologic.” We are in the process of redesigning a few of the underpinnings of DSTEMR (including DLARRV) to make it more robust.
What do the version numbers in each file correspond to? Do they have a meaning or are they meant to be updated at each release?
Just a suggestion of changing at 'make.inc.example' file, to replace the line:
by
With 'MAKEDEPRECATED = Yes' the deprecated routines are not build.
This is related to the subroutines xLABAD.
SLABAD takes as input the values computed by SLAMCH for underflow and
overflow, and returns the square root of each of these values if the
log of LARGE is sufficiently large. This subroutine is intended to
identify machines with a large exponent range, such as the Crays, and
redefine the underflow and overflow limits to be the square roots of
the values computed by SLAMCH. This subroutine is needed because
SLAMCH does not compensate for poor arithmetic in the upper half of
the exponent range, as is found on a Cray.
Is that still relevant on current Cray machines?
I would assume all machines are using IEEE and such a routine is obsolete by now.
Testing should use git repository.
http://my.cdash.org/index.php?project=LAPACK
From E-mail correspondence
We are transitioning to GIT. I did a grep
of SVN in the LAPACK directory, and it seems to me that CMAKE is still using the old
SVN repository. I think it would be more appropriate if CMAKE would use GIT. By any chance, is it something with which you could help us?
It is for the CTEST Dashboard. http://my.cdash.org/index.php?project=LAPACK&date=2016-07-25
We used it from time to time, especially around release time.
The documentation of 'work' in sgesvj states that it should be of size max(4,M+N), whereas de documentation of 'lwork' in sgesvj states that it should be 'length of WORK, WORK >= MAX(6,M+N)'.
First of all, this is bad phrasing: WORK >= MAX(6,M+N) is an incorrect expression; clearly len(WORK) >= MAX(6, M+N) is intended. Second, this contradicts the documentation of 'work'. max(6, M+N) seems to be correct, because the documentation of 'work' mentions that the first six values of 'work' have meaning on exit.
There looks like a bug in SGESVDX in Lapack3.6.1 (also in 3.6.0):
At the beginning, SGESVDX check the NORM of A and then calls SLASCL to modify the values of the elements if they are extreme (line 506-514). At the end calls SLASCL to adjust back the values of elements of S if A has been modified (line 817-814). The problem is that, in many cases, S is not changed even A is modified. That means that the modification in A does not get transfered into S. There should be no adjustment need for S. And this unnecessary adjustment causes overflow or underflow in many cases. An exampleis:
JOBU='V', JOBVT='V', RANGE='V', N=1, and A and S have some big numbers. A gets modifiled but S does not change (look at the call to SBDSVDX). Calling to SLASCL at the end causes overflow.
Workaround: check if S is changed before calling to SLASCL at the end. This may not be real fix.
Thanks,
Shandong Lao
Hi, I built lapack with xlf on a ppc64le (POWER8) machine. I used the options in INSTALL\make.inc.XLF
, although for my BLASLIB
, I switched to ../../librefblas.a
.
During make
, I encountered the following error:
** zunt01 === End of Compilation 1 ===
"zunt01.f", 1500-036 (I) The NOSTRICT option (default at OPT(3)) has the potential to alter the semantics of a program. Please refer to documentation on the STRICT/NOSTRICT option for more information.
1501-510 Compilation successful for file zunt01.f.
xlf -O3 -qfixed -qnosave -c zunt03.f -o zunt03.o
** zunt03 === End of Compilation 1 ===
"zunt03.f", 1500-036 (I) The NOSTRICT option (default at OPT(3)) has the potential to alter the semantics of a program. Please refer to documentation on the STRICT/NOSTRICT option for more information.
1501-510 Compilation successful for file zunt03.f.
\
xlf -qnosave -o xeigtstz \
zchkee.o zbdt01.o zbdt02.o zbdt03.o zbdt05.o zchkbb.o zchkbd.o zchkbk.o zchkbl.o zchkec.o zchkgg.o zchkgk.o zchkgl.o zchkhb.o zchkhs.o zchkst.o zckcsd.o zckglm.o zckgqr.o zckgsv.o zcklse.o zcsdts.o zdrges.o zdrgev.o zdrges3.o zdrgev3.o zdrgsx.o zdrgvx.o zdrvbd.o zdrves.o zdrvev.o zdrvsg.o zdrvst.o zdrvsx.o zdrvvx.o zerrbd.o zerrec.o zerred.o zerrgg.o zerrhs.o zerrst.o zget02.o zget10.o zget22.o zget23.o zget24.o zget35.o zget36.o zget37.o zget38.o zget51.o zget52.o zget54.o zglmts.o zgqrts.o zgrqts.o zgsvts3.o zhbt21.o zhet21.o zhet22.o zhpt21.o zhst01.o zlarfy.o zlarhs.o zlatm4.o zlctes.o zlctsx.o zlsets.o zsbmv.o zsgt01.o zslect.o zstt21.o zstt22.o zunt01.o zunt03.o dlafts.o dlahd2.o dlasum.o dlatb9.o dstech.o dstect.o dsvdch.o dsvdct.o dsxt1.o alahdg.o alasum.o alasvm.o alareq.o ilaenv.o xerbla.o xlaenv.o chkxer.o ../../libtmglib.a \
../../liblapack.a ../../librefblas.a && mv xeigtstz ../xeigtstz
make[2]: Leaving directory `/home/u0017592/projects/lapack/TESTING/EIG'
NEP: Testing Nonsymmetric Eigenvalue Problem routines
./xeigtstz < nep.in > znep.out 2>&1
/bin/sh: line 1: 29872 Segmentation fault ./xeigtstz < nep.in > znep.out 2>&1
make[1]: *** [znep.out] Error 139
make[1]: Leaving directory `/home/u0017592/projects/lapack/TESTING'
make: *** [lapack_testing] Error 2
Any idea what the problem might be? Is reference lapack tested on ppc64le architecture?
Dear Development Team ,
is it possible to have a manpage file that includes
only the manual and not also all the Doxygen
extra stuff as on previous release ?
$ ls -sh manpages-3.6.0.tgz
1.4M manpages-3.6.0.tgz
$ ls -sh manpages-3.6.1.tgz
9.6M manpages-3.6.1.tgz
All the additional Doxygen stuff does not belong to a normal man page:
...
Definition at line 152 of file zupmtr&.f&.
.PP
.nf
152 *
153 * -- LAPACK computational routine (version 3&.4&.0) --
154 * -- LAPACK is a software package provided by Univ&. of Tennessee, --
155 * -- Univ&. of California Berkeley, Univ&. of Colorado Denver and NAG Ltd&.&.--
156 * November 2011
[D,S]GEBD2 routines are being checked in /TESTING/EIG/[d,s]errbd.f.
For example:
$ less /TESTING/EIG/derrbd.f
. . . . .
91 * .. External Subroutines ..
92 EXTERNAL CHKXER, DBDSDC, DBDSQR, DBDSVDX, DGEBD2,
93 $ DGEBRD, DORGBR, DORMBR
. . . . .
143 *
144 * DGEBD2
145 *
146 SRNAMT = 'DGEBD2'
147 INFOT = 1
148 CALL DGEBD2( -1, 0, A, 1, D, E, TQ, TP, W, INFO )
149 CALL CHKXER( 'DGEBD2', INFOT, NOUT, LERR, OK )
150 INFOT = 2
151 CALL DGEBD2( 0, -1, A, 1, D, E, TQ, TP, W, INFO )
152 CALL CHKXER( 'DGEBD2', INFOT, NOUT, LERR, OK )
153 INFOT = 4
154 CALL DGEBD2( 2, 1, A, 1, D, E, TQ, TP, W, INFO )
155 CALL CHKXER( 'DGEBD2', INFOT, NOUT, LERR, OK )
156 NT = NT + 3
Also
*> SERRBD tests the error exits for SGEBD2, SGEBRD, SORGBR, SORMBR,
*> SBDSQR, SBDSDC and SBDSVDX
It seems, the same should be provided for CGEBD2 and ZGEBD2.
DORCSD2BY1:
The help text for DORCSD2BY1 (and similarly for SORCSD2BY1, CUNCSD2BY1, ZUNCSD2BY1) does not tell me exactly how to build the output matrices from LAPACK's output. It gives the following formula:
[ I 0 0 ]
[ 0 C 0 ]
[ X11 ] [ U1 | ] [ 0 0 0 ]
X = [-----] = [---------] [----------] V1**T .
[ X21 ] [ | U2 ] [ 0 0 0 ]
[ 0 S 0 ]
[ 0 0 I ]
and defines that X is M-by-Q, X11 is P-by-Q, and thus X12 is (M-P)-by-Q, and U1, U2, and V1 are P-by-P, (M-P)-by-(M-P), and Q-by-Q, respectively. The help text also defines that the matrices C and S are R-by-R, with R = MIN(P,M-P,Q,M-Q). But the sizes of the other blocks in the formula are not defined.
After quite some experimentation, I think that the top left matrix I should have size T-by-T, with T = max(Q+P-M, 0), and the bottom right matrix I should have size K-by-K, with K = max(Q-P, 0). The sizes of the other blocks can be derived from these two blocks, but finding these two formulas took me quite a long time on my own, and should really be documented directly.
The newly added rook3 codes which is based on BLAS-level 3 are named *_rk
while the older rook routines are named *_rook
.
As I suggested in #82, I think they should be named more similarly (here is a suggestion which may not at all reflect their intend and actual code):
The VU
argument in *LARRV is currently not being used to bound the eigenvalue calculation.
Is this omitted on purpose?
Furthermore, there is no check whether VL < VU
as prescribed in the documentation.
$less /TESTING/EIG/serrgg.f
. . . . .
80 INTEGER DUMMYK, DUMMYL, I, IFST, ILO, IHI, ILST, INFO,
81 $ J, M, NCYCLE, NT, SDIM, LWORK, JDUM
Should be:
80 INTEGER DUMMYK, DUMMYL, I, IFST, ILO, IHI, ILST, INFO,
81 $ J, M, NCYCLE, NT, SDIM, LWORK
I think it would be nice to have the lapack repo track the releases of lapack, consistently for future easy backtracks.
I have tracked the 3.6.0 and 3.6.1 versions down to:
v3.6.0: 5944165
v3.6.1: 5efe359
To confirm I have downloaded lapack-3.6.0.tgz and lapack-3.6.1.tgz and performed:
diff -r git-lapack/ lapack-3.6.X
and there where no differences.
I had difficulties from 3.5.0 due to inconsistent history. For instance I get the minimal diff:
using:
5eb7207
but there is one patch missing (introduced in: 1039d78)
But there are commits in between, so an exact hash is not available.
I haven't tried any versions prior to 3.5.0.
PS. I have added (annotated) tags on my fork:
https://github.com/zerothi/lapack/releases
but one cannot create PR for equivalent branches (assigning tags does not change history).
I hope you will consider adding the versions.
In DLASWP, INCX
is the increment for IPIV
and can be positive or negative. If negative, the swaps are applied from K2
down to K1
, otherwise from K1
to K2
. From the documentation it is understood that the same elements of IPIV
are accessed in both cases.
However, the implementation behaves differently. When INCX>0
, the first accessed element is K1
, then K1+INCX
, ... until K1+(K2-K1)*INCX
. When INCX<0
, the first accessed element is
1+(1-K2)*INCX
when in fact it should be
K1+(K1-K2)*INCX
It seems that the 1
is missing a K
in front. All the calls to DLASWP with INCX=-1
luckily use K1=1
in LAPACK.
The documentation for ?LAQR2 says:
*> accumulated into Z(ILOZ:IHIZ,ILO:IHI) from the right.
It should say (as in ?DLAQR3):
*> accumulated into Z(ILOZ:IHIZ,ILOZ:IHIZ) from the right.
Hi,
Since I see that you're interested in unused symbols in the sources, here is a list of exactly 101 of them. I think some of them have been covered by other issues, but I don't think all of them have.
Kees van Reeuwijk
In clarfb.f the name of the parameter for the size of WORK is LDWORK.
LDWORK has been changed to LWORK everywhere else. It seems, to follow the internal convention,
the size of WORK should be named LWORK too.
Checking the why the Travis build #99.2 of a patch failed only on MacOS, I noticed that the binary jq
is missing and consequently, the environment variable BRANCH is always empty:
$ export BRANCH=$(if [ "$TRAVIS_PULL_REQUEST" == "false" ]; then echo $TRAVIS_BRANCH; else echo `curl -s $PR | jq -r .head.ref`; fi)
/Users/travis/build.sh: line 45: jq: command not found
(23) Failed writing body
The documentation for LDQ says that
LDQ >= max(1,N).
irrespective of the value of COMPQ. The code actually takes COMPQ into account:
IF( LDQ.LT.1 .OR. ( WANTQ .AND. LDQ.LT.MAX( 1, N ) ) )
It seems that the documentation should be updated to:
LDQ >= 1, and if COMPQ = 'V', LDQ >= max(1,N).
LAPACKE_?larfb_work
passes work
and ldwork
to the Fortran routine unmodified even if the matrix layout is row major. This is extremely surprising bordering a bug because it means that the work matrix has to be column major while all the other matrices are treated as row major. Interestingly, LAPACKE_?larfb
follows the game and the work matrix it allocates (due to ldwork
it uses) is column major.
I might be wrong but the only reasonable solution I see is not to allocate work in LAPACKE_?larfb
and allocate it in LAPACKE_?larfb_work
instead while ignoring work
and ldwork
parameters passed to it.
It seems that LAPACKE_?tprfb_work
and LAPACKE_?trsna_work
suffer from the same problem.
hello,
while creating the shared version of latest lapack (3.7.0), I get:
liblapack.a(ztplqt.o): In function ztplqt_': ztplqt.f:(.text+0x0): multiple definition of
ztplqt_'
liblapack.a(ztplqt.o):ztplqt.f:(.text+0x0): first defined here
liblapack.a(ztplqt2.o): In function ztplqt2_': ztplqt2.f:(.text+0x0): multiple definition of
ztplqt2_'
liblapack.a(ztplqt2.o):ztplqt2.f:(.text+0x0): first defined here
liblapack.a(ztpmlqt.o): In function ztpmlqt_': ztpmlqt.f:(.text+0x0): multiple definition of
ztpmlqt_'
liblapack.a(ztpmlqt.o):ztpmlqt.f:(.text+0x0): first defined here
Actually, I can find a duplicate line in both SRC/Makefile and
SRC/CMakeLists.txt:
SRC/Makefile:
...
ztplqt.o ztplqt2.o ztpmlqt.o \ <===== DUP
zgelqt.o zgelqt3.o zgemlqt.o
zgetsls.o zgeqr.o zlatsqr.o zlamtsqr.o zgemqr.o
zgelq.o zlaswlq.o zlamswlq.o zgemlq.o
ztplqt.o ztplqt2.o ztpmlqt.o \ <===== DUP
...
SRC/CMakeLists.txt:
...
ztplqt.f ztplqt2.f ztpmlqt.f <===== DUP
zgelqt.f zgelqt3.f zgemlqt.f
zgetsls.f zgeqr.f zlatsqr.f zlamtsqr.f zgemqr.f
zgelq.f zlaswlq.f zlamswlq.f zgemlq.f
ztplqt.f ztplqt2.f ztpmlqt.f <===== DUP
...
Removing the line duplication fixes everything for me.
ciao
gabriele
These tests are failing on GCC compilers 4.9.1
version 4.9.1 20140922 (Red Hat 4.9.1-10) (GCC)
https://travis-ci.org/Reference-LAPACK/lapack/jobs/147765341
13 - CBLAS-xscblat1 (Failed)
14 - CBLAS-xscblat2 (Failed)
15 - CBLAS-xscblat3 (Failed)
16 - CBLAS-xdcblat1 (Failed)
17 - CBLAS-xdcblat2 (Failed)
18 - CBLAS-xdcblat3 (Failed)
19 - CBLAS-xccblat1 (Failed)
20 - CBLAS-xccblat2 (Failed)
21 - CBLAS-xccblat3 (Failed)
22 - CBLAS-xzcblat1 (Failed)
23 - CBLAS-xzcblat2 (Failed)
24 - CBLAS-xzcblat3 (Failed)
I verified these failures on a RHEL6 machine using These tests are failing on GCC compilers 4.9.1
version 4.9.1 20140922 (Red Hat 4.9.1-10) (GCC).
$ cat t.f
PROGRAM BUG_STGSNA
IMPLICIT NONE
INTEGER INFO, LWORK, M, MM, IWORK(20), i, j
REAL A(2,2), B(2,2), RWORK( 40 )
REAL VL(2,4), VR(2,4), S(2), DIF(2)
COMPLEX C( 2, 2 ), U( 2, 2 ), VT( 2, 2 )
COMPLEX WORK( 20 )
LOGICAL SELECT(2)
C(1,1) = (1.0,1.0)
C(2,1) = 1.0
C(1,2) = (-1.0,1.0)
C(2,2) = - 1.0
LWORK = 40
CALL CGESVD('A','A',2,2,C,2,S,U,2,VT,2,WORK,LWORK,RWORK,INFO)
SELECT(1) = .TRUE.
SELECT(2) = .TRUE.
A(1,1) = 1.0
A(1,2) = 1.0
A(2,1) = -A(1,2)
A(2,2) = A(1,1)
B(1,1) = 1.0
B(1,2) = 0.0
B(2,1) = 0.0
B(2,2) = 1.0
PRINT 10
PRINT 11,((A(i,j),j=1,2),i=1,2)
PRINT 12
PRINT 11,((B(i,j),j=1,2),i=1,2)
PRINT 13, S(2)
MM = 2
CALL STGEVC('B','S',SELECT,2,A,2,B,2,VL,2,VR,2,
& MM,M,RWORK,IWORK,INFO)
CALL STGSNA('B','S',SELECT,2,A,2,B,2,VL,2,
&VR,2,S,DIF,MM,M,RWORK,LWORK,IWORK,INFO)
PRINT 14, DIF(2)
10 FORMAT(1X,'INPUT matrix A')
11 FORMAT(1X,2F5.0)
12 FORMAT(1X,'INPUT matrix B')
13 FORMAT(1X,'Correct DIF(after CGESVD) ',F5.2)
14 FORMAT(1X,'DIF after STGSNA ',F5.2)
END
The result is:
INPUT matrix A
1. 1.
-1. 1.
INPUT matrix B
1. 0.
0. 1.
Correct DIF(after CGESVD) 0.87
DIF after STGSNA 0.62
Instead of the correct result:
INPUT matrix A
1. 1.
-1. 1.
INPUT matrix B
1. 0.
0. 1.
Correct DIF(after CGESVD) 0.87
DIF after STGSNA 0.87
Solution: correct in stgsna.f and dtgsna.f: (lines 637-639).
Instead of
637 ROOT1 = C1 + SQRT( C1*C1-4.0*C2 )
638 ROOT2 = C2 / ROOT1
639 ROOT1 = ROOT1 / TWO
Should be:
637 ROOT1 = C1 + SQRT( C1*C1-FOUR*C2 )
638 ROOT1 = ROOT1 / TWO
639 ROOT2 = C2 / ROOT1
The difference between xyyEQU and xyyEQUB is usually that the latter avoids round-off errors by returning scaling matrices who entries are powers of the floating point radix and the xGEEQUB documentation explains the difference very well. The xPOEQU and xPOEQUB obey the same naming rule but the documentation does not highlight the difference so you have to look at the source code to understand the difference. Seeing that xSYEQUB, xHEEQUB do not return radix powers, the routines are misnomers.
I am currently preparing a patch for the xPOEQUB documentation that highlights the difference to xPOEQU.
The Sca/LAPACK Program Style guide forbids breaking backward compatibility (see here) but clearly, it is a misnomer. In face of the broken documentation of xSYEQUB, xHEEQUB, I wonder if anyone is actually using this routine. In addition, these routines are not implemented in Intel MKL so I propose to ignore the backwards compatibility issue and just start rounding the entries of the scaling matrix to radix powers.
$ less sgejsv.
.
SUBROUTINE SGEJSV( JOBA, JOBU, JOBV, JOBR, JOBT, JOBP,
M, N, A, LDA, SVA, U, LDU, V, LDV,
WORK, LWORK, IWORK, INFO )
LDV is the 15th argument.
If the input value of LDV is incorrect, there should be a message that the
15th argument is incorrect, but there is a message about the 14th parameter,
because
564 ELSE IF ( RSVEC .AND. ( LDV .LT. N ) ) THEN
565 INFO = - 14
Should be:
564 ELSE IF ( RSVEC .AND. ( LDV .LT. N ) ) THEN
565 INFO = - 15
The same for sgejsv.f and dgejsv.f - incorrect, but correct for cgejsv.f and zgejsv.f.
In a very simplified way, I created the following function in C that uses the Generalized SVD function dggsvd3.
[U,V,Q,C,S] = my_gsvd_in_C(A,B), where the inputs are:
A = [[ 1., 1., 2.],
[ 0., 1., 3.],
[ 0., 4., 3.]]
B = [[ 0.2, 1.4, 2. ],
[ 0.4, 1. , 3.2],
[ 0.3, 4.5, 3.3]]
The outputs that I get are the following:
U = [[-0.45904757, 0.22321234, -0.8599137 ],
[-0.88154988, -0.23451243, 0.40972397],
[-0.11020501, 0.94613961, 0.30442518]]
V = [[-0.38233328, 0.1538968 , 0.91111856],
[-0.92047714, -0.149751 , -0.36096602],
[-0.0808894 , 0.97667313, -0.19891328]]
C = [ 0.69954115, 0.64582169, 0.99967773]
S = [ 0.71459232, 0.76348828, 0.02538585]
Q = [[-0.09327252, -0.55839886, 0.82431241],
[ 0.70757717, 0.54528323, 0.44944494],
[ 0.70045328, -0.6251855 , -0.34425035]]
My problem is with the output of Q.
This output is completely different from the one from MATLAB.
Calling the function gsvd from MATLAB, with the same inputs, I have the following value for Q
Q =
0.3456 -0.6562 0.8602
5.8426 -2.5466 -0.7678
3.9969 -5.5656 -0.4228
I don't know what is going on. I can send my C code if needed.
Sorry for any duplicate message, I posted the same issue in the LAPACK forum.
Thanks in advance for any answer
The documentation for the deflation process in {s,d}lasd2 claims that the deflated singular values are sorted in increasing order, but they are in fact more-or-less sorted in decreasing order, with this constraint broken in certain cases when there was a small update vector entry deflation before a deflation of index JPREV due to a small diagonal difference deflation. As can be seen from {s,d}lasd1's call to {s,d}lamrg, the deflated portion of D is claimed to be in decreasing order, which turns out not to be necessarily true for the mentioned reason. I would therefore recommend calling DLASRT rather than DLAMRG to avoid breaking the assumption that the two subsets are already sorted.
Also, there is no d(j)-d(0) <= tol check within the initial deflation loop starting at line 425 of dlasd2. One could easily fix this by checking if d(j) is less than the tolerance rather than after checking if |z(j)| is, but this requires a Givens rotation to mix the first and j'th columns of U (and V), which destroys the fact that the first column of U2 would only have a single nonzero entry (in general, said column would become dense), so the packing logic would need to be changed to incorporate a rank-one update rather than copying a single row of the secular left singular vectors into the updated left singular vectors.
Further, the writes to DSIGMA during the deflation loop in {s,d}lasd2 are superfluous given the subsequent filling of DSIGMA at line 555 of dlasd2.
Further, no absolute value should be required on line 457 of dlasd2 (and the analogue in slasd2) since the diagonal entries are already sorted.
Further, the check on line 568 of dlasd2 of whether abs(DSIGMA(2)) <= tol/2 is strange for two reasons:
Also, the documentation for U and V in {s,d}lasd3 incorrectly suggests that only the last N-K vectors are relevant, when, on exit, they provide the singular vectors for the merged bidiagonal problem.
For row-major matrices the function uses lda_t = max(1,lda)
which is wrong, it should use the number of rows. The number of rows is however not readily available, only the lower bound can be extracted from ipiv
. The lower bound should be enough for the purpose of the function (to do the matrix copy), and it seems to me that it is actually the only way to implement this function at all.
As a side note, an issue with the number of rows that is not directly available in lapacke_?laswp
has arisen before. In 834c77d it was handled by omitting the NaN check.
Over at OpenBLAS we received a report OpenMathLib/OpenBLAS#1056 stating that mingw fails to build LAPACKE as the list of object files in LAPACKE/src/Makefile is too long for the ar on that platform to handle. Would you consider a PR to split the object list as it is currently done in https://github.com/xianyi/OpenBLAS/blob/develop/lapack-netlib/LAPACKE/src/Makefile ?
Compiler warnings are generated for:
~/lsrc/lapack-bld> gfortran --version
GNU Fortran (SUSE Linux) 4.8.3 20140627 [gcc-4_8-branch revision 212064]
Copyright (C) 2013 Free Software Foundation, Inc.
GNU Fortran comes with NO WARRANTY, to the extent permitted by law.
You may redistribute copies of GNU Fortran
under the terms of the GNU General Public License.
For more information about these matters, see the file named COPYING
[ 8%] /user/iibi/johnsonhj/lsrc/lapack/CBLAS/testing/c_dblat1.f:214.48:
CALL STEST1(DNRM2TEST(N,SX,INCX),STEMP,STEMP,SFAC)
1
Warning: Rank mismatch in argument 'strue1' at (1) (scalar and rank-1)
/user/iibi/johnsonhj/lsrc/lapack/CBLAS/testing/c_dblat1.f:218.48:
CALL STEST1(DASUMTEST(N,SX,INCX),STEMP,STEMP,SFAC)
1
Warning: Rank mismatch in argument 'strue1' at (1) (scalar and rank-1)
[ 8%] Building C object CBLAS/testing/CMakeFiles/xscblat2.dir/c_sblas2.c.o
/user/iibi/johnsonhj/lsrc/lapack/CBLAS/testing/c_sblat1.f:214.48:
CALL STEST1(SNRM2TEST(N,SX,INCX),STEMP,STEMP,SFAC)
1
Warning: Rank mismatch in argument 'strue1' at (1) (scalar and rank-1)
/user/iibi/johnsonhj/lsrc/lapack/CBLAS/testing/c_sblat1.f:218.48:
CALL STEST1(SASUMTEST(N,SX,INCX),STEMP,STEMP,SFAC)
1
Warning: Rank mismatch in argument 'strue1' at (1) (scalar and rank-1)
Apart from #37, there is another bug in LAPACKE_?larfb_work
. When transposing the input matrices, in the last, fourth, case when storev
is 'r'
, the check for direct
should compare it to 'b'
, not to 'f'
.
All of my recent pull requests are failing to build on Mac with this error
No CMAKE_Fortran_COMPILER could be found.
Did the environment change? I see this in build 99:
$ export CXX=g++
$ export CC=gcc
$ g++ --version
Configured with: --prefix=/Applications/Xcode.app/Contents/Developer/usr --with-gxx-include-dir=/usr/include/c++/4.2.1
Apple LLVM version 6.0 (clang-600.0.54) (based on LLVM 3.5svn)
Target: x86_64-apple-darwin13.4.0
Thread model: posix
and this in build 100:
$ export CXX=g++
$ export CC=gcc
$ g++ --version
Configured with: --prefix=/Applications/Xcode.app/Contents/Developer/usr --with-gxx-include-dir=/usr/include/c++/4.2.1
Apple LLVM version 7.3.0 (clang-703.0.31)
Target: x86_64-apple-darwin15.6.0
Thread model: posix
InstalledDir: /Applications/Xcode.app/Contents/Developer/Toolchains/XcodeDefault.xctoolchain/usr/bin
There looks like a bug in SGESVDX in Lapack3.6.1 (also in 3.6.0):
At the beginning, SGESVDX check the NORM of A and then calls SLASCL to modify the values of the elements if they are extreme (line 506-514). At the end calls SLASCL to adjust back the values of elements of S if A has been modified (line 817-814). The problem is that, in many cases, S is not changed even A is modified. That means that the modification in A does not get transfered into S. There should be no adjustment need for S. And this unnecessary adjustment causes overflow or underflow in many cases. An exampleis:
JOBU='V', JOBVT='V', RANGE='V', N=1, and A and S have some big numbers. A gets modifiled but S does not change (look at the call to SBDSVDX). Calling to SLASCL at the end causes overflow.
Workaround: check if S is changed before calling to SLASCL at the end. This may not be real fix.
A declarative, efficient, and flexible JavaScript library for building user interfaces.
🖖 Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.
TypeScript is a superset of JavaScript that compiles to clean JavaScript output.
An Open Source Machine Learning Framework for Everyone
The Web framework for perfectionists with deadlines.
A PHP framework for web artisans
Bring data to life with SVG, Canvas and HTML. 📊📈🎉
JavaScript (JS) is a lightweight interpreted programming language with first-class functions.
Some thing interesting about web. New door for the world.
A server is a program made to process requests and deliver data to clients.
Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.
Some thing interesting about visualization, use data art
Some thing interesting about game, make everyone happy.
We are working to build community through open source technology. NB: members must have two-factor auth.
Open source projects and samples from Microsoft.
Google ❤️ Open Source for everyone.
Alibaba Open Source for everyone
Data-Driven Documents codes.
China tencent open source team.