Git Product home page Git Product logo

gpufort's Introduction

GPUFORT

This project develops a source to source translation tool that is able to convert:

  1. Fortran+OpenACC and CUDA Fortran -> Fortran + OpenMP 4.5+
  2. Fortran+OpenACC and CUDA Fortran -> Fortran + [GCC/AOMP OpenACC/MP runtime calls] + HIP C++

The result of the first translation process, can be compiled with AOMP, which has a Fortran frontend. The result of the second translation process can be compiled with hipfort or a combination of hipcc and gfortran. Note that a OpenACC runtime is only necessary for translating OpenACC code.

An overview of the different translation paths that we work on is shown below:

Image

NOTE: GPUFORT is a research project. We made it publicly available because we believe that it might be helpful for some. We want to stress that the code translation and code generation outputs produced by GPUFORT will in most cases require manual reviewing and fixing.

Installation and usage

Please take a look at the (slightly outdated) user guide.

Implementation details

This presentation gives an overview of GPUFORT's building blocks.

Limitations

  • GPUFORT is not a compiler (yet)

GPUFORT is not intended to be a compiler. It's main purpose is to be a translator that allows an experienced user to fix and tune the outcomes of the translation process. However, we believe GPUFORT can develop into an early-outlining compiler if enough effort is put into the project. Given that all code and especially the grammar is written in python3, GPUFORT can be developed at a quick pace.

  • GPUFORT assumes syntactically and functionally correct input

GPUFORT does only perform a small number of syntax checks as we assume that developers apply GPUFORT to code that can be run correctly on CUDA devices. (We plan to add the option to prescribe a user-specified syntax checker tool.)

  • GPUFORT does a bad job in analyzing what code parts can be offloaded and which ones not
  • GPUFORT does a bad job in reorganizing loops and assignments in order to maximize the available parallelism

While both would be possible as the translator works with a tree structure, we simply have not started to implement much in this direction yet.

  • GPUFORT does not implement the full OpenACC standard (yet)

GPUFORT was developed to translate a number of HPC apps to code formats that are well supported by AMD's ROCm ecosystem. The development of GPUFORT is steered by the requirements of these applications.

Fortran-C Interoperablity Limitations

To interface generated HIP C++ kernels with the original Fortran code, GPUFORT relies on the iso_c_binding interoperability mechanisms that were added to the Fortran language with the Fortran 2003 standard. Please be aware that the interoperability of C structs and Fortran derived types is quite limited till this date:

  • "Derived types with the C binding attribute shall not have the sequence attribute, type parameters, the extends attribute, nor type-bound procedures."
  • "Every component must be of interoperable type and kind and may not have the pointer or allocatable attribute. The names of the components are irrelevant for interoperability."

(Source: https://gcc.gnu.org/onlinedocs/gfortran/Derived-Types-and-struct.html)

We are currently investigating what workarounds could be automatically applied. Until then, you have to modify your code manually to circumvent the above limitations.

Currently supported features:

  • ACC:
    • ACC2OMP & ACC2HIP
    • Translation of data directives: !$acc enter data, !$acc exit data, !$acc data
    • Synchronization directives: !$acc wait, !$acc update self/host/device
    • Kernel and loop constructs !$acc kernels plus !$acc loop in subsequent line, !$acc kernels loop, !$acc parallel plus !$acc loop in subsequent line, !$acc parallel loop, !$acc loop
    • Support for !$acc routine seq functions with scalar arguments
  • CUF:
    • CUF2HIP
      • Majority of CUDA libary functionality via HIPFORT
      • Kernel and loop constructs: !$cuf kernel do
      • Overloaded intrinsics: allocate, allocated, deallocate, deallocated, =
      • Support for CUDA Fortran attributes(global) (array and scalar arguments), and attributes(host,device), attributes(device) procedures (only scalar arguments supported for the latter)

(List is not complete ...)

Planned features (or: "more limitations")

  • Current work focuses on:
    • ACC:
      • Initial support for !$acc declare (detected but not considered in codegen yet)
      • Improve support for!$acc parallel (loop)
      • Add support for !$acc parallel without !$acc loop in next line)
        • Results in gang parallelism
      • Add support for !$acc kernels without !$acc loop in next line)
        • Auto detection of offloadable code parts
      • Rewrite of GPUFORT Fortran runtime in (HIP) C++
    • ACC/CUF:
      • Support of derived types with allocatable, pointer members
  • Planned:
    • Add option for prescribing syntax checker (e.g. use other compiler for syntax checks.)

gpufort's People

Contributors

domcharrier avatar reger-men avatar

Stargazers

 avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar

Watchers

 avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar

gpufort's Issues

Allow to disable output prettification and debug code generation

  • Add options that prevent generation of debug code into the HIP C++ kernels
  • Add options to disable prettification of generated HIP C++ code and formatted
    Fortran source

Background:

  • Some applications might want to use GPUFORT as a preprocessing step in an automated
    compilation process. Output formatting is not needed in this case.

Parsing error with comment after line continuation character

The following code will generate a parsing error:

program test1 integer, parameter :: ims = 12,ime=16,jms=1,jme=10 REAL, DIMENSION(ims:ime, jms:jme), INTENT(OUT) :: & ! Does not like a comment here HEAT2D ! What about here end program test1

[CUF] Reenable global/device subroutines

  • Translation of CUDA Fortran global subroutines is currently disabled

  • Planned code generation:

Procedure Type Qualifier Fortran launcher call Fortran launcher interface HIP C++ launcher function HIP C++ device function HIP C++ global function
Function/Subroutine host / None          
Function/Subroutine device / host,device       Y  
Subroutine global Y Y Y   Y

Update(05/26/2021):

An initial implementation was pushed to the main branch and an example vector-add-kernel
was added as demonstator. A current limitation is that kernels can only call device and host-device functions
with scalar parameters.

How to handle device functions?

  • Associate generated HIP C++ files with modules / programs to allow inclusion of device routines or ...
  • Inline everything straightforwardly?

The latter option seems to be the preferred one as it would also allow
to handle gang/worker/vector accelerator routines. Would require indexer to record whole function definitions.

Allow to apply macros

  • as a first step, we will allow the user to predefine a macro
  • as second step, we will discover macros directly in the source code

Reenable CUDA Fortran

  • GPUFORT has seen dramatic changes in the past to enable OpenACC
  • We need to reassess the CUDA Fortran porting capabilities of the code
  • We further need to adopt the new HIPFORT Fortran interfaces

Add "convert only compute constructs" option (and the alternative)

  • Allows to convert only compute constructs and manually tune them while keeping all the other
    directives as directives in the code.
  • The other directives can then be converted automatically by a preprocessor when
    compiling (add a separate option that excludes compute constructs).
  • This would increase the maintainability of GPUFORT-translated applications.

No support for derived type extensions

Issue:

Fortran derived types can be based on other types.
GPUFORT does not support this yet.

Background/Example:

type mytype1
   real :: a
end type
type, extend(mytype1) :: mytype2
   real :: b
end type

mytype2 inherited member field a from mytype1.

Assessment:

Issue has low priority as we did not see such code yet in any HPC
application that we tried to port.

Be careful about datatype of number constants

  • Double/custom precision numbers have a suffix that indicates the desired number format.
  • Numbers without a suffix need to be translated to the default real (=float if not redefined via compiler option).

[ACC] Handle !$acc declare

!$acc declare is not handled yet.

  • "A declare directive is used in the declaration section of a Fortran subroutine,
    function, or module."

  • "In a Fortran module declaration section, only create, copyin, device_resident, and
    2084 link clauses are allowed."

For more details on the directive, see "2.13. Declare Directive" in the specification

Implementation approach:

  • Host functions & subroutines:

    • Track function/subroutine begin and declarations in scanner tree & indexer
    • Consider !$acc declare in indexer
    • Track return statements in scanner tree as child node of function/procedure (begin)
    • Track function/subroutine end in scanner tree as child node of function/procedure (begin)
    • Emit unstructured enter data routine call or equivalent shortcut after declaration section of function/subroutine
    • Emit unstructured exit data routine call or equivalent shortcut before function/subroutine end or return statement
  • Modules / Program

    • Track module and declarations in scanner tree & indexer
    • Consider !$acc declare in indexer
    • Assumed-size arrays
      * [x] Track allocate statements in in scanner tree
      * [ ] Emit unstructured enter data routine call or equivalent shortcut after allocate
    • Explicit-size arrays
      • ??? - not totally sure where to put the allocation
        • if no save keyword:
          • body of function/subroutine/program that uses the module
        • else if save keyword and not already present in global scope (scope 0):
          • body of function/subroutine/program that uses the module
      • Emit unstructured enter data routine call or equivalent shortcut where appropriate

Accelerator routines use "acc-declared" variables

Some thoughts:

  • Device routines can use variables that are mapped to the device via acc declare
    statements.
  • When we generate HIP code, we need to pass the respective device pointers as kernel parameters.
  • Hence, we would need to add a use statement with the respective module to the caller's (assuming loopnest) parent so that it is available in this scope. In case of gang, worker, vector accelerator routines, this must be done recursively.

Logging folder/multi-user issues

  • Cannot specify new log directory via config when default location is inaccessible
  • Conflicts when multiple users specify the same log file

Line continuation character not always recognized

When a Fortran line continuation character (&) is included in either an expression or an !$acc directive the parser fails to recognize the source code. See the example code snippet below

     ```

!$acc data copyin(a,b) copyout(c_gpu)
!$acc parallel loop collapse(2) &
!$acc reduction(+:tmp)
do j=1,colsB
do i=1,rowsA
tmp = 0.0
!$acc loop vector reduction(+:tmp)
do k=1,rowsB
tmp = tmp &
+ a(i,k) * b(k,j)
enddo
c_gpu(i,j) = tmp
enddo
enddo
!$acc end parallel
!$acc end data

Single-line if statements might need to be converted to if-then construct

Background information

Single-line if statements need to be unrolled when the translated body
part will contain multiple statements that are subject to the condition part.

Steps to reproduce / example

We have the following statement that contains two CUDA Fortran device variables x_d, y_d.

if ( <condition> ) deallocate(x_d,y_d)

Expected outcome:

This should be converted to HIP as follows:

if ( <condition> ) then
  call hipCheck(hipFree(x_d))
  call hipCheck(hipFree(y_d))
else

Actual outcome

if ( <condition> )   call hipCheck(hipFree(x_d))          ! NOTE: end of single-line if statement
call hipCheck(hipFree(y_d))                                        ! NOTE: next statement

The second statement is not subject to the condition anymore, which is a bug.

[CUF] Support fixed size arrays completely

GPUFORT can translate the following expressions as
they do not imply that an additional allocation and deallocation
must be generated at the begin and end of the scope.

real, device, allocatable :: x_d(:), y_d(:)

For the same reason, the translation of fixed-size array declarations is not
completely supported.

Hence,

 real, device :: x_d(N), y_d(N)

will simply be translated to

real, device, dimension(:) :: x_d
real, device, dimension(:) :: y_d

Currently, no corresponding hipMalloc and hipFree calls are generated into the code.

Please enable two factor authentication in your github account

@gmarkomanolis;@jungwonkim;@sael9740;@sael9740;@gjost-git

We are going to enforce two factor authentication in (https://github.com/ROCmSoftwarePlatform/) organization on 29th April, 2022 .
Since we identified you as outside collaborator for ROCmSoftwarePlatform organization, you need to enable two factor authentication in your github account else you shall be removed from the organization after the enforcement.
Please skip if already done.

To set up two factor authentication, please go through the steps in below link:

https://docs.github.com/en/free-pro-team@latest/github/authenticating-to-github/configuring-two-factor-authentication

Please email "[email protected]" for queries

Improve translator robustness

Current parser is too general and does not take into account that Fortran
statements such as end module mymodule or end do mylabel must be either
terminated with ; or a linebreak.

Examples (wanted behaviour):

Program Hello ! valid
Program Hello; ! valid
Program; Hello ! invalid
Program Hello; Print *, "Hello World" ! valid
Program Hello Print *, "Hello World" ! invalid
Program Hello
   Print *, "Hello World" ! valid

CUF/ACC: Translate `size(<array>[,<dim>])` and `lbound(...)` and `ubound(...)`

  • In extracted kernels / device subroutines, we sometimes need to translate size(<array>[,<dim>]), lbound(<array>,<dim>)
    and ubound(<array>,<dim>) intrinsic calls,

Implementation status for different types of <array> and <dim>:

  • <array> is identifier and <dim> is integer literal:
    • size(<array>,<dim>) -> <array>_n<dim>
    • lbound(<array>,<dim>) -> <array>_lb<dim>
    • ubound(<array>,<dim>) -> <array>_lb<dim> + <array>_n<dim>

NOTE: Above, the <array>_lb<dim> and <array>_n<dim> are already arguments of
the extracted routines.

  • <array> is identifier and <dim> is identifier | arithmetic expression

    • In this case, we need to generate a query function that is parameterized by array rank.
  • Other cases where <array> is not an identifier and / or <dim> is identifier are not supported

Indexing error in split_fortran_line

After the most recent update that addressed issue #34 a large number of index errors are now occurring in the spit_fortran_line python function. I have attached a Fortran code that reproduces this error.

Thanks,
John

writeout.txt

Recommend Projects

  • React photo React

    A declarative, efficient, and flexible JavaScript library for building user interfaces.

  • Vue.js photo Vue.js

    🖖 Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.

  • Typescript photo Typescript

    TypeScript is a superset of JavaScript that compiles to clean JavaScript output.

  • TensorFlow photo TensorFlow

    An Open Source Machine Learning Framework for Everyone

  • Django photo Django

    The Web framework for perfectionists with deadlines.

  • D3 photo D3

    Bring data to life with SVG, Canvas and HTML. 📊📈🎉

Recommend Topics

  • javascript

    JavaScript (JS) is a lightweight interpreted programming language with first-class functions.

  • web

    Some thing interesting about web. New door for the world.

  • server

    A server is a program made to process requests and deliver data to clients.

  • Machine learning

    Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.

  • Game

    Some thing interesting about game, make everyone happy.

Recommend Org

  • Facebook photo Facebook

    We are working to build community through open source technology. NB: members must have two-factor auth.

  • Microsoft photo Microsoft

    Open source projects and samples from Microsoft.

  • Google photo Google

    Google ❤️ Open Source for everyone.

  • D3 photo D3

    Data-Driven Documents codes.