Git Product home page Git Product logo

proleap-cobol-parser's Introduction

ProLeap ANTLR4-based parser for COBOL

This is a COBOL parser based on an ANTLR4 grammar, which generates an Abstract Syntax Tree (AST) and Abstract Semantic Graph (ASG) for COBOL code. The AST represents plain COBOL source code in a syntax tree structure. The ASG is generated from the AST by semantic analysis and provides data and control flow information (e. g. variable access). EXEC SQL, EXEC SQLIMS and EXEC CICS statements are extracted as texts.

The parser is developed test-driven, passes the NIST test suite and has successfully been applied to numerous COBOL files from banking and insurance. It is used by the ProLeap analyzer, interpreter & transformer for COBOL.

๐Ÿ’ซ Star if you like our work.

License: MIT ProLeap on Twitter

Example

Input: COBOL code

 Identification Division.
 Program-ID.
  HELLOWORLD.
 Procedure Division.
  Display "Hello world".
  STOP RUN.

Output: Abstract Syntax Tree (AST)

(startRule
  (compilationUnit
    (programUnit
      (identificationDivision Identification Division .
        (programIdParagraph Program-ID .
          (programName
            (cobolWord HELLOWORLD)) .))
      (procedureDivision Procedure Division .
        (procedureDivisionBody
          (paragraphs
            (sentence
              (statement
                (displayStatement Display
                  (displayOperand
                    (literal "Hello world")))) .)
            (sentence
              (statement
                (stopStatement STOP RUN))) .)))))) <EOF>)

Getting started

To include the parser in your Maven project build it and add the dependency:

<dependency>
  <groupId>io.github.uwol</groupId>
  <artifactId>proleap-cobol-parser</artifactId>
  <version>4.0.0</version>
</dependency>

Use the following code as a starting point for developing own code.

Simple: Generate an Abstract Semantic Graph (ASG) from COBOL code

// generate ASG from plain COBOL code
java.io.File inputFile = new java.io.File("src/test/resources/io/proleap/cobol/asg/HelloWorld.cbl");
io.proleap.cobol.preprocessor.CobolPreprocessor.CobolSourceFormatEnum format = io.proleap.cobol.preprocessor.CobolPreprocessor.CobolSourceFormatEnum.TANDEM;
io.proleap.cobol.asg.metamodel.Program program = new io.proleap.cobol.asg.runner.impl.CobolParserRunnerImpl().analyzeFile(inputFile, format);

// navigate on ASG
io.proleap.cobol.asg.metamodel.CompilationUnit compilationUnit = program.getCompilationUnit("HelloWorld");
io.proleap.cobol.asg.metamodel.ProgramUnit programUnit = compilationUnit.getProgramUnit();
io.proleap.cobol.asg.metamodel.data.DataDivision dataDivision = programUnit.getDataDivision();
io.proleap.cobol.asg.metamodel.data.datadescription.DataDescriptionEntry dataDescriptionEntry = dataDivision.getWorkingStorageSection().getDataDescriptionEntry("ITEMS");
Integer levelNumber = dataDescriptionEntry.getLevelNumber();

Complex: Generate an Abstract Semantic Graph (ASG) and traverse the Abstract Syntax Tree (AST)

// generate ASG from plain COBOL code
java.io.File inputFile = new java.io.File("src/test/resources/io/proleap/cobol/asg/HelloWorld.cbl");
io.proleap.cobol.preprocessor.CobolPreprocessor.CobolSourceFormatEnum format = io.proleap.cobol.preprocessor.CobolPreprocessor.CobolSourceFormatEnum.TANDEM;
io.proleap.cobol.asg.metamodel.Program program = new io.proleap.cobol.asg.runner.impl.CobolParserRunnerImpl().analyzeFile(inputFile, format);

// traverse the AST
io.proleap.cobol.CobolBaseVisitor<Boolean> visitor = new io.proleap.cobol.CobolBaseVisitor<Boolean>() {
  @Override
  public Boolean visitDataDescriptionEntryFormat1(final io.proleap.cobol.CobolParser.DataDescriptionEntryFormat1Context ctx) {
    io.proleap.cobol.asg.metamodel.data.datadescription.DataDescriptionEntry entry = (io.proleap.cobol.asg.metamodel.data.datadescription.DataDescriptionEntry) program.getASGElementRegistry().getASGElement(ctx);
    String name = entry.getName();

    return visitChildren(ctx);
  }
};

for (final io.proleap.cobol.asg.metamodel.CompilationUnit compilationUnit : program.getCompilationUnits()) {
  visitor.visit(compilationUnit.getCtx());
}

Where to look next

How to cite

Please cite ProLeap COBOL parser in your publications, if it helps your research. Here is an example BibTeX entry:

@misc{wolffgang2018cobol,
  title={ProLeap COBOL parser},
  author={Wolffgang, Ulrich and others},
  year={2018},
  howpublished={\url{https://github.com/uwol/proleap-cobol-parser}},
}

Features

  • EXEC SQL statements, EXEC SQLIMS statements and EXEC CICS statements are extracted by the preprocessor and provided as texts in the ASG.
  • Passes the NIST test suite.
  • Rigorous test-driven development.
  • To be used in conjunction with the provided preprocessor, which executes COPY, REPLACE, CBL and PROCESS statements.

Build process

The build process is based on Maven (version 3 or higher). Building requires a JDK 17 and generates a Maven JAR, which can be used in other Maven projects as a dependency.

  • Clone or download the repository.
  • In Eclipse import the directory as a an existing Maven project.
  • To build, run:
$ mvn clean package
  • The test suite executes AST and ASG tests against COBOL test code and NIST test files. NIST test files come from Koopa repo. Unit tests and parse tree files were generated by class io.proleap.cobol.TestGenerator from COBOL test files. The generator derives the COBOL line format from the containing folder name.
  • You should see output like this:
[INFO] Scanning for projects...
...
-------------------------------------------------------
 T E S T S
-------------------------------------------------------
Running io.proleap.cobol.ast.fixed.FixedTest
Preprocessing file Fixed.cbl.
Parsing file Fixed.cbl.
Comparing parse tree with file Fixed.cbl.tree.
Tests run: 1, Failures: 0, Errors: 0, Skipped: 0, Time elapsed: 6.202 sec
Running io.proleap.cobol.ast.fixed.QuotesInCommentEntryTest
...
Results :

Tests run: 680, Failures: 0, Errors: 0, Skipped: 0

[INFO] ------------------------------------------------------------------------
[INFO] BUILD SUCCESS
[INFO] ------------------------------------------------------------------------
  • To install the JAR in your local Maven repository:
$ mvn clean install
  • To only run the tests:
$ mvn clean test

Release process

License

Licensed under the MIT License. See LICENSE for details.

proleap-cobol-parser's People

Contributors

gitmensch avatar httpdigest avatar sebdei avatar slesa avatar stawi avatar uwol avatar

Stargazers

 avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar

Watchers

 avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar

proleap-cobol-parser's Issues

Unstringing indexed items not recognized correctly

Unstringing an indexed item - like t-wert (x-1) is not parsed correctly.
Using a simple item - like z-wert in out example - works

005270*        UNSTRING Z-WERT                                          Y2612893
005270         UNSTRING T-WERT  (X-1)                                   Y2612893
005280                 DELIMITED BY ALL '='                             Y2612893
005290                           OR ALL '  '                            Y2612893
005300                 INTO         Z-WERT1                             Y2612893
005310                              Z-WERT2                             Y2612893
005320         END-UNSTRING                                             Y2612893

parsing errors are following. The Error starts with the left parenthesis of the index ...
UnstringSample.cbl.txt

line 89:32 mismatched input '(' expecting INTO
line 89:36 mismatched input ')' expecting SECTION
line 93:36 mismatched input 'Z-WERT2' expecting SECTION
line 94:15 mismatched input 'END-UNSTRING' expecting SECTION
line 96:29 mismatched input '=' expecting SECTION
line 97:43 mismatched input 'TO' expecting SECTION
line 98:15 mismatched input 'WHEN' expecting SECTION
line 98:29 mismatched input '=' expecting SECTION
line 99:43 mismatched input 'TO' expecting SECTION
line 100:15 mismatched input 'WHEN' expecting SECTION
line 100:29 mismatched input '=' expecting SECTION
line 101:43 mismatched input 'TO' expecting SECTION
line 102:15 mismatched input 'WHEN' expecting SECTION
line 102:29 mismatched input '=' expecting SECTION
line 103:43 mismatched input 'TO' expecting SECTION
line 104:15 mismatched input 'WHEN' expecting SECTION
line 104:29 mismatched input '=' expecting SECTION
line 105:43 mismatched input 'TO' expecting SECTION
line 106:15 mismatched input 'WHEN' expecting SECTION
line 106:29 mismatched input '=' expecting SECTION
line 107:43 mismatched input 'TO' expecting SECTION
line 108:15 mismatched input 'WHEN' expecting SECTION
line 108:29 mismatched input '=' expecting SECTION
line 109:43 mismatched input 'TO' expecting SECTION
line 110:15 mismatched input 'WHEN' expecting SECTION
line 110:29 mismatched input '=' expecting SECTION
line 111:43 mismatched input 'TO' expecting SECTION
line 112:15 mismatched input 'WHEN' expecting SECTION

Inline comments not recognized ( *> comments)

Inline-comments in Cobol can be used since Cobol-2002 (extending the ANSI-85 norm)
*> starts a cobol-comment till end-of-line, the same like // for Java.

Parsing leads to error:
line 21:30 mismatched input '*' expecting
and line 21 is the one withe the *> ANSI End-of-Line Comment .

Note: I am aware of the Parser being named 85 ;)

 01  Student.
   03  nachname    pic x(20).
   03  vorname     pic x(20).
   03  geschlecht  pic x(1).  *> ANSI End-of-Line Comment 
     88  mann      values 'M' 'm'. 
	 88  frau      value 'F', 'f', 'W', 'w'.
	 88  egal      values 'a' thru 'z',
	      'A' thru 'Z'.

Call to unknow data element(s - maybe due to redefine?)

Not sure what the error is about - maybe thefiller redefinesis not recognized?
Best refer to the attached sample where in particular the declaration of AUSGABE has been shortened. The original contains approx 15 redefines - because of "periodic groups in Adabas".
ISSUE13.CBL.txt

Preprocessing file ISSUE13.CBL.
Parsing file ISSUE13.CBL.
Collecting units in file ISSUE13.CBL.
Analyzing program units of compilation unit ISSUE13.
Analyzing identification divisions of compilation unit ISSUE13.
Analyzing environment divisions of compilation unit ISSUE13.
Analyzing data divisions of compilation unit ISSUE13.
Analyzing procedure divisions of compilation unit ISSUE13.
Analyzing statements of compilation unit ISSUE13.
call to unknown data element ausgabe
call to unknown data element T11-DSTINV0DAT12A
call to unknown data element T11-NAMEDSTINV0DAT12A
call to unknown data element T11-ZAKAINV0DAT12A
call to unknown data element T11-BHWNRINV0DAT12A
call to unknown data element T11-SCHLAOKINV0DAT12A
call to unknown data element T11-BDSTINV0DAT12A
call to unknown data element T11-DEBEKOINV0DAT12A
call to unknown data element T11-ASORTSTINV0DAT12A(2)
call to unknown data element T11-ZLISTENINV0DAT12A(2)
call to unknown data element T11-ORTINV0DAT12A
call to unknown data element T11-STRASSEINV0DAT12A
call to unknown data element T11-ZUFEMINV0DAT12A
call to unknown data element T11-WEDSTINV0DAT12A(1)
call to unknown data element T11-WEDSTINV0DAT12A(2)
call to unknown data element T11-WEDSTINV0DAT12A(3)
call to unknown data element T11-WEDSTINV0DAT12A(4)
call to unknown data element T11-WEDSTINV0DAT12A(5)
call to unknown data element T11-BENKZINV0DAT12A
call to unknown data element T11-ZAKANRINV0DAT12A
call to unknown data element T11-DFUINV0DAT12A
call to unknown data element T11-ANGEWENDETINV0DAT12A(1)
call to unknown data element T11-ANGEWENDETINV0DAT12A(2)
call to unknown data element T11-ANGEWENDETINV0DAT12A(3)
call to unknown data element T11-ANGEWENDETINV0DAT12A(4)
call to unknown data element T11-ANGEWENDETINV0DAT12A(5)
call to unknown data element T11-ANGEWENDETINV0DAT12A(6)
call to unknown data element T11-ANGEWENDETINV0DAT12A(7)
call to unknown data element T11-ABRARTINV0DAT12A
call to unknown data element T11-LIWERTEINV0DAT12A(1)
call to unknown data element T11-LINAMEINV0DAT12A(1)
call to unknown data element T11-LINAMEINV0DAT12A(2)

Parser errors should throw an exception

Parser errors that compromise subsequent parsing should throw an exception.

For example, when the parser logs the error
line 8:17 no viable alternative at input 'x <>'
for a condition with unrecognized operator x <> 5, the visitor finds non-existent sections at 8:15 and 8:20.

Too rigid interpretation of FIXED line format

In parsing an original (host) COPY-Book, I run into the following error:

Exception in thread "main" java.lang.RuntimeException: Is FIXED the correct line format? Could not parse line 80: 007200
	at io.proleap.cobol.preprocessor.sub.line.reader.impl.CobolLineReaderImpl.parseLine(CobolLineReaderImpl.java:64)

A close look at that line 80 inside the copy book shows, that in fact this line 80 is a more or less empty line, having entries in col1-6 only, then being followed by CR/LF ... so not padded up to 80 characters till column 80.

007100        10 P321-FILLER       PIC X(36).                           26.10.94
007200
007300     05 FILLER REDEFINES P321-LISTINFO.                           26.10.94

in changing the CPY-book and padding it up to 80 characters (or of course deleting the empty line), we get past the error.
The parser should recognize such "empty lines" without throwing an error; maybe a warning, but not stopping the parsing process!

Specifying different codepages for Cobolfiles in CobolPreprocessorImpl

The current implementation of the CobolPreprocessor seems to allow default (utf-8) character encoding of the cobol-files, only. Typically the cobol sources files are SingleByteCharSets only, like ebcdic (ibm-1441) or iso-8859-1(5), win-1252 only and would thus need a code page conversion before running through the parser.

I would appreciate the chance of parameterizing the codepage for Cobolsources - thus being able to use an additional parameter for the Charset in the InputStreamReader ....

public InputStreamReader(InputStream in,
                 **Charset cs)**

referring to your method:

	public String process(final File cobolFile, final List<File> copyFiles, final CobolSourceFormatEnum format,
			final CobolDialect dialect) throws IOException {
		LOG.info("Preprocessing file {}.", cobolFile.getName());

		final InputStream inputStream = new FileInputStream(cobolFile);
		final InputStreamReader inputStreamReader = new InputStreamReader(inputStream);
		final BufferedReader bufferedInputStreamReader = new BufferedReader(inputStreamReader);
		final StringBuffer outputBuffer = new StringBuffer();

Changing only the Codepage of the cobolsources, say from iso-8859 to utf-8 will lead to processing Errors sooner or later. Just think of
01 Name Pic x(30) value "Gรผnter Mรถrgรคn" and statements like if Name(2:1) = "รผ" etc etc ... These will only work in SBCS and not in DBCS.
on the other hand, not converting the Codepage and thus letting the Parser Interpret the characters as utf-8 will lead to these well known grotesque and misinterpreted characters ...

Counterintuitive names for DIVIDE sub-statement

Sub-statements of divide (and corresponding parser methods) have counterintuitive names: divideIntoStatement contains a GIVING keyword, while divideIntoGivingStatement does not contain it.

Errors in parsing CALL statement

All errors refer to the last line

            CALL "C$TOUPPER"
               USING TEXT-VALUE-2
               BY VALUE
               LENGTH 1.

Error: line 128:22 missing OF at '1'

           CALL "C$JUSTIFY"
               USING TEXT-VALUE-2
               "C".

Error: line 131:15 no viable alternative at input 'TEXT-VALUE-2"C"'

           CALL "C$TOUPPER"
               USING TO-UPPER-CASE
               BY VALUE
               LENGTH TO-UPPER-CASE.

Error: line 144:22 missing OF at 'TO-UPPER-CASE'

Parser runs into no viable alternative exception - maybe nested performs with indexed variables ?

Parsing the attached file leads to the following error

Cobolfile: a2600215.CBL threw exception: {}
java.lang.RuntimeException: syntax error in line 339:71 no viable alternative at input 'DISPLAY '------------------------------------------'-'
	at io.proleap.cobol.asg.runner.ThrowingErrorListener.syntaxError(ThrowingErrorListener.java:20)
	at org.antlr.v4.runtime.ProxyErrorListener.syntaxError(ProxyErrorListener.java:41)
	at org.antlr.v4.runtime.Parser.notifyErrorListeners(Parser.java:544)
	at org.antlr.v4.runtime.DefaultErrorStrategy.reportNoViableAlternative(DefaultErrorStrategy.java:282)
	at org.antlr.v4.runtime.DefaultErrorStrategy.reportError(DefaultErrorStrategy.java:121)
	at io.proleap.cobol.Cobol85Parser.ifThen(Cobol85Parser.java:31532)
	at io.proleap.cobol.Cobol85Parser.ifStatement(Cobol85Parser.java:31335)
	at io.proleap.cobol.Cobol85Parser.statement(Cobol85Parser.java:24801)
	at io.proleap.cobol.Cobol85Parser.performInlineStatement(Cobol85Parser.java:35565)
	at io.proleap.cobol.Cobol85Parser.performStatement(Cobol85Parser.java:35488)
	at io.proleap.cobol.Cobol85Parser.statement(Cobol85Parser.java:24857)
	at io.proleap.cobol.Cobol85Parser.sentence(Cobol85Parser.java:24440)
	at io.proleap.cobol.Cobol85Parser.paragraph(Cobol85Parser.java:24376)
	at io.proleap.cobol.Cobol85Parser.paragraphs(Cobol85Parser.java:24293)
	at io.proleap.cobol.Cobol85Parser.procedureDivisionBody(Cobol85Parser.java:24151)
	at io.proleap.cobol.Cobol85Parser.procedureDivision(Cobol85Parser.java:23223)
	at io.proleap.cobol.Cobol85Parser.programUnit(Cobol85Parser.java:880)
	at io.proleap.cobol.Cobol85Parser.compilationUnit(Cobol85Parser.java:783)
	at io.proleap.cobol.Cobol85Parser.startRule(Cobol85Parser.java:727)
	at io.proleap.cobol.asg.runner.impl.CobolParserRunnerImpl.parseFile(CobolParserRunnerImpl.java:190)
	at io.proleap.cobol.asg.runner.impl.CobolParserRunnerImpl.analyzeFile(CobolParserRunnerImpl.java:94)
	at de.dvzmv.fabea.infrastruktur.csi.parser.SingleCobolParser.parseFile(SingleCobolParser.java:56)
	at de.dvzmv.fabea.infrastruktur.csi.parser.SingleCobolParser.main(SingleCobolParser.java:140)
[a2600215.CBL.txt](https://github.com/uwol/cobol85parser/files/1575474/a2600215.CBL.txt)

Parser returns free-format only regardless of the fixed-format input

We experience a nasty side effect when parsing a File with the following statement:
replacedString = new CobolPreprocessorImpl().process(inputFile, copyBookFiles, COBOL_FIXED_FORMAT);
The replacedString is an "optimized" Cobol-String which can rather be used for free-cobol Format only, at least a Cobol which allows the processing of lines with more than 80 (or rather 72) columns.
When using the Cobol85Parser for some Cobol-Optimisations (for example goto-elimination algorithms/improvements) and then sending these optimised Cobol files back to the ibm-host, this IS a problem.

Just have a look at "line 001800 and 001810", where even the continuation sign in column 7 ist used since the line had been too long.

001770 01  ZEILE3B.                                                     abc11S05
001780     05  Z3B-VS              PIC X       VALUE ' '.               abc11S05
001790     05  FILLER              PIC X(11)   VALUE SPACE.             abc11S05
001800     05  FILLER              PIC X(50)   VALUE 'fรผr ausgeschiedeneabc11S05
001810-                                        ' Besoldungsfรคlle'.      abc11S05
001820     05  FILLER              PIC X(71)   VALUE SPACE.             abc11S05
001830     05  Z3B-FIB             PIC X       VALUE '2'.               abc11S05

We are fully aware of the underlying parsing reasons and agree on a much better readability etc etc.
the only "BUT/HOWEVER" is: the resulting replacement string exceeds the 80column limit and will be rejected on the host.
Second sideeffect, which is an additional matter or discussion:
these (in)famous columns 1-6 and 73-80 very often are used for taggings or even rudimentary version-control (who applied these changes/when)
This information also gets lost.

       01  ZEILE3B.
           05  Z3B-VS              PIC X       VALUE ' '.
           05  FILLER              PIC X(11)   VALUE SPACE.
           05  FILLER              PIC X(50)   VALUE 'fรผr ausgeschiedene Besoldungsfรคlle'.
           05  FILLER              PIC X(71)   VALUE SPACE.
           05  Z3B-FIB             PIC X       VALUE '2'.

Any workaround to be suggested other than reformatting the returned String into the fixed-format corsett ?

Embedding an external preprocessor - like EXEC ADABAS ...

Noone expects you to implement a preprocessor for Adabas for example, however there are a lot of sources around where calls to be resolved by differente preprocessors are written into the cobol sources
following a pattern of for example:
EXEC ADABAS <adabas-statement> END-EXEC.

(sure there is a similarity to exec sql <sql-statements> end-exec.

  • For the time being it would be nice to have a parser feature implemented that would skip/ignore input between a certain start-pattern and a certain end-pattern: In out case this would be "EXEC ADABAS" as a start and END-EXEC (with a dot) as the end-pattern. Sooner or later, however, this approach will lead to missing declarations in the source files. Just think of "exec adabas copy file=myfile ... end-exec" to act like a copy myfile ... and the copybook not being resolved.
  • Alternatively, but much more complicated for the parser per se there could be a callback to the PreProcessor from SoftwareAg (manufacturer of Adabas), so that the exec adabas end-exec are resolved into a couple of Cobol-Statements, be they declaration or invocation of code. (the preprocessor of Adabas is more or less behaving similar to exec sql preprozessors - the biggest difference being the lack of documentation being available).
  • A third approach - easiest for your ;) - would be the implementation of an own preprocessor, which resolves all the exec adabas ... end-exec statements by calling the SAG-Preprocessor (an application running in "PREDICT") and

only this resolved cobol code is to be parsed by the cobol85parser.

Any suggestion/recommandation as of how to proceed with such sources?
Source snippets are like following:

001810 EXEC ADABAS COPY          FILE=x4711-DAT12A                      17.01.94
001820                           MEM=vd4711A                            17.01.94
001830                           END-EXEC.                              Y2600120

001890*EXEC ADABAS GENERATE      FILE=x4711-DAT12C                      15.08.96
001900*                          RECORD-BUFFER-NAME=vd4711C             15.08.96
001910*                          PREFIX=T11-                            15.08.96
001920*                          END-EXEC.                              15.08.96

005030    EXEC ADABAS FORMAT-BUFFER                                     10.05.10
005040                FILE=x4711-TAB12A                                 10.05.10
005050                FORMAT-BUFFER-NAME=FB-V0TAB12A                    10.05.10
005060                OFFSET=V                                          10.05.10
005070    END-EXEC.                                                     10.05.10

input COBOL file not parsing

for the input program

PROGRAM-ID. HELLO.

DATA DIVISION.
WORKING-STORAGE SECTION.
EXEC SQL
INCLUDE SQLCA
END-EXEC.

EXEC SQL
INCLUDE STUDENT
END-EXEC.

EXEC SQL BEGIN DECLARE SECTION
END-EXEC.
01 WS-STUDENT-REC.
05 WS-STUDENT-ID PIC 9(4).
05 WS-STUDENT-NAME PIC X(25).
05 WS-STUDENT-ADDRESS X(50).
EXEC SQL END DECLARE SECTION
END-EXEC.

PROCEDURE DIVISION.
EXEC SQL
SELECT STUDENT-ID, STUDENT-NAME, STUDENT-ADDRESS
INTO :WS-STUDENT-ID, :WS-STUDENT-NAME, WS-STUDENT-ADDRESS FROM STUDENT
WHERE STUDENT-ID=1004
END-EXEC.

IF SQLCODE=0
DISPLAY WS-STUDENT-RECORD
ELSE DISPLAY 'Error'
END-IF.
STOP RUN.

i am not getting the Procedure division under program unit.
i am getting only identification division , environment division and data division. Please help to resolve this. i have taken the latest cobol85.g4

Format of preprocessed Cobolfiles

Writing the preprocessed Cobolfiles to disk

			CobolPreprocessor.CobolSourceFormatEnum format = CobolPreprocessor.CobolSourceFormatEnum.FIXED;
			String preProcessedInput = new CobolPreprocessorImpl().process(inputFile, copyBookFiles,format);
			File preprocessedFile = new File(cobolFileName + "_preprocessed.cbl");
			FileUtils.writeStringToFile(preprocessedFile, preProcessedInput);

and checking the output, it's a bit unusal that the wrong sequence for an inline comment ist used:
>* instead of the ANSI-2005 *>
I would recommend to use the ANSI-inline comment format and additionally position the Asterisk ( * ) into the 7th column. Currently the '>' sign is in column 7 and '*' in column 8 ... thus a compilation of the preprocessed inputfile fails.

       DATE-WRITTEN. >*CE   06.1993.
      >*CE =================================================================
      >*CE     Dieses Programm liest die allgemeinen Vorlaufanweisungen,   *
      >*CE     die ein Programm benรถtigt.                                  *
...etc...

referring to inline comments of MF-documentation:
http://documentation.microfocus.com/help/index.jsp?topic=%2Fcom.microfocus.eclipse.infocenter.studee60ux%2FHRLHLHCPRO01U990.html

DATA RECORD in FD not recognized correctly

Not really sure, what the exact reason for the error is, but the DATA RECORD in the FD seems to be ignored or misinterpreted.
Also no reason found why "AUSGABE" ist not recognized as well.
File Control, File Section and FD are original ibm-cobol-host declarations ...


Parsing file ISSUE14.CBL.
Collecting units in file ISSUE14.CBL.
Analyzing program units of compilation unit ISSUE14.
call to unknown data element D111E-DATEI
Analyzing identification divisions of compilation unit ISSUE14.
Analyzing environment divisions of compilation unit ISSUE14.
[ISSUE14.CBL.txt](https://github.com/uwol/cobol85parser/files/1181299/ISSUE14.CBL.txt)

Analyzing data divisions of compilation unit ISSUE14.
Analyzing procedure divisions of compilation unit ISSUE14.
Analyzing statements of compilation unit ISSUE14.
call to unknown data element aus-vor
call to unknown data element aus-text
call to unknown data element aus-text
call to unknown data element ausgabe

(too) rigid check of program-id. xxxx (NO DOT)

Once again - same "erro" category like previous issue.
We do have several programs where the name of the program-id is not terminated with a DOT.
(Sure the grammar insists on a dot - but most compilers dont check/insist on a DOT to terminate the name ...)

thus the parser runs into the next error:

Parsing file AAA02.cbl.
line 5:7 mismatched input 'ENVIRONMENT' expecting {COMMON, DEFINITION, INITIAL, IS, LIBRARY, DOT_FS}
Collecting units in file AAA02.cbl.

Cobollines with the incorrect declaration.

000010 IDENTIFICATION DIVISION.                          
000050 PROGRAM-ID.       AAA02 
000060******************************************************************07.05.15
      * not DOT after the name of the program-id !!!

Suggestion: though the parser is acting correctly - reality shows that most of the declarations
like author or date-written or remarks etc (till input-output section) should be parsed with a rather error-forgiving algorithm.

tagging the expansion of copy-books (begin-copy-book/end-copy-book)

This is more an architectural issue and "no real error".
I am fully aware that the contents of any copybook must be included into the parsing algorithm to avoid undeclared variables etc etc.
Currently a call like
replacedString = new CobolPreprocessorImpl().process(inputFile, copyBookFiles, COBOL_FIXED_FORMAT);
returns a String where all the COPY-book statement are expanded as if they had been written right into the cobolcode.
As far as I have seen, there is no hint or tag to find out where the original copybook had started or ended.

for example:

004780 01  LA.                                                          Y26RZKGS
004790     05 LAS           PIC 9       VALUE 0.                        Y26RZKGS
004800     05 LADST         PIC 999.                                    Y26RZKGS
004810     05 LASVNR        PIC 9(8).                                   19.08.08
004820     05 LABDST        PIC 9(8).                                   Y26RZKGS
004830     05 LAPRJ         PIC 9(2).                                   Y26RZKGS
004840     05 LAD           PIC 9(2)    VALUE 00.                       Y26RZKGS
004850                                                                  Y26RZKGS
004860     COPY VOKAKGS.                                                08.10.96
004870                                                                  Y26RZKGS
004880*=================================================================14.09.07
004890 PROCEDURE DIVISION.                                              Y26RZKGS

leads to

       01  LA.
           05 LAS           PIC 9       VALUE 0.
           05 LADST         PIC 999.
           05 LASVNR        PIC 9(8).
           05 LABDST        PIC 9(8).
           05 LAPRJ         PIC 9(2).
           05 LAD           PIC 9(2)    VALUE 00.
       
      *> *****            V O R L A U F K A R T E                    *****
      *> *****            xxxxxGELDSTATISTIK                        *****
      *> *****************************************************************
       01  VOKAKGS.
      *> ***   STELLE 1    = VERARBEITUNGSART , N - Nxxxxxxxxx
      *> ***                                    S - SONDERxxxxxxxx
           05  VK-VER               PIC X.
      *> ***   STELLE 2-3  = MELDEMONAT
           05  VK-MELDM             PIC 99.
      *> ***   STELLE 4-5  = MELDEJAHR
           05  VK-MELDJ.
               10  VK-MELDJJ12          PIC 99.
               10  VK-MELDJJ34          PIC 99.
      *> ***   STELLE 8-9  = LIEFERUNGSNR.
           05  VK-LNR               PIC 99.
      *> ***             VOKA-REST
           05  FILLER               PIC X(71).       
      *> =================================================================
       PROCEDURE DIVISION.

We would highly appreciate a mechanism where the Start and end of the original COPYbook could be easily recognised.
This could be a naming convention in using a predefined comment like:

123456* START-of-copybook: <COPYBOOK-statement>
    <here is the content of the copybook>
123456* END-of-copybook: <COPYBOOK-statement>

Why/when would such a feature be needed?
If you use the parser for various code optimization issues (for example GO TO elimination algorithms or deat code removal etc.) this approach would result in a total elimination of the copybooks in the resulting new Cobol-sources ... which of course is out of question.
Using the start/end-tags for expanded copybooks in a way similar to my suggestion would be a rather straightforward approach to easily find the correct place where to re-insert the original copy-book-statement.
The tagged/commented copy-book-statement should be the original copy-book statement, covering all these replacing/by aspects and multi-line statements as well - so in effect it will be more than a "single line only"

RECORD IS VARYING IN SIZE FROM 1 TO 9992 CHARACTERS ...

Referring to attached source.
The record is varying clause not yet fully implemented ...
ISSUE27.CBL.txt

Preprocessing file ISSUE25.CBL.
Preprocessing file ISSUE25.CBL.
Parsing file ISSUE25.CBL.
Collecting units in file ISSUE25.CBL.
Analyzing program units of compilation unit ISSUE25.
call to unknown data element ein-STAT
call to unknown data element ein-STAT
call to unknown data element ein-STAT
call to unknown data element ein-STAT
call to unknown data element ein-STAT
call to unknown data element ein-STAT
call to unknown data element ein-STAT
call to unknown data element ein-STAT
Exception in thread "main" java.lang.NullPointerException
	at io.proleap.cobol.asg.metamodel.impl.ProgramUnitElementImpl.createIntegerLiteral(ProgramUnitElementImpl.java:816)
	at io.proleap.cobol.asg.metamodel.data.file.impl.FileDescriptionEntryImpl.addRecordContainsClause(FileDescriptionEntryImpl.java:347)
	at io.proleap.cobol.asg.metamodel.data.file.impl.FileSectionImpl.addFileDescriptionEntry(FileSectionImpl.java:82)
	at io.proleap.cobol.asg.metamodel.data.impl.DataDivisionImpl.addFileSection(DataDivisionImpl.java:144)
	at io.proleap.cobol.asg.metamodel.impl.ProgramUnitImpl.addDataDivision(ProgramUnitImpl.java:74)
	at io.proleap.cobol.asg.metamodel.impl.CompilationUnitImpl.addProgramUnit(CompilationUnitImpl.java:65)
	at io.proleap.cobol.asg.visitor.impl.CobolProgramUnitVisitorImpl.visitProgramUnit(CobolProgramUnitVisitorImpl.java:66)
	at io.proleap.cobol.asg.visitor.impl.CobolProgramUnitVisitorImpl.visitProgramUnit(CobolProgramUnitVisitorImpl.java:1)
	at io.proleap.cobol.Cobol85Parser$ProgramUnitContext.accept(Cobol85Parser.java:842)
	at org.antlr.v4.runtime.tree.AbstractParseTreeVisitor.visitChildren(AbstractParseTreeVisitor.java:46)
	at io.proleap.cobol.Cobol85BaseVisitor.visitCompilationUnit(Cobol85BaseVisitor.java:27)
	at io.proleap.cobol.Cobol85Parser$CompilationUnitContext.accept(Cobol85Parser.java:764)
	at org.antlr.v4.runtime.tree.AbstractParseTreeVisitor.visit(AbstractParseTreeVisitor.java:18)
	at io.proleap.cobol.asg.visitor.impl.AbstractCobolParserVisitorImpl.visit(AbstractCobolParserVisitorImpl.java:1)
	at io.proleap.cobol.asg.runner.impl.CobolParserRunnerImpl.analyzeProgramUnits(CobolParserRunnerImpl.java:151)
	at io.proleap.cobol.asg.runner.impl.CobolParserRunnerImpl.analyze(CobolParserRunnerImpl.java:50)
	at io.proleap.cobol.asg.runner.impl.CobolParserRunnerImpl.analyzeFile(CobolParserRunnerImpl.java:92)
	at com.csi.parser.CobolParseDVZ.parseFile(CobolParseDVZ.java:60)
	at com.csi.parser.CobolParseDVZ.main(CobolParseDVZ.java:95)


Suppress Option in COPY-Statement not supported.

Referring to http://www.cs.vu.nl/grammarware/vs-cobol-ii/#gdef:copy-statement or
https://www.ibm.com/support/knowledgecenter/en/SSXJAJ_13.1.0/com.ibm.faultanalyzer.doc_13.1/cblsupp.html

Legacy code tends to have COPY-statements with the SUPPRESS Option (to avoid "long listings")

001900     COPY V0P172                                                  supr2.99
001910                   REPLACING ==V0P172.== BY                       24.02.99
001920                             ==V0P172 EXTERNAL.==.                24.02.99
001930     COPY V0P172   SUPPRESS                                       24.02.99
001940                   REPLACING ==V0P172.== BY                       24.02.99
001950                             ==V0P172-DST.==.                     24.02.99

The SUPPRESS Verb is not recognized.
If the "SUPPRESS" ist replaced by spaces within the source code, the error disappears.

Currently the Parser runs into the following error:
Exception in thread "main" java.lang.RuntimeException: syntax error in line 193:33 missing '.' at '\n'
at io.proleap.cobol.preprocessor.sub.document.impl.ThrowingErrorListener.syntaxError(ThrowingErrorListener.java:20)
at org.antlr.v4.runtime.ProxyErrorListener.syntaxError(ProxyErrorListener.java:41)
at org.antlr.v4.runtime.Parser.notifyErrorListeners(Parser.java:544)
at org.antlr.v4.runtime.DefaultErrorStrategy.reportMissingToken(DefaultErrorStrategy.java:381)
at org.antlr.v4.runtime.DefaultErrorStrategy.singleTokenInsertion(DefaultErrorStrategy.java:484)
at org.antlr.v4.runtime.DefaultErrorStrategy.recoverInline(DefaultErrorStrategy.java:448)
at org.antlr.v4.runtime.Parser.match(Parser.java:206)
at io.proleap.cobol.Cobol85PreprocessorParser.copyStatement(Cobol85PreprocessorParser.java:577)
at io.proleap.cobol.Cobol85PreprocessorParser.startRule(Cobol85PreprocessorParser.java:186)
at io.proleap.cobol.preprocessor.sub.document.impl.CobolDocumentParserImpl.processWithParser(CobolDocumentParserImpl.java:85)
at io.proleap.cobol.preprocessor.sub.document.impl.CobolDocumentParserImpl.processLines(CobolDocumentParserImpl.java:61)
at io.proleap.cobol.preprocessor.impl.CobolPreprocessorImpl.parseDocument(CobolPreprocessorImpl.java:58)
at io.proleap.cobol.preprocessor.impl.CobolPreprocessorImpl.process(CobolPreprocessorImpl.java:101)
at io.proleap.cobol.preprocessor.impl.CobolPreprocessorImpl.process(CobolPreprocessorImpl.java:86)
at io.proleap.cobol.asg.runner.impl.CobolParserRunnerImpl.parseFile(CobolParserRunnerImpl.java:174)
at io.proleap.cobol.asg.runner.impl.CobolParserRunnerImpl.analyzeFile(CobolParserRunnerImpl.java:91)
at cobol.log.CobolParseDVZ.main(CobolParseDVZ.java:45)

Extra spaces in ID division lead to parsing errors

Extra spaces before some paragraphs in identification division lead to errors, as in the following example, with 8 spaces instead of 7 before AUTHOR and DATE-WRITTEN:

`

   IDENTIFICATION DIVISION.
    PROGRAM-ID.       test1.
    AUTHOR.           Alberto.
    DATE-WRITTEN.     6/12/17.

`

java.lang.RuntimeException: syntax error in line 3:26 mismatched input 'Alberto' expecting <EOF> ...

Extra spaces don't seem to matter before PROGRAM-ID

Latest API changes

Latest snapshot of cobol85parser broke my code, since the method
io.proleap.cobol.asg.metamodel.procedure.Statement#getChildren()
is missing.
In particular, I need to find linear sequences of statements in Cobol code - or equivalently the statement following a given other statement - and I used the following recursive code:

	// outer sequence is the list statements
	for (Statement s : statements) {
		for (ASGElement c : s.getChildren()) {
			if (c instanceof Scope) {
				Scope scope = (Scope) c;
				// recurse on scope.getStatements()
			}
		}
	}

Is there a way now to do this?

By the way, is there a more stable JAR online, or should I freeze a local copy of the repository and include the JAR in my project?

Thank you

function when-compiled not yet recognized

Obviously the intrinsic function when-compiled is not yet implemented:

Statements like the following

000300             MOVE 'Compile-Datum: '      TO P190-TEXT (30:15)     15.04.98
000200             MOVE FUNCTION WHEN-COMPILED (7:2)                    22.09.97
000450                                         TO P190-TEXT (45:02)     15.04.98
000440             MOVE '.'                    TO P190-TEXT (47:01)     15.04.98
000200             MOVE FUNCTION WHEN-COMPILED (5:2)                    22.09.97
000300                                         TO P190-TEXT (48:02)     15.04.98
000440             MOVE '.'                    TO P190-TEXT (50:01)     15.04.98
000200             MOVE FUNCTION WHEN-COMPILED (1:4)                    22.09.97
000300                                         TO P190-TEXT (51:04)     15.04.98

lead to the following errors:

call to unknown data element FUNCTIONWHEN-COMPILED(7:2)
call to unknown data element FUNCTIONWHEN-COMPILED(5:2)
call to unknown data element FUNCTIONWHEN-COMPILED(1:4)
call to unknown data element FUNCTIONWHEN-COMPILED(9:2)
...

FUNCTION CURRENT-DATE as part of string-statement not allowed ...

We run into the following error when using the next string statement .
(Line 1634 is exactly the FUNCTION CURRENT-DATE within the string-Statement)

line 1634:23 extraneous input 'FUNCTION' expecting {ABORT, ALL, APOST, ARITH, AS, ASCII, ASSOCIATED_DATA, ASSOCIATED_DATA_LENGTH, ATTRIBUTE, AUTO, AUTO_SKIP, BACKGROUND_COLOR, BACKGROUND_COLOUR, BEEP, BELL, BINARY, BIT, BLINK, BOUNDS, CAPABLE, CCSVERSION, CHANGED, CHANNEL, CLOSE_DISPOSITION, COBOL, CODEPAGE, COMMITMENT, CONTROL_POINT, CONVENTION, CRUNCH, CURSOR, DEFAULT, DEFAULT_DISPLAY, DEFINITION, DELIMITED, DFHRESP, DFHVALUE, DISK, DONTCARE, DOUBLE, EBCDIC, EMPTY_CHECK, ENTER, ENTRY_PROCEDURE, ERASE, EOL, EOS, ESCAPE, EVENT, EXCLUSIVE, EXPORT, EXTENDED, FALSE, FOR, FOREGROUND_COLOR, FOREGROUND_COLOUR, FULL, FUNCTIONNAME, FUNCTION_POINTER, GRID, HIGHLIGHT, HIGH_VALUE, HIGH_VALUES, IMPLICIT, IMPORT, INTEGER, KEPT, KEYBOARD, LANGUAGE, LB, LD, LEFTLINE, LENGTH_CHECK, LIB, LIBACCESS, LIBPARAMETER, LIBRARY, LINAGE_COUNTER, LIST, LOCAL, LONG_DATE, LONG_TIME, LOWER, LOWLIGHT, LOW_VALUE, LOW_VALUES, MMDDYYYY, NAMED, NATIONAL, NETWORK, NO_ECHO, NOSEQ, NULL, NULLS, NUMERIC_DATE, NUMERIC_TIME, ODT, OPTIMIZE, ORDERLY, OVERLINE, OWN, PASSWORD, PORT, PRINTER, PRIVATE, PROCESS, PROGRAM, PROMPT, QUOTE, QUOTES, READER, REMOTE, REAL, RECEIVED, REF, REMOVE, REQUIRED, REVERSE_VIDEO, SAVE, SECURE, SHARED, SHAREDBYALL, SHAREDBYRUNUNIT, SHARING, SHORT_DATE, SP, SPACE, SPACES, SYMBOL, TASK, THREAD, THREAD_LOCAL, TIMER, TODAYS_DATE, TODAYS_NAME, TRUE, TRUNCATED, TYPEDEF, UNDERLINE, VIRTUAL, WAIT, XOPTS, YEAR, YYYYMMDD, YYYYDDD, ZERO, ZERO_FILL, ZEROS, ZEROES, NONNUMERICLITERAL, '66', '77', '88', INTEGERLITERAL, NUMERICLITERAL, IDENTIFIER}

Statement ist:

           STRING      '2016'
                       '-'
                       'LST'
                       '-'
                       ZW-ETINODERIDNR   DELIMITED BY SPACE
                       '-'
                       ZW-EL-ANFANG-A
                       ZW-EL-ENDE-A
                       '-'
                       ZW-EL-AG-STNR-A
                       '-'
                       EL0000 IN C1ELALT
                       '-'
                       ZW-EL-ABR-DAT
                       '-'
                       '1'
                       FUNCTION CURRENT-DATE(1:16)
                                       DELIMITED BY SIZE
               INTO ZW-EL-KMID
           END-STRING

unknown value stmt at [n n n n .... n]

In parsing huge programs (more than 10.000 LOCs), the parser tends to issue a lot of "call to unknown data element []" statements.
Could you give a hint:

  • when such a message is raised?
  • and how to interprete the numbers inside the [] - are these line numbers of the parsed program or internal ids or ???
  • is it rather a warning or an error (msg is in black - not red)

If you need cobol samples for that issue, let me know: like mentioned above - I only observed that error pattern in "big cobol sources", so I would have to do some time consuming anonymization of the customer code before uploading

call to unknown data element FUNCTIONDATE-OF-INTEGER(FUNCTIONINTEGER-OF-DATE(P0120(X-S))-1)
unknown value stmt at [5339 5330 5312 3662 4308 4296 3641 3610 3602 3593 3576 3538 1200 1185 1182]
unknown value stmt at [5339 5330 5312 3662 4308 4296 3641 3610 3602 3593 3576 3538 1200 1185 1182]
unknown value stmt at [5339 5330 5312 3662 4308 4296 3641 3610 3602 3593 3576 3538 1200 1185 1182]
unknown value stmt at [5339 5330 5312 3662 4308 4296 3641 3610 3602 3593 3576 3538 1200 1185 1182]
unknown value stmt at [5339 5330 5312 3662 4184 4159 3632 4319 4297 3641 3610 3602 3593 3576 3538 1200 1185 1182]
unknown value stmt at [5339 5330 5312 3662 4184 4159 3632 4319 4297 3641 3610 3602 3593 3576 3538 1200 1185 1182]
unknown value stmt at [5339 5330 5312 3662 4184 4159 3632 4319 4297 3641 3610 3602 3593 3576 3538 1200 1185 1182]
unknown value stmt at [5339 5330 5312 3662 4184 4159 3632 4319 4297 3641 3610 3602 3593 3576 3538 1200 1185 1182]
unknown value stmt at [5339 5330 5312 3662 4184 4159 3632 4319 4297 3641 3610 3602 3593 3576 3538 1200 1185 1182]
unknown value stmt at [5339 5330 5312 3662 4184 4159 3632 4319 4297 3641 3610 3602 3593 3576 3538 1200 1185 1182]
unknown value stmt at [5339 5330 5312 3662 4184 4159 3632 4319 4297 3641 3610 3602 3593 3576 3538 1200 1185 1182]
unknown value stmt at [5339 5330 5312 3662 4184 4159 3632 4319 4297 3641 3610 3602 3593 3576 3538 1200 1185 1182]
unknown value stmt at [5339 5330 5312 3662 4184 4159 3632 4319 4297 3641 3610 3602 3593 3576 3538 1200 1185 1182]
unknown value stmt at [5339 5330 5312 3662 4184 4159 3632 4319 4297 3641 3610 3602 3593 3576 3538 1200 1185 1182]
unknown value stmt at [5339 5330 5312 3662 4184 4159 3632 4319 4297 3641 3610 3602 3593 3576 3538 1200 1185 1182]
unknown value stmt at [5339 5330 5312 3662 4184 4159 3632 4319 4297 3641 3610 3602 3593 3576 3538 1200 1185 1182]
unknown value stmt at [5339 5330 5312 3662 4308 4296 3641 3610 3602 3593 3576 3538 1200 1185 1182]
unknown value stmt at [5339 5330 5312 3662 4308 4296 3641 3610 3602 3593 3576 3538 1200 1185 1182]
unknown value stmt at [5339 5330 5312 3662 4308 4296 3641 3610 3602 3593 3576 3538 1200 1185 1182]
unknown value stmt at [5339 5330 5312 3662 4308 4296 3641 3610 3602 3593 3576 3538 1200 1185 1182]
unknown value stmt at [5339 5330 5312 3662 4308 4296 3641 3610 3602 3593 3576 3538 1200 1185 1182]
unknown value stmt at [5339 5330 5312 3662 4308 4296 3641 3610 3602 3593 3576 3538 1200 1185 1182]
unknown value stmt at [5339 5330 5312 3662 4308 4296 3641 4308 4296 3641 3610 3602 3593 3576 3538 1200 1185 1182]
unknown value stmt at [5339 5330 5312 3662 4308 4296 3641 4308 4296 3641 3610 3602 3593 3576 3538 1200 1185 1182]
unknown value stmt at [5339 5330 5312 3662 4308 4296 3641 4308 4296 3641 3610 3602 3593 3576 3538 1200 1185 1182]
unknown value stmt at [5339 5330 5312 3662 4308 4296 3641 4308 4296 3641 3610 3602 3593 3576 3538 1200 1185 1182]
unknown value stmt at [5339 5330 5312 3662 4184 4159 3632 4319 4297 3641 4308 4296 3641 3610 3602 3593 3576 3538 1200 1185 1182]
unknown value stmt at [5339 5330 5312 3662 4184 4159 3632 4319 4297 3641 4308 4296 3641 3610 3602 3593 3576 3538 1200 1185 1182]
unknown value stmt at [5339 5330 5312 3662 4184 4159 3632 4319 4297 3641 4308 4296 3641 3610 3602 3593 3576 3538 1200 1185 1182]
unknown value stmt at [5339 5330 5312 3662 4184 4159 3632 4319 4297 3641 4308 4296 3641 3610 3602 3593 3576 3538 1200 1185 1182]
unknown value stmt at [5339 5330 5312 3662 4184 4159 3632 4319 4297 3641 4308 4296 3641 3610 3602 3593 3576 3538 1200 1185 1182]
unknown value stmt at [5339 5330 5312 3662 4184 4159 3632 4319 4297 3641 4308 4296 3641 3610 3602 3593 3576 3538 1200 1185 1182]
unknown value stmt at [5339 5330 5312 3662 4184 4159 3632 4319 4297 3641 4308 4296 3641 3610 3602 3593 3576 3538 1200 1185 1182]
unknown value stmt at [5339 5330 5312 3662 4184 4159 3632 4319 4297 3641 4308 4296 3641 3610 3602 3593 3576 3538 1200 1185 1182]
unknown value stmt at [5339 5330 5312 3662 4184 4159 3632 4319 4297 3641 4308 4296 3641 3610 3602 3593 3576 3538 1200 1185 1182]
unknown value stmt at [5339 5330 5312 3662 4184 4159 3632 4319 4297 3641 4308 4296 3641 3610 3602 3593 3576 3538 1200 1185 1182]
unknown value stmt at [5339 5330 5312 3662 4184 4159 3632 4319 4297 3641 4308 4296 3641 3610 3602 3593 3576 3538 1200 1185 1182]
unknown value stmt at [5339 5330 5312 3662 4184 4159 3632 4319 4297 3641 4308 4296 3641 3610 3602 3593 3576 3538 1200 1185 1182]
call to unknown data element x-1

DATA RECORD not processed

ISSUE14.CBL.txt

The DATA RECORD inside a FD seems to be ignored or not parsed correctly.
Also no idea, why the Ausgabe Structure is not parsed.
best refer to the simple sample.

Preprocessing file ISSUE14.CBL.
Preprocessing file ISSUE14.CBL.
Parsing file ISSUE14.CBL.
Collecting units in file ISSUE14.CBL.
Analyzing program units of compilation unit ISSUE14.
call to unknown data element D111E-DATEI
Analyzing identification divisions of compilation unit ISSUE14.
Analyzing environment divisions of compilation unit ISSUE14.
Analyzing data divisions of compilation unit ISSUE14.
Analyzing procedure divisions of compilation unit ISSUE14.
Analyzing statements of compilation unit ISSUE14.
call to unknown data element aus-vor
call to unknown data element aus-text
call to unknown data element aus-text
call to unknown data element ausgabe

CBL statement not recognized

CBL statement specifying compiler options at start of program is not recognized, as in

line 1:7 mismatched input 'CBL' expecting {ID, IDENTIFICATION, PROCESS}

   CBL ARITH(EXTEND)
   IDENTIFICATION DIVISION.
   ...

Copy-book resolution error

Suppose you include a copy book in speficying an extension in the name of the copybook but using no qoutes. Thus the Copy-statement looks like:

COPY abc.cpy

=> the copy book is resolved and found correctly, but the ".cpy" extension is split into a new line, somehow looking like:

      *>  ****************************************************************
CPY.
(Next statement after the copy abc.cpy statement ...)

A very close look shows, that the CPY. starts in column 1 and thus should not be interpreted when also having specified the FIXED format option ...

And this leads to a parsing error as follows:
Parsing file ab2600145.CBL.
line 171:0 mismatched input 'CPY' expecting
Collecting units in file ab2600145.CBL.

Well - I know, it is negotiable if this really is a parsing error,
of course a
COPY abc
or a
COPY 'abc.cpy'
work correctly.
the compilers we are using here on site, also resolve a copy abc.cpy Statement correctly => so personally I tend to say, yes it is an error.

call to unknown data element fort-struktur

Not sure what really causes the parser to have problems with the data structure inside the attached cobol program.
The original data structure of the customer is much more complicated, including redefines and occurs as well ...

       01  FORT-Struktur.
           05 FORT-ANZ             PIC S9(7).
           05 FORT-TAB .
             10  C1FONEU.
               15 P8000                     PIC  X(14).
               15 PG8100.
                 20 PG8028-gruppe .
                   25 P8028               PIC  9(3).
                   25 P8028-N             PIC  9(2).
                 20 P8028-S               PIC  X(3).

Parser issues the following message:
call to unknown data element fort-struktur
a2612ABS.CBL.txt

Igore compiler output formatting directives like EJECT, SKIP1

An almost trivial issue, but currently such occurrences in source code lead to parsing-errors.

I think it's an IBM extension only and no real standard - well you know, COBOL ... -
but output formatting commands like the following should be recognized (and ignored) by the parser:

  • EJECT
  • SKIP1
  • SKIP2
  • (maybe even SKIPn n=3,4...?)

typically these listoutput formatting commands are written on a single line, everywhere in the code and have no semantic meaning at all.
SUPPRESS - like mentioned in #15 falls into the same category, though this output suppressing command is typicall written following a copy xxx statement.

Expansion/Replacement of Tabs in CobolSources

This is more a question of understanding / suggestion for better flexibility - not really an error.
We do have numerous Cobolsources with heavy use of TABULATORs.
Due to indention reasons there are even up to 11 TABS before various Cobol-statements, for example - and all that still stuck into that rigid 1-7,8-72,73-80 column corsett ... gosh !!!

As long as the parsed and returned Cobol-String is treated by a compiler capable of a wide or free source format, we are facing no problem at all, but if the modified Cobol-Statement/Cobol-Files is returned to the host, we have to conform to that infamous rigid Area-Format.
The current algorithm seems to replace each Tab by 4 blanks - but we would rather need more flexibility in that replacement algorithm like being able to define for example:

FIRST_TAB = (seven spaces) 
OTHER_TABS = (two spaces)

Somehow that issue relates to #37 and #24 as well => an option to retain the input format when getting a parsed String from the Cobol85Parser.

For further, better clarification: the parser has NO problems with such lines and parses them correctly. "JUST" the returned parsing String is "too long" and breaks the strict column format. So if the Parser is being used for various Cobol-enhancement-tasks (like reformatting, goto-elimination, dead-code-removal etc) and the modified sources are brought back to the host => then we face the problem (or have to handle it in a postprocessor => as we are doing right now)!

CobolPreprocessor.CobolSourceFormatEnum format = CobolPreprocessor.CobolSourceFormatEnum.FIXED;
CobolParserParamsImpl params = new CobolParserParamsImpl();
params.setCopyBookFiles(copyBookFiles);
String preProcessedInput = new CobolPreprocessorImpl().process(inputFile, format, params); // <<<<<======= THIS String is not Fixed format any more and not AREA-conform

Building cobol85parser 2.3.0-SNAPSHOT fails - due to compile errors

I tried a maven build / test / install etc on the recent snapshot offered via git:

[INFO] Scanning for projects...
[INFO] Downloading: https://devtools/artifactory/repos/org/apache/maven/wagon/wagon-ftp/maven-metadata.xml
[INFO] Downloaded: https://devtools/artifactory/repos/org/apache/maven/wagon/wagon-ftp/maven-metadata.xml (1.2 kB at 109 B/s)
[INFO] 
[INFO] ------------------------------------------------------------------------
[INFO] Building cobol85parser 2.3.0-SNAPSHOT
[INFO] ------------------------------------------------------------------------
[INFO] 
[INFO] --- antlr4-maven-plugin:4.7:antlr4 (run antlr) @ cobol85parser ---
[INFO] No grammars to process
[INFO] ANTLR 4: Processing source directory C:\DATEN\WS-Cobol2Java\Cobol85Parser\src\main\antlr4
[INFO] 
[INFO] --- maven-resources-plugin:3.0.2:resources (default-resources) @ cobol85parser ---
[INFO] Using 'UTF-8' encoding to copy filtered resources.
[INFO] Copying 1 resource
[INFO] Copying 0 resource
[INFO] 
[INFO] --- maven-compiler-plugin:3.6.1:compile (default-compile) @ cobol85parser ---
[INFO] Changes detected - recompiling the module!
[INFO] Compiling 1136 source files to C:\DATEN\WS-Cobol2Java\Cobol85Parser\target\classes
[INFO] -------------------------------------------------------------
[ERROR] COMPILATION ERROR : 
[INFO] -------------------------------------------------------------
[ERROR] /C:/DATEN/WS-Cobol2Java/Cobol85Parser/src/main/java/io/proleap/cobol/asg/metamodel/procedure/skip/impl/SkipStatementImpl.java:[11,38] cannot find symbol
  symbol:   class SkipStatementContext
  location: class io.proleap.cobol.Cobol85Parser
[ERROR] /C:/DATEN/WS-Cobol2Java/Cobol85Parser/src/main/java/io/proleap/cobol/asg/metamodel/procedure/skip/impl/SkipStatementImpl.java:[21,25] cannot find symbol
  symbol:   class SkipStatementContext
  location: class io.proleap.cobol.asg.metamodel.procedure.skip.impl.SkipStatementImpl
[ERROR] /C:/DATEN/WS-Cobol2Java/Cobol85Parser/src/main/java/io/proleap/cobol/asg/metamodel/procedure/skip/impl/SkipStatementImpl.java:[27,90] cannot find symbol
  symbol:   class SkipStatementContext
  location: class io.proleap.cobol.asg.metamodel.procedure.skip.impl.SkipStatementImpl
[ERROR] /C:/DATEN/WS-Cobol2Java/Cobol85Parser/src/main/java/io/proleap/cobol/asg/metamodel/procedure/eject/impl/EjectStatementImpl.java:[11,38] cannot find symbol
  symbol:   class EjectStatementContext
  location: class io.proleap.cobol.Cobol85Parser
[ERROR] /C:/DATEN/WS-Cobol2Java/Cobol85Parser/src/main/java/io/proleap/cobol/asg/metamodel/procedure/eject/impl/EjectStatementImpl.java:[21,25] cannot find symbol
  symbol:   class EjectStatementContext
  location: class io.proleap.cobol.asg.metamodel.procedure.eject.impl.EjectStatementImpl
[ERROR] /C:/DATEN/WS-Cobol2Java/Cobol85Parser/src/main/java/io/proleap/cobol/asg/metamodel/procedure/eject/impl/EjectStatementImpl.java:[25,91] cannot find symbol
  symbol:   class EjectStatementContext
  location: class io.proleap.cobol.asg.metamodel.procedure.eject.impl.EjectStatementImpl
[INFO] 6 errors 
[INFO] -------------------------------------------------------------
[INFO] ------------------------------------------------------------------------
[INFO] BUILD FAILURE
[INFO] ------------------------------------------------------------------------
[INFO] Total time: 32.379 s
[INFO] Finished at: 2017-12-08T10:24:11+01:00
[INFO] Final Memory: 23M/387M
[INFO] ------------------------------------------------------------------------
[ERROR] Failed to execute goal org.apache.maven.plugins:maven-compiler-plugin:3.6.1:compile (default-compile) on project cobol85parser: Compilation failure: Compilation failure: 
[ERROR] /C:/DATEN/WS-Cobol2Java/Cobol85Parser/src/main/java/io/proleap/cobol/asg/metamodel/procedure/skip/impl/SkipStatementImpl.java:[11,38] cannot find symbol
[ERROR]   symbol:   class SkipStatementContext
[ERROR]   location: class io.proleap.cobol.Cobol85Parser
[ERROR] /C:/DATEN/WS-Cobol2Java/Cobol85Parser/src/main/java/io/proleap/cobol/asg/metamodel/procedure/skip/impl/SkipStatementImpl.java:[21,25] cannot find symbol
[ERROR]   symbol:   class SkipStatementContext
[ERROR]   location: class io.proleap.cobol.asg.metamodel.procedure.skip.impl.SkipStatementImpl
[ERROR] /C:/DATEN/WS-Cobol2Java/Cobol85Parser/src/main/java/io/proleap/cobol/asg/metamodel/procedure/skip/impl/SkipStatementImpl.java:[27,90] cannot find symbol
[ERROR]   symbol:   class SkipStatementContext
[ERROR]   location: class io.proleap.cobol.asg.metamodel.procedure.skip.impl.SkipStatementImpl
[ERROR] /C:/DATEN/WS-Cobol2Java/Cobol85Parser/src/main/java/io/proleap/cobol/asg/metamodel/procedure/eject/impl/EjectStatementImpl.java:[11,38] cannot find symbol
[ERROR]   symbol:   class EjectStatementContext
[ERROR]   location: class io.proleap.cobol.Cobol85Parser
[ERROR] /C:/DATEN/WS-Cobol2Java/Cobol85Parser/src/main/java/io/proleap/cobol/asg/metamodel/procedure/eject/impl/EjectStatementImpl.java:[21,25] cannot find symbol
[ERROR]   symbol:   class EjectStatementContext
[ERROR]   location: class io.proleap.cobol.asg.metamodel.procedure.eject.impl.EjectStatementImpl
[ERROR] /C:/DATEN/WS-Cobol2Java/Cobol85Parser/src/main/java/io/proleap/cobol/asg/metamodel/procedure/eject/impl/EjectStatementImpl.java:[25,91] cannot find symbol
[ERROR]   symbol:   class EjectStatementContext
[ERROR]   location: class io.proleap.cobol.asg.metamodel.procedure.eject.impl.EjectStatementImpl
[ERROR] -> [Help 1]
[ERROR] 
[ERROR] To see the full stack trace of the errors, re-run Maven with the -e switch.
[ERROR] Re-run Maven using the -X switch to enable full debug logging.
[ERROR] 
[ERROR] For more information about the errors and possible solutions, please read the following articles:
[ERROR] [Help 1] http://cwiki.apache.org/confluence/display/MAVEN/MojoFailureException

END-<statement> considered error

Sometimes putting END-<statement> stops parsing, for example with
PERFORM UNTIL x NOT = SPACE END-PERFORM.

The same happens with END-ACCEPT and END-IF

space required after comma

The grammar seems to be very strict on requiring SPACES after COMMAs.
'1', '2', '3' is valid, but '1', '2','3' leads to a parsing error (there is no spaces between '2','3'.

Error Message:
Parsing file CobolT1.cbl.
line 28:12 mismatched input ',' expecting DOT_FS
Collecting units in file CobolT1.cbl.

_01  Student.
   03  nachname    pic x(20).
   03  vorname     pic x(20).
   03  geschlecht  pic x(1).
     88  mann      values 'M' 'm'. 
	 88  frau      value 'F', 'f', 'W', 'w'.
	 88  egal      values 'a' thru 'z',
	      'A' thru 'Z'
 * SPACE after COMMA required?		  
 * OK	  '1', '2', '3'.
		  '1', '2','3'.     <<<<==== this is my line 28_ 

I am aware of some IBM-Cobol-Compilers being rather keen on having spaces before a comma, though these are WARNINGs only.

"replace" issue

Hi,
I have an issue with the replacemethod of Cobol85PreprocessorImplclass.
The problem could happen when there are two identifiers in currentReplaceableReplacementswhose name is a substring of each other. Example:

REPLACE
==TEST1== by =='VAL1'==
==TEST1-SECOND-PART== by =='VAL2'==

Depending on the order of substitution, you could have this kind of results
'VAL1'-SECOND-PART instead of 'VAL2' when processing TEST1-SECOND-PART
I'll try to order the substitution (sort on lenght of tokens and process the longest first)
Thanks

EXEC SQL sections are removed in the parsed program

I tried printing the text of a compilation unit retrieved using getCtx().getText(), and the EXEC SQL sections are completely ignored. Is this a missing feature of the parser? Or is there another way to retrieve the embedded SQL statements?

Thank you

Copybooks not found

I tried to parse a directory of Cobol files, using the method
new CobolParserRunnerImpl().analyzeFiles(inputFiles, copyFiles, format, null)
If I understood correctly, inputFiles are Cobol sources, and copyFiles are copybooks.

I got the following error
Copy file "copybooks/abc.cpy" not found in copy files [src/test/resources/copybooks/abc.cpy].
The same happens if I put "abc.cpy" in the same directory of the Cobol source.
Am I missing something?

Thank you

Data description entry not parsed correctly

In parsing the following code, taken from cobol-unit-test, the DataDescriptionEntry corresponding to BINARY has getName() that returns null.

   01  BINARY.
       05  THE-REMAINDER      PIC S9(4).
       05  THE-QUOTIENT       PIC S9(4).
       05  THE-DIVISOR        PIC S9(4).
       05  OFFSET             PIC S9(4).

EXEC SQL Statement before a paragraph.

Hi,

When an execSqlStatement is placed just before a paragrah, i've a parse error.

Here's the error

line 21:7 extraneous input 'END-OF-JOBS' expecting {ACCEPT, ADD, ALTER, CALL, CANCEL, CLOSE, COMPUTE, CONTINUE, DELETE, DEPART, DISABLE, DISPLAY, DIVIDE, ENABLE, ENTRY, EVALUATE, EXHIBIT, EXIT, FETCH,
 FIND, FREE, GENERATE, GOBACK, GO, IF, IMPART, INITIALIZE, INITIATE, INSPECT, KEEP, MERGE, MODIFY, MOVE, MULTIPLY, OPEN, PERFORM, PURGE, READ, RECEIVE, RELEASE, RETURN, REWRITE, SEARCH, SEND, SET, SOR
T, START, STOP, STORE, STRING, SUBTRACT, TERMINATE, UNSTRING, WRITE, DOT_FS, EXECSQLLINE}
       END-OF-JOBS.
       ^^^^^^^^^^^

The code is :

           MOVE SPACE TO W-A.

           EXEC SQL
                WHENEVER SQLERROR CONTINUE
           END-EXEC.
      * 
       END-OF-JOBS.

because of the DOT_FS needed by the statement?
Thanks

$SET Compiler directives in cobol-sources should be ignored/processed

Some programs tend to have compiler directives in the very first line of a COBOL-source,
thus starting with $SET, the $ being positioned in the 7th column (like a comment).
The parser should either ignore such statements - treat them like a comment (first and easys tep) or consider the semantics of the various options (some options are very "doubtful" - like NOCHECKNUM, SPZERO etc etc and even influence the runtime behaviour)

2017-08-03 12:33:22.010 [main] ERROR com.csi.parser.CobolParseXXX.parseFile (79) - Cobolfile: XXXXASX.CBL threw exception: {}
java.lang.RuntimeException: Is FIXED the correct line format? Could not parse line 1:       $SET DIALECT(MF) NOTRUNC NOBOUND                                          
	at io.proleap.cobol.preprocessor.sub.line.reader.impl.CobolLineReaderImpl.parseLine(CobolLineReaderImpl.java:64)
	at io.proleap.cobol.preprocessor.sub.line.reader.impl.CobolLineReaderImpl.processLines(CobolLineReaderImpl.java:100)
	at io.proleap.cobol.preprocessor.impl.CobolPreprocessorImpl.readLines(CobolPreprocessorImpl.java:110)
	at io.proleap.cobol.preprocessor.impl.CobolPreprocessorImpl.process(CobolPreprocessorImpl.java:99)
	at io.proleap.cobol.preprocessor.impl.CobolPreprocessorImpl.process(CobolPreprocessorImpl.java:86)
	at io.proleap.cobol.preprocessor.impl.CobolPreprocessorImpl.process(CobolPreprocessorImpl.java:65)

retain Inputformat whilst preprocessing

Preprocessing the Cobolsources in the next code snippet created correct Cobol, however only in FREE Format. This is no problem with GnuCobol, just use the -free option for further compilation of the generated Coboloutput. However, if you would like to rerun the preprocessed Coboloutput on the host, you can only use a FIXED format here. (one of the reason for rerunning that output on the host would be cobol source instrumentation or enhancements to be applied ...)
Best implementation would be, that the preprocessed String returned is formatted according to the given input.

			CobolPreprocessor.CobolSourceFormatEnum format = CobolPreprocessor.CobolSourceFormatEnum.FIXED;
			String preProcessedInput = new CobolPreprocessorImpl().process(inputFile, copyBookFiles,format);

MOVE CORR tab(ind1,ind2) - Statement not parsed

the following two move corr (using an INDEX) statements lead to parsing errors.

error * move corr with indexed expression leads to parsing error
error       MOVE CORR PG1122-BETRAEGE   (X-H, X-1) TO TAB-ALL-1122    
error       MOVE CORR PG1122-ERGEBNISSE (X-H, X-1) TO TAB-ALL-1122   
error * move corr with indexed expression leads to parsing error

looks like an indexed expression ist not allowed in the grammar for the move-corr-statement:

Parsing file a2612051.CBL.
line 3585:40 mismatched input '(' expecting TO
line 3585:46 mismatched input 'X-1' expecting SECTION
line 3585:49 mismatched input ')' expecting SECTION
line 3586:12 mismatched input 'MOVE' expecting SECTION
line 3586:40 mismatched input '(' expecting SECTION
line 3586:46 mismatched input 'X-1' expecting SECTION
line 3586:49 mismatched input ')' expecting SECTION
line 3589:13 mismatched input 'move' expecting SECTION
line 3589:36 mismatched input '(' expecting SECTION
line 3589:42 mismatched input 'X-1' expecting SECTION
line 3589:45 mismatched input ')' expecting SECTION
line 3591:13 mismatched input 'COMPUTE' expecting SECTION
line 3591:27 mismatched input 'IN' expecting SECTION
line 3591:43 mismatched input '=' expecting SECTION
line 3591:51 mismatched input 'IN' expecting SECTION
line 3591:67 mismatched input '/' expecting SECTION
line 3592:13 mismatched input 'COMPUTE' expecting SECTION
line 3592:27 mismatched input 'IN' expecting SECTION
line 3592:43 mismatched input '=' expecting SECTION
line 3592:51 mismatched input 'IN' expecting SECTION
line 3592:67 mismatched input '/' expecting SECTION
line 3593:13 mismatched input 'MOVE' expecting SECTION
line 3593:26 mismatched input '(' expecting SECTION
line 3593:32 mismatched input 'X-1' expecting SECTION
line 3593:35 mismatched input ')' expecting SECTION
line 3594:13 mismatched input 'MOVE' expecting SECTION
line 3594:48 mismatched input 'TO' expecting SECTION
line 3595:13 mismatched input 'MOVE' expecting SECTION
line 3595:48 mismatched input 'TO' expecting SECTION
line 3596:11 mismatched input 'END-PERFORM' expecting SECTION
Collecting units in file a2612051.CBL.

Record Key and File Status not processed correctly.

Not exactly sure what happens, but I suppose:
VideoCode of VideoRecord => Record Key may only contain simple Cobol-Names (but not such with an "OF" or "IN")
File Status is VideoFileStatus => seems that currently only the default and implicitly declared FILE STATUS is recognized by the parser.

Datei007.cbl.txt

... Output when parsing Datei007.cbl

Preprocessing file Datei007.cbl.
Parsing file Datei007.cbl.
Collecting units in file Datei007.cbl.
Analyzing program units of compilation unit Datei007.
call to unknown data element VideoCodeofVideoRecord
call to unknown data element VideoTitleofVideoRecord
call to unknown data element VideoFileStatus
call to unknown data element SeqVideoStatus
Analyzing identification divisions of compilation unit Datei007.
Analyzing environment divisions of compilation unit Datei007.
Analyzing data divisions of compilation unit Datei007.
Analyzing procedure divisions of compilation unit Datei007.
Analyzing statements of compilation unit Datei007.
call to unknown data element VideoCodeofVideoRecord
call to unknown data element SupplierCodeofVideoRecord
unknown value stmt at [5334 5325 5307 3657 4702 4695 3644 3605 3582 3580 3572 3533 1200 1185 1182]
call to unknown data element VideoTitleofVideoRecord
call to unknown data element VideoTitleofVideoRecord
call to unknown data element VideoTitleofVideoRecord
call to unknown data element VideoTitleinVideoRecord
call to unknown data element VideoCodeofVideoRecord
call to unknown data element VideoCodeofVideoRecord
call to unknown data element VideoTitleofVideoRecord
call to unknown data element VideoCodeofVideoRecord
call to unknown data element VideoCodeofVideoRecord
call to unknown data element VideoCodeofVideoRecord
call to unknown data element VideoCodeofVideoRecord
Anzahl Sections: 8
Section-Name:Anfang
Statements:4

<> operator not recognized

The <> operator is not recognized as NOT_EQUAL, as in:

`line 8:17 no viable alternative at input 'x <>'

   ...
   IF x <> 5
       ADD 1 TO x.
   ...

`

Terminating DOT for a COPY-Statement must be in same line (at least not in a single line)

The required DOT for a copy-statement must not be on a single line ;) else a parsing error is thrown.

You may write
Copy mycopy . (DOT) but
not

copy mycopy
. (dot)

AA1.cbl.txt
mycopy.CPY.txt

Same difference in erroneous behaviour whether the copybook is for the working-storage or the procedure division ...

java.lang.RuntimeException: syntax error in line 12:7 no viable alternative at input '\n       .'
	at io.proleap.cobol.preprocessor.sub.document.impl.ThrowingErrorListener.syntaxError(ThrowingErrorListener.java:20)
	at org.antlr.v4.runtime.ProxyErrorListener.syntaxError(ProxyErrorListener.java:41)
	at org.antlr.v4.runtime.Parser.notifyErrorListeners(Parser.java:544)
	at org.antlr.v4.runtime.DefaultErrorStrategy.reportNoViableAlternative(DefaultErrorStrategy.java:282)
	at org.antlr.v4.runtime.DefaultErrorStrategy.reportError(DefaultErrorStrategy.java:121)
	at io.proleap.cobol.Cobol85PreprocessorParser.copyStatement(Cobol85PreprocessorParser.java:582)
	at io.proleap.cobol.Cobol85PreprocessorParser.startRule(Cobol85PreprocessorParser.java:186)
	at io.proleap.cobol.preprocessor.sub.document.impl.CobolDocumentParserImpl.processWithParser(CobolDocumentParserImpl.java:85)
	at io.proleap.cobol.preprocessor.sub.document.impl.CobolDocumentParserImpl.processLines(CobolDocumentParserImpl.java:61)
	at io.proleap.cobol.preprocessor.impl.CobolPreprocessorImpl.parseDocument(CobolPreprocessorImpl.java:58)
	at io.proleap.cobol.preprocessor.impl.CobolPreprocessorImpl.process(CobolPreprocessorImpl.java:101)
	at io.proleap.cobol.preprocessor.impl.CobolPreprocessorImpl.process(CobolPreprocessorImpl.java:86)
	at io.proleap.cobol.preprocessor.impl.CobolPreprocessorImpl.process(CobolPreprocessorImpl.java:65)

DISPLAY keyword in FILE-CONTROL

The parser does not expect DISPLAY keyword not referring to the printing statement, as in

file-control.
    select standard-input assign to keyboard.
    select standard-output assign to display.

ASSIGN TO DYNAMIC and EXTERNAL not parsed

Following statements do not parse:

 IDENTIFICATION DIVISION.
 PROGRAM-ID. ASSIGNTODYN.
 ENVIRONMENT DIVISION.
    INPUT-OUTPUT SECTION.
       FILE-CONTROL.
          SELECT TEACHER ASSIGN TO DYNAMIC SOME-DAT.
 IDENTIFICATION DIVISION.
 PROGRAM-ID. ASSIGNTODYN.
 ENVIRONMENT DIVISION.
    INPUT-OUTPUT SECTION.
       FILE-CONTROL.
          SELECT TEACHER ASSIGN TO EXTERNAL SOME-DAT.

correct, though too rigid check of declaration order .

First of all - no doubt, your parser is right in "insisting" on the defined order of

environment division.
object-computer. xxx.
source-computer xxx.
special-names.

We all know that the object-computer and source-computer clause is more or less to be treated like a comment only (apart from the option with debugging mode attached to the source-computer)

We/our customer has a couple of programs with the WRONG order of these options (refer to the attachted Cobol program). No doubt, this should be at least a warning under ibm-cobol as well, but it is not ...

Your strict parser rules run into the next error messages:
I think, as ease of use, the parser should not be too rigid on the order of all these options which are more or less like an "intelligent" comment in Cobol ...

> Parsing file AAA01.CBL.
> line 15:7 mismatched input 'OBJECtT-COMPUTER' expecting <EOF>
> Collecting units in file AAA01.CBL.
> 
[AAA01.cbl.txt](https://github.com/uwol/cobol85parser/files/1205597/AAA01.cbl.txt)

Recommend Projects

  • React photo React

    A declarative, efficient, and flexible JavaScript library for building user interfaces.

  • Vue.js photo Vue.js

    ๐Ÿ–– Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.

  • Typescript photo Typescript

    TypeScript is a superset of JavaScript that compiles to clean JavaScript output.

  • TensorFlow photo TensorFlow

    An Open Source Machine Learning Framework for Everyone

  • Django photo Django

    The Web framework for perfectionists with deadlines.

  • D3 photo D3

    Bring data to life with SVG, Canvas and HTML. ๐Ÿ“Š๐Ÿ“ˆ๐ŸŽ‰

Recommend Topics

  • javascript

    JavaScript (JS) is a lightweight interpreted programming language with first-class functions.

  • web

    Some thing interesting about web. New door for the world.

  • server

    A server is a program made to process requests and deliver data to clients.

  • Machine learning

    Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.

  • Game

    Some thing interesting about game, make everyone happy.

Recommend Org

  • Facebook photo Facebook

    We are working to build community through open source technology. NB: members must have two-factor auth.

  • Microsoft photo Microsoft

    Open source projects and samples from Microsoft.

  • Google photo Google

    Google โค๏ธ Open Source for everyone.

  • D3 photo D3

    Data-Driven Documents codes.