How to install pygments_ibm_cobol_lexer
- Download and install ActivePython
- Open Command Prompt
- Type
pypm install pygments-ibm-cobol-lexer
Lastest release
This package contains a Pygments Lexer for cobol (db2, cics and dli embedded).
The lexer parses the Enterprise Cobol for z/os (V3R4) dialect, including utilizing embedded Db2/Sql, Cics and DLi
mainframe cobol coding form
Many early programming languages, including PL/1, Fortran, Cobol and the various IBM assembler languages, used only the first 7-72 columns of a 80-column card
Columns 1- 6 Tags, Remarks or Sequence numbers identifying pages or lines of a program 7
- * (asterisk) designates entire line as comment
- / (slash) forces page break when printing source listing
- - (dash) to indicate continuation of nonnumeric literal
8 - 72
- COBOL program statements, divided into two areas :
- Area A : columns 8 to 11
- Area B : columns 12 to 72
73 - 80 Tags, Remarks or Sequence numbers (often garbage...)
Division, section and paragraph-names must all begin in Area A and end with a period.
CBL/PROCESS directives statement can start in columns 1 through 70
Installation
The lexer is available as a Pip package:
$ sudo pip install pygments_ibm_cobol_lexer
Or using easy_install:
$ sudo easy_install pygments_ibm_cobol_lexer
Usage
After installation the ibmcobol Lexer and ibmcobol Style automatically registers itself for files with the ".cbl" extensions.
- Therefore, cmdline usage is easy:
Ascii input :pygmentize -O full,style=ibmcobol,encoding=latin1 -o HORREUR.html HORREUR.ascii.cbl
Ebcdic input (in this case it's necessary to specify outencoding value):pygmentize -O full,style=ibmcobol,encoding=cp1147,outencoding=latin1 -o COB001.html COB001.cp1147.cbl
- As library :
from pygments import highlight from pygments.formatters import HtmlFormatter from pygments_ibm_cobol_lexer import IBMCOBOLLexer, IBMCOBOLStyle my_code = open("cobol_ebcdic.cbl",'rb').read() highlight(my_code,IBMCOBOLLexer(encoding='cp1140'), HtmlFormatter(style=IBMCOBOLStyle, full=True), open('test.html','w'))
Also see the pygments_ibm_cobol_lexer-1.1/pygments_tests/ directory
About cp1147
I have files coded IBM1147 (EBCDIC french + euro sign), I was forced to write my own codec cp1147, very close to the cp500 (Canada, Belgium), it diverges on the characters "@°{}§ùµ£à[€`¨#]~éè¦ç" :
from pygments_ibm_cobol_lexer import cp1147 print "euro sign ?",chr(159).decode('cp1147') print ''.join([ chr(i).decode('cp1147') for i in range(0,256) if chr(i).decode('cp1147') != chr(i).decode('cp500')])
I have added this import in IBMCOBOLLexer init method :
if self.encoding == 'cp1147': import cp1147
Changelog
1.1 - (2012-11-19) Minor Fix + EBCDIC enhancements:
- Fix : float regex detection before integer detection
- Add inline-commentaire *> (not the IBM default)
- Change cics/dli keywords color...
- Extend CICS_KEYWORDS, remove EJECT/SKIP from COBOL_KEYWORDS (treated as comments)
- each ASCII input lines is padded to 80 columns
- Add EBCDIC features:
- add my own french codec cp1147
- if EBCDIC encoding is passed (cp500,cp1140,...) or detected, convert the binary input raw text in 80 columns fixed lines
- encoding=chardet (slowly) does not detect EBCDIC chart, it's override with encoding=guess
- "guess EBCDIC" is defaulted to self.encoding='cp500'
1.0 - (2012-11-12) Initial release.
Cobol gallery
This lexer has been developped as part of a larger project of cleaning cobol sources (Scrubol).
Visit my cobol gallery (gallery).