Page 1 of 2 12 LastLast
Results 1 to 10 of 17

Thread: ignoring CRLF

Hybrid View

Previous Post Previous Post   Next Post Next Post
  1. #1
    Join Date
    Jan 2018
    Posts
    28

    Default ignoring CRLF

    I am comparing 2 data files from Mainframe. Inside the data records there can be CRLF inside the data record. The hex value is 15 that shows in the record. When the compare results are displayed, the CRLF is interpreted and a CRLF is performed. The line splits once the CRLF is hit so it becomes 2 lines in the compare output. For example, if there are 10 lines all having a CRLF in it, 20 lines show in the compare output.
    Can the CRLF be ignored and interpreted as part of the data ? This doubling of the line is skewing the results.
    Many unexpected split up lines show. The encoding is ANSI.


    Thanks.

  2. #2
    Join Date
    Oct 2007
    Location
    Madison, WI
    Posts
    11,830

    Default

    If you use the Table Compare, using a format with a defined string can include the CRLF in the data of a cell.

    The Table Compare is column/row defined cells, with a default Key as Column 1. You can define different Key columns by right clicking any column header and setting each as Standard, Unimportant, or Key (and multiple Key columns act together as the Key).
    Aaron P Scooter Software

  3. #3
    Join Date
    Jan 2018
    Posts
    28

    Unhappy update

    The file does not have delimiters (semicolon, comma, etc).
    The CRLF is needed for source code compares because CRLF always means do a line break.
    If I have 1 million records, 100s of records can have CRLF when records having over 1000 bytes of data.
    I tried the table compare but it doesn't work with a non delimited file.

    If I do a CRLF replace with space before doing the compare, there will be no un needed line breaks.
    This is not ideal since I have to copy the file and convert it without CRLF.
    It seems having an option to ignore doing a line break when it sees CRLF in data files would be a good feature to have in BC.
    It would include CRLF in the compare but do not do the line break in the compare result. Line breaking is fine for source code or report type files though. I am only referring to non delimited raw data files.



    Quote Originally Posted by Aaron View Post
    If you use the Table Compare, using a format with a defined string can include the CRLF in the data of a cell.

    The Table Compare is column/row defined cells, with a default Key as Column 1. You can define different Key columns by right clicking any column header and setting each as Standard, Unimportant, or Key (and multiple Key columns act together as the Key).
    Last edited by mikedgre; 15-Feb-2018 at 03:34 PM.

  4. #4
    Join Date
    Oct 2007
    Location
    Madison, WI
    Posts
    11,830

    Default

    Hello,

    Are your data files Fixed Width (every column is a specific number of characters)? Because the Table Compare format can be defined for either Delimited or Fixed style in the Format button (or Tools menu -> Formats dialog, new Table format).

    Is the Hex value 15 the line break interior of your lines or the one that should be respected? What differentiates the two between those you need ignored and those you need to break? If you can define any string to swallow the former, then the Table Compare can encapsulate the entire line as a single cell (if the Fixed style doesn't work).

    Would it be possible to send us example files (and note an example line number/character position of the line break you want to ignore vs. one you want preserved) to support@scootersoftware.com?
    Aaron P Scooter Software

  5. #5
    Join Date
    Jan 2018
    Posts
    28

    Default no progress...

    Yes. The hex is 15 for this CRLF. The width of the fields are fixed but there are rare times where this may not be the case.
    I do not want to map the fields out every time. Some files have over 200 fields and can be packed, binary, etc.
    I was hoping the single cell approach would work. It only shows a max number of bytes in the compare result. I only see part of the record and the changed bytes do not show if they are later in a long record.
    After looking further, this line splitting is not that much of a negative. Because I am only focused in differences, the chance of
    CRLF splits is very minimal. When I share this to others, I will just mention that this can happen. However, it would be
    nice to have a checkbox to not do CRLF line breaks if the user was only doing raw file data compare.

    The CRLF happens at random places and can be any position in the record. It could be byte 17 in 1 record, byte 85 in another, or even 2,3,4,5,6..... instances of CRLF in the record. Having BC not 'process' a CRLF would be
    very nice in this case. I would say the best solution is to treat the CRLF as any other data in the compare.


    Quote Originally Posted by Aaron View Post
    Hello,

    Are your data files Fixed Width (every column is a specific number of characters)? Because the Table Compare format can be defined for either Delimited or Fixed style in the Format button (or Tools menu -> Formats dialog, new Table format).

    Is the Hex value 15 the line break interior of your lines or the one that should be respected? What differentiates the two between those you need ignored and those you need to break? If you can define any string to swallow the former, then the Table Compare can encapsulate the entire line as a single cell (if the Fixed style doesn't work).

    Would it be possible to send us example files (and note an example line number/character position of the line break you want to ignore vs. one you want preserved) to support@scootersoftware.com?
    Last edited by mikedgre; 16-Feb-2018 at 01:26 PM.

  6. #6
    Join Date
    Oct 2007
    Location
    Madison, WI
    Posts
    11,830

    Default

    Hello,

    Ok, thanks. To clarify, what character does signify the end of your lines? It looks like you have mostly fixed width, except when it's not? Is there a different line ending character that you are using?
    Aaron P Scooter Software

  7. #7
    Join Date
    Jan 2018
    Posts
    28

    Default

    Im not sure how the end of line gets done. All the records have the same length. A CRLF must be attached to the end of each record. I put max character per line as 3000. Each record is 352 bytes long. On the mainframe, the record can have a CRLF or hex value 15. This value would be randomly buried in inside the record. Your software detects mainframe file length and inserts a CRLF at end of each record? I think telling BC to process 352 bytes of record 'as is' with only CRLF at end would work. Im just brainstorming

  8. #8
    Join Date
    Dec 2007
    Location
    U.S. East coast
    Posts
    303

    Default

    a CRLF or hex value 15
    I'm curious about what kind of enviroment this is, if it's not proprietary information.

    The conventional line terminator characters are CR (carriage return), which is hexadecimal D (decimal 13); and LF (line feed), which is hexadecimal A (decimal 10). These two characters occur by themselves or in the combination CR followed by LF. The terms "carriage return" and "line feed", of course, are taken from the old mechanical typewriters or printers, where they correspond to physical movements of the carriage.

    In this context, I've only seen "15" as the octal representation of hexidecimal D (decimal 13).

  9. #9
    Join Date
    Jan 2018
    Posts
    28

    Default correction

    I have a correction. Pardon me. The hex value that I see in the file is 15. Sorry about that. I went into the following link to obtain the value and chart for hex 15 https://www.ibm.com/support/knowledg...ef/asciit.html

    It is 'new line' from what I see. New line must be causing the 'line break' and not CRLF.

    Decimal Value = 21
    Hex Value = 15
    Control Character = Ctrl-U
    ASCII Symbol = NAK
    Meaning = negative acknowledge
    EBCDIC Symbol = NL
    Meaning = new-line




    Quote Originally Posted by Dave_L View Post
    I'm curious about what kind of enviroment this is, if it's not proprietary information.

    The conventional line terminator characters are CR (carriage return), which is hexadecimal D (decimal 13); and LF (line feed), which is hexadecimal A (decimal 10). These two characters occur by themselves or in the combination CR followed by LF. The terms "carriage return" and "line feed", of course, are taken from the old mechanical typewriters or printers, where they correspond to physical movements of the carriage.

    In this context, I've only seen "15" as the octal representation of hexidecimal D (decimal 13).

  10. #10
    Join Date
    Dec 2007
    Location
    U.S. East coast
    Posts
    303

    Default

    mikedgre, thanks for the information.

Posting Permissions

  • You may not post new threads
  • You may not post replies
  • You may not post attachments
  • You may not edit your posts
  •