PDA

View Full Version : Parsing multiline CSV fields?


barries
18-Jan-2009, 06:32 PM
Many thanks for the great tool!

It looks like a multiline CSV fields (where newlines are contained in quoted text) confuse BC3. This example gets parsed into several records by BC3, for instance, but not by OpenOffice or Excel:

this file has one row,"this
is
a
multiline
field"


Is this likely to be fixed? We deal with a number of CSV files with this property (for string localization tables, for instance), and would like to use bc3 to check for changes.

Erik
19-Jan-2009, 08:25 AM
Our latest release (3.0.13) supports multiline csv fields that do not contain the file's line ending character. By default, csv's only recognize Windows line endings (CR/LF) so CR (carriage return) or LF (line feed) alone can be contained within a field. This setting is "Recognized line ending styles" found on the "Type" page for Data Formats.

barries
14-Feb-2009, 07:44 AM
Eric,

Thanks for the reply. Yes, BC correctly parses LF and/or CR as normal characters in files using other sequences for line ends. However, that's not multi-line support because those characters are, by definition, not line ends in such files.

The reason I asked was that we use a lot of OOo and Excel spreadsheets with multiline fields and BC is far superior to those tools' "Compare Documents" features. So, a BC that correctly parsed such files like that would be useful; one which displayed them as mutliline strings (in taller rows) would be especially nice. But, no complaints, thanks for the great tool.

- Barrie

Aaron
10-Mar-2009, 05:09 PM
Hello Barries,

Is the issue you have CR/LF characters inside of strings you want recognized as part of the string, and not the final CR/LF at the end of the line? This we do not currently support. The best method to work around would be an external conversion process that converts all End of Line characters inside of "strings," as you define, and change them into another character (CR, LR, or CR/LF, whichever isn't used for the actual EoL/End Of Record character. This way, you can configure BC3 to not count that character as the EoL character in the specific data compare File Format.

barries
22-Mar-2009, 09:30 AM
That's correct, Aaron. Thanks for the reply. Is this likely to be scheduled for addition sometime?

- Barrie

Erik
23-Mar-2009, 08:02 AM
Hi Barrie,

Any idea which application(s) you use are using cr/lf's within fields? Excel 2002 only uses lf's within fields, although it does support cr/lf's. We're just trying to gauge how commonly this case might come up because it would require significant redesign to support.

Thanks,

barries
19-Nov-2009, 07:11 AM
Hi, Erik,

Sorry for the delay in responding--been busy on other things.

Applications which we and our customers use to edit .csv files (other than Excel and OpenOffice) are Gvim, emacs, visual studio, wordpad, notepad, etc. On Windows, these will (often) put cr/lfs in fields.

Also, many SCMs (perforce, svn, etc) will convert lineendings onn text files when checking in & out again to be "platform local". So if you check in a .csv file on a non-Windows compatible platform, then check it out on Windows, the lineendings can be changed to be all cr/lfs, and vice versa.

Background: a lot of our CSV files are UTF-8 that the customers (and their overseas subcontractors) need to review and modify. We'd like to use BC3's data view when importing tweaks and translations...

I can see how it would be a significant redesign to move from a line-oriented CSV parser to a record-oriented CSV parser.

- Barrie

Erik
19-Nov-2009, 08:19 AM
Hi Barrie,

Support for cell data containing all kinds of line endings will be added in version 3.2.

Regards,