Large CSV with many columns comparison very slow.

Collapse
X
 
  • Time
  • Show
Clear All
new posts

  • Aaron
    replied
    I should add: since you are switching computers, it is good to test both versions to note how each performs on the same hardware. This can be done by using the appropriate setup.exe and selecting a Portable Install option. You can create as many Portable Installs on your Desktop as you like, and each is a single directory install that do not interact with each other.

    Otherwise, if working with just one BC3 and one BC4 install, you can install them normally, as they install to Beyond Compare 3\ and Beyond Compare 4\ directories respectively.

    Leave a comment:


  • fphillips
    replied
    Ok. I'll check it out at home and generate some dummy data to see what happens. Shouldn't be too big of a deal to do.

    Thanks for the info.

    Frank

    Leave a comment:


  • Aaron
    replied
    Hello,

    You would get access to more memory if you were able to update to BC 4.1(.5), but 4.0.7 was our last 32bit only release. I'm not certain if this would improve performance in your case without access to more specific files or information, but if you have a test environment where this could be tested, it would be a good first test.

    Leave a comment:


  • fphillips
    replied
    Would love to upgrade but we're a controlled corp so only approved applications can be used and currently that is BC3. Don't even have access on my laptop to install anything outside of the approved list. Working on that, but wont be anytime soon enough to help.

    Keyed field is default to the first column, which works for these files as the key field is the first column. This isn't always the case and a lot of times there are multiple keys. We do use the unimportant and key field options a lot as a lot of our work is maintenance, bug fixes, and enhancements, so we are constantly having to mark new fields and fields where calculations have change as unimportant. Lets just say we use the little referee button a lot.

    Mostly I was hoping that there was a setting I couldn't find that would allow for more memory usage. It seems that with only 80 meg of memory used to compare two 200+meg files indicates a lot of disk usage. Since we have about 59gb of free memory was hoping to be able to up the program usage to the max of about 3 gb for a 32bit process.

    Anyway, looks like we're going to have to live with it for now and hope we don't encounter too many lines of business that put out these massive files.

    Thanks for the quick reply,

    Frank

    Leave a comment:


  • Aaron
    replied
    Hello,

    Thanks for the feedback. The first thing I would recommend is trying the trial of BC4, which you can install without altering or removing BC3 (on Windows). BC4.1.5 has an improved Table Compare (renamed Data Compare) and 64bit support which may help with large files.

    For BC3's Data Compare, different factors can impact performance. The number of columns could be an issue, but also the data in the columns. Would your additional columns contain the same amount of text, making the files much larger? Or would the additional columns be much smaller? Are you using the default Key (column 1)?

    The Text Compare could also be configured to ignore defined grammars. Would the new data be definable by a Regular Expression or set character positions? We have a guide on defining unimportance here: http://www.scootersoftware.com/suppo..._unimportantv3
    Once Unimportant, you can Ignore Unimportant Differences and the rest of the lines would be compared.

    Leave a comment:


  • fphillips
    started a topic Large CSV with many columns comparison very slow.

    Large CSV with many columns comparison very slow.

    Hi all.

    Currently using BC v3 to compare our current program output vs. previous version output to make sure that only expected changes are present.

    However we have a few CSV files that are very large (300+ megs) and have 60+ columns of data. Since this is a CSV and we need to ignore expected differences, we require the use of the DATA compare rather than text.

    The problem that I'm having is the length of time it is taking to process this data. Currently I'm near the end of my first hour of the comparison (in this case the files are 200+ megs but only 3 or 4 columns wide) and am wondering if there is a way to speed this up. My thought is that if it's taking this long for 4 columns, I'm toast when I get to the files with 60+ columns.

    Text compare works much faster, however we've had to add a few columns to the current program version and since text compare is a line compare, it's not really a benefit.

    Any thoughts or suggestions would be appreciated.


    Thanks,
    Frank
Working...