10554 Alignment failure on international char and FTP

Collapse
X
 
  • Time
  • Show
Clear All
new posts
  • chrisjj
    Carpal Tunnel
    • Apr 2008
    • 2537

    10554 Alignment failure on international char and FTP

    After copying a file having an international-char name from HD to FTP (with Encoding UTF-8), the original and copy fail to align:

    http://img187.imageshack.us/img187/9053/74036567.gif

    BC bug? Workaround known? Fix planned?
  • Zoë
    Team Scooter
    • Oct 2007
    • 2666

    #2
    Not a bug. The workaround is to rename the local file to match the remote one. I have an idea of how to fix it, but the current behavior is correct, so it's a feature request and I can't say when it will get scheduled.

    On Windows, if your filename contains the character é it's stored on disk as 0x00E9 (small letter e with acute). On OS X its stored as e (0x0045) followed by ' (0x0301, combining acute accent). Both forms are valid, and Windows can actually store both in the same directory, so if you copy from FTP to HD, you'll have two files that look like they have identical filenames. Since both files are valid on Windows we take care not to align them; the fix would involve detecting when there are orphans like this and fudging things so they align.
    Zoë P Scooter Software

    Comment

    • chrisjj
      Carpal Tunnel
      • Apr 2008
      • 2537

      #3
      > On Windows, if your filename contains the character é it's stored on disk as
      > 0x00E9 (small letter e with acute). On OS X its stored as e (0x0045)
      > followed by ' (0x0301, combining acute accent).

      Well fine, but since it was BC that made the adjustment, BC should accomodate the adjustment.

      How else is BC ever going to be properly usable here?? You can't seriously expect the user to e.g. do a manual fixup of thousands of filenames in order to get sync to work properly.

      Comment

      • Zoë
        Team Scooter
        • Oct 2007
        • 2666

        #4
        BC did no such thing. OS X does it, either in the FTP server or in its low-level APIs when it writes it to the disk, and we have no control over it whatsoever.

        As for making it usable, I already said what the likely eventual fix will be. I'm sorry it's an inconvenience, but the current behavior is correct and there are other enhancements that have much higher priority. If you don't like the proposed workaround, stop using extended characters in your filenames.
        Zoë P Scooter Software

        Comment

        • chrisjj
          Carpal Tunnel
          • Apr 2008
          • 2537

          #5
          > BC did no such thing. OS X does it

          Really?? I had to check BC's Encoding: UTF-8 to get even this far. Surely that's what is causing BC Copy to convert from ANSI to Unicode.

          And should cause also BC Compare's to adjust.

          > If you don't like the proposed workaround,

          Your proposed workaround of hand-renaming files is ridiculous.

          > stop using extended characters in your filenames.

          Well, I have been waiting two years for the promised fix to BC international character messups, so I guess I can wait a few more.
          Last edited by chrisjj; 27-Jul-2009, 08:25 PM.

          Comment

          • Zoë
            Team Scooter
            • Oct 2007
            • 2666

            #6
            > Surely that's what is causing BC Copy to convert from ANSI to Unicode.

            The problem is not ANSI vs Unicode, it's Unicode Normalization Form C vs Unicode Normalization Form D. Both are valid ways to store the the same character, and both are representable in UTF-8. I'd prefer you just take my word for it, but there's a fairly thorough thread on the 7-zip mailing list that you're welcome to read, since it covers the same problems.

            > Your proposed workaround of hand-renaming files is ridiculuous.

            I didn't say it was a good workaround, just that that's what's available. The other alternative is to sync everything to the FTP site and then use a mirror sync to copy it back to your HD, which would do the same thing with copies/deletes.

            > Well, I have been waiting two years for the promised fix to BC international character messups, so I guess I can wait a few more.

            And we've made plenty of other fixes within a day or two of you reporting them. You're not our only customer; you have to let a few of the others get their pet problems fixed too.
            Zoë P Scooter Software

            Comment

            • chrisjj
              Carpal Tunnel
              • Apr 2008
              • 2537

              #7
              > The problem is not ANSI vs Unicode, it's Unicode Normalization Form C
              > vs Unicode Normalization Form D.

              Analogous to DOS/Unix linefeed representation, then. After BC copies a line between sides, adjusting linefeed representation, BC compare accomodates its adjustment - showing the orignal and copy as matching.

              After BC copies a file between sides, adjusting character encoding, BC compare fails to accomodate its adjustment, and so falsely shows the orignal and copy as mismatching. Defacing the compare and pranging sync.

              It is very disappointing that you shipped knowing about this failure. And not declaring it to users.

              > The other alternative is to sync everything to the FTP site and then
              > use a mirror sync to copy it back to your HD

              Doubling the execution time, risking source sorruption, messing up the timestamps and breaking filename match further up the workflow. Thanks for the suggestion, but that is not acceptable.

              Comment

              • Zoë
                Team Scooter
                • Oct 2007
                • 2666

                #8
                We didn't ship knowing about this, I discovered it yesterday after I set up an OS X server to test the failure you reported. I'm sorry to disappoint you, but we don't have test systems with every OS/FTP server combination known to man.

                I've already confirmed it as an issue, said what a likely fix will look like, and told you that it's not going to be scheduled at this time. Whether you like it or not, the current behavior is correct, your request is an enhancement, it will be scheduled/prioritized along with every other enhancement, and it's not getting bumped just because you won't shut up about it.
                Zoë P Scooter Software

                Comment

                • chrisjj
                  Carpal Tunnel
                  • Apr 2008
                  • 2537

                  #9
                  > We didn't ship knowing about this, I discovered it yesterday

                  You shipped not knowing that conversion could occur??

                  > said what a likely fix will look like

                  Rather than "fudging things so they align" I hope you'll consider a proper fix - character equivalence as per Text Compare. Ideally likewise with Ignore optional.

                  Comment

                  Working...