Announcement

Collapse
No announcement yet.

set default format to utf8

Collapse
X
 
  • Filter
  • Time
  • Show
Clear All
new posts

  • set default format to utf8

    At the moment, a lot of files are being loaded with the "detected" default of ANSI.
    They are not, they are utf8, and by saving i loose some the special chars, where can i set the default to be uft8
    - i know i can change it, when the files are open, but often forget.

    thanks

  • #2
    You can set the default in the File Formats dialog per format, or in the Profiles dialog in the Server tab for an FTP profile. What kind of location are you connecting to to? If you copy the file to your Desktop, does it still detect as ANSI? If so, can we get a copy of the file emailed to support@scootersoftware.com along with a link back to this forum thread?
    Aaron P Scooter Software

    Comment


    • #3
      file sent

      Comment


      • #4
        I'm testing Beyond Compare 4.
        I noted the same thing as you.
        Utf8 (no BOM) files are correct detected in vim and notepad++ but detected as Ansi in BC.
        If you forget to change encoding to uf8 you'll loose some of the special chars.
        Changing the default to Utf8 will give troubles saving Ansi files.

        BTW: Its a great software.

        Comment


        • #5
          Rando,

          If you have an example file that isn't detected correctly, please email it to support@scootersoftware.com and we'll work on improving the character encoding detection.

          Also, please include a link to this forum thread in your email.
          Chris K Scooter Software

          Comment


          • #6
            I've sent you a file by email as requested.

            Comment


            • #7
              Rando,

              Here's a copy of the reply I sent by email for anyone else monitoring this thread:

              Thank you for sending the example file.

              It appears the file you sent doesn't contain any utf8 characters. As it doesn't have utf8 characters and it has no Byte Order Mark, it is impossible to detect if it is utf8 or ANSI. In this case, where encoding can't be detected, Beyond Compare defaults to ANSI.

              I evaluated opening the same file with Notepad++, it does open it as utf8. However, when I looked at Notepad++'s settings, it opens the file as utf8 because it is the default character encoding, not because it can actually detect the file is utf8.

              To specify Notepad++'s character encoding behavior, open "Settings > Preferences". Go to the new section, then check or uncheck "Apply to opened ANSI files". This setting is on by default.

              If you want your files to open as utf8, then you'll need to edit Beyond Compare's session settings or file formats to force files to default to utf8.

              To force a file encoding for all files matching a specific mask in Beyond Compare 4, select "Tools > File Formats". Select the format that matches your files. Go to the "Conversion" tab. Change encoding from "Detect" to "UTF-8" and save the changes.

              To force a file encoding for a specific pair of files in the Text Compare, select an encoding in the file info panel below each path edit. Then "Session > Save Session". The next time you load the saved session, it will use the specified encoding.
              Chris K Scooter Software

              Comment


              • #8
                Thank you Chris for your reply.

                I don't think that forcing a file encoding is a good thing.
                It is i.m.o. a good thing what notepad++ does:
                Setting a default file encoding; if a file can be detected as Ansi AND utf-8, chose the default encoding.

                Notepad++ still detects Ansi or Latin1 files if they are Ansi or Latin1.
                BC tries to change all file encoding to the default encoding if you change the encoding in Sessions, isn't it?
                That makes you loose some of the special chars if you save the file.

                Comment


                • #9
                  BC will try to detect if you add non-ANSI characters to an ANSI text file.

                  As an example, open an ANSI text file. Paste in a few Japanese characters. Save.

                  BC displays an Encoding Error prompt with the message "This file contains Unicode characters that will be lost if it is saved using the current encoding". The dialog includes a dropdown so you can select an appropriate character encoding for your file.
                  Chris K Scooter Software

                  Comment


                  • #10
                    Chris:

                    I'm coming to the party late, but for us, a very useful feature would be a command-line qualifier that allowed one to coerce the encoding that BC will then try and use.

                    I'd also like to press you a little harder about what gets defaulted ANSI and what gets defaulted to utf-8.

                    Consider a file that contains the following character stream:

                    <20><B1><20> (" ")

                    Is this good Latin1 (ANSI) or is this malformed utf-8? In the absence of other valid utf-8 sequences, it might be ambiguous.

                    Proper utf-8 would be:

                    <20><C2><B1><20>

                    The answer might be context-dependent, especially for a text-comparison program. Maybe we botched a file that we intended to be utf-8?

                    If we could tell BC to always consider it utf-8 and yell at us (or at least show diffs) if it sees formatting errors, that would be useful to us.

                    Atlant

                    Comment


                    • #11
                      Atlant,

                      Thank you for the feedback. I've added a command-line qualifier to coerce encoding to the feature request list.

                      To detect encoding, BC checks for a UTF-8 byte order mark. It also checks for an XML or HTML encoding tag at the beginning of a file. If none of those are found, then it uses a third party library to detect the encoding of the beginning of the file. I don't know the specifics of the algorithm in the third party library.
                      Chris K Scooter Software

                      Comment

                      Working...
                      X