Jump to content
Sign in to follow this  
Cruachan

Character Encoding (XML Files)

Recommended Posts

Hi Oliver,

Moving over to Prepar3D v5 HF2.

Native + ORBX installed into various libraries. Preparing to install Creative Design Studio's Black Marble.

Have backed up all the usual files (addons.cfg, scenery.cfg, terrain.cfg) prior to installing BM as wish to do this via the xml addons method. However the BM Developer has voiced his concerns on his forums as this could lead to layering and other unspecified problems. He still prefers the old method but the BM V5 installers, in fact, do allow for the use of both methods.

Otherwise, no manual messing about with files, apart from Prepar3D.cfg, thus far! To date everything has been installed using native V5 installers.

As you know, I have been an enthusiastic user of your Addon Organizer in P3D v4 so thought it would be wise to have it to hand in case I encountered difficulties in P3Dv5.

Installed Addon Organizer v1.55 b10

Thought I would Check config file encoding and was surprised by this report:

GDaaOIG.jpg

Surely this is incorrect? I have examined all the reported xml files in that image and in each case the encoding is reported correctly as UTF-8.

LM have stated that the encoding for xml files should be UTF-8, as appears to be the case here, and cfg files should be UCS-2 LE BOM.

I think you have stated in another thread that such reported errors would go away after Saving. However, I'm a tad concerned as to what might happen behind the scenes were I to do so.

Interestingly enough, I note in passing that Prepar3D.cfg has been encoded as UTF-8 and this despite having used Notepad++ to add/modify some entries.

I wonder whether you would be kind enough to clarify all this for me before I proceed any further.

Best regards,

Mike

 

 

 

 

  • Like 1

Share this post


Link to post
Share on other sites

Add-on xml files are Windows CRLF UTF-8

dll.xml exe.xml and Displays.xml files are Windows CRLF UTF-8 with BOM

cfg files like display.cfg in C:\ProgramData\Lockheed Martin\Prepar3D v5 are Windows CRLF UTF-16LE

Using Notepad++ is a cause of encoding errors in these files . Using native Windows Notepad.exe is preferred.

 

 

  • Like 1
  • Upvote 1

Steve Waite: Engineer at codelegend.com

Share this post


Link to post
Share on other sites
43 minutes ago, SteveW said:

Using Notepad++ is a cause of encoding errors in these files .

Hi Steve,

Wow, really! That’s certainly come as an unwelcome revelation! It seems for years now we have been reading advice to use this App instead of Windows Notepad while modifying xml and cfg files as the latter was more likely to give rise to encoding issues. I have been complying with this advice and have not been aware of any such issues....so far.

Are you suggesting that my V5 Prepar3D.cfg has had its encoding changed by Notepad++ from UCS-2 LE BOM to UTF-8 when the file was saved after modifications? I note my V4 Prepar3D.cfg is still encoded UCS-2 LE BOM and this does not appear to have given rise to any problems.

This topic is evolving into a minefield of confusion ☹️

Regards,

Mike

Edited by Cruachan
  • Like 1

Share this post


Link to post
Share on other sites

I've never recommended Notepad++. With Windows native Notepad.exe it's not going to alter the encoding it reads in. Only by use of specialist tools set to other formats or add-ons mistaking the way the encoding works can save those files incorrectly. However, once the encoding has been changed, Notepad.exe will continue to read and save those files as-is.


Steve Waite: Engineer at codelegend.com

Share this post


Link to post
Share on other sites

incidentally, aircraft.cfg files can be either Windows CRLF UTF-8 or CRLF UTF-16LE, but not with BOM.


Steve Waite: Engineer at codelegend.com

Share this post


Link to post
Share on other sites

The Byte Order Mark (BOM) is for xml readers to work out the format but is not a requirement with Windows CRLF .ini files which basically is what aircraft.cfg files are. So the Windows in-built code cannot determine the first character of an .ini file with BOM. With .ini files (aircraft.cfg) we see that each section is started with [sectionname.0] or similar. So if the first character is ignored '[' the section is ignored. We have seen since FSX that some of these files contain an empty first line which ensures the first section is read if the format is changed to with BOM.

Edited by SteveW
  • Upvote 1

Steve Waite: Engineer at codelegend.com

Share this post


Link to post
Share on other sites
18 hours ago, Cruachan said:

Surely this is incorrect?

I would have to take a look at one of those files to be sure (= I need one sent to me by email). Probably the Encoding just can't be determined. Looking at the ORBX add-on.xmls on my computer, I can't see anything wrong with them, and I also don't get this list of Encoding warnings.

Best regards


LORBY-SI

Share this post


Link to post
Share on other sites
3 hours ago, Cruachan said:

that my V5 Prepar3D.cfg has had its encoding changed by Notepad++ from UCS-2 LE BOM to UTF-8 when the file was saved after modifications

There are some installers and other management tools around that are doing this. With the add-ons.cfg files too. This is a very common occurence.

Best regards

  • Like 1

LORBY-SI

Share this post


Link to post
Share on other sites
5 hours ago, SteveW said:

I've never recommended Notepad++. With Windows native Notepad.exe it's not going to alter the encoding it reads in. Only by use of specialist tools set to other formats or add-ons mistaking the way the encoding works can save those files incorrectly. However, once the encoding has been changed, Notepad.exe will continue to read and save those files as-is.

Problem is that when a file like terrain.cfg exhibits evidence of encoding changes then joe average needs Notepad++ to fix matters whereas Windows Notepad will not.

Again referencing Black Marble, back in May the ‘culprit’ was thought to be ORBX Central and the advice then was to check terrain.cfg following a visit to ORBX Central and restore the correct encoding UCS-2 LE BOM before installing any of the BM products as BM adds a multitude of entries to terrain.cfg as well. The App recommended to accomplish this was Notepad++. This may no longer apply as I have checked the encoding in terrain.cfg after multiple visits to ORBX Central to bring my P3D V5 ORBX installation up to date and the encoding remains unchanged: UCS-2 LE BOM which is correct according to LM.

I could try generating a fresh Prepar3D.cfg and determine the encoding with Notepad++ and without making any changes. Assuming it turns out to be UCS-2 LE BOM, I will reinstate my mods using Windows Notepad.

It seems reasonable to assume that the majority of simmers won’t know about any of this and are not experiencing problems. Perhaps this is a case of making a mountain out of a molehill, or is there still room for concern?

Regards,

Mike

Edited by Cruachan
  • Like 1

Share this post


Link to post
Share on other sites
1 hour ago, Cruachan said:

is there still room for concern

That depends on what is in those files. Character encoding is not just a concept. The encoding describes the mechanism to store the individual characters. They vary not only concerning what character has what code, but also how many bytes are used to store a character. If you force an encoding on a file that was written in a different one, that can destroy the entire content of the file.

https://unicode.org/faq/utf_bom.html

The initial characters in these sets are identical, derived from good old ASCII. So as long as you don't use exotic characters for labels, names or file paths, it doesn't really matter. But with the current operating systems, ASCII or UTF-8 may not be enough to store a file or folder path when you are using extended character set beyond the classic. In that case, if a file requires a specific encoding to store its contents, messing with the character encoding will destroy the file. Those are the posts where people show an add-ons.cfg file that suddenly only contains chinese symbols. (this usually happens when an "word not allowed" program adds text to/removes text from a file that had an advanced encoding) 

All this is extremely important for all "IT" that has to process text based files. For example the Internet - every browser, every webserver have to rely on correct character encoding, otherwise the whole thing would fail.

Best regards

Edited by Lorby_SI
  • Upvote 1

LORBY-SI

Share this post


Link to post
Share on other sites
30 minutes ago, Lorby_SI said:

if a file requires a specific encoding to store its contents, messing with the character encoding will destroy the file. Those are the posts where people show an add-ons.cfg file that suddenly only contains chinese symbols. (this usually happens when an "word not allowed" program adds text to/removes text from a file that had an advanced encoding) 

Hi Oliver,

Seems to me that the above is very significant. I too have seen such posts. So, if a file has the incorrect encoding reported by, say, Notepad++ is it safe to change that encoding without risking corruption/further corruption of the file content? If this is the case then why the heck are we constantly being advised to use Notepad++ to accomplish the task? As ever, sometimes I believe we are our own worst enemies.

Check your mail for zip archive of each add-on.xml as displayed in the image in my original post. I thought it might be more helpful to you rather than sending only one file. Perhaps you can identify a common factor. Each file has remained untouched since original installation.

Regards,

Mike

Share this post


Link to post
Share on other sites
2 hours ago, Cruachan said:

 App recommended to accomplish this was Notepad++.

As you mentioned, I would also say the only reason one would use a more complicated tool is to put things right that were already put wrong by some other thing, or mistakes maybe from another app, maybe from an editor maybe from an addon configuring itself, none of which should happen.

  • Upvote 1

Steve Waite: Engineer at codelegend.com

Share this post


Link to post
Share on other sites

Have now generated a fresh Prepar3D.cfg and confirmed the file content encoding is UCS-2 LE BOM, which is as it should be. The mystery remains as to how it was changed to UTF-8. Obviously the only suspect in this case has to be Notepad++ and it must have occurred during one of several saves as I added/modified content in the file.

This is all a bit concerning...to say the least! Can we continue to trust this aspect of Notepad++ functionality?

Mike

  • Like 1

Share this post


Link to post
Share on other sites

Notepad.exe doesn't change the format from what it loads, whereas Notepad++ is a sledgehammer to crack a walnut. Prepar3D.cfg is UTF-16LE since it is not an xml file it is an .ini file.

  • Upvote 1

Steve Waite: Engineer at codelegend.com

Share this post


Link to post
Share on other sites

...let's say you open an aircraft.cfg or Prepar3D.cfg in native Windows Notepad.exe. Then you paste in some text that contains characters inconsistent with the present format. Go to save and a message pops up suggesting the file format will have to be changed to accept the included text. Other editors may well simply save what you have leaving what is basically a corrupt file.

  • Upvote 1

Steve Waite: Engineer at codelegend.com

Share this post


Link to post
Share on other sites

Create an account or sign in to comment

You need to be a member in order to leave a comment

Create an account

Sign up for a new account in our community. It's easy!

Register a new account

Sign in

Already have an account? Sign in here.

Sign In Now
Sign in to follow this  

  • Tom Allensworth,
    Founder of AVSIM Online


  • Flight Simulation's Premier Resource!

    AVSIM is a free service to the flight simulation community. AVSIM is staffed completely by volunteers and all funds donated to AVSIM go directly back to supporting the community. Your donation here helps to pay our bandwidth costs, emergency funding, and other general costs that crop up from time to time. Thank you for your support!

    Click here for more information and to see all donations year to date.
×
×
  • Create New...