Skip to content
View in the app

A better way to browse. Learn more.

The AVSIM Community

A full-screen app on your home screen with push notifications, badges and more.

To install this app on iOS and iPadOS
  1. Tap the Share icon in Safari
  2. Scroll the menu and tap Add to Home Screen.
  3. Tap Add in the top-right corner.
To install this app on Android
  1. Tap the 3-dot menu (⋮) in the top-right corner of the browser.
  2. Tap Add to Home screen or Install app.
  3. Confirm by tapping Install.

BP=0 and the PCIe bus. (Or, why you should have a fast enough GPU to actually make use of your CPU.)

Featured Replies

In http://forums1.avsim...p0-conclusions/ ******* Altuve describes some very good tweaks. Including BP=0 that gives a really good boost in performance. I can only take my hat off to him and everyone for all the work they put in to figure all this out.I have my own inputs to why the BP=0 is actually giving the improved performance that I’d like to share with the community.It is said that FSX uses massive CPU resources to manage the BufferPools. Hence the improvement when not using them.My input is that it is actually mostly the PCIe bus that is holding us back and that the BP=0 reduces the need for PCIe bandwidth.Let me show you how I came to this conclusion.I’ve been using a 630 second long flight in FSXmark07. It started with me studying this FPS graph showing flights with and without BP=0.There is a marked improvement the first 114 seconds and from 242 seconds onwards. But little improvement between 114s - 242s. An improvement in available CPU cycles should really show here as well. It doesn’t make sense.I decided to study the core load in the task manager.Let me give you a quick introduction to the MainThread, Fibres and Texture&Terrain loaders in pictures.Let’s start with a simple Dual core CPU and AffinityMask=1.Here you can see the MainThread on Core#0 and Fibres on Core#1.If we move on to a Quad core CPU but still keep AffinityMask=1 you can see that the fibres are spread out on Core#1, Core#2 and Core#3.There is no way to control where they end up and fibres can end up on either core including Core#0. It will change during and between flights as well. The important thing to understand is that the MainThread + Fibres = FPS. Restrict the MainThread or the Fibres and you end up with lower FPS.Let’s introduce the Texture&Terrain loaders as well.I use AffinityMask=13 to put the MainThread on Core#0, and a Texture&Terrain loader on Core#2 and Core #3. Here you can see that most Fibres now end up on Core#1, but some share the cores with the T&t loaders.Now back to the original problem. BP=0 not behaving as expected.Lets look at the task manager using BP=0.Look at the full utilization of Core#0 with the MainThread. This is how the MainThread should look like unless we are bottlenecked by something. I smell a PCIe bandwidth limitation. But let’s look closer at the task manager first.Here I pasted together 2 different flights so we can see the MainThread the first 5 minutes of the flight. Look at the times I marked and compare that to the first graph I showed.Without BP=0 we already have fairly high utilization between 114s-242s. Therefore we don’t get much improvement there. Before 114s and after 242s, with BP=0, we use the CPU cycles that are already available on our CPU. That’s why we get the improvement in FPS.Now lets look at a FPS graph showing different PCIe bandwidth and the impact of BP=0.From 114s and onwards you can see that BP=0 has roughly the same effect as doubling the PCIe bus bandwidth. Before 114s it’s more than 4 times PCIe bandwidth. How come it differs? This took me a lot of head scratching to figure out an explanation for. But this is my answer. Before 114s we have downtown Seattle in view with loads of custom buildings. After 114s we have passed down town. The heavy scenery downtown needs a lot more PCIe bandwidth. It’s the same thing in FSmark11. BP=0 is simply letting us be purely CPU limited again.Let’s have a look at the MainThread in task manager at the end of the flight using BP=0.It’s not 100%. We are still bottlenecked by the PCIe bus at the end of the flight. PCIe 3.0 should help even with BP=0.I won’t show the different graphs but when we are limited by the PCIe bus, a faster GPU or CPU can still help, but it is giving diminishing returns. So if we use AA a faster GPU is noticeable. And remember, the GPU always have to be faster than the CPU when using BP=0.Let me finally show you this Graph showing the difference between using a PCIe x8 bus without BP=0 and a PCIe x16 bus with BP=0 on a flight using 8xSQAA.Both are fully realistic scenarios on today’s mainstream platforms with only 16 lanes from the CPU. Please make sure you use all 16 PCIe lanes to your GPU and use BP=0 if you can. In that case you can actually get use of your CPU.

Hi Lars,Good to see you back. Extremely interesting to say the least! Many of us have probably noticed the symptoms of BP=0 without knowing the causes. I am curious what changes happen when hyperthreading is a factor, if any. Thank you for your work. It challenges me to rethink some of my rethinks!Kind regards,

Mods, please make this a sticky. Very useful information in here.

My input is that it is actually mostly the PCIe bus that is holding us back and that the BP=0 reduces the need for PCIe bandwidth.
I wonder if the PCIe3 bus will cause about a 10FPS jump. As it seems, the performance jump is about 10 FPS per double in PCIe available bandwidth. Excited to see some Kepler+Ivy systems run through FSmark11!

Edited by benorg

I wonder if the PCIe3 bus will cause about a 10FPS jump. As it seems, the performance jump is about 10 FPS per double in PCIe available bandwidth. Excited to see some Kepler+Ivy systems run through FSmark11!
I fly without autogen, so PCIe3 may not do anything for me...
Simon
Both are fully realistic scenarios on today’s mainstream platforms with only 16 lanes from the CPU. Please make sure you use all 16 PCIe lanes to your GPU and use BP=0 if you can. In that case you can actually get use of your CPU.
How do we confirm/change the PCIe lanes usage?

Joe Brown

gold_mustang1500.jpg

 

  • Author

I doubt Keppler+Ivy Bridge will show much change in FSXmark11 on a clock for clock basis as I belive high end systems already get BP=0 put in the .cfg by the tweaking tool(?) so we are already CPU limited by high end systems.Effects of HT do not really change with BP=0. You can more easily see the positive effects FPS get from HT on a Dual core with affinitymask=9 compared to a regular Dual core with affinitymask=3.HT still have it's specific upsides and downsides. And to me the HT is only "worth it" with photo scenery or if you are stuck with a Dual core with HT.

How do we confirm/change the PCIe lanes usage?
I see you have a bloomfield system so you do not have to worry. I think your motherboard always give the main GPU slot 16 lanes. Othervise you can read it in GPU-Z or CPU-Z what you have at the moment.

Edited by SAAB340

it is actually mostly the PCIe bus that is holding us back
That's what Phil Taylor has been saying all along. (Back-of-the-Envelope calculation here.)Cheers,- jahman.
That's what Phil Taylor has been saying all along. (Back-of-the-Envelope calculation here.)Cheers,- jahman.
Wow! How did he know that! hahaAnyways, it seems apparent to me that we should see about a 10-13FPS increase in heavy autogen moving from PCIeV2---> PCIeV3.I would call that quite a jump and for most people and with combination of an IB @ 5GHZ and 8GB of DDR3@2133 JK[3-3-3-14]JK I personally think that we will be truly able to run FSX even higher than ever. Seems that all of these upgrades with PCIe3(with a fast card) and IB should create about a 15% jump in performance in most scenarios.Now, how about Haswell? Big%20Grin.gifI think that I just had an epiphany! I understand why my FPS sucks around ORBX+Max Autogen! Thanks so much for the link Jahman!

Edited by benorg

Thanks so much for the link Jahman!
You're welcome!Cheers,- jahman.
Mods, please make this a sticky. Very useful information in here.
These are only speculations. Very good ones, but none the less only guesses. PCIex3 is not here yet, comparisons are not final.For now going to keep it as a discussion.

I too thought that this was well presented, even if a little light on technical facts.For instance there's no real mention of hardware specifications, so we can't tell if the results are universal or hardware specific.I know that ******* does not recommend the BP=0 with a card with less than 2GB VRAM so is this important wrt thr results presented? Will this work with a 1GB or 3GB card? Now we know that BP affects the cpu ie not just FSX, so were any artefacts outside of FSX experienced.There is no mention of PCIE2 so what was used PCIE1 (unlikely).Someone gives a link to Phil Taylor's blog re PCIE but his calculations appear to be based on PCE1"PCIE2 from Google: The PCIe 2.0 standard doubles the transfer rate compared with PCIe 1.0 to 5 GT/s and the per-lane throughput rises from 250 MB/s to 500 MB/s. This means a 32-lane PCI connector (×32) can support throughput up to 16 GB/s aggregate." So are we saying that we are still bottlenecked at this level of transfer rate?To me the biggest performance issues in FSX are the fragmentation of the VAS (which increases with time and can slow data transfer/instructions to the working set) and the poor coding of FSX which utilises unalligned memory access ('in spades' according to aforesaid P Taylor). If only we could reprogram FSX to utilise aligned memory access that would be a major step in increasing performance. It can be done, but I believe it is byte by byte, so we need a young volunteer!I think that this is a good post but we do need a few more technical facts so we can judge the reported improvements a little better.Well done OP!RegardsPeterHPS I don't know why this type face has descended into minute lillipution - did I press the wrong key? Ah! that's better I can suddenly re-size.

PS I don't know why this type face has descended into minute lillipution - did I press the wrong key?
No, the BB text editor is buggy and self-clicks certain editor buttons of its own volition, In your case it was either the subscript or superscript button, so you're in time to edit your post and revert back to standard type.Cheers,- jahman.

Lars you could run this tool during your test runshttp://forum.avsim.net/topic/360449-dx-explorer-excellent-tool-for-in-depth-analysis/It would give you even more information on what's going on, dx calls etc.Regards,Markus

I see you have a bloomfield system so you do not have to worry. I think your motherboard always give the main GPU slot 16 lanes. Othervise you can read it in GPU-Z or CPU-Z what you have at the moment.
Yes, you are right. After further research I see my mobo dedicates 16 lanes as long as the proper slots are used in the right sequence.

Joe Brown

gold_mustang1500.jpg

 

Create an account or sign in to comment

Account

Navigation

Search

Search

Configure browser push notifications

Chrome (Android)
  1. Tap the lock icon next to the address bar.
  2. Tap Permissions → Notifications.
  3. Adjust your preference.
Chrome (Desktop)
  1. Click the padlock icon in the address bar.
  2. Select Site settings.
  3. Find Notifications and adjust your preference.