Jump to content
Sign in to follow this  
SAAB340

BP=0 and the PCIe bus. (Or, why you should have a fast enough GPU to actually make use of your CPU.)

Recommended Posts

In http://forums1.avsim...p0-conclusions/ ******* Altuve describes some very good tweaks. Including BP=0 that gives a really good boost in performance. I can only take my hat off to him and everyone for all the work they put in to figure all this out.I have my own inputs to why the BP=0 is actually giving the improved performance that I’d like to share with the community.It is said that FSX uses massive CPU resources to manage the BufferPools. Hence the improvement when not using them.My input is that it is actually mostly the PCIe bus that is holding us back and that the BP=0 reduces the need for PCIe bandwidth.Let me show you how I came to this conclusion.I’ve been using a 630 second long flight in FSXmark07. It started with me studying this FPS graph showing flights with and without BP=0.There is a marked improvement the first 114 seconds and from 242 seconds onwards. But little improvement between 114s - 242s. An improvement in available CPU cycles should really show here as well. It doesn’t make sense.I decided to study the core load in the task manager.Let me give you a quick introduction to the MainThread, Fibres and Texture&Terrain loaders in pictures.Let’s start with a simple Dual core CPU and AffinityMask=1.Here you can see the MainThread on Core#0 and Fibres on Core#1.If we move on to a Quad core CPU but still keep AffinityMask=1 you can see that the fibres are spread out on Core#1, Core#2 and Core#3.There is no way to control where they end up and fibres can end up on either core including Core#0. It will change during and between flights as well. The important thing to understand is that the MainThread + Fibres = FPS. Restrict the MainThread or the Fibres and you end up with lower FPS.Let’s introduce the Texture&Terrain loaders as well.I use AffinityMask=13 to put the MainThread on Core#0, and a Texture&Terrain loader on Core#2 and Core #3. Here you can see that most Fibres now end up on Core#1, but some share the cores with the T&t loaders.Now back to the original problem. BP=0 not behaving as expected.Lets look at the task manager using BP=0.Look at the full utilization of Core#0 with the MainThread. This is how the MainThread should look like unless we are bottlenecked by something. I smell a PCIe bandwidth limitation. But let’s look closer at the task manager first.Here I pasted together 2 different flights so we can see the MainThread the first 5 minutes of the flight. Look at the times I marked and compare that to the first graph I showed.Without BP=0 we already have fairly high utilization between 114s-242s. Therefore we don’t get much improvement there. Before 114s and after 242s, with BP=0, we use the CPU cycles that are already available on our CPU. That’s why we get the improvement in FPS.Now lets look at a FPS graph showing different PCIe bandwidth and the impact of BP=0.From 114s and onwards you can see that BP=0 has roughly the same effect as doubling the PCIe bus bandwidth. Before 114s it’s more than 4 times PCIe bandwidth. How come it differs? This took me a lot of head scratching to figure out an explanation for. But this is my answer. Before 114s we have downtown Seattle in view with loads of custom buildings. After 114s we have passed down town. The heavy scenery downtown needs a lot more PCIe bandwidth. It’s the same thing in FSmark11. BP=0 is simply letting us be purely CPU limited again.Let’s have a look at the MainThread in task manager at the end of the flight using BP=0.It’s not 100%. We are still bottlenecked by the PCIe bus at the end of the flight. PCIe 3.0 should help even with BP=0.I won’t show the different graphs but when we are limited by the PCIe bus, a faster GPU or CPU can still help, but it is giving diminishing returns. So if we use AA a faster GPU is noticeable. And remember, the GPU always have to be faster than the CPU when using BP=0.Let me finally show you this Graph showing the difference between using a PCIe x8 bus without BP=0 and a PCIe x16 bus with BP=0 on a flight using 8xSQAA.Both are fully realistic scenarios on today’s mainstream platforms with only 16 lanes from the CPU. Please make sure you use all 16 PCIe lanes to your GPU and use BP=0 if you can. In that case you can actually get use of your CPU.

Share this post


Link to post
Share on other sites

Hi Lars,Good to see you back. Extremely interesting to say the least! Many of us have probably noticed the symptoms of BP=0 without knowing the causes. I am curious what changes happen when hyperthreading is a factor, if any. Thank you for your work. It challenges me to rethink some of my rethinks!Kind regards,

Share this post


Link to post
Share on other sites

Mods, please make this a sticky. Very useful information in here.

Share this post


Link to post
Share on other sites
My input is that it is actually mostly the PCIe bus that is holding us back and that the BP=0 reduces the need for PCIe bandwidth.
I wonder if the PCIe3 bus will cause about a 10FPS jump. As it seems, the performance jump is about 10 FPS per double in PCIe available bandwidth. Excited to see some Kepler+Ivy systems run through FSmark11! Edited by benorg

Share this post


Link to post
Share on other sites
I wonder if the PCIe3 bus will cause about a 10FPS jump. As it seems, the performance jump is about 10 FPS per double in PCIe available bandwidth. Excited to see some Kepler+Ivy systems run through FSmark11!
I fly without autogen, so PCIe3 may not do anything for me...

Simon

Share this post


Link to post
Share on other sites
Both are fully realistic scenarios on today’s mainstream platforms with only 16 lanes from the CPU. Please make sure you use all 16 PCIe lanes to your GPU and use BP=0 if you can. In that case you can actually get use of your CPU.
How do we confirm/change the PCIe lanes usage?

Share this post


Link to post
Share on other sites

I doubt Keppler+Ivy Bridge will show much change in FSXmark11 on a clock for clock basis as I belive high end systems already get BP=0 put in the .cfg by the tweaking tool(?) so we are already CPU limited by high end systems.Effects of HT do not really change with BP=0. You can more easily see the positive effects FPS get from HT on a Dual core with affinitymask=9 compared to a regular Dual core with affinitymask=3.HT still have it's specific upsides and downsides. And to me the HT is only "worth it" with photo scenery or if you are stuck with a Dual core with HT.

How do we confirm/change the PCIe lanes usage?
I see you have a bloomfield system so you do not have to worry. I think your motherboard always give the main GPU slot 16 lanes. Othervise you can read it in GPU-Z or CPU-Z what you have at the moment. Edited by SAAB340

Share this post


Link to post
Share on other sites
Guest jahman
it is actually mostly the PCIe bus that is holding us back
That's what Phil Taylor has been saying all along. (Back-of-the-Envelope calculation here.)Cheers,- jahman.

Share this post


Link to post
Share on other sites
That's what Phil Taylor has been saying all along. (Back-of-the-Envelope calculation here.)Cheers,- jahman.
Wow! How did he know that! hahaAnyways, it seems apparent to me that we should see about a 10-13FPS increase in heavy autogen moving from PCIeV2---> PCIeV3.I would call that quite a jump and for most people and with combination of an IB @ 5GHZ and 8GB of DDR3@2133 JK[3-3-3-14]JK I personally think that we will be truly able to run FSX even higher than ever. Seems that all of these upgrades with PCIe3(with a fast card) and IB should create about a 15% jump in performance in most scenarios.Now, how about Haswell? Big%20Grin.gifI think that I just had an epiphany! I understand why my FPS sucks around ORBX+Max Autogen! Thanks so much for the link Jahman! Edited by benorg

Share this post


Link to post
Share on other sites
Guest jahman
Thanks so much for the link Jahman!
You're welcome!Cheers,- jahman.

Share this post


Link to post
Share on other sites
Mods, please make this a sticky. Very useful information in here.
These are only speculations. Very good ones, but none the less only guesses. PCIex3 is not here yet, comparisons are not final.For now going to keep it as a discussion.

Share this post


Link to post
Share on other sites

I too thought that this was well presented, even if a little light on technical facts.For instance there's no real mention of hardware specifications, so we can't tell if the results are universal or hardware specific.I know that ******* does not recommend the BP=0 with a card with less than 2GB VRAM so is this important wrt thr results presented? Will this work with a 1GB or 3GB card? Now we know that BP affects the cpu ie not just FSX, so were any artefacts outside of FSX experienced.There is no mention of PCIE2 so what was used PCIE1 (unlikely).Someone gives a link to Phil Taylor's blog re PCIE but his calculations appear to be based on PCE1"PCIE2 from Google: The PCIe 2.0 standard doubles the transfer rate compared with PCIe 1.0 to 5 GT/s and the per-lane throughput rises from 250 MB/s to 500 MB/s. This means a 32-lane PCI connector (×32) can support throughput up to 16 GB/s aggregate." So are we saying that we are still bottlenecked at this level of transfer rate?To me the biggest performance issues in FSX are the fragmentation of the VAS (which increases with time and can slow data transfer/instructions to the working set) and the poor coding of FSX which utilises unalligned memory access ('in spades' according to aforesaid P Taylor). If only we could reprogram FSX to utilise aligned memory access that would be a major step in increasing performance. It can be done, but I believe it is byte by byte, so we need a young volunteer!I think that this is a good post but we do need a few more technical facts so we can judge the reported improvements a little better.Well done OP!RegardsPeterHPS I don't know why this type face has descended into minute lillipution - did I press the wrong key? Ah! that's better I can suddenly re-size.

Share this post


Link to post
Share on other sites
Guest jahman
PS I don't know why this type face has descended into minute lillipution - did I press the wrong key?
No, the BB text editor is buggy and self-clicks certain editor buttons of its own volition, In your case it was either the subscript or superscript button, so you're in time to edit your post and revert back to standard type.Cheers,- jahman.

Share this post


Link to post
Share on other sites

Lars you could run this tool during your test runshttp://forum.avsim.net/topic/360449-dx-explorer-excellent-tool-for-in-depth-analysis/It would give you even more information on what's going on, dx calls etc.Regards,Markus

Share this post


Link to post
Share on other sites
I see you have a bloomfield system so you do not have to worry. I think your motherboard always give the main GPU slot 16 lanes. Othervise you can read it in GPU-Z or CPU-Z what you have at the moment.
Yes, you are right. After further research I see my mobo dedicates 16 lanes as long as the proper slots are used in the right sequence.

Share this post


Link to post
Share on other sites

Create an account or sign in to comment

You need to be a member in order to leave a comment

Create an account

Sign up for a new account in our community. It's easy!

Register a new account

Sign in

Already have an account? Sign in here.

Sign In Now
Sign in to follow this  

  • Tom Allensworth,
    Founder of AVSIM Online


  • Flight Simulation's Premier Resource!

    AVSIM is a free service to the flight simulation community. AVSIM is staffed completely by volunteers and all funds donated to AVSIM go directly back to supporting the community. Your donation here helps to pay our bandwidth costs, emergency funding, and other general costs that crop up from time to time. Thank you for your support!

    Click here for more information and to see all donations year to date.
×
×
  • Create New...