Sign in to follow this  
Noel

Unbelievably smooth performance--question re cloud/terrain shadow impact

Recommended Posts

With some recent minor changes, ones known that I hadn't tried before because performance was quite decent already, but wow, unreal now.   I'm telling you there isn't the slightest hint of a stutter, micro stutter or what have you.    That is with the PMDG 777 in Nor Cal & So Cal FTX scenery, at major terminals.  I'm shocked truly, and this is a 5 y/o system, albeit strong for its day.

 

So really at this point the only thing that dampens perfection in video smoothness is when there are lots of clouds AND moderate levels of terrain/cloud shadowing.   It's not too much of a practical problem because I normally run clouds out to 100nm and to Maximum density, and in this instance there is no need for shadowing, so I turn it off.  With low amounts of clouds w/ cloud/terrain shadowing all's still liquid smooth.  My only question is:  are there specific cloud textures that are better on this issue?  I'm using P3D's MSAA at 4x, FXAA is OFF.    The three major changes that took me from very good to excellent total performance is:  1, UNLIMITED frames (have done this off and on for years, but conclude w/ the two other changes this is clearly preferred); 2, setting monitor refresh to 30 w/ VSYNC enabled in sim; 3, setting the affinity mask to 4052 per SteveW.  With these changes seemingly only the shadowing comes into play overall.  In the most complex of settings for me, KSFO or KLAX in FTX regional scenery, fps has gone briefly down to 20 and we lose perfect smoothness, but it's quite brief overall.

 

So anything to be done to improve how the sim copes w/ shadowing?  I'm using P3D 3.0 still and prefer to keep it that way for a while.

 

Thanks!

Share this post


Link to post
Help AVSIM continue to serve you!
Please donate today!

Thanks for the info Noel, that's really encouraging.

 

Funnily enough I've just bought a new bigger SSD so will download and do a full install of the latest P3D version. I usually run with unlimited FPS and 4 x SGSS in NVI, but I'll give your tweaks a try out before I add on addons. Its been a while since I bothered with an Affinity Mask, I've never full understood it and there seems so much conjecture about whether it works or not, however isn't the value dependant on what hardware you have?

 

cheers

 

Ian

Share this post


Link to post

Yes the AM value of 4052 is for 6-core hyper threaded processors only.  

 

I'm really shocked at this improvement.  It's always been decent, but the lack of even micro-stutters to this degree is new for me.   Look for SteveW here maybe he has a suggestion for your specific processor.  I had really assumed it was setup as good as possible, but it seems now I'm seeing a really significant increase in total performance including frame rate in complex areas, terrain texture update rate, and freedom from micro stutter.  There will be on occasion some hitches when turning on the taxiway at times, but this I think may have been done by something related to the GPU, like shadowing etc.  But overall, in the most demanding scenarios, it's very close to perfection.   It's hard to say exactly what role each part is playing, but VSYNC w/ a 30mHz refresh seems to be huge for the smoothness piece.  Running at UNLIMITED feeds the most frames possible under the circumstances and that is important if you are running VSYNC/30mHz screen refresh, and most often I'm seeing the built-in frame rate counter display exactly 30.0 almost always--until it's super demanding, then down to 24-25, and the worst so far was on take off in the PMDG 777 RWY27L KSFO-HD in FTX NCA regional, and scenery sliders close to maxed.  The new affinity mask seems to have worked w/ everything else to maximize total performance, and it's palpable, all of these working together.

 

One odd caveat I need to post about in case someone knows.  Despite the best ever performance, I notice now road traffic is more jerky at times than I recall, or perhaps it's just in contrast to everything else.  While airport ground traffic is absolutely smooth now, road traffic gets choppy.

 

Pretty cool to be 5y out from the last build and get this out of it--the last week has been a complete surprise.  And I agree, it's inspiring to know it's possible, even if elusive!

 

Good Luck to you Ian!

Share this post


Link to post

Maybe Steve can explain why 4052..

 

4052 = 11 11 11 01 01 00

 

2 logical cores for addons. You probably do not have much addons.

 

I am still on 340 which was the best for my setup with al kinds of addons.

  • Upvote 1

Share this post


Link to post

Maybe Steve can explain why 4052..

 

4052 = 11 11 11 01 01 00

 

2 logical cores for addons. You probably do not have much addons.

 

I am still on 340 which was the best for my setup with al kinds of addons.

 

I was at 4092 previously, so SteveW mentions this:

 

4092=11,11,11,11,11,00 on the right two zeros show the sim won't be running on those, but it is running on every other LP. Next coming in from the right you have two ones, showing that the sim will start it's two primary jobs on those two LPs each happen to be the same core, and so mathematically speaking this setup has no chance against one whole core each as in 4052=11,11,11,01,01,00

You can see the first two ones exist on a core to themselves they are responsible for rendering. The others can exist on cores together since they are data gathering not rendering, these jobs take seconds to complete. You can't muck the renderer about in the same way. The data gatherers can exist on the same core but they share bandwidth of the core. Your AM put's two primary jobs to share a core.

 

I was not aware sharing a core over two LPs made a difference, and quite clearly it did negatively impact rendering performance.  I manually assign everything else outside of P3D to every LP except 3,4,5,6 which are for exclusive use by the renderer.   I don't think there is a compelling reason to restrict all other processes to the 1st two LPs, but maybe there is.  I have assumed the workload for everything, including terrain texture loading, except the main thread could be shared w/o significant impact, and looking at CPU utilization seems to support that, i.e. I never see anywhere near 100% on all the other LPs.

Share this post


Link to post

Thanks Noël.

 

To which cores have you assigned your addons too.

Share this post


Link to post

How are you limiting vsync to 30 refresh rate?

Share this post


Link to post

Thanks Noël.

 

To which cores have you assigned your addons too.

 

Every process, which includes all add ons, that are assignable I have set to 12,11,10,9,8,7,6_,_,_,_,2,1

 

I think the big gain came from giving both LPs per cores 1 & 2 instead of one LP per core for exclusive use by P3D.  At least that is how I understand it.

Share this post


Link to post

Did you assign one addon to 2 cores and then devided your addons over all cores , or did you assign every addon to all those cores ?

 

Thanks

Share this post


Link to post

Maybe Steve can explain why 4052..

 

4052 = 11 11 11 01 01 00

 

2 logical cores for addons. You probably do not have much addons.

 

I am still on 340 which was the best for my setup with al kinds of addons.

I hope this helps:

 

Looking at the binary first we have at least four LPs assigned to the sim so the rendering stage is at its leanest; which means we have the least code running per core. We can allocate more LPs but they only gain a speed advantage in loading the scenario, they cannot increase the rendering speed. More than four LPs however gives the first two jobs more to do more often, in effect reduces the rendering performance.

 

Look on the right in the binary and we have one LP per core on the first two LPs. The first two LPs being the rendering stages and so getting maximum performance. The divided up jobs three and four are allocated across the remaining cores giving maximum scenario loading speed, We need not allocate as many LPs as we can find since after a point there will be no gain. First try 6 LPs giving two to the renderer and four to the loading stages. Check with a stopwatch and repeat the scenario load see that six gains an increase in scenario loading speed over four in most cases - adding more may not yield any benefit - check.

 

Gerard uses 340 = 00,01,01,01,01,00 gives four straight cores to the sim and provides maximum rendering performance. So long as addon exe apps are kept away from those cores the sim should perform great. 340 came up in my testing as maximum on 6 cores giving the first and last core to other apps which suits the jobsheduler well. Adding LPs to 340 with say 1364 = 01,01,01,01,01,00 gives five straight cores to the sim, the rendering still only gets the first two but now with the extra core undertaking loading tasks the scenario loads in a few seconds less time. However the rendering is affected slightly and there's more sim cores spread out on the CPU to coincide with other processes and lose performance.

 

Any processes external to sim processes finding themselves running on those LPs allocated to the sim will cause that LP into switching time losses that would not be so great if the external process was on the sister LP, even better a core unused by the sim. This is one of the reasons we see such differences across systems.

 

6 core +HT suggestions try:

00,01,01,01,01,00=340

or

00,11,11,01,01,00=245

01,00,00,00,00,01=addons - give exe apps combination of two LPs min

or

01,01,01,01,01,00=1364

10,10,10,00,00,01=addons - give exe apps combination of two LPs min

 

leaves an LP of core zero free for unexected system activity

 

mixed processes don't harm the sim background tasks so bad as they do the first two jobs/LPs.

 

 

 

With P3D the VSync=On in Display Settings coupled with Unlimited on the fps Slider control effects an fps "Limit" on the frame rate output to the refresh rate of the monitor. With the Slider set to an fps value other than Unlimited introduces look-ahead frame buffering. When this is going a slight stutter will use up one to three look ahead frames from the buffer (the default max set is three). Even if one frame is lost from the buffer a great deal of time can be taken to fill the buffer back up since it became utilised through lack of performance. If you can see 40+ fps at all times with Unlimited VSync=Off you can try Locking at 20fps on the slider.

Share this post


Link to post

 

 


With P3D the VSync=On in Display Settings coupled with Unlimited on the fps Slider control effects an fps "Limit" on the frame rate output to the refresh rate of the monitor. With the Slider set to an fps value other than Unlimited introduces look-ahead frame buffering. When this is going a slight stutter will use up one to three look ahead frames from the buffer (the default max set is three). Even if one frame is lost from the buffer a great deal of time can be taken to fill the buffer back up since it became utilised through lack of performance. If you can see 40+ fps at all times with Unlimited VSync=Off you can try Locking at 20fps on the slider.

 

LoL Steve, could you please explain this part again for dummys...I cant follow ....Sorry

 

Thank you

 

McDan out

Share this post


Link to post

The GPU can use various ways of holding a frame (the picture) in memory. There's a bit for while the next frame is being painted by the GPU, there's a bit for the display output circuit to be reading in so it can send onto the display, and there's a bit in the middle that can be used to store frames ahead of time.

 

There can be a combination of techniques involved to produce an outcome. We can use the bit in the middle in the technique called Triple Buffering - a page is drawn on one of two buffers while the output buffer is reading in one of them.

 

In the Fixed fps method the buffer is used to store up to three frames that the physics and positions of moving objects in the sim are computed based on a fixed time between frames. It is important to realise that with say 20fps fixed each frame will be computed such that the moving objects are 1/20s (50ms) along the timeline irrespective of how long the frame takes to draw. Having three in reserve at 20fps = 3 x 50 = 150ms delay. A 150ms stutter depletes the buffer and the next frame will be made with the objects shown in the wrong position with respect to their timeline.

 

The Unlimited VSync and Triple Buffer settings go together, however each next frame of the scene is computed for the objects to be down the timeline of the average time between frames, they are never computed to the time they appear and never appear at the right place in time, unless the frame rate is held constant by the limit of the monitor refresh or by the limit of the sim performance. Sometimes we can get less stutter by reducing the performance of the sim so that it is held against an end-stop in performance and runs with a more consistent fps that way.

Share this post


Link to post

Sorry, what is an LP? I have a i7 4820K which is a quad core processor. If anyone could recommend an affinity mask for this, I'd be happy to give it a try.

 

cheers

 

Ian

Share this post


Link to post

mixed processes don't harm the sim background tasks so bad as they do the first two jobs/LPs.

 

 

Right now all other processes including Windows & the few add ons I have running along w/ P3D are assigned manually are shared w/ P3D on 11,11,11,__,__,00 and as well on the unassigned 1st two LPs.  

 

That was my next question, is it best to give P3D the entire exclusive use of a couple cores beyond the main thread's 2 full cores texture loading and prevent any other exe's from sharing, or mainly just theoretically?  With two add ons and all the other windows processes sharing w/ P3D all except LPs 3,4,5,6 I notice CPU utilization on those LPs is often quite low. 

Share this post


Link to post

Sorry, what is an LP? I have a i7 4820K which is a quad core processor. If anyone could recommend an affinity mask for this, I'd be happy to give it a try.

 

cheers

 

Ian

 

Logical Processor. If you have Hyperthreading turned on, you have 8 LPs -- if off, you have 4.

Share this post


Link to post

Back to the original question:  besides turning shadows off or down, what else improves shadow performance?  My GPU is a 5 y/o GTX Titan so perhaps that's feeling its age.

Share this post


Link to post

"mixed processes don't harm the sim background tasks so bad as they do the first two jobs/LPs."

 

Right now all other processes including Windows & the few add ons I have running along w/ P3D are assigned manually are shared w/ P3D on 11,11,11,__,__,00 and as well on the unassigned 1st two LPs.  

 

That was my next question, is it best to give P3D the entire exclusive use of a couple cores beyond the main thread's 2 full cores texture loading and prevent any other exe's from sharing, or mainly just theoretically?  With two add ons and all the other windows processes sharing w/ P3D all except LPs 3,4,5,6 I notice CPU utilization on those LPs is often quite low.

The background tasks take seconds to complete so although they are obviously affected by other processes sharing the core they are not affecting the timely requirements of the rendering stage which needs to be kept flowing.

 

Taking note of graphs and CPU percentage loads in Task Manager is a big mistake when ascertaining the true impact of other processes on the CPU.

Share this post


Link to post

Well, I was looking at Asus's monitor for each LP, and it seems to be about what I would imagine it would be.

Share this post


Link to post

I found a new achilles heel in the 30mHz monitor refresh:  if frame rate drops below that level, it gets much choppier than leaving the monitor at 60mHz.   I noticed this when flying out of KORD in the NGX.  It got quite jerky again when frames got down to around 22, 1st time I'd seen this.  And the cloud/terrain slider seems to consistent bog down the GPU if set too high.  GTX 1080 would solve this do you think?  GTX Titan now and it appears to be less than 50% of the 3DMark score of the 1080.

Share this post


Link to post

Well, I was looking at Asus's monitor for each LP, and it seems to be about what I would imagine it would be.

That's fine but what I'm saying is there's things to consider, like two LPs of the same core showing 100% each are actually at 50%. Also when looking at an averaging graph consider an example that two cars reach a destination at an average of 50mph, one does the trip at a consistent 50mph while the other repeatedly accelerates up to 200mph and stops, then off to 200mph and stop again.

Share this post


Link to post

Hi Noel,

 

I just started using nVidia Inspector again after reading that a user had set the frame limiter to ~30.5 fps in NI and leaving frames set to unlimited in the sim and I was impressed with the seemingly smooth flight experience even with heavy hitting addons like PMDG's 747 and AS16.  I do recall seeing a post in the past where a user had reduced his refresh to 30MHz so I may give that a try.

 

Steve,

 

Good to see you again!  I have an i74790K (4 core) with HT on giving me 8 physical cores.  I have set the affinity to 116 [ Binary = 01110100 ], reading Noels posts I see that the first 2 cores are left untouched as mine are but the cores on the "backend" are being more aggressively used.  Is there a change that I should make to my AM?  Should I be looking at an AM of 244 [ Binary = 11110100 ] or is there something better?  I have very few addon airports but do have quite a bit of Orbx scenery including Global Base and Vector installed.

 

Using Process Lasso I have assigned TrackIR and a couple of addons to use cores 3,5,6,7 and I am wondering if there is a better way to handle what processes are being handled by which cores.

 

Thanks, 

 

Robert

Share this post


Link to post

Hi Robert, Problem is four cores only go so far so there's going to be contention somewhere. 116 uses three cores for the sim on four LPs, leaves a whole core free. The only other way is to use four cores 85 or 245:

 

01,01,01,01 = 85 dec - sim - best rendering

10,10,00,00 = 2 LPs for apps

 

or

 

11,11,01,01 = 245 dec - sim - best loading

10,10,00,00 = 2 LPs for apps

Share this post


Link to post

Hi Steve, interesting, I will have a try with those settings and see what happens.  Thanks for your feedback, looks like my next build will at least include a 6 core processor.

 

Noel, thanks for starting this thread, it's always insightful when reading about this technology.

 

Robert

Share this post


Link to post

Create an account or sign in to comment

You need to be a member in order to leave a comment

Create an account

Sign up for a new account in our community. It's easy!

Register a new account

Sign in

Already have an account? Sign in here.

Sign In Now
Sign in to follow this