Jump to content
Sign in to follow this  
Ray Proudfoot

The importance of MAX_TEXTURE_REQUEST_DISTANCE

Recommended Posts

And this is why manually setting AffinityMasks is a fool's errand. Especially on your machine, you have 24 cores and a kernel scheduler that knows how to allocate them. Let it do its job.

As an example, my new 7800X3D is rocking at 70fps, and 2-4 cores are sleeping since they don't have enough to do. If you have more than 8 cores, you won't be starved for CPU time.

Cheers


Luke Kolin

I make simFDR, the most advanced flight data recorder for FSX, Prepar3D and X-Plane.

Share this post


Link to post

@Luke, I followed the advice given by @SteveW in this post and it matches that he has given over a long time. I still assign the three major elements of P3D as recommended by LM.

I have no need for 70fps. I’ve found that running a CPU flat out does indeed give impressive fps but at the expense of micro-stutters as the system runs at 100%.

I limit fps to the refresh rate of my monitor - 30Hz. That allows me to have higher settings and also more Ai at busy airports. I did manage to max out performance at 30fps at Heathrow with 220 Ai in the 80nm bubble. Reducing it to 160 keeps fps at a constant 30 at all major airports.

I use RTSS to monitor all cores and yes, most are working most of the time. I rarely see core0 hit 100%. I’m delighted with performance.

  • Like 1

Ray (Cheshire, England).
System: P3D v5.3HF2, Intel i9-13900K, MSI 4090 GAMING X TRIO 24G, Crucial T700 4Tb M.2 SSD, Asus ROG Maximus Z790 Hero, 32Gb Corsair Vengeance DDR5 6000Mhz RAM, Win 11 Pro 64-bit, BenQ PD3200U 32” UHD monitor, Fulcrum One yoke.
Cheadle Hulme Weather

Share this post


Link to post
17 hours ago, Ray Proudfoot said:

I have no need for 70fps. I’ve found that running a CPU flat out does indeed give impressive fps but at the expense of micro-stutters as the system runs at 100%.

I limit fps to the refresh rate of my monitor - 30Hz. That allows me to have higher settings and also more Ai at busy airports. I did manage to max out performance at 30fps at Heathrow with 220 Ai in the 80nm bubble. Reducing it to 160 keeps fps at a constant 30 at all major airports.

I use RTSS to monitor all cores and yes, most are working most of the time. I rarely see core0 hit 100%. I’m delighted with performance.

Sorry I wasn't clear. My point was not to suggest that you should run at 70fps. I was just doing that to test out the new rig (7800X3D + 3080) and see what it can do - on subsequent flights I'm now reducing the frames to reduce power utilization and heat.

Rather, I'm trying to demonstrate that a modern, high-end CPU has a surplus of computing power for even P3D at high settings, and that optimizing core usage (especially for add-ons) is more likely to lead to decreased performance when one makes a mistake and inadvertently starves and add-on of needed CPU time, as you did. You're far more likely to reduce performance, not enhance it.

On a side note, while I've allocated the three worker threads to different cores, I'm also skeptical of any effect. I have the Main Worker Thread assigned to what Ryzen Master calls my "best" core (C02) and while it's heavily loaded it's still averaging 1.2Ghz below max boost - it just doesn't have enough work to do. Since I have a very good frame rate recorder built into my VAs ACARS software I'll see if I can turn those settings off and see if it makes a difference.

Cheers

Edited by Luke

Luke Kolin

I make simFDR, the most advanced flight data recorder for FSX, Prepar3D and X-Plane.

Share this post


Link to post
19 hours ago, Luke said:

And this is why manually setting AffinityMasks is a fool's errand. Especially on your machine, you have 24 cores and a kernel scheduler that knows how to allocate them. Let it do its job.

As an example, my new 7800X3D is rocking at 70fps, and 2-4 cores are sleeping since they don't have enough to do. If you have more than 8 cores, you won't be starved for CPU time.

Cheers

I disagree.  With P3D on my 13900KS, I use the AM to force the main thread to run on my fastest core, which is clocked independently and higher than the others to 6 GHz.  I have HT turned off (and four e-cores disabled to reduce heat) because 20 physical cores is more than enough to handle the workload, and that way the cores aren't sharing cache memory with an unloaded or lightly-loaded virtual CPU.  On older CPUs like my 10900K that can't set HT independently per-core, I use AM to block the OS from trying to put additional threads on the virtual CPU paired with the core that the main thread runs on.  If I let the OS "do its job" without an AM, it loads threads onto the vCPU that shares the core running the main thread, and in the process reduces performance, more evident by the appearance of microstuttering than by a meaningful reduction in frame rate.

P3D can still max out the main thread on a modern high-end CPU, e.g. with a lot of AI traffic.  I'm not saying it averages 100%, but rather that utilization is maxxed-out often enough to produce stuttering.  Keeping some headroom available on the main thread's core so that workload peaks don't drive the core all the way up to the wall has long been a key to smooth, stutter-free performance in P3D.  If you were to graph demand alongside performance, you'd see demand peak up above the 100% line, and on the performance side a flat plateau where the CPU has maxxed out...in analog amplification waveforms it's commonly known as "clipping", e.g. the output waveform has its peaks "clipped" off because the amplifier can't produce enough power to keep up.

Optimizing the performance on one core and assigning it to the main thread via an AM setting does make a difference for me.  And keeping demand under control to leave headroom on the main thread also makes a difference.  There still isn't a CPU made that can run P3D all-out with utter impunity...some configuration control is still needed for smooth performance.

  • Like 1

Bob Scott | President and CEO, AVSIM Inc
ATP Gulfstream II-III-IV-V

System1 (P3Dv5/v4): i9-13900KS @ 6.0GHz, water 2x360mm, ASUS Z790 Hero, 32GB GSkill 7800MHz CAS36, ASUS RTX4090
Samsung 55" JS8500 4K TV@30Hz,
3x 2TB WD SN850X 1x 4TB Crucial P3 M.2 NVME SSD, EVGA 1600T2 PSU, 1.2Gbps internet
Fiber link to Yamaha RX-V467 Home Theater Receiver, Polk/Klipsch 6" bookshelf speakers, Polk 12" subwoofer, 12.9" iPad Pro
PFC yoke/throttle quad/pedals with custom Hall sensor retrofit, Thermaltake View 71 case, Stream Deck XL button box

Sys2 (MSFS/XPlane): i9-10900K @ 5.1GHz, 32GB 3600/15, nVidia RTX4090FE, Alienware AW3821DW 38" 21:9 GSync, EVGA 1000P2
Thrustmaster TCA Boeing Yoke, TCA Airbus Sidestick, 2x TCA Airbus Throttle quads, PFC Cirrus Pedals, Coolermaster HAF932 case

Portable Sys3 (P3Dv4/FSX/DCS): i9-9900K @ 5.0 Ghz, Noctua NH-D15, 32GB 3200/16, EVGA RTX3090, Dell S2417DG 24" GSync
Corsair RM850x PSU, TM TCA Officer Pack, Saitek combat pedals, TM Warthog HOTAS, Coolermaster HAF XB case

Share this post


Link to post
1 hour ago, Luke said:

Rather, I'm trying to demonstrate that a modern, high-end CPU has a surplus of computing power for even P3D at high settings, and that optimizing core usage (especially for add-ons) is more likely to lead to decreased performance when one makes a mistake and inadvertently starves and add-on of needed CPU time, as you did. You're far more likely to reduce performance, not enhance it.

I agree a 13900K can handle multi processes with ease with sensible P3D settings. But I can’t agree that optimising settings in accordance with recommendations from LM is detrimental to performance. I rarely see core0 hit 100%.

Allocating the correct number of VPs to executables like Active Sky, Pollypot GIT and LNM needs a bit of tweaking but these aren’t high demand programs. I’m getting there.

A night-time approach and landing into T2G Paris De Gaulle was a delight with a solid 30fps all the way in despite low cloud and 90+ Ai aircraft.


Ray (Cheshire, England).
System: P3D v5.3HF2, Intel i9-13900K, MSI 4090 GAMING X TRIO 24G, Crucial T700 4Tb M.2 SSD, Asus ROG Maximus Z790 Hero, 32Gb Corsair Vengeance DDR5 6000Mhz RAM, Win 11 Pro 64-bit, BenQ PD3200U 32” UHD monitor, Fulcrum One yoke.
Cheadle Hulme Weather

Share this post


Link to post
3 minutes ago, Ray Proudfoot said:

I agree a 13900K can handle multi processes with ease with sensible P3D settings. But I can’t agree that optimising settings in accordance with recommendations from LM is detrimental to performance.

I may have missed it. Where does LM recommend adjusting process affinity for processes other than P3D?

Cheers


Luke Kolin

I make simFDR, the most advanced flight data recorder for FSX, Prepar3D and X-Plane.

Share this post


Link to post
8 minutes ago, Luke said:

I may have missed it. Where does LM recommend adjusting process affinity for processes other than P3D?

Cheers

How have you determined that? I made no such claim.


Ray (Cheshire, England).
System: P3D v5.3HF2, Intel i9-13900K, MSI 4090 GAMING X TRIO 24G, Crucial T700 4Tb M.2 SSD, Asus ROG Maximus Z790 Hero, 32Gb Corsair Vengeance DDR5 6000Mhz RAM, Win 11 Pro 64-bit, BenQ PD3200U 32” UHD monitor, Fulcrum One yoke.
Cheadle Hulme Weather

Share this post


Link to post
15 minutes ago, Ray Proudfoot said:

How have you determined that? I made no such claim.

Then you have misunderstood what I am saying: there is no point setting custom affinity masks for external processes with a modern processer. I don't think that goes against any LM recommendation.

Cheers

 


Luke Kolin

I make simFDR, the most advanced flight data recorder for FSX, Prepar3D and X-Plane.

Share this post


Link to post
43 minutes ago, Luke said:

Then you have misunderstood what I am saying: there is no point setting custom affinity masks for external processes with a modern processer. I don't think that goes against any LM recommendation.

Cheers

 

What would you recommend for them? Use all cores?


Ray (Cheshire, England).
System: P3D v5.3HF2, Intel i9-13900K, MSI 4090 GAMING X TRIO 24G, Crucial T700 4Tb M.2 SSD, Asus ROG Maximus Z790 Hero, 32Gb Corsair Vengeance DDR5 6000Mhz RAM, Win 11 Pro 64-bit, BenQ PD3200U 32” UHD monitor, Fulcrum One yoke.
Cheadle Hulme Weather

Share this post


Link to post
5 minutes ago, Ray Proudfoot said:

What would you recommend for them? Use all cores?

Whatever core is available, yes. The people who write the Windows and Linux schedulers know what they are doing.

Cheers


Luke Kolin

I make simFDR, the most advanced flight data recorder for FSX, Prepar3D and X-Plane.

Share this post


Link to post
1 hour ago, Luke said:

Whatever core is available, yes. The people who write the Windows and Linux schedulers know what they are doing.

The people that coded Windows have crafted a broad one-size-fits-all scheduling solution that isn't necessarily always optimum.  Sometimes, additional tailoring of the scheduling priorities can be useful.

I don't think Windows makes any special accomodation in its scheduling for real-time processes...there are use cases that warrant keeping other processes separated from them.

That said, the loading on the P3D threads running the less-critical texture loading and other ancillary threads rarely rises to the point where sharing with a small number of low-load external (to the sim) processes would affect performance.  Most texture processing is done ahead of time by lookahead code, anyway.  Restricting the external stuff to the slower e-cores gives P3D more headroom on the faster p-cores on the 13900K.  Though there may not be any appreciable gains, I don't see any downside in doing it that way.

 


Bob Scott | President and CEO, AVSIM Inc
ATP Gulfstream II-III-IV-V

System1 (P3Dv5/v4): i9-13900KS @ 6.0GHz, water 2x360mm, ASUS Z790 Hero, 32GB GSkill 7800MHz CAS36, ASUS RTX4090
Samsung 55" JS8500 4K TV@30Hz,
3x 2TB WD SN850X 1x 4TB Crucial P3 M.2 NVME SSD, EVGA 1600T2 PSU, 1.2Gbps internet
Fiber link to Yamaha RX-V467 Home Theater Receiver, Polk/Klipsch 6" bookshelf speakers, Polk 12" subwoofer, 12.9" iPad Pro
PFC yoke/throttle quad/pedals with custom Hall sensor retrofit, Thermaltake View 71 case, Stream Deck XL button box

Sys2 (MSFS/XPlane): i9-10900K @ 5.1GHz, 32GB 3600/15, nVidia RTX4090FE, Alienware AW3821DW 38" 21:9 GSync, EVGA 1000P2
Thrustmaster TCA Boeing Yoke, TCA Airbus Sidestick, 2x TCA Airbus Throttle quads, PFC Cirrus Pedals, Coolermaster HAF932 case

Portable Sys3 (P3Dv4/FSX/DCS): i9-9900K @ 5.0 Ghz, Noctua NH-D15, 32GB 3200/16, EVGA RTX3090, Dell S2417DG 24" GSync
Corsair RM850x PSU, TM TCA Officer Pack, Saitek combat pedals, TM Warthog HOTAS, Coolermaster HAF XB case

Share this post


Link to post

I'll disagree with Luke also for couple of reasons:

1.  LM updated their thread management with these:

[JobScheduler]
AffinityMask=65535
P3DCoreAffinityMask=65535
MainThreadScheduler=6
RenderThreadScheduler=2
FrameWorkerThreadScheduler=4

2.  I've used process lasso and/or other tools to manage external process assignment and it's resulting benefits are fewer long frames (more consistent frame times).  There is no miracle increase (noticeable) in FPS but it will produce far few pauses/stutters.

From my testing results, Windows thread scheduling is still pretty primitive.  Quoting this from another source:

Quote

“WrAlertByThreadID” is a system thread that is used to wake up threads that are waiting for an alertable state. An alertable state is a state in which a thread can be awakened by an asynchronous procedure call (APC) or an I/O completion routine. When a thread is waiting for an alertable state, it is waiting for an event that will cause it to wake up and perform some action.

When a thread enters an alertable state, it is placed on a wait queue. The “WrAlertByThreadID” thread periodically checks the wait queue to see if any threads are waiting for an alertable state. If it finds any threads on the wait queue, it sends them an APC to wake them up. Lastly, the TID is the Unique Thread Identifier that we briefly covered before.

There’s so many more elements to cover, however I’m going to leave it there today. Examining thread activity is especially important if you are trying to determine why a process that is hosting multiple services is running (such as Dllhost.exe, Svchost.exe, or Lsass.exe) or why a process has stopped responding. The best tools to review threads on a system are WinDbg, Performance Monitor, and the one I just demonstrated Process Explorer (sysinternals).

The two other main topics I wish to review today are thread scheduling and thread pools, we’re going to cover these briefly which will not do it justice.

Thread scheduling in windows is the process of assigning processor time slices to threads. Threads are scheduled for execution based on their priority. All threads are assigned processor time slices by the operating system. Windows implements a priority driven, preemptive scheduling system. At least one of the highest-priority ready threads always run, with the caveat that certain high-priority threads ready to run might be limited by the processors on which they might be allowed or preferred to run — phenomenon called processor affinity.

After a thread is selected to run, it runs for an amount of time called a quantum. A quantum is the length of time a thread is allowed to run before another thread at the same priority level is given a turn to run. Key to understanding thread scheduling algorithms is understanding priority levels:

spacer.png

Thread priority levels are assigned from two different perspectives: those of the Windows API and those of the Windows kernel. The Windows API first organizes processes by the priority class to which they are assigned at creation:

A thread pool is a collection of worker threads created and managed by the system that efficiently execute asynchronous callbacks on behalf of the application. Additionally, there are waiter threads that wait on multiple wait handles (callback to us looking at WaitReason), a work queue, a default thread pool for each process, and a worker factory that manages the worker threads.

Worker factories refer to the internal mechanism used to implement user-mode thread pools. By default, each thread pool has a maximum of 500 worker threads. The thread pool attempts to create more worker threads when the number of worker threads in the ready/running state must be less than the number of processors.

Windows supports preemptive multitasking, which creates the effect of simultaneous execution of multiple threads from multiple processes. On a multiprocessor the system can simultaneously execute as many threads as there are processors on the computer.

 

 

Share this post


Link to post
19 hours ago, Bob Scott said:

The people that coded Windows have crafted a broad one-size-fits-all scheduling solution that isn't necessarily always optimum.  Sometimes, additional tailoring of the scheduling priorities can be useful.

Before we go on, I want to point out one thing - I've been working with high-volume, low latency software in both Windows and Linux since 2005. That's not an appeal to authority (well, maybe a little bit), instead it's a prelude to this - I've never worked with software outside of FSX and P3D (essentially the ESP engine) where you were able to set thread affinity, never mind have it as actively advocated as much as it is. (And I've never, ever ever seen folks trying to actively set affinity for other software like Ray and many others do).

Windows is run on millions of machines across the world. Despite Linux' popularity in data centers, there are also millions of servers running Windows in applications that require extremely high bandwidth or extremely low latency. No one seems to have a significant problem with the Windows scheduler. Except people here.

The only time the Windows or Linux schedulers seem to have issues is when changes to the core assumptions around CPUs come out, like SMT or big/little cores. Even then, it gets patched and people move on. Windows has known and understood SMT for almost a quarter century now.

19 hours ago, Bob Scott said:

I don't think Windows makes any special accomodation in its scheduling for real-time processes...there are use cases that warrant keeping other processes separated from them.

Schedulers are by and large the same - they balance a number of critieria when deciding when a thread can execute, and where. How long has it been starved for CPU? Are there higher priority threads waiting? What CPU was it last run on? Is that CPU busy? Is that CPU powered up, or sleeping? Sometimes it might be faster overall to wait a few more cycles for a CPU that's currently busy if it means that I don't need to power up another one or the code/data is likely to be in cache.

That's my long-winded way of saying that yes, Windows does take priority into account. It won't schedule other tasks onto a fully loaded core when others are sitting idle. The scheduler knows an amazing amount about each thread and where it can and should be run and dynamically decides millions of times a second - far smarter than a bunch of old farts like us who use an exceptionally crude tool like affinity masks set statically.

11 hours ago, CO2Neutral said:

I'll disagree with Luke also for couple of reasons:

1.  LM updated their thread management with these:

I am unsure how that is a disagreement with me. How many other throughput and latency sensitive applications (of which there are thousands if not millions) have similar options? Even amongst flight simulators, how many have this? Does X-Plane? Does MSFS? If the kernel scheduler was as bad as many here claim, you would think that setting thread affinity would be almost universal.

I'm also curious how many of you have done blind testing. There's a reason it exists, otherwise we're at the same level as audiophiles claiming that precious metal connectors on their HDMI cables increase the richness of sound. I'm at least doing quantitative testing.

Cheers

 

 

  • Like 1

Luke Kolin

I make simFDR, the most advanced flight data recorder for FSX, Prepar3D and X-Plane.

Share this post


Link to post
On 11/22/2023 at 2:14 PM, Luke said:

And this is why manually setting AffinityMasks is a fool's errand. Especially on your machine, you have 24 cores and a kernel scheduler that knows how to allocate them. Let it do its job.

As an example, my new 7800X3D is rocking at 70fps, and 2-4 cores are sleeping since they don't have enough to do. If you have more than 8 cores, you won't be starved for CPU time.

Cheers

If this is true, there are a lot of fools here 😆

That said, if I don’t set an affinity mask, manually, on my 12900k HT on, P3d V 5.4 setup I will get microstutters every time. I have tried running the stock P3d cfg which works pretty good but manually set is much better. SteveW has done an enormous amount of research on this and has helped an equal amount of people achieve an as close to fluid experience as the current hardware will allow.

Have a great post Thanksgiving day!

Share this post


Link to post
On 11/23/2023 at 5:57 PM, Bob Scott said:

I disagree.  With P3D on my 13900KS, I use the AM to force the main thread to run on my fastest core, which is clocked independently and higher than the others to 6 GHz.  I have HT turned off (and four e-cores disabled to reduce heat) because 20 physical cores is more than enough to handle the workload, and that way the cores aren't sharing cache memory with an unloaded or lightly-loaded virtual CPU.  On older CPUs like my 10900K that can't set HT independently per-core, I use AM to block the OS from trying to put additional threads on the virtual CPU paired with the core that the main thread runs on.  If I let the OS "do its job" without an AM, it loads threads onto the vCPU that shares the core running the main thread, and in the process reduces performance, more evident by the appearance of microstuttering than by a meaningful reduction in frame rate.

P3D can still max out the main thread on a modern high-end CPU, e.g. with a lot of AI traffic.  I'm not saying it averages 100%, but rather that utilization is maxxed-out often enough to produce stuttering.  Keeping some headroom available on the main thread's core so that workload peaks don't drive the core all the way up to the wall has long been a key to smooth, stutter-free performance in P3D.  If you were to graph demand alongside performance, you'd see demand peak up above the 100% line, and on the performance side a flat plateau where the CPU has maxxed out...in analog amplification waveforms it's commonly known as "clipping", e.g. the output waveform has its peaks "clipped" off because the amplifier can't produce enough power to keep up.

Optimizing the performance on one core and assigning it to the main thread via an AM setting does make a difference for me.  And keeping demand under control to leave headroom on the main thread also makes a difference.  There still isn't a CPU made that can run P3D all-out with utter impunity...some configuration control is still needed for smooth performance.

Hi Bob, I have the same CPU as you, would you be able to provide what your AM settings are please? I'll need to disable my HT but it would be extremely handy to know what you set. My CPU runs extremely hot on P3d 5.4. Thanks!

Share this post


Link to post

Create an account or sign in to comment

You need to be a member in order to leave a comment

Create an account

Sign up for a new account in our community. It's easy!

Register a new account

Sign in

Already have an account? Sign in here.

Sign In Now
Sign in to follow this  
  • Tom Allensworth,
    Founder of AVSIM Online


  • Flight Simulation's Premier Resource!

    AVSIM is a free service to the flight simulation community. AVSIM is staffed completely by volunteers and all funds donated to AVSIM go directly back to supporting the community. Your donation here helps to pay our bandwidth costs, emergency funding, and other general costs that crop up from time to time. Thank you for your support!

    Click here for more information and to see all donations year to date.
×
×
  • Create New...