Sign in to follow this  
Noel

Bizarre: dying CPU or Power Supply or other?

Recommended Posts

I have a 4+ y/o SB-E system and 4 drives.  3 of the 4 drives are completely separate, isolated bootable Win 7 drives.  I get to them by using F8 to bring up ASUS BIOS boot drive popup during boot up.  Prepar3D is on an SSD, FWIW, games and web browsing on it's own drive, and a complete clone of the P3D SSD.  Here's what is now happening as of this morning:

 

1.  Gets thru POST and allows me to hit the F8 key to bring up the boot drive popup.

2.  No matter which of 3 boot drives I select, I always get display of, 'Windows has encountered an error during shutdown.....Start Windows normally?  Safe Mode, etc.  EACH of these 3 completely isolated drives does the exact same thing

3.  When I choose Start Windows normally, the initial color swirling balls starts to displays, and instead of then completing this and moving to display the desktop, the mouse freezes in the center of a black screen and never recovers.  I have to turn off power w/ the power switch hold or toggle the PSU off and back on.

 

FWIW, yesterday I had a couple of oddities happen out of the blue:  the next time I booted up after shutting down for several hours I got a chkdsk run before Windows loaded which should never have happened.  Then, I couldn't run Witcher 3 which is installed on the 2nd drive reserved for non flight sim use.  Then an icon on the task bar lost connectivity and the generic non-icon displayed.   In any case the machine has never manifest any problems, but I have run it at 1.32v to 1.355 v for its entire life so maybe the CPU is so degraded it can't boot into Windows.  In any case it's creepy all 3 separate Win 7 drives fail at the exact same place in the entire boot sequence to the Windows desktop.

 

Also of significance:  I've done all of this at normal automated clocking as well as my typical 4.4Ghz overclock--same behavior in either instance.

 

Thoughts?  Have no idea where to turn since it's clearly not OS related.   I have to assume it's either a degraded CPU, and this seems odd because of how each boot drive to win desktop dies in the exact same place, and that is AS Windows 7 is initializing.  Never makes it to the Windows desktop background--just a frozen mouse on a black screen in the center.  Yes, disconnected the mouse and same same.  Very bizarre!

Share this post


Link to post
Share on other sites
Help AVSIM continue to serve you!
Please donate today!

I have a 4+ y/o SB-E system and 4 drives

 

 

Not sure exactly whats going on but I would not rule out your video card as the culprit - heres why my last Titan X went south on me - couldnt figure out why my computer was not starting sometimes - changed my video card to an old one laying around - bam all the problems were gone -  if I were you and you have an old video card around try it - thats what I would put my money on - needless to say that sucker got rma'ed

 

Have you ever overclocked you gpu - if so thats a no no - I never do and they still go south

 

Had to rma my first Titan and my Titan X - no fun - thats why you buy evga ROCK SOLID return policy

Share this post


Link to post
Share on other sites

Hello Noel.

 

Before going too much further, I would Clear your CMOS (there are CMOS problems which can occur after post).  Google for your motherboard to find the suggested method for clearing your CMOS, but many methods are all the same. There are also deeper resets, see the article HERE for more information.

 

If clearing CMOS doesn't help, you can also try swapping memory cards..If you have more than two memory cards, I would swap only one at a time and move them back before swapping the next one.

 

If you do have a hardware problem, it may be the Motherboard instead of the CPU.  You can probably find some freeware diagnostics via Google or contact the manufacturer of your motherboard for assistance.

 

Let's not forget the LED codes on the motherboard as well as Beep codes.

 

Another thing you can try is using some freeware software to monitor your CPU during the boot phase.

 

I realize you're in a dark place, but hopefully something above will help you.

 

Best of luck!

Share this post


Link to post
Share on other sites

Hello Noel,

I think that I would open up the box - and disconnect all the Drives - Then Re Connect them one at a time to perform a boot on each drive to see if it Boots Normally - Then try Booting 2 drives -  Then three - ect. - If You don't have a bad Drive - You may be loosing a Power Supply - Adding the Drives one at a time - will show You the Limit - I seem to be going through the same thing - when I disconnected My Win xp Drive - My Win 7 Drive Boots and Runs Normally - When I disconnect My Win 7 Drive - My Xp Drive Boots and Runs Normally - Some Days You Just have More Fun than Others - Good Luck - Johnman B)

Share this post


Link to post
Share on other sites

Thanks all--sounds like it could be anything.  I just know it happened out of the blue, sort of.   About a week ago I increased vCore to 1.355 in order to avoid a freeze/crash at 4.553Ghz and it was shortly thereafter, within a week or so, that this happened, which seems to point to CPU.  I only did a few parts of flights at 4.553, the rest at my default 4.423 or so which it's always run stably at.  Temps were always fabulous because I use pre-cooled air intake to a Noctua NH-14D which even w/ the 4.553 OC rarely made it to 60C core temps.

 

I will start with the low hanging fruit--clear the BIOS.  Next, try swapping memory modules.  If no go, try a different video card--this is common to all drives.   Disconnecting all but one drive seems like a bit of along shot since they are such low power users but it's a thought. 

 

Motherboard is a definite possibility as well.  If so, may just jump up to Haswell E.  If the low hanging tricks don't tease out a quick cure I'm loathe to try to troubleshoot a dying motherboard, or CPU for that matter.  i don't have the tools I don't think for this.  Was really hoping to hold on until something meaningful arises as all of the I-7 increases have been pretty marginal it seems to me.  Was planning on waiting til 64-bit P3D arrives as well as I'm still on 2.3 which is fine for me.

 

Also, my system is actually almost 3y/o.  I found my emailed invoice for parts.

Share this post


Link to post
Share on other sites

 

 


Thoughts? Have no idea

 

I had a HDD failure on a multiple drive system, one HDD of 3 or 4 forget how many I had.  The drive that failed was not even the OS drive and it caused all kinds of trouble including the chkdisk thing on boot.

Share this post


Link to post
Share on other sites

Here's what I did so far:

  1. Deep CMOS reset:  no joy
  2. Bootable memtest86:  all 32Gb tested perfect.  I concluded swapping module locations made no sense when intensive memory test checks out.  Capacitors all look sterling.
  3. Swapped in another video card:  same issue, all drives stop booting as desktop is preparing to display.  This video card is below power requirement of my Titan GTX (maybe), so concluded probably the power supply is not at fault since the behavior with the new video card causes the same boot failure to any drive.
  4. Disconnected all but my P3D boot drive (contains leaned out Win 7 and all P3D stuff only and a few testing utlilities.
  5. Fedora Linux UEFI to load Intel Desktop Processor Tool w/ the BIOS set at Optimized Defaults (no overclock, all auto).  CPU fails, no matter what the test set.  I ran the utility on another desktop running Win 7 and all CPU functions tested out fine.  I have to conclude the CPU is about to completely give up the ghost.

I guess I'll look for a remaining 3930K, or remaining new 4930K (my ASUS P9X79 WS supports IB-E w/ a BIOS flash).   This is coming in at $500 for a 4930K off of ebay pronounced factory sealed.  It goes up from there to $600 from B&H Photo, and up from there.   There is a company in Fremont Calif called 'starmicro' that is advertising low prices on a few of these--3930K for $295 for example.  No mention of new/used, but I have to assume they are used and have emailed their sales contact to find out.   This turns the issue into plug and play without the need to reinstall two full bootable disks one of which is the usual prolonged restore of Prepar3D and all of its add-on, utilities, the other has several games I used regularly, plus can use my 4 DDR3 2400 modules which always played well on that mobo, and keep my same Noctua NH-D14 cooler which works well.

 

Wish me luck!  Thanks for the troubleshooting tips.  Intel warranty involved maybe?   June 2013 was the purchase month of this 3930K.

Share this post


Link to post
Share on other sites

Here's what I did so far:

  1. Deep CMOS reset:  no joy
  2. Bootable memtest86:  all 32Gb tested perfect.  I concluded swapping module locations made no sense when intensive memory test checks out.  Capacitors all look sterling.
  3. Swapped in another video card:  same issue, all drives stop booting as desktop is preparing to display.  This video card is below power requirement of my Titan GTX (maybe), so concluded probably the power supply is not at fault since the behavior with the new video card causes the same

 

Thanks for the report sounds like you are doing a good elimination process - good luck with it - what a pain

Share this post


Link to post
Share on other sites

Hi Noel,

 

You can verify if your CPU is still under warranty.  Intel will provide you with a new one if it falls within the 3 year coverage period:

 

http://www.intel.com/content/www/us/en/support/warranty-center.html

 

http://www.intel.com/content/www/us/en/support/processors/000005609.html

 

You will have to get the FPO and ATPO numbers directly off the CPU.

 

Good luck!

 

Robert

Share this post


Link to post
Share on other sites

Did You try to Boot with JUST - One - of Your "Other" Boot Drives - a Non P3d Drive ??? - Johnman - B)

Share this post


Link to post
Share on other sites

Did You try to Boot with JUST - One - of Your "Other" Boot Drives - a Non P3d Drive ??? - Johnman - B)

I tried just my P3D drive.  My 3 bootable drives are 100% isolated from each other as far having any software connection to each other.   So if it's a case of reducing power demand I think much older video card might have teased that out plus 3 other drives and DVD-R.  But truly, the major reason for stopping the process of elimination approach was the unequivocal CPU FAIL when the Intel Desktop Processor Tool is run from within the BIOS.  Never even gets to a drive.  It took some sleuthing but I found the method to use when you can't boot to windows to do a decent CPU test in a Linux environment.

Hi Noel,

 

You can verify if your CPU is still under warranty.  Intel will provide you with a new one if it falls within the 3 year coverage period:

 

http://www.intel.com/content/www/us/en/support/warranty-center.html

 

http://www.intel.com/content/www/us/en/support/processors/000005609.html

 

You will have to get the FPO and ATPO numbers directly off the CPU.

 

Good luck!

 

Robert

Thanks Robert I had done that as of a few hours ago when I remembered indeed it was a boxed version and the 3y warrant does not end til Aug 16 so hopefully we will get up and running w/ a replacement or however they handle this.   I hope so--I've bought a bunch of Intel processors in my day and never warrantied a one  :Hmmmph:

Share this post


Link to post
Share on other sites

I'm not sure what capability they have to test them (maybe none), but I'm pretty sure that any overclock voids the warranty (though I do hope you're able to get a new one under warranty).

Share this post


Link to post
Share on other sites

Let us know if that was the Problem - A lot of Good Trouble shooting Options Discussed - Johnman B)

Share this post


Link to post
Share on other sites

Very rare for CPU's to fail these days.

 

Don't be surprised if it turns out not to be the CPU.

 

I'd leave nothing to chance. Grab a PSU tester from Amazon and see if there are any issues there. What about the SMART data for your hard drive? You can read it with Crystal Disc Info.

Share this post


Link to post
Share on other sites

Very rare for CPU's to fail these days.

 

Don't be surprised if it turns out not to be the CPU.

 

I'd leave nothing to chance. Grab a PSU tester from Amazon and see if there are any issues there. What about the SMART data for your hard drive? You can read it with Crystal Disc Info.

Hi Martin--all 3 drives won't boot, and they won't boot at exactly the same place in loading Win 7 so SMART evaluation wouldn't mean anything in this scenario.   That being said, I did look at SMART on both my P3D drive and one other the day the entire slow failure began evolving and generated the boot time chkdsk run using AIDA64 and they both checked out.  Not too much later I could not boot to Windows on any of 3 drives.

 

PSU is a possibility I agree.   I looked at some PSU testers on Amazon and they are quite cheap and get great reviews however one person mentioned these mean very little since they are not measuring dynamic PSU behavior--i.e., what happens under load.  I have a Corsair HX850 that is warrantied for 7y, is now 2.7y in use.   As I do overclock my CPU that nowhere near what some folks do subject their CPU to in terms of voltage and heat, it seems to me in the very low demand environment of POST and pre Windows initialization (compared to running P3D or other which puts massive stress on components compared to POST and Win 7 initialization), with the IPDT indicated CPU Failure very rapidly, Occam's Razor seems to be making the case for CPU troubles.  My PSU is amply rated for my system and again, it's doesn't seem to be the target of stress compared to my CPU.

 

I just have no clue whether a static test of my PSU will mean much.  As someone who was answering questions re a particular PSU tester on Amazon said, these testers won't tell you how your PSU behaves under load, only what they are outputting per rail, and that the only way to truly test a PSU that has not completely failed (i.e., can't light up the motherboard and everything else) is to swap in a different proven PSU.  Everything lights up w/ mine, right on thru to the video card output from the motherboard to display, BIOS and so forth, so in my mind a static tester might show very little in this scenario.   Indeed, if I picked up one of these testers and it showed one rail was slightly out of spec, does that mean I replace a 2.7y/o pretty high end PSU?  What I really need is another PSU to swap in.  Perhaps I could buy a cheap one just to r/o my PSU.  That might be a more useful test than the static voltage tester.  I see there are quite a few around the $50 price point.  I'll check at work w/ our IT dept to see if they happen to have a ATX 2.3 they could loan me for a quick test.

 

I'm open to other thoughts but Occam's still points pretty strongly to CPU at this point.

Share this post


Link to post
Share on other sites

Fair enough Noel, you make valid points. It's just that CPU failure [if not abused] is so rare that I find myself somewhat dubious.

 

A power surge from a dying PSU, or mains power spike can damage a CPU of course.

 

Have you ruled out motherboard or not? CPU failing and motherboard issues are often similar. Sorry if I missed something re that.

 

Boot failure can be a RAM issue of course, or RAM slot issue. I even had an issue where bad RAM prevented the installation of Windows. Great in the BIOS, no issues, and then at the same point every time installing windows it failed.

 

Try one stick at a time, and different slots, just in case, you never know.

 

Any blue screens or anything in the event viewer?

 

Good luck, it's a bloody nuisance. Hope you get it sorted out.

Share this post


Link to post
Share on other sites

Fair enough Noel, you make valid points. It's just that CPU failure [if not abused] is so rare that I find myself somewhat dubious.

 

A power surge from a dying PSU, or mains power spike can damage a CPU of course.

 

Have you ruled out motherboard or not? CPU failing and motherboard issues are often similar. Sorry if I missed something re that.

 

Boot failure can be a RAM issue of course, or RAM slot issue. I even had an issue where bad RAM prevented the installation of Windows. Great in the BIOS, no issues, and then at the same point every time installing windows it failed.

 

Try one stick at a time, and different slots, just in case, you never know.

 

Any blue screens or anything in the event viewer?

 

Good luck, it's a bloody nuisance. Hope you get it sorted out.

As mentioned above all memory tested perfect with the very detailed UEFI memtest86 routine.

 

There are a million potential problems and any testing beyond swapping in whole components, like a new motherboard, new PSU etc becomes tricky to interpret it would seem.  So really it's impossible to conclude much of anything without swapping the CPU into another board--this is the real acid test.  I just don't have one to try this with unfortunately.  This is how people end up building entire new machines especially when you factor in trial and error trouble shooting w/ multiple components.   I'm waiting to see if Intel will come thru.   I had a Q9650X fail on me--bought a non X form and up she started and continues to function 7y later in a 2nd desktop that was never overclocked after the Q9650X failed--it was overclocked but kept cool.  Hard to say what constitutes cpu 'abuse'.  Intel basically says if you run within absolute MAX/MIN voltage ratings, but outside the cpu's functional operational limits you can expect decreased service life which fits w/ my Q9650X story.

Share this post


Link to post
Share on other sites

Given that the CPU has to be operating for Memtest86+ and BIOS screens, I am not inclined to think this a CPU issue.  Usually CPU failures will result in blue screen issues, or complete failure to start anyway.  My guess, from what I've read here, is something like a problem with the ICH controller on the mobo chipset.  Excessive overclocking voltages can cause this kind of problem just the same as the CPU.  Have you tried running with stock voltages and speeds on the whole mobo chipset as well as on the CPU?

 

If the mobo has one or more secondary SATA controller chips (many do), you could try putting just the boot drive on a port served by that secondary controller.  That might narrow things down to the ICH bridge.

 

Regardless, on a 4 year old machine, I would look at CPU and mobo as an integrated combo...if I need to replace one of them, I replace both with something newer.

 

Cheers

Share this post


Link to post
Share on other sites

If the mobo has one or more secondary SATA controller chips (many do), you could try putting just the boot drive on a port served by that secondary controller.  That might narrow things down to the ICH bridge.

 

Regardless, on a 4 year old machine, I would look at CPU and mobo as an integrated combo...if I need to replace one of them, I replace both with something newer.

 

Cheers

 

Thanks for that.   The Marvell PCIe 9128 Serial Controller appears to be a secondary SATA controller on my ASUS P9X79 WS so I can give that a try.  There is however a sticker on the socket for the Marvell that suggests this is used for SSD cache function.   Can it also be used to boot thru i wonder?

 

This utility might be helpful:  Hot CPU Tester Pro is a system health and stability tester. It tests CPU, chipset and virtually all parts of motherboard for errors/bugs, defective parts and components.

Share this post


Link to post
Share on other sites

Not well explained is why the Intel Desktop Proc Tool indicates a failed CPU fairly quickly in its routine.  I tried what I thought would be a low impact routine, temperature, and it fails there as well.  Here's the interface for it running in a UEFI Fedora desktop.   Have no idea what parts of the motherboard are needed for this to run, and whether or not a failed motherboard component would generate a false failed CPU report.  Of note, I stated w/ the innocuous appearing Temperature Test and it failed there as well as all others tested.

 

Screen%20Shot%202016-02-22%20at%209.40.3

Share this post


Link to post
Share on other sites

Just trying to cover all possibilities, and sorry if you've already covered this, but... as it's an Asus board, with diagnostic LED's, are any of them continuously lit?  Should be one next to the CPU, RAM and hard drive and graphics card.

 

You may have the Q-Code display too, so any code displayed?

 

57, 58, 59, 5A, are the CPU fault codes I believe.

 

I'm sure you've checked this, but just in case.

 

Even though your RAM checks out with Memtest, I would still try one stick at a time and in different slots, it's so easy to do, you may as well.

Share this post


Link to post
Share on other sites

I had a similar situation with a customer a few years ago, turned out to be the heatsink had come loose from the CPU.

Share this post


Link to post
Share on other sites

Just trying to cover all possibilities, and sorry if you've already covered this, but... as it's an Asus board, with diagnostic LED's, are any of them continuously lit?  Should be one next to the CPU, RAM and hard drive and graphics card.

 

You may have the Q-Code display too, so any code displayed?

 

57, 58, 59, 5A, are the CPU fault codes I believe.

 

I'm sure you've checked this, but just in case.

 

Even though your RAM checks out with Memtest, I would still try one stick at a time and in different slots, it's so easy to do, you may as well.

 

Thank you Martin--I believe they all turned off and no I did not know about the Q-Code display which I think was partly obscured behind the graphics card--I believe I saw a number there so again, thank you I will check it out and see.   The Q-codes must be mentioned in the manual so will check that out as well.  I tried to learn what the limitations of the IDPT are but couldn't find anything worthwhile to see under what other conditions beyond a failed CPU the failed CPU report displays.  Once the report came back failed I removed the CPU and will reinstall it tonight and see what is there.

 

I'm actually hoping for a failed CPU as that would be the quickest way to restore functionality, but at this point too early to tell.  If it's a main board I think I'm hosed unless I can find a used one.   Besides the failed test report the other clue hard to ignore is that it was only a few days prior to the failed windows initialization that I had moved vCore up from 1.32 to 1.355 in order to get to a stable 4.553Ghz so this again hints a little more towards a failing CPU it seems to me.  Memory was never overvolted and runs w/in spec for frequency so that really seems unlikely especially with a 30 min routine that came back w/ zero errors.  The box has always run cool.

 

Thanks!

 

Addendum:  wow, there's a whole diagnostics feature in those codes!  Many thanks again I hope it points to the culprit  :smile:

I had a similar situation with a customer a few years ago, turned out to be the heatsink had come loose from the CPU.

No problems there but thanks for the thought. 

Share this post


Link to post
Share on other sites

Yes the Q Codes are a great feature. I went for the Z170-A for my Skylake build, which doesn't have the Q Code feature. It was a feature I would have liked to be honest. But couldn't justify a more feature packed board as I wouldn't make use of those features.

Share this post


Link to post
Share on other sites

If there are any lessons learned so far it's that one can't draw too many conclusions (which am always prone to do) from limited information.

 

1st Place:  goes to Johnman & Gary--the Samsung 840 SSD appears to be the culprit as it is the one drive of actually three bootable drives that when connected for some reason prevents any drive from booting to Win 7, at exactly the same place for any drive.  Kudo's Johnman & Gary!

 

2nd Place:  goes to Martin-W for his persistence and education which had me removing my CPU then reinstalling again in order to check out Q-codes.  Bonus points for up front noting it was likely not my CPU.  This was hard to have faith in because A, I had just overvolted my CPU to 1.355v, a new high for me, but also considerably lower than what others claim to get away with, and B, it kept failing the IDPT evaluation of the CPU.

 

3rd Place:  to Bob & others who surmised it wasn't the CPU and offered various tests to try to narrow down troubles.

 

Honorable Mention:  goes to Mike Collins--removing and reinstalling the CPU, for reasons I still can't guess on, allowed the IDPT to run in Fedora UEFI, and shockingly indicated a PASS which never happened before R&Ring the CPU.  Those familiar w/ the Noctua would know it  essentially can't come loose from the CPU, and moreover, I've always had good thermal management so who knows what happened there.

 

Hopefully I can now reconnect the bum SSD, boot to an HDD, and run something beyond SMART on it because that is one of the first things I looked at early on when I got the weird boot time chkdsk run not requested by me.  I have to think maybe it's a hardware fault in the SSD now.  I do have another desktop in the house that has 3GB SATA connector, not 6GB, but maybe can run some drive tests on it. 

 

The final conclusion ( :fool:  ) will be posted when we get there!

Share this post


Link to post
Share on other sites

Create an account or sign in to comment

You need to be a member in order to leave a comment

Create an account

Sign up for a new account in our community. It's easy!

Register a new account

Sign in

Already have an account? Sign in here.

Sign In Now
Sign in to follow this