Monday, June 24, 2013

My 4.5GHz overclock on 2600K has become unstable!

It's been a while since I tested for stability of my CPU's 4.5GHz overclock. I ran the blend test of Prime95 of more than 6hrs back then and I felt that was sufficient. However, PC has been giving random lockups in the past couple of months and I didn't want to accept that the reason for this instability was because my overclock wasn't stable enough. For some reason, it was so hard to find the settings that made my PC stable, and I didn’t want to go through all that again. It could be because I wanted to both overclock and save power at the same time. That's not something that's easy to achieve, obviously. Or it could be because the motherboard is not the best when it comes to overclocking. Or it can even be that this CPU is a bad clocker. Or I could have been doing something wrong in the first place.

Anyways, the reason for the recent instability could be due to many reasons. The first one could be because the last time I checked for stability, I didn't use a version of Prime95 that supported AVX instructions. Then, I did a couple of UEFI updates and maybe I needed to change “some values” to make the CPU stable under the firmware that I'm using currently. Or it could well be a problem with the power supply or motherboard or memory.
So I decided to stress test again. My stress testing tool is AVX supported Prime95. The target duration is 12hrs. And I started, at around 6PM yesterday. When I came back to the PC after about 30 minutes, the PC had automatically rebooted. No BSOD. Yikes! Somehow the automatically restart at system failure checkbox has not been checked. Also I wasn't using a page file so I couldn't check what caused the reboot/BSOD using a mini dump file. Mini dump files don't get created when you don't have page file enabled.
Thinking that I would catch the thief this time round, I restarted the stress test. Again under my very nose, the PC suddenly rebooted out of nowhere. No BSOD this time either. Weird, I thought. Then it struck me, it could be something other than the CPU itself. Memory perhaps? You know, I had been running the RAM, which is rated for 1.5V, at 1.35V all because Memtest ran for 6hrs without issues. Maybe it lied; happens all the time. So I restored the RAM voltage back to 1.5V and restarted the stress test. It was late so I went to sleep letting it do its thing overnight.
When I woke up and checked the PC, the display had turned blank. The PC was still powered up but moving the mouse around didn't bring the display back to life. I knew something must have gone wrong. Nothing too alarming though. It's so hard to break PCs these days. They are not as fragile as we think they are. Machines are machines!
So I hit the power button on the case and the PC turned off and after a second it restarted. I guess the motherboard has gotten confused after last night's adventures. Anyways, no big deal. The PC is working. When I logged into Windows, I was greeted with an error message, sadly which I forgot to take a screenshot of.
And there was a mini dump file as well. Phew! I downloaded WhoCrashed from the Internet and it automatically analyzed the mini dump file and it showed me the details. It had gotten a STOP ERROR with error code 0x124 which usually means the Vcore was too low and the CPU got crashed. That's good and bad. Good, because I know the reason why it crashed. Bad, because I will have to up the Vcore to make the overclock stable. More heat, more power - dammit.
I was using Offset mode to set the Vcore all this time because people say that it lowers power consumption. I'm not so sure about it because when you have the C-states enabled, the power consumption is very low anyways. Anyways, I was using an offset of +0.065V and at full load the Vcore fluctuated between 1.320 and 1.328V; 1.320V being more frequent. I raised it to +0.070 and it still hovered between 1.320 and 1.328V but now mostly stayed at 1.328V. So I let the CPU do its thing and came to work. I monitored the status through Team Viewer. Sure, I couldn’t do anything if it crashed, but hey, I will at least know that it crashed. Valuable information! And it did crash in a couple of hours. Sadness! At least my wife was at home and I asked her to literally pull the power plug. I wasn’t mad or anything; I just didn’t want any surprise reboots and ask my wife to deal with that situation.
So I came home with the intention of fixing this one way or another. First of all, I needed to find the correct Vcore which makes the CPU stable. Offset mode perhaps was not the right way to do it. So I went back to the old school way of doing it – fixed voltage. But, then there was another issue. I know the CPU needs at least 1.328V to be completely stable. But I cannot set that voltage in UEFI. There is a massive VDroop at load with this motherboard. To get 1.328V with full VDroop, I think I have to set like 1.5V. But that is an outrageous amount of voltage. Thus there is Load Line Calibration, built into almost every motherboard that can overclock. There are 5 levels of LLC in this board. Level 1 to level 5, level 1 being the highest (means, the Vcore set in the UEFI and the actual voltage is almost identical). It seems level 1 is the best, but that is not actually the case. At LLC level 1, there are noticeably high voltage spikes. Fluctuations are bad for overclocking. After a bit of trial and error, I felt that LLC level 3 gives the best of both worlds. The actual load voltage is close to which is set in the UEFI, while the fluctuations at full load are not large. So, when I set 1.4V in UEFI, at full load the CPU Vcore hovered between 1.328V and 1.336V. Yes, it is still a huge VDroop. But that is the best this motherboard can do. Midrange boards are awesome, NOT!
As you might have guessed, I’m stress testing while typing this post. It’s been going without an event for 3hrs and 45min. I hope to let it run till the morning. Unfortunately, it won’t be 12hrs, but about 11hrs. I want to play some Crysis 3 (see where I stand by clicking here) in the morning before I go to work. This is the only time I get to play games without upsetting my wife.
With a room temperature of 28.4C, the Silver Arrow cooler is working so hard to keep the temps down. It hit 79C on the hottest core a few times during this run. Usually keeps temps below 75C though. Not bad, since the fans are running only at 1100RPM instead of ear deafening 2500RPM they are rated for.


I’m not sure if I would go back to Offset mode. I know it uses less volts when the CPU is idle, but I’m not sure if it really saves any power, because with C-states enabled, the CPU is almost switched off when it is not doing anything. Besides, there is this problem with Offset mode. You have to disable C-States or it can give BSODs when you are not even stressing the CPU. Disabling C-States actually increases the power consumption by a few watts even at idle, according to CoreTemp.

Wish me luck!

No comments:

Post a Comment

Related Posts Plugin for WordPress, Blogger...