Expand your knowledge of hardware, software and supercomputing

Server doesn’t POST – Determining if an DIMM, CPU, or MotherBoard is faulty

In this example we will troubleshoot when a server fully powers on but does not post. The three most common reasons why a server will not post is either a bad DIMM, bad CPU, or bad motherboard. The main objective of all this is to start with a minimum amount of components in the server, find a point where it works, and then add in components until it doesn’t. The order of the this process is not important as long as you keep track of what parts worked and in which slots.

As an example we will troubleshoot which component is bad is by taking out most of the components and only leave one DIMM and one CPU on the motherboard.

Below is an image of a Z9PH-D16 mother board that we will use in this example. It is important to understand that there is both the CPU1 and CPU1 DIMM slots on the left. CPU1 must be populated in order to POST and DIMM slot A1 must be populated as well. You will need to refer to your motherboards manual to determine these locations.

In the Z9PH-D16 manual it shows us the CPU locations (left image)  and labels each one of the DIMM slots (right image). This shows us that the left most CPU slot is CPU1 and the bottom-left DIMM slot is A1.

Once we have just one CPU and one DIMM installed on the motherboard we can attempt to power on the machine again. If it does post this is a good time to add in more DIMMs. By looking in the manual we can find a DIMM population chart for when using just one CPU.

The above chart shows us that with one CPU we can have up to eight DIMMs in the following slots. In this example we have eight DIMMs so we will fill up all available slots for CPU1. If the server still posts after putting all the DIMMs in, then it is time to put the second CPU back in and put half of the DIMMs over to CPU2s DIMM slots. Refer to your manuals DIMM population chart for which slots to use.

In this example lets imagine that the system no longer POSTs after putting back in CPU2 and populating half of the RAM on its slots. This shows us that either the CPU we put in the CPU2s slot is bad, or the CPU2 slot is bad, or one of the slots we put the DIMMs on CPU2 is bad. The best way to tell is to take the CPU and all the DIMMs on CPU1s side out and populate CPU1 with the CPU and DIMMs from CPU2. We will imagine that the server comes back on after swapping everything back to CPU1 – This tells us that all of the DIMMs and CPUs work, but one of the slots on CPU2s side does not – so the motherboard is faulty and needs to be replaced.
Use our Breakin stress test and diagnostics tool to pinpoint hardware issues and component failures.
Check out our product catalog and use our Configurator to plan your next system and get a price estimate.

Request a Consultation from our team of HPC and AI Experts

Would you like to speak to one of our HPC or AI experts? We are here to help you. Submit your details, and we'll be in touch shortly.

  • This field is for validation purposes and should be left unchanged.