Guide to Troubleshooting Industrial Control and Automation Equipment

Effective troubleshooting is the technician’s core competency. Some people claim that this is an innate skill used to identify a problem and then effect repairs. To others, it’s a superpower; some people are born with the knack to repair complex systems such as the Programmable Logic Controller (PLC) based system shown in Figure 1.

Fair enough.

Yet, it’s been my experience that troubleshooting electrical and electronic systems is a skill that can be taught. In fact, it’s been taught to generations of military technicians, some of which are likely working in your facility. The military is similar to your industry, as there is a great need for skilled technicians who can quickly restore equipment to operational status.

Figure 1: Picture of a partially completed PLC trainer featuring a Schneider Modicon PLC and HMI.

Safety

It’s essential that we start with safety as seemingly benign systems are hazardous to life and limb. A technician could be electrocuted, asphyxiated, drowned, crushed, degloved, entangled, blinded, infected with cholera, suffer collapsed lungs, burns from arc and blast, plus a hundred other unfortunate things.

A complete exploration of safety is beyond the scope of the article. Instead, we will mention a few overarching thoughts:

  • Follow Lock Out Tag Out (LOTO) procedures to make equipment electrically, mechanically, pneumatically, and hydraulically safe.

  • Understand and follow all federal and state regulations including Occupational Safety and Health Standards (OSHA).

  • Follow your employer’s safety regulations. It’s also a good idea to be actively involved in the training program. If your employer does not have written safety regulations for technicians, take the time to research and write your own.

  • Use Personal Protective Equipment (PPE) as appropriate for the task.

  • Work with a partner.

  • Remove all watches, rings, and other conductive metals. Guard against entanglement from long hair and things like neckties or employee badges hanging around your neck.

  • Be especially mindful of complacency. Even a culture of safety can become dangerous when you foolishly delegate your personal safety to another person. There is no one more interested in your person safety than you. Also, chances are very high that you are one of the few people in the plant who can identify and understand the related safety risks.

You, the technician, are responsible for your own safety and the safety of others.

Tech Tip: Be very careful when working in service panels, especially if the electrical safety interlock has been defeated. Once activated, you are exposed to lethal high voltages. Disaster can strike in a single moment of complacency. Clearly written safety policies and strict enforcement are essential for the safety of technicians and operators.

Navy 6-Step troubleshooting

As seen in the “about this author” section below, I served in the U.S. Coast Guard for a total of 27 years. My formative years were spent as an enlisted technician while the latter half was spent as an officer. This background provides a unique perspective, as I have served in a range of roles including technician, supervisor, and educator.

The Navy 6-step Troubleshooting Procedure is an effective troubleshooting technique. This procedure is outlined in the Navy Electricity and Electronics Training Series (NEETS), Module 19 - The Technician’s Handbook, NAVEDTRA 14191. This approach is every bit as applicable to our modern Programmable Logic Controller (PLC) based industrial equipment as it was to the vacuum tube electromechanically tuned radio transmitters and receivers that were in use when the material was originally written.

The document states:

You may have the job of maintaining or helping to maintain some electrical or electronic unit, subsystem, or system. Some of these jobs may be complex, but even a complex job can be broken down into simple steps. Basically, any repair of electric or electronic equipment should be done in the following order:

  1. Symptom recognition. This is the action of recognizing some disorder or malfunction in electronic equipment.

  2. Symptom elaboration. Obtaining a more detailed description of the trouble symptom is the purpose of this step.

  3. Listing probable faulty functions. This step is applicable to equipment that contains more than one functional area or unit. From the information you have gathered, where could the trouble logically be located?

  4. Localizing the faulty function. In this step you determine which of the functional units of the multiunit equipment is actually at fault.

  5. Localizing trouble to the circuit. You will do extensive testing in this step to isolate the trouble to a specific circuit.

  6. Failure analysis. This step is multipart. Here you determine which part is faulty, repair/replace the part, determine what caused the failure, return the equipment to its proper operating status, and record the necessary information in a recordkeeping book for other maintenance personnel.

Step 1: Symptom recognition

To recognize a malfunction, you need to know your equipment. This includes developing an intuitive feel for the equipment’s operation including the cycle timing and sequence. Be proactive and quickly learn the equipment. After all, how can you effectively repair the equipment if you do not know how to operate it?

Safety is also an important consideration, as you must know how to keep yourself and others out of dangerous situations. There is no substitute for the time and effort it takes to familiarizing yourself with operation of the equipment.

Fail in this critical step and you will damage your professional reputation in the eyes of the operators and production manager. You will feel the pain as equipment downtime will cost your company hundreds to thousands of dollars per minute.

Step 2: Symptom elaboration

When you respond to a service call, it is important to pause and take a moment to properly identify all of the things that are wrong with the equipment. All too often, we get tunnel vision and focus on the first symptom. While this may restore A problem, it may not restore THE problem. Worse, is when the technician departs after repairing A problem with the equipment still in a state of disrepair. This leads to another repair call, potentially damaged equipment, lost revenue, and certainly loss of reputation.

For many systems, it is sufficient to run, or attempt to run, the equipment through a complete cycle. Obviously, there is no point operating past the point of failure.

We must recognize that system elaboration is enhanced when the Standard Operating Procedures (SOP) is used. This guards against the all-too-common operator and technician error. Recall that the SOP is a checklist containing the site and equipment-specific preliminary setup, operating instruction, normal shutdown, and emergency instructions. A rigorous and closely related example are the aircraft specific preflight checklists used by aviators and ground crew to verify that the aircraft is ready.

Tech Tip: SOP familiarization is an important aspect of the previous symptom recognition and system elaboration. If you do not have written SOPs, make it your top priority to create a properly written SOP. It’s well worth your time, as the SOP is the foundation for training new personnel and reduce equipment downtime.

On a related note, don’t rely on your memory when conducting the system elaboration. Write things down. As a best practice, maintain a service log for each piece of equipment. This could be a physical logbook kept with the machinery or an electronics copy maintained in the company’s systems.

Use the log to record the symptoms along with the troubleshooting steps and a description of the repair. These actions will save considerable time. This is especially true if the equipment is serviced by several different personnel. Also, be sure to record the initial reason for the service call and the operator’s observed symptoms. This can assist with intermittent faults or temperature dependent faults which are notoriously difficult to troubleshoot.

Where to look to identify symptoms

The components used in industrial machinery are built with troubleshooting in mind. Rudimentary yet essential examples include:

  • front panel indicators

  • indicator LEDs on the face of a PLC as shown in the Siemens S7-1200 PLC as shown in Figure 2. This is also applicable to the plug in and expansion modules.

  • indicator LEDs on the body of field devices such as sensors and actuators. A prime example is a pneumatic or hydraulic system with LEDs on the Directional Control Valve (DCV) along with indicator LEDs on the cylinder positioning sensors.

  • indicator LEDs associated with a control relay.

  • the physical position of a relay’s armature.

  • the operation of interposing relays.

  • the physical action of field devices such as a motor, air cylinder, or Motor Operated Valve (MOV).

Once again, it’s important to know the operation and cycles associated with a machine. Without this critical information you will not know where or when to look.

Safety: DO NOT look for air and hydraulic leaks using your hands. If you do not already know the meaning research the terms hydraulic puncture wound. Be forewarned, the pictures are not for the faint of heart as this is a devastating injury that requires extensive surgery to save the effected limb.

Advanced machines may include additional information such as:

  • a time stamped error log stored in a cloud or local database

  • a Human Machine Interface (HMI)

The suitability of these logs is directly related to the skill and imagination of the system programmer(s). An automobile provides a good example. Here, the engineers have designed a system to detect, record, and report a wide variety of errors based on the feedback of your car’s sensors. While many faults may be directly determined from the data, there is still the need for a skilled technician to properly interpret the data. A blind reliance on the built in diagnostics may result in costly repair as it is very difficult to predict all the combinations and permutations of things that can go wrong with a piece of equipment. There is simply no substitute for a technician’s familiarity with the machine as it operates in your facility.

Figure 2: Image of a Siemens S7-1200 PLC. The PLC features indicator LEDS to show the status of all digital inputs and outputs.

Tech Tip: Interpret the input and output LEDs with caution. While they work very well for slow signals like switches or drive signals to solenoids, they do not work for fast signals or pulses. Misinterpreting of the LEDs can add considerable confusion and delay to the troubleshooting process.

Step 3: Listing probable faulty functions

Up until this point, all data has been gathered using sight, touch, and sound. You will notice that no test equipment has been used. Instead, we have carefully examined the equipment and written down the operator’s description and all of the observed symptoms.

At this point, the technician needs to step back and consider all of the data that has been collected. The objective is to look for a root cause of the malfunction(s) and select sections(s) that could logically cause the problem. As a representative example, consider the laser controller as shown in Figure 3. This device includes the laser (not shown), the logic card, timers, and safety relays. As technicians, we are expected to troubleshoot down to the defective device(s), and sometime to the defective component(s).

Logical identifying the faulty function units is an especially challenging step as we must resist the temptation to focus on the first symptom. This can be challenging, as:

  • there may be contradictory symptoms

  • there may also be multiple problems

  • the problem(s) may be intermittent

  • the problems(s) may be temperature dependent

  • the problem may not be within the machine itself

  • the problem(s) may be mechanical in nature

  • the problem(s) may be self-induced by operator or technician error

  • there may not even be a problem as the equipment may be operated incorrectly

Tech Tip: When I was very young, I was mentored by a small-town TV repairman. I know this is a cliche, but it was a small company town. My mentor worked in the mill and then repaired consumer electronics in his off time. He was a gruff as they come. Perhaps this wasn’t the best mentor mentee relationship.

I’ll never forget the day he balled me out in front of my father for being an idiot who had no business working on electronics. He wasn’t wrong, as I had failed to take step 6 (listing probable faulty functions) to heart. In fairness, I was easter egging in the audio output section when I should have been focused on the power supply. Don’t get tunnel visioned!

Figure 3: Interior of a laser controller featuring a mixed system with industrial components such as DIN rail and switches as well as a custom PCB. Careful system recognition is required to logically identify the faulty function.

Step 4: Localizing trouble to the circuit

At this point, we have narrowed the problem down to a few blocks. In this troubleshooting step, we will conduct tests to isolate the problem to a specific part of the circuit. If, at this point, we have already located the problem, we should jump to step 5 and conduct a full failure analysis to ensure the root cause has been determined.

From the previous steps, we already have a working hypothesis that constrains the failure(s) to a few blocks. Our primary object is to systematically perform tests to isolate the problem to a single block and then to the failed component(s).

Notice that this is the first time we use wire diagrams and test equipment.

Example 1:

Suppose you open the control panel and find that the PLC’s power indicator LED is out. From troubleshooting step 3, we have already isolated the failure to a few blocks including the PLC, the mains AC to 24 VDC power supply, circuit breaker(s), an isolation transformer, or the mains supply itself. It is also possible that a short circuit exists on the 24 VDC supply line.

Some of the blocks can be eliminated based on other symptoms. For example, look at the field devices that share the 24 VDC supply with the PLC. Many blocks can be eliminated if the sensor responds (lights up) in response to a stimulus. For example, a functioning proximity indicator would suggest the mains supply and isolations transformer are functioning. A valid reading of the 24 VDC supply’s output would definitively eliminate all upstream blocks. The remaining components include a circuit breaker, loose / broken wires, and the PLC itself.

Continuing with the multimeter, we can take a series of voltage measurements to “half-step”- to divide and conquer – to isolate the remaining blocks. We could check the power at the PLC terminals. If 24 VDC is present, the PLC is likely defective. If not present, we jump to somewhere between the PLC and the 24 VDC source.

Example 2

A system featuring a large blower monitored by a sail switch is shutting down due with a “loss of air” fault as indicated by the HMI. This is the third response to the failure this week. The equipment appears to be operating normally. From step 3, we suspect a failure in the PLC, the blower, or the sail switch. However, a review of the logbooks indicates that a failed sail switch with a missing sail was replaced last week by 2nd shift technician. This would suggest that the sail switch was improperly installed, the sail is too small. or the switch has the incorrect tension.

We conduct a failure analysis as described in the next step. Also, a data logger is installed on the sail switch contacts to monitor for fluttering. We let the data logger run for at least a week or until a pattern of failure is discovered.

Example 3

Suppose you arrive to find a system where the thermal trips for two independent motor starters have tripped. Pressing the overload block reset buttons immediately activates the pumps which appear to operate normally.

At this point some technicians would claim success, button up the panel, and move on to the next job. Often, this is a mistake as they did not consider the last step of the troubleshooting procedure.

Tech Tip: Activating a motor is a stressful event that causes the motor to draw considerably more current than it does when running at rated speed and torque. Repeated starting and stopping (jogging) can cause the motor to overheat. It can also cause the motor starter’s thermal trip to activate.

Step 5: Failure analysis

The troubleshooting procedure described in this article would seem to be a linear process where each of the six steps are completed sequentially. In an ideal situation this would be true. The reality is often a messy iterative process as shown in Figure 4, where moving to the next step causes you to look back and consider something that you may have missed.

Figure 4: Troubleshooting is often an iterative process.

As an example, consider the simple relay. From your study of electronics, you know that the relay’s coil acts like an inductor; a device that stores magnetic energy. In DC systems, a diode such as the 1N4004 or other type of snubber is often placed in parallel with the relay’s coil. This diode is placed so that it conducts when the PLC turns the relay off. This action provides a path for the “kickback” energy stored in the relay’s coil. Without this diode a high-voltage spike is present when the relay is turned off; a high voltage spike that can damage the PLC I/O.

No, suppose a PLC with a semiconductor output has a damaged output pin. It is highly likely that this pin was damaged by an open snubber. It is also possible that a shorted snubber caused the damage. Either way, it’s safe to say that your professional reputation is tied to how you handle the failure analysis. It’s not enough to restore the equipment to full operation, you must ask why it was damaged and then take appropriate action to prevent a second failure. In the case of the relay snubber, replacing the PLC is expensive and short-sighted repair as the real (root cause) problem is associated with a failed diode.

Step 6: Return to full operation status

This is the final step of troubleshooting. You have identified the failed component(s) and repaired or replaced the parts as necessary. Before restoring the equipment to full operation, several important actions must be taken:

Perform a full functional test to verify that the equipment was indeed repaired

As mentioned before, a SOP is useful to guide you through this process. In effect, you are going back to step #1 and performing the functional test all over again. This is an important aspect of the troubleshooting process as there may have been multiple problem. There are also technician inducted errors such as leaving the machine in a service mode as opposed to fully operational. Be sure to modify the SOP if you encounter an undocumented condition.

Have a conversation with the machine operator(s) and shift manger

Determine if there is a flaw in the process, weakness in the machine, or if the failure was exacerbated by improper operation. This is also a good time to enlist the operator to be on the lookout for intermittent problems should you lack 100% confidence that the problem is repaired.

Document your work

This may be done in a logbook that stays with the machine or an official maintenance database for the factory. At a minimum you should make a record of:

  • failure symptoms

  • parts expended

  • adjustments performed on the machinery

  • time expended for the repair

  • tips for future technicians

Tech Tip: There is a vast difference between commissioned equipment and equipment that is under development. Consequently, the field technician and the engineer have two very different ways of looking at a system. The field technician has confidence that the system once worked and can therefore be restored to full functionality. The engineer, factory technician, system integrator, and engineer have no such confidence, as the system may have never been operational.

Parting thoughts

Troubleshooting industrial control and automation system is a systematic process. The Navy 6-step process remains as relevant as when it was first introduced. In fact, we could argue it becomes increasingly important as the complexity of modern PLC and cloud based industrial systems increases. As my favorite pragmatic Trekking engineer once said, “the more, they overthink the plumbing, the easier it is to stop up the drain.”

It takes skill, practice, and a focused technique to troubleshoot effectively.

We would love to hear from you. Is there anything missing from this guide? Also, do you have any examples others could learn from. If so, please leave your comments in the space below. Finally, be sure to test your knowledge by completing the questions and critical thinking questions at the end of this note.

Best wishes,

APDahlen

Return to the Industrial Control and Automation Index

About this author

Aaron Dahlen, LCDR USCG (Ret.), serves as an application engineer at DigiKey. He has a unique electronics and automation foundation built over a 27-year military career as a technician and engineer which was further enhanced by 12 years of teaching (partially interwoven with military experience). With an MSEE degree from Minnesota State University, Mankato, Dahlen has taught in an ABET-accredited EE program, served as the program coordinator for an EET program, and taught component-level repair to military electronics technicians. Dahlen has returned to his Northern Minnesota home and thoroughly enjoys researching and writing educational articles about electronics and automation.

Highlighted experience

Dahlen is an active contributor to the DigiKey TechForum. At the time of this writing, he has created over 150 unique posts and provided answers for an additional 500 customer posts. Dahlen shares his insights on a wide variety of topics including microcontrollers, FPGA programming in Verilog, and a large body of work on industrial controls. A collection of Dahlen’s Industrial control and automation may be found at this link.

Connect with Aaron Dahlen on LinkedIn.

Questions:

The following questions will help reinforce the content of the article.

  1. Why is LOTO the most important aspect of troubleshooting?

  2. What are the two general classes of field devices?

  3. Identify the individual steps in the Navy 6-step troubleshooting process and then provide a brief explanation for each step.

  4. Why is familiarization with the equipment SOP such an important aspect of the troubleshooting process?

  5. True / False: Installing the faulty parts is the last step to the troubleshooting process.

  6. True / False: Failing to notice a lit front panel power indicator will bring you back to step 1.

  7. True / False: LOTO is an implied step associated with the Navy 6-step troubleshooting process.

  8. Step _____ in the 6-step troubleshooting process is like an airline pre-flight check.

  9. Step _____ in the 6-step troubleshooting process involves half stepping.

  10. State the hazards associated with manually resetting the overload block on a motor starter.

  11. Within the context of troubleshooting, what is meant by tunnel vision?

  12. What is meant by half step? For full credit, provide a block diagram showing the interconnections for a typical AC main to a PLC’s 24 VDC input.

  13. With regards to the previous question, what could go wrong with voltage measurements from the AC mains to the PLC’s 24 VDC power? Hint: Does the meter have a central knob to select the operating mode? Are there any floating voltages? Finally, are both meter and probes at least CAT IV rated?

  14. How reliable is the PLC’s input indicator LEDs to measure:
    A) a steady on / off signal from a limit switch?
    B) an intermittent “flutter” due to turbulent airflow across a sail switch?
    C) the presence of a pulsed signal?

  15. Research the Siemens S7-1200:
    A) Which indicator LEDs should always be lit under normal operation?
    B) Which status indicator LEDs should be extinguished during normal operation?

  16. Research and reflect on the controversial “safety is number 3rd” statement. What does it mean to you personally and how do you square the concept with directives from entities such as OSHA?

Critical thinking questions

These critical thinking questions expand the article’s content allowing you to develop a big picture understanding the material and its relationship to adjacent topics. They are often open ended, require research, and are best answered in essay form.

  1. How long should the equipment logbook be kept, and why? Hint: Consider both practical aspects for future technicians as well as any legal / traceability questions.

  2. The 6-step process is not a one-and-done process. Select a piece of equipment in your shop and propose a timeline for a new technician. Assume the new technician has some troubleshooting experience but has never seen the equipment in your plant. Hint: Did you include time for safety and plant familiarization?

  3. Which voltages may be safety measured with a CAT II voltmeter? Should a CAT II meter be part of your industrial service kit?

  4. For what reason(s) may an interposing relay be utilized in a PLC based control panel?

  5. Given a technician who has been written up several times for LOTO violations who is responsible if an equipment operator is harmed by a technician’s LOTO violation. What is the proper response to a LOTO violation? Hint: Obviously this is a legal question. However, it’s a good question to consider from the perspective of operator, technician, foreman, and supervisor.