ProjectRisks

I'm currently working on a project involving a custom built, hardened computer, an embedded OS, and a proprietary application. I'm working with the team getting the necessary documents approved, parts ordered, fabrication scheduled and embedded software written. The engineering VP that brought me in swagged a date to the CEO, and it looks like we'll make that date (and have some time to do thorough testing).

I'm now working on the project risks and mitigation. What should I be worrying about? DonGray 2005.04.05

Don, from my experience with custom hardware and software, I would develop validation test cases and verification test cases as early as possible so you can ensure the user interaction, system integration, and runtime are within what your users need. CharlesAdams 2005.04.05

What happened on previous projects ? Have you had a look at historical documents - outputs from project retrospectives for instance ? "Yesterday's problem is today's risk." -- LaurentBossavit 2005.04.06

Charles - I would develop validation test cases ...

I forgot to mention it's a "headless" device, so like the microprocessor that controls anti-lock brakes, there's no obvious user interface. The QA person generated a 10 page test document. I'll re-read it looking for these types of tests.

Laurent - What happened on previous projects?

I'll ask. The hardware people are the same, so I should be able to get the story. I have at least 2 different sources to ask. It will be interesting to see how the stories align. DonGray 2005.0406

One Story: The previous hardware was field tested for over 6 months after it was "done", and is still delivered on a semi-custom basis (3 years later) because the "standard" version isn't, well, standard. "Field tested" here meaning: "ship the semi-functional stuff to customers and fix the problems they encounter individually, reactively, at the customer site." It was about 6 months before *some* units shipped without needing repair at the customer site.

There was a "stand down" on the second project that attempted to address the disconnect between "done-as-in-works-and-can-ship" and "done-whatever-that-meant-last-time." Assertions of project done-ness from as far back as last November were challenged, and adjusted. Once illusion was recognized as such *no* ship date was discussed until "Ship what?" and "How will we know when we are done?" were established - much to the frustation of some of the hardware team, and consternation of some of the dependent organizations. One view. - Anon 2005.04.06

Don Gray writes there's no obvious user interface.

Don, If the interface isn't obvious, than that's the very thing that needs the most design and testing. Link the design and testing back to the requirements.

Review the design. Challenge choices that don't link to the requirements. Too often design choices are technology in search of a problem. Help them choose again.

SteveSmith 2005.04.06

Don's biggest risk is the VP of Engineering who swagged a date then brought him in to see about meeting it. Manage that loose cannon carefully.

- JimBullock 2005.04.06

As you've noticed, I actively manage my manager. You're coming along nicely! Enjoy your morning! DonGray 2005.04.06

re: "Enjoy your morning!" 3.5 hours in the dentist chair this morning - relatively speaking makes today an easy day. - jb

What could go wrong, Don? You said you're "working with the team getting the necessary documents approved, parts ordered, fabrication scheduled and embedded software written". Could anything happen with any of those activities that could make the sky fall--or at least make you late?

Are there any external dependencies for document approval? Could your parts suppliers have problems meeting your delivery schedule? Who is responsible for fabrication, and is there a Plan B? Is your development schedule dependent on one or two brilliant people? -- FionaCharles 6-Apr-2005

Fiona - Could anything happen with any of those activities that could make the sky fall--or at least make you late?

Goodness yes! That's why I'm working on identifying the risks and prevention / mitigation. The hardware part has been somewhat straight forward. The software risk is currently "The software won't work on the hardware when the hardware arrives." Now I'm working on what "won't work" means.<grin>

Major parts are on order. We meet with the fabrication facility this AM. DonGray 2005.04.07

When I've worked on hardware/software combination projects, I found that a lot of risks were having separate schedules for the hardware and the software. I needed to create an integrated schedule that dealt with deliverables, not ends of phases for the schedule to make sense and reduce risk. -- JohannaRothman 2005.04.07

Apply "Future Perfect Planning". Imagine for a moment that you just shipped on time, then list out the (imagined) problems you had to overcome. I find that this exercise usually turns up risks that I don't consider when planning forwards. --DaveSmith 2005.04.07

Johanna - I needed to create an integrated schedule that dealt with deliverables, not ends of phases for the schedule to make sense and reduce risk.

We sort of have that in the greatest of scheduling tools, Excel. I have three worksheets: hardware team deliverables, software team deliverables, and vendor deliverables. I can hyperlink off each deliverable to a risk discussion about that deliverable. What would be really great is to find a "convert worksheet to timeline" graphing function.

Dave - Imagine for a moment that you just shipped on time, then list out the (imagined) problems you had to overcome.

The product was ready to ship last August according to some. Apparently some have great imaginations.

Are there risk laundry lists on the web? I know I'll have to select what fits the project, but it might help reveal some blind spots I'm not considering. DonGray 2005.04.08

Take a look at http://www.sei.cmu.edu/pub/documents/93.reports/pdf/tr06.93.pdf and http://www.thomsettinternational.com/main/articles/risk_0404/risk0404_toc.htm --JohannaRothman 2005.04.08

Thank you. DonGray

Don,

On every embedded OS/hardware system that I have worked on over the last 15 years, we had at least one major unplanned integration issue that almost made the project fail. So when I do risk planning I always allow for one major integration issue that is completely unplanned. There is no way to identify it easily upfront but I always include a line item (risk item) in a schedule for an integration issue. You could follow Barry Boehm's risk method and allocate insurance monies to offset the risk item based on a probability of occuring.

For instance, on one embedded OS/Hardware system during integration we discovered that we were periodically losing data packets when we uploaded critical data from the multimillion dollar hardware system to the organization's ERP system via Ethernet connector. Of course this was unacceptable to the customer and basically made the system unusable. We lost one month chasing down the problem. Everybody thought it was a hardware problem. Turns out it was a bad TCP/IP and Ethernet driver from the manufacturer. We ended up ripping out cabling, connectors and replacing Ethernet boards but to no avail.

On a another system using an embedded OS we had periodic lockups to the OS kernal which prevented the system from working. We only discovered this after we started using the system printer that was an optional feature for the customer. This cost us about a week of debugging and the cost of several consultants from the OS manufacturer. Turns out it was a bad contacts on some I/O boards that was causing the lockups.

I could go on with other stories. But the moral of the story is to plan for at least one unplanned integration fiasco. As a metric, plan for one major problem for every 12 months of calendar schedule.

JohnSuzuki 2005.04.09

That's a good one, John. Two mitigation strategies come to mind, in addition to "plan on having one of these."

Integrate early, integrate often. Ideally push stuff end to end in some limited fashion early in the game.
Define success and test for correct behavior at layers / interfaces. Do this, and at least when an integration issue happens, you'll know what you trust.

-- JimBullock 2005.04.09

John As a metric, plan for one major problem for every 12 months of calendar schedule.

Then I'd expect about one major problem. The hardware team previously created the older brother to the new system, so they've done the dance before. The new system adds memory, power, and a couple of new features.

Jim Integrate early, integrate often

I pick this one. QA should have working prototypes by this Friday. This gives us 11 weeks to find out what we don't know.

--DonGray 2005.04.11

The first attempt at delivering the product has been illuminating and educational. The mechanical mods were never done. The testing cables were missing. A plethora of assumptions have been discovered. And the prototype system doesn't match the prototype documentation. I'm glad we got a chance to do this before we have to do it again. Round 2 should be smoother. --DonGray 2005.04.15

I've been away and have probably come into this too late to help you, Don, but here goes anyway.

In this type of project, I've seen one major risk consistently--the hardware problems get pushed off into software. Hardware is "what you take is what you get," and the software has to work around it somehow (only once in a very great while does that prove impossible, but it always proves costly in time.

If you had a reliable simulator (which also helps give you an interface you can work with--which if you don't have, maybe you should build), you can avoid some of these problems, but these simulators are never perfectly reliable. (sometimes they're less than worthless, and in any case, never help you with timing and race problems) But if you work on the simulator before the h/w is available, you get up to speed faster on your learning curve.

Another thing I always like to do is a one-hour (maybe two) of the whole integration process, using paper to represent each part of the system. That way, you wouldn't have forgotten your cables, for example. You may have just done that on the real stuff you got, but you could have done this walking around with paper and avoided a lot of that trouble. Not all, though. - JerryWeinberg 2005.04.17

Jerry Another thing I always like to do is a one-hour (maybe two) of the whole integration process, using paper to represent each part of the system.

What a "just in time" suggestion! We need to make sure we have all the parts on order for the next set of units. I'm setting up a "build simulation" for tomorrow so we can verify EVERYTHING is in house or on order. - DonGray 2005.04.18

Glad to be of help for a change. The payment I demand is that you let us know on this thread what happens. - JerryWeinberg 2005.04.18

Don,

Jerry's suggestion for a paper integration process is an excellant technique to add to the tool box. I wish I had thought of doing this. I need to add this step to my own process before accepting finished hardware from the electrical and mechanical folks. I have been on the receiving end (actually thrown over the R&D fence to the integration and system test teams)of a purportedly finished or fully functional hardware system only to find tons of problems (incorrect I/O's, subsystems or modules not working correctly or at all, logic states reversed, missing cables, incorrect optics and optical filters, wrong versions of critical EEPROMS, missing peripherals, grounding and power lines not connected, missing brackets and screws, etc.). I recall on one project the integration test team came up with a list of over one hundred defective things before we even started formal integration software testing. Make sure the configuration of the hardware is baselined and documented before accepting it. If you can believe this, I've worked on systems where upgrades and changes to hardware was occuring during system testing!

I also have to agree with Jerry's observation about software picking up the slack when hardware is missing or doesn't work as specified. I have seen this behavior so often on projects that I suggest that you add this as schedule item and as a risk item. Not only does it add development costs and schedule time but adds additional unplanned software testing. It creates a lot of stress for the testing team near release time. Often times these changes are made at the last minute with little or no documentation. Updates to the baseline software requirements are often neglected as well. This creates additional negative effects for the maintenance folks or for the folks working on the next release.

It sounds like the first pass at integration was instructive and helpful. It also sounds like you are fortunate that you have a second round to flush out additional problems. Good luck.

-JohnSuzuki 2005.04.18

We held the simulated build today. Overall things went well. We had all the major parts. We were missing screws, nuts, washers and the like. As such we'd have been running around, locating parts, and paying more for Fed-EX than the parts cost. The Engineering VP and CFO should be happy with the results. We'll need to monitor delivery schedules to make sure they don't slip. (An on going risk).

Now what to do? What's the next risk awareness activity? -- DonGray 2005.04.20

I'm not sure exactly what's next--risks are like that. But one risk I see is the risk of losing small parts. Figure out what an inventory will cost, vs. losing some little part and shutting down the development effort or system. Then make a decision.

Another risk is forgetting lessons. What are you doing to ensure that they remember--and that people who were not there for the lesson get the message and retain it? - JerryWeinberg 2005.04.20

Don,

When do you start formal software testing (integration, system and or user acceptance testing)? You might have plenty of problems to deal with during this phase. This might lead to the next set of risks that you will have to deal with.

As a suggestion to help the team and organization remember, perform what we call an interim or work chunk retrospective for the current or past phase. This might be a short 1 hour or 2 hour gathering to figure out what went well and what we need to change the next time or next cycle. As part of the learning process for an interim retrospective, you can help the team change the integration process to include paper walkthroughs and incorporate the the top 3 recommendations that the team comes up with during this session. The information that is recorded can go in the project history for others to use and remember in the future. If the recommended changes are small and local (as opposed to organizational changes that take longer to implement) they can be incorporated by individuals or the team before the next cycle or phase. I have found that this gives you a better chance of making lasting change for the team instead of waiting till the end of the project and performing a traditional end of project retrospective.

I held a session with a colleague from Siemens at the last Retrospective Facilitators Gathering on how to make lasting change from retrospectives and how to insure that changes get implemented by closing the loop on the retrospective process. From that session we both agreed that you need to try to make the changes local and as soon as practical. Smaller changes can be made by individuals and the team immediately without waiting for organizational or senior management approval. -JohnSuzuki 2005.04.20

Don, it looks like you're off to a great start. Make sure you manage the risk of not being as appreciated for your risk management efforts as you deserve to be. :)

Appreciation is, of course, dependent on by whom. For one group of people at least, Don's ability at risk management is already appreciated. - JimBullock 2005.04.21

How about Steve's earlier question about tracing design choices back to the requirements ? You've been making sure that the product gets built properly - is there a risk that you build it perfectly but it turns out not to be the product the customers wanted, or needed ? There's one story above about "problems customers encounter" - were all of these problems related to "building the product right", or were some of them also of the "building the right product" variety?

I can't wait to hear what Don has to say about this one. - JimBullock, 2005.04.21

Laurent 05.04.21

Laurent - You've been making sure that the product gets built properly - is there a risk that you build it perfectly but it turns out not to be the product the customers wanted, or needed ?

It is possible (but not likely) that the unit will be verifiable but not validated (I think I'm using the terms correctly). The company has domain knowledge and experience. This unit represents a third generation device. Faster, stronger, more flexible, so on and so forth.

We have a set of specifications for the hardware, BIOS, embedded OS image and additional tools/utilities. The hardware spec is solid (which is good since we're on rev 3). Other specs seem to be more fluid (as in we're still figuring out and agreeing what should be done).

Today I've received an email that indicates not everyone in a meeting leaves with the same understanding (this isn't new news). I also saw another that said "Thanks. That's a change, so you'll need to work through the proper channels so we can consider it in the future." Some people are getting it. DonGray 2005.04.26

I am confused about what a "paper integration process" is. What would this look like? Is this just creating a harware test plan? I have done something similar (I think) I have written unit tests for my vendor supplied tools. I found some really deep bugs in the Perl core language that way.

KenEstes 2005.04.27

Ken,

Jerry mentioned "paper integration" in his 2005.04.17 addition to this page. I interpreted the suggestion as "if you don't actually have the parts, use paper as a subsitute, and try and build the product." For a prototype it could be as detailed as writing the desciption on the paper, and seeing how it matches to other the other systems part. For example, on a card write "2 inch, 68 conductor cable with 2x34 female connectors on each end. Connect to J1 and J12." During the assembly process you'd locate J1, and it would hopefully say "J1, 2x34 male connector for PCI bus extension". If it says "J1, 2x20 IDE connector" you'd have a problem.

We're further along in the process, so I adapted the exercise to make sure all the parts were accounted for in the proper amounts. (They weren't, but now are.)

Is this just creating a harware test plan?

Not in my opinion, but this could be a veiwpont difference. We have several levels of hardware testing, and the QA department is creating a overall test plan that includes both the hardware and software.

I have done something similar (I think) I have written unit tests for my vendor supplied tools.

Both exercises (the paper integration and unit tests) seem similar in that they try to find problems before the problems would naturally appear. DonGray 2005.04.28

Don has the right interpretation of what I meant by "paper integration," for his situation. For other situations, what's on the paper will be different, but each is a deliverable (which is also a reviewable). Passing the papers around can test for interface matching, missing (or extra) components, and disconnects in the project process plan. If you add decision points, you can also test much more. For example, at each delivery, assign a certain probability (perhaps use dice or a coin flip) to simulate passing or not passing acceptance tests of the content (which isn't really there to test). Then you are paper testing the processes of how errors are handled, and you learn all kinds of things. - - JerryWeinberg 2005.05.01

Jerry: "If you add decision points, you can also test . . . the processes of how errors are handled . . ."

Oh, my. That one is fun. You don't even need to do this on paper. You can, start by declining to accept an incomplete deliverable for real - although the stakes are often higher and the reaction a bit louder than finding this out with paper ahead of time.

- JimBullock 2005.05.02

Oh, my. That one is fun

How's this for fun? The unit has a recessed "reset" button. I have two different stories about what "reset" means. I've asked the product manager where "reset" is defined in the specs. The good news is the switch works. The rest is just software. --DonGray 2005.05.02

Beware the reset button! One of the greatest project risks of all.

OTOH, sometimes a PROJECT RESET button would be the best risk eliminator of all. - JerryWeinberg 2005.05.02

And in some instances turning the project knob to the "OFF" position makes sense. This can be very hard to do because of the emotional investment people have in the project. -- CharlesAdams 2005.05.03

Yet when I've seen this happen, there's shock the first day, followed by immense relief the next. - JerryWeinberg 2005.05.03

Just an update. We're currently tracking to plan. We've discovered another couple items missing (connecting cables), but we should be able to correct this in the month remaining. It's been a hectic couple of weeks, but it looks like I'll have time today to worry about things I'm trying to keep from happening. DonGray 2005.05.11

The last boards are due in tomorrow (5/27). The first Beta unit is currently going through check out and burn in. There's been a couple of wide spots, but no gaping holes ... yet. DonGray 2005.05.26 (looking for something to worry about)

I've given more thought to Jerry's comments, above, and I think every project should have the following four buttons, like a VCR:

STOP - as above, causes shock followed by relief...
REWIND - retrospect to make sense of what's gone on before
FAST-FORWARD - look to the future to anticipate risks
PLAY - don't forget to have fun as you work

(And now some wag is bound to ask if projects should also have LEDs perpetually flashing twelve o'clock.)

LaurentBossavit 05.05.26

What are your thoughts on RiskVersusUncertainty?

And the answer is ... the hardware team has delivered all the units to QA. QA started receiving units a week ago. The testing is revealing little (software) things that need be corrected, but nothing that should stop the beta units from being sent to clients. DonGray 2005.06.14

Don, There's been time for the dust to settle now. How'd thing work out? --DaveSmith 2007.02.06

I think the best analogy is: "The operation was a success. The patient died." The team did get the units delivered to QA, and temporary Engineering VP seemed pleased with the work. However, there happened to be machinations happening at a much higher organizational level. I don't know if the Engineering VP knew about them. And there was some legal unpleasantness we (all) knew about.
Long story short: The legal problems were resolved by one organization buying both companies and merging them. From what I can tell, the merged company doesn't sell the type of equipment I worked with. Any references I find on the net are generating 404s when I go to the page. DonGray 2007.02.07

I'd say: "The operation was a success. The patient mutated. The limb was lost in the process." Answering what I can, and dancing where I have to, first about machinations at a much higher organizational level . . .

The collision of the two companies was inevitable. Some people saw it coming. Some didn't.
When that's going to happen, the better your operational results the better you come through the crack-up. So, ship product, build partnerships, make sales. (BTW, win the lawsuit, which also happened here.)
Purely subjectively, I SWAG that the changes in operational performance while Don's story was unfolding were worth a 10 to 15% swing in ownership mix of the combined companies. Also worth, probably, the jobs that did get kept when the companies combined vs. a complete gutting.
There were at least two additional sets of machinations going on up where mergers happen.

Putting on executive-guy hat for a minute, I'm pretty sure the company shouldn't have been in the ruggedized hardware business and absolutely sure they shouldn't have been in that business that way. Having said so, up front, to the right people, several ways, blah, blah, blah, this one isn't on the messenger. It's not on the project implementation team, either. Along the way, CEO-guy had an off the record side-bar saying: " . . . I get it now about not being in this business." So, ship the thing in the pipeline then do the next one in a very different way. (Today, I read an announcement that the merged companies newest product version also ships . . . on third-party generic ruggedizeed laptop PCs. Much better plan.)

Putting on project-sponsor-guy's hat, several things worked.

The project team was totally trainable that any answer they needed & couldn't get elsewhere, come to the sponsor and they'd get one. This de-coupled them from the endless "What's it supposed to be?" flailing.
Absent an externally generated baselined product definition, the target was declared to be: "Whatever you are building to. Baseline that before you do anything else, and then it doesn't change unless I say so. If anybody doesn't like that, send them to me."
Any push-back from outside the project team got handled this way: "Great. Write that down, and we'll baseline it and build to it. Would you like me to help you write that down?" No takers.
There was a running list of "stuff I'm making up" shared up the food chain.

From a project internals POV, that lets progress happen in some direction absent any - er - useful direction. From an "Is it the right product?" POV, there was enough domain knowledge in the implementation team that their instincts would be by and large correct. This turned out to be true in the event.

I doubt Don knows, but it turns out that one of the partner companies wanted to license the HW design & related. There was some really good work in there. I don't know what, if anything, came of that.

-- JimBullock
BTW, making that gut-check guess that the project team's domain knowledge would be sufficient is part of what an executive is paid for. Not so much knowing when certainty is possible but guessing well when there's no objective way to tell, and creating clarity when there isn't any otherwise. Then take your lumps if that's what happens.
As for building something when I'm sure "we" shouldn't be in the business that way, well, sometimes I'm wrong. In the event it looks like my confidence in my own opinion was justified. At the time, well, I could have been wrong. A whole bunch of people who's job it was to be sure about the products we needed were sure . . . that the thing was more than vital, was in fact the future of the company. Somebody was crazy here. Could have been me.
I suppose from a project risk POV, some people are paid to make the bets, and the project team is paid to deliver so we can find out whether we bet right. The project team did that. Risks internal to the project were managed fine. The ones that should have been booted up stairs were - I know because I got them.
-- JimBullock

Updated: Thursday, February 8, 2007