Repercussions of Being Leading Edge

Leading Edge

Only a couple days before go time on an ISIM upgrade, I ran into an issue that was introduced in IBM’s latest fixpack at the time.  Unfortunately for my team, we were forced to upgrade to the latest fixpack due to some other issues that the fixpack resolves.  However, I knew there was a big risk of being on the leading edge.

For those curious of the technical aspect, using IdentityPolicy.userIDExists within an Identity Policy is now broken in ISIM 6.0.0.3 (FP3).  Rumor has it, a code change that was intended to improve performance broke things.  Using the userIDExists function now only searches for user IDs on accounts owned by that identity.  It is supposed to search for user IDs on any accounts to prevent duplicate user IDs from being used.  This led to duplicate user IDs being created within the system, which would have been a security nightmare if it would have not been discovered before being placed into production.

Because the fixpack was so new, not many of IBM’s customers had upgraded.  This meant I was left to discover this new bug.  After two days of troubleshooting, including combing through literally millions of lines of debug output, I was finally able to present IBM with what exactly bad broken.  They immediately confirmed the issue and have since opened an ticket on their side to have the code corrected.  Unfortunately this caused the emergency break to be pulled on our scheduled upgrade, which ultimately makes myself (the integrator) and my team look bad.

Going back many years, I encountered a similar issue on a Sonicwall firewall with IDS/IPS services enabled.  I had the IPS services configured to download the latest updates as soon as they were available.  During the middle of a business day all of a sudden Internet browsing stopped working for everyone.  The IPS began sending me thousands of alerts, one for each time a user attempted to access a webpage.  Sonicwall had released a bad rule which matched all normal HTTP traffic.  In only a moment I was able to disable the rule.  However, I was thankful I was sitting at my desk and not on the beach at the time.

There is a big risk with being first when it comes to upgrades (patches and major versions alike).  Think about that every time your Windows workstation applies the latest patches just released from Microsoft.  If something is wrong with those patches, you’ll be one of the first to discover there is an issue, and that could cause you downtime.  This may not be much of an issue with a home computer, but in a business could cause significant downtime, especially if hundreds of workstations and/or servers are involved.

When it comes to patching, I recommend using some sort of patch management such as WSUS, SMS, RHEL Satellite Server, or the countless other tools designed specifically for your environment.  These tools will allow you to schedule updates at your business’s pace.  It is best to create three primary groups, alpha, beta, and production, and schedule updates at different times for each group.  If you can’t afford one of these solutions, there are countless examples of how to setup free “home grown” solutions to perform their functions.

Place a few of your most tech savvy administrators in the alpha group.  Always release updates to this group first, and wait at least half a day before releasing updates to the beta group.  The beta group should comprise of a couple tech savvy users from each major department within your organization.  Never release updates to production until they have been tested by beta users for at least 24 hours, preferably 48 hours.  Always communicate with each group when you are releasing updates, and make it clear to your alpha and beta users that you need them to test and provide feedback by a certain time.

This process must be tweaked slightly when applying patches to enterprise systems that multiple users access.  Within these environments, you should have the system duplicated to represent an alpha, beta, and production group (aka development, testing, and production).  The user groups represented for the alpha and beta groups should have access to the alpha and beta enterprise system so that they can perform testing for you.  Unfortunately because production data will seldom exist on development and testing systems, you will have to assume some risk when you go to production with any patch.

In the end, by allowing your alpha users to be on the leading edge, you will reduce the impact of bad patches across your organization.  While bad patches don’t happen often, they do happen, and they do cause major headaches, which ultimately points back to you for releasing them.

IAM Upgrades

Major version upgrades on IBM security software has been occupying my life for the past few months.  Many of my friends have been wondering where I am, and I should refer them to SIM and SAM who could better answer that question.  Of course, just a few months ago it was TIM and TAM, but the bean counters over at IBM figured it made more sense to brand their IAM (Identity and Access Management) products as “Security” instead of using the “Tivoli” name.

The trickiest part of the upgrades has been the dependency list for required software.  For example, both the TFIM, TAM, and ITIM products all utilize WAS (WebSphere Application Server).  However, TFIM may be compatible up to WAS 8, whereas ITIM is only compatible up to version 7 (this is only an example of course).  While one can run multiple versions of WAS within their enterprise, that presents a nightmare when it comes to patching, maintenance, and training new admins.  Therefore, the major products (TFIM, TAM, ITIM) and their sub-components (WAS, TDS, TDI, DB2) must be upgraded in a specific order to maintain compatibility across versions and ensure the final upgraded system is secure and maintainable.

I found the TAM->SAM upgrade to be the easiest upgrade since the SAM products have traditionally used RPMs for RHEL systems.  The only gotchas I encountered doing these upgrades was a few minor things in the WebSEAL config file did not get properly set for the new version to start.  Also, SMS (Session Management Server) is incredibly difficult to upgrade in place without suffering a lengthy downtime.  The easiest thing was to stand up new instances of SMS on new servers, cut over to them, and remove the old servers.  The other SAM components could all be done in place without an issue.

TFIM was a bit tricky due to the underlying WAS upgrade.  IBM stated that TFIM WAS instances could not be upgraded in place.  Note to vendors, don’t ever tell me it can’t be done.  I found that by copying the TFIM configuration files to a temporary location, then uninstalling TFIM, upgrading WAS, and reinstalling TFIM, I could copy the configuration files back in their same spot on the DMgr (deployment manager), restart the DMgr, and viola! it would upgrade the configuration.  This was done with minimal downtime.

The TIM->SIM upgrade has been more of a challenge.  Due to the vast number of systems that tie into an enterprise identity manager, there have been a number of integration points that have to be tested and tweaked before the final production cut over.  Also, the ISIM API had significant changes to the authentication mechanisms which caused a rewrite of code for any custom applications that utilize ISIM.  IBM also stated that the ITIM WAS instances could not be upgraded in place.  However, I found that by upgrading the WAS instances, then upgrading ISIM, that I had no issues getting the configuration working with minimal issues.  I should also mention we were on the latest WAS fixpacks when this occurred, as earlier fixpacks would not perform the application migrations correctly.

I don’t typically care for in place upgrades of major product versions; however, I was given a resource and time constant for all components involved and estimated that the upgrade path would be accelerated by doing in place upgrades.  Because we had several development strings to perform the upgrades on prior to production, I felt the upgrade instructions would be accurate enough to reduce the risk of performing in place upgrades.  Thus far this decision has paid off in reducing the time and resources required for the upgrades.

Hopefully in the upcoming weeks I have some more tips once the ISIM system is fully upgraded.  Then I need to take a much needed break to work on some personal weather projects so that I can get better data off my PWS through these summer storms.

DB2 Migration and Paths

Today I assisted a coworker in our upgrade of DB2 from 9.5 to 9.7.  He had run into an issue where the instance migration utility kept failing.  Initially the output of the instance migration utility was a simple variable not found.  However, using the debug (-d) option on db2iupgrade I discovered the true issue.  This brings up a good point that we can’t always trust the first error message we see on the screen.  A lot of times issues are much more deep than they appear.

After reviewing the debug information, I had to sift through IBM’s various instance scripts (written in bash script) to discover why the procedure was having an issue.  I had to modify the scripts to output the exact command that the scripts were running where the issue was occurring.  Unfortunately I could run the command as the root user with no issues, which led me to believe there was an environmental issue.  I began comparing the environments for both the root user and db2 instance owner user between this server we were having issues on and servers we did not have the issue on.  I quickly discovered someone had set the PATH variable to include the db2 bin directory.  I suppose because the PATH variable included the previous version’s bin directory, the instance scripts were getting run for the wrong version, hence causing things to error out during the instance migration.

After resolving this simple PATH issue (it ended up being in the instance owner’s bash_profile), I thought of some good advice from a senior coworker years ago.  He told me the more complex a problem seemed, the greater chance the underlying cause was very simple.  I have found this to be true in at least 90% of the issues I work with.  Now, onto the next challenge, TFIM plugin class loader problems and Identity Manager upgrades!

Also, for reference, here was the error we saw:

/opt/ibm/db2/V9.7/instance/db2iupgrade -u db2fenc1 db2inst1
...
/opt/ibm/db2/V9.7/instance/db2iutil: line 2498: DB2INSTVER: parameter null or not set
DBI1124E  Instance db2inst1 cannot be upgraded.
...
/opt/ibm/db2/V9.7/instance/db2iutil: line 2502: [: too many arguments

Today’s Weather

Barn Feb 22 2014Today’s weather was a much needed break from the monotony of winter.  Blue sky, the grass is visible again, and the barn is open and airing out.  Beautiful!

 

 

 

Sky Feb 22 2014Here is a view from my weather cam from today. It has been breezy all day which is really helping dry things out.

 

 

 

Weather Feb-22-2014

The tower sensor in the barn was pretty close with the outside 5-in-1 sensor earlier. However, now that the sun is directly on the barn it is a few degrees warmer in there now. The Acu-Rite bridge allows for up to 3 sensors, 1 including the 5-in-1. Technically I could add another tower sensor. From what I have read on forums, you can actually connect as many sensors as you want, they just won’t show up on the My Backyard Weather page. However, if your crafty and pull the data from the bridge you can get the data for those additional sensors.

Moving WordPress to Root

I found a couple great resources for those that installed WordPress in its own directory but want to move it to root.  The first, Moving WordPress, talks about how to physically move all of the files.

However, I found another article, Giving WordPress Its Own Directory, which I followed parts of to quickly get WordPress working on my root directory but leaving it installed in its own directory.  It was rather simple:

  1. Login to the admin console, go to General Settings, change the Site address (URL) to your domain http://www.yourdomain.com and save the changes.
  2. Copy from your wordpress directory index.php and .htaccess (I used cp -a to ensure permissions stayed the same) into your root directory
  3. Edit index.php in your root directory and add, on the require dirname FILE line, the directory name of where wordpress resides.

Awesome!  This allowed me to install WordPress, get it working at www.mljenkins.com/wordpress, then make it available at my root directory at www.mljenkins.com without having to go through a major headache of moving tons of files.  It would also allow me switch from WordPress to another solution in the future fairly easily.

Speaking of which, make sure you keep flexibility in mind when installing a new solution.  It’s best not to back yourself into a corner where you must use the same solution from now until eternity without suffering major downtime.  Designing your solution with flexibility in mind will allow you to part with it in the future if it gets too expensive (monetarily or time) in the future.