WebSphere
®
Application Server for distributed
WebSphere Application Server for z/OS
®
Versions 3.5 through 6.0
Common malpractices
Eleven ways to wreck a deployment
Version date
: April 14, 2006
Level
: Introductory
IBM
®
Corporation
WebSphere Serviceability Team – SWAT
Katie Barrett (
katie@us.ibm.com
), Manager, WebSphere Serviceability, IBM Austin Lab
Yasuko Bryan
(ybryan@us.ibm.com),
WebSphere Serviceability, IBM Austin Lab
Alan Booth
(aebooth@us.ibm.com
), Manager, WebSphere Customer Support, IBM Raleigh Lab
David Cai
(davecai@us.ibm.com)
, WebSphere Serviceability, IBM Pittsburgh Lab
Melinda Carter (
mlcarter@us.ibm.com
), WebSphere OS/390
®
– Level 2 Support, IBM Poughkeepsie Lab
Mark T. Schleusner (
schleus@us.ibm.com
), WebSphere Serviceability Development, IBM Rochester Lab
Rex Simmons
(simmonre@us.ibm.com)
, WebSphere Knowledge Engineering, IBM Austin Lab
William Wentworth (
wkwentw@us.ibm.com
), WebSphere Information Development, IBM Austin Lab
Many thanks to the following people for their contributions to this white paper:
David Burdine, John Bukiewicz, and
Paul Griffiths
pg_0002
WebSphere Application Server malpractices
Copyright ®2006, IBM Corporation
-2-
Version Date
: April 14, 2006
Table of contents
Table of contents
............................................................................................................................................................. 2
ABSTRACT...................................................................................................................................................... 4
INTRODUCTION.............................................................................................................................................. 4
MALPRACTICES IN THE PROJECT LIFE CYCLE ........................................................................................ 5
DEFINITION OF MALPRACTICES.................................................................................................................. 6
In-planning stage (requirements - functionality and constraints)................................................................................. 6
1. No capacity or scalability plan........................................................................................................................... 6
2. No education ...................................................................................................................................................... 6
3. No current architecture plan............................................................................................................................... 6
Development stage (design, implementation) .................................................................................................................. 7
4. Blind to an application error .............................................................................................................................. 7
Validation stage.................................................................................................................................................................. 7
5. No production traffic profile .............................................................................................................................. 7
6. No load or stress testing..................................................................................................................................... 7
7. No test environment = production environment ................................................................................................ 7
8. Changes put directly into production ................................................................................................................. 7
Production or post-production stage................................................................................................................................ 8
9. No migration plan .............................................................................................................................................. 8
10. No record of changes ......................................................................................................................................... 8
For all stages ....................................................................................................................................................................... 8
11. Communication breakdown ............................................................................................................................... 8
PERCENTAGE OF OCCURRENCES OBSERVED IN 2005 WEBSPHERE APPLICATION SERVER
EXTENDED SERVICEABILITY TEAM ENGAGEMENTS ............................................................................... 9
Observation: ..................................................................................................................................................................... 9
SAMPLE STORY AND POSSIBLE SOLUTION FOR EACH MALPRACTICE ............................................. 10
Planning requirements - functionality & constraints ................................................................................................... 10
1. No capacity or scalability plan......................................................................................................................... 10
2. No education .................................................................................................................................................... 11
3. No current architecture plan............................................................................................................................. 14
Development-design and implementation...................................................................................................................... 15
pg_0003
WebSphere Application Server malpractices
Copyright ®2006, IBM Corporation
-3-
Version Date
: April 14, 2006
4. Application error.............................................................................................................................................. 15
Validation to ensure that the software meets the needs ............................................................................................... 18
5. No production traffic profile ............................................................................................................................ 18
6. No load or stress testing................................................................................................................................... 19
7. No test environment = production environment .............................................................................................. 21
8. Changes put directly into production ............................................................................................................... 22
Production and post production ..................................................................................................................................... 24
9. No migration plan ............................................................................................................................................ 24
10. No record of changes ....................................................................................................................................... 25
In all stages ....................................................................................................................................................................... 26
11. Communication breakdown ............................................................................................................................. 26
IMPACT OF PRODUCTION OUTAGES TO YOUR BUSINESS ................................................................... 27
CONCLUSION ............................................................................................................................................... 28
APPENDICES ................................................................................................................................................ 29
pg_0004
WebSphere Application Server malpractices
Copyright ®2006, IBM Corporation
-4-
Version Date
: April 14, 2006
Abstract
The WebSphere Serviceability Team has the opportunity to visit customer sites to solve complex situations.
Although this situation is not a regular practice, we extend our service when on-site support is absolutely
necessary. After each visit, we document what we have done and observed on site. Often, we revisit our
database to determine how and why our customers got into trouble.
This white paper explains the practices and patterns that we have observed while assisting customers. This
information is used to raise your
awareness. Examples are provided for a better understanding of how and
why the trouble occurred. We also explain the possible solutions for each practice.
Introduction
Did your mother tell you "do not run with scissors!" or "Brush your teeth?” Running with scissors is a bad
idea. Brushing your teeth is a good idea. If this white paper were about child raising, running with scissors
would be called a "malpractice." Malpractice is something that you should not do; a practice that, if you do it
repeatedly, the practice can eventually hurt you. Brushing your teeth represents a best practice as it is
something in which you would benefit if you did it repeatedly.
This white paper talks about malpractices in the context of Information technology (IT) organizations. The
information is based upon our observations from working with customers in emergency situations for the last
5 years. Companies increasingly depend upon their Web sites as part of their business operation. Any
downtime costs money, which makes time and tempers short. Getting the Web site up quickly is essential.
However, getting the Web site up while performing malpractices will backfire on you one day. We have
observed that two thirds of our customers with extended serviceability engagements had committed at least
one malpractice. However, we did not observe any malpractice with the remaining one third of our extended
serviceability engagements.
We do not think that you will wake up one morning and realize that what you are doing is actually a
malpractice. We do not think that you intentionally plan to commit one, two or even all eleven malpractices.
Maybe you have never done any of these malpractices. However, watch out when someone tries a short cut
as it might be a malpractice.
Katie Barrett
Manager, IBM WebSphere Serviceability Team
pg_0005
WebSphere Application Server malpractices
Copyright ®2006, IBM Corporation
-5-
Version Date
: April 14, 2006
Malpractices in the project life cycle
We observed eleven malpractices in our engagements and we have separated them in the life cycle of
WebSphere Application Server. We divide the WebSphere Application Server life cycle into the following four
stages:
Planning
Development
Validation
Production and post production
We have listed the most common malpractices that are seen in each stage as shown in the following
diagram. The observations
spanned all of the WebSphere Application Server platforms, including distributed
and z/OS.
Stages Malpractice
No capacity or transaction plan
No education
No current architecture plan
Application error
No production traffic diagram
No load or stress testing
No test environment = production environment
Changes were put directly into production
No migration plan
No record of changes
For all stages ----------------->
Communication breakdown
Planning
Development
Validation
Evolution
pg_0006
WebSphere Application Server malpractices
Copyright ®2006, IBM Corporation
-6-
Version Date
: April 14, 2006
Definition of malpractices
This section defines the different types of malpractices.
In-planning stage (requirements - functionality and constraints)
This fundamental stage determines what the final application does and how the resources are allocated. Key
plans for the life cycle of the application are made or neglected in this stage.
1.
No capacity or scalability plan
Part of the planning stage determines the amount of data that will flow into, through, and out of
the application. The type of data and the volume of that data will grow over time. The following
examples are problems that we see from a customer who does not have a capacity or
scalability plan:
No prediction for their needed production capacity or response time
No prediction of which transactions will occur, in what combinations, and how often
No consideration for the length of each transaction
No prediction of how many concurrent users will be in production
No periodic update to the plan as market changes, for example, the holiday season
2.
No education
Time and resource constraints to go to class or study educational materials results in limited
knowledge on the following topics:
Problem determination
Performance tuning
Available features
Performance tuning
3.
No current architecture plan
One of the following architecture conditions exist:
A diagram of the application flow between the various software products does not exist.
A diagram of where these products reside in the topology does not exist. Where is IBM
DB2
®
located? Where is WebSphere Application Server located? What is in the cluster?
A current architectural diagram does not exist, but a diagram does exist that is based upon
the architecture from 4 years ago.
pg_0007
WebSphere Application Server malpractices
Copyright ®2006, IBM Corporation
-7-
Version Date
: April 14, 2006
Development stage (design, implementation)
The development stage is the creation stage, which builds a foundation for the production stage. This stage
is more than just the original development of the application; it also includes all of the enhancements or
modifications to the application.
4.
Blind to an application error
The following list explains some common issues:
Allocation of large objects causes the heap size to grow too big.
Redundant computation calls causes increased CPU usage.
Infinite loops cause increased CPU usage.
Validation stage
This testing phase verifies that the application works as you planned in a complex environment. The
environment is similar to your production environment in terms of the network, hardware configuration, back-
end database, and load.
5.
No production traffic profile
A diagram of network, routers, switches, and hubs does not exist.
Data on the capacity of the networks or network segments does not exist.
6.
No load or stress testing
Load or stress testing is not done.
Load or stress testing is done, but the testing is not based upon the production load.
The transaction pattern is not simulated accurately to the production transaction pattern.
The load is simulated accurately, but the back-end database size is significantly smaller.
7.
No test environment = production environment
The test environment is too small or not available when it is needed.
The test hardware, network, and software are different from the production environment.
The z/OS LPAR on the same machine or network is not isolated from the production
system.
The configuration settings for the test systems are different from settings for the production
systems.
8.
Changes put directly into production
Changes to a configuration and a fix are put into the production system directly before it is
discovered that the changes do not work.
pg_0008
WebSphere Application Server malpractices
Copyright ®2006, IBM Corporation
-8-
Version Date
: April 14, 2006
A test environment does not exist that is equivalent to the production environment, which
can be used to simulate the problem or the load. The customer is forced to test the changes
live in the production environment.
Production or post-production stage
This stage is the key for the life cycle of the application and is far longer than the original application
development. However, it is possible to overlook this stage. The application is now in production so problems
are very public and costly.
The software that the application is built upon evolves. Thus, co-requisite packages need upgrades and
integration testing. Hardware and operating system might be upgraded and then the application itself might
expand or interface with additional applications. All these changes require adequate planning, documentation
updates, and plan completion. The following related issues have been observed:
9.
No migration plan
Adequate time is not allocated for migration.
Enough investigation time is not taken to determine how new Java™ 2 Platform, Enterprise
Edition (J2EE) specifications affect the system.
Developers are unaware of the co-requisite software requirements.
10.
No record of changes
A record of changes is not made. Changes are made under stress.
Multiple groups make changes to the environment without records and without
communicating with each other.
For all stages
11.
Communication breakdown
Communication is critical in today’s business environment where things change rapidly. The key is
clear, clean communication between the right people. Miscommunication frustrates other people
and can lead to a misdirected effort. In some cases, effective communication was not observed
between customers, IBM and customers, and customers and the various vendors.
pg_0009
WebSphere Application Server malpractices
Copyright ®2006, IBM Corporation
-9-
Version Date
: April 14, 2006
Percentage of occurrences observed in 2005 WebSphere Application Server Extended
Serviceability Team engagements
WebSphere Application Server Extended Serviceability Team works with a small percentage of WebSphere
Application Server customers. However, malpractices are a very common occurrence among those
customers and are often directly related to the situation in which the customers find themselves. The
following chart describes the frequency of the malpractices for the WebSphere Application Server Extended
Serviceability customers in 2005.
Note: Although the sample was taken from WebSphere Application Server customers, the malpractices are
not specific to WebSphere Application Server. For example, if you do not complete stress testing on
WebSphere Application Server, then you might not complete it on the other products or applications in that
same environment.
6.43%
13.57%
2.86%
18.57%
6.43%
7.14%
19.29%
7.14%
3.57%
5.71%
9.29%
No Capacity or Scalability Plan 1
No Education 2
No Curr ent A r chitec ture. Plan 3
A pplication Error 4
No Production Traffic Prof ile 5
No Load or Stress Testing 6
No Test or Production Environment 7
Changes put directly into Production
No Migration Plan 9
No Record of Changes 10
Communications Breakdow n 11
Frequency of Malpractice
Observation:
“No test environment that is equivalent to production environment” has the highest
occurrences among the customers who got into the highly critical situation.
“Blind to application error” is the second highest occurrence.
“No education” is the third highest occurrence.
“No test environment that is equivalent to production environment “often triggers ”Changes
put directly into production“ and ”No load or stress testing”
pg_0010
WebSphere Application Server malpractices
Copyright ®2006, IBM Corporation
-10-
Version Date
: April 14, 2006
Sample story and possible solution for each malpractice
Planning requirements - functionality & constraints
1.
No capacity or scalability plan
While it is true that a capacity plan is not absolutely necessary, it is a plan that you need if the e-
business ever needs to grow. The plan is not as simple as buying a bigger piece of hardware. The
simple plan will work for a while, but it probably is not the most cost-effective way to increase an
application’s capacity
.
When a good capacity plan is put into place, it includes a good High Availability (HA) plan. The need
for a HA plan is not unusual. Every business wants to get the most work with the least amount of
hardware for the entire business day. A business cannot accomplish those goals without a HA and
capacity plan in place. If the entire application is put on a single piece of hardware, it might be able to
handle the capacity for a while. However, if that piece of hardware goes down, the entire application is
not handling traffic and the business is not receiving income. If that piece of hardware is two smaller
pieces of hardware that can handle the same capacity, the hardware cost is nearly the same price, but
the chances of losing the entire application are now much less. If this process of thought is taken out
even further and in more detail, then it is a good beginning to a pair of capacity and HA plans.
Case study:
The following case study shows the importance of a good, well-researched capacity plan. The
customer thinks that the client’s workload is not getting properly distributed, which results in a capacity
issue. The customer had planned for enough hardware and had incorporated a good HA plan into the
capacity plan. However, the application is not able to handle more capacity as demand increases.
CPU, memory, and network utilizations are all thought to be the problem.
As the code review of the application is completed, the problem is discovered in the application. The
client is doing cell-scoped Java Naming and Directory Interface (JNDI) calls where every client looked
up the enterprise beans through the deployment manager. Even though the application is spread out
across six nodes, the deployment manager is the problem.
The point here is that a capacity plan is more than a hardware plan. The capacity planning needs to
take the application, the application deployment, and the application client use into account. When
the naming calls were restructured to go to cluster members, the problem went away and the
application was able to handle more traffic. The WebSphere Application Server Development Team
learned from this problem and now caches naming more strategically in response to the client calls
.
Possible solution and preventive care:
When you put a capacity plan together remember that the plan is for what both the hardware and the
application can handle. The biggest mistake in capacity plans is overlooking the application. With a
pg_0011
WebSphere Application Server malpractices
Copyright ®2006, IBM Corporation
-11-
Version Date
: April 14, 2006
well-tuned application design, you can save hardware costs. Do not overlook what the application
uses.
Use the following checklist to prevent this malpractice:
What does the capacity plan for the back-end database say it can handle for an additional
load?
What can the network routers handle for increased network traffic?
What can the Lightweight Directory Access Protocol (LDAP) server handle for user
authentication?
If the application is spread out across a Wide Area Network (WAN), are remote calls kept to
a minimum?
Is part of the hardware also being used for disaster recovery or HA?
What is the maximum amount of hardware utilization before you need to provide additional
hardware?
Are object types being properly used for increased capacity?
o
Entity enterprise beans with a caching policy for the bulk of client database
reads
o
Stateful enterprise beans have been passivated to a fast storage device
2.
No education
WebSphere Application Server built to comply with an open, but evolving standard. Each new release
of WebSphere Application Server conforms to a succeeding standard of J2EE for application servers.
As the standards change, so do the succeeding releases of WebSphere Application Server.
Customers generally take the time to learn their first application server, but unfortunately a few
customers do not take the same time to learn the changes that have occurred in subsequent releases.
For example:
o
What is new?
o
What is deprecated?
o
What is not there any more?
The typical reasons for not investing in education are “We do not have time to send someone away to
the class” or “Do not tell me about education now!” or “We can not afford to have key people out of the
office”. The lack of education leads to the following production crisis:
Migration problems: Each Application Server release evolves from the preceding release.
Customers who do not review their application for any changes it might need to run
efficiently under the new release might have the following problems:
Performance problems: What was efficient in the
n
release might not be optimal in
n
+ 1 or
n
+ 2 release.
Functional problems:
pg_0012
WebSphere Application Server malpractices
Copyright ®2006, IBM Corporation
-12-
Version Date
: April 14, 2006
The function that is used in the
n
release might be depreciated in the
n
+1 release
and removed in the
n
+2 release.
New function is not used. This opportunity might be missed because each
enhancement reflects the increasing capability that is requested by customers and
reflected in the new J2EE standards.
The original application was not J2EE compliant, but the release to which the
customer is moving only handles J2EE compliant applications.
2. Integration problems: Each release has established explicit levels of pre-requisite or co-requisite
software. You might encounter problems and affect the performance of your application if you do
not check and migrate to the pre-requisite and co-requisite software,
3. Frustration, which usually shows when WebSphere Application Server does not work as the
customer expects.
Case study:
A toy company started their online site using WebSphere Application Server Version 3.5. The
company successfully moved to version 4.0 using the version 3.5 webserver plug-in without making
any application changes. Now, version 4.0 is out of support. Their online business has also grown so
they need the added performance of version 5.1 or version 6. The company decides to move to
version 6 because it would provide them with more time on a single level of WebSphere Application
Server.
The company buys WebSphere Application Server Version 6, installs it, and then deploys their
application. However, the application does not deploy and nothing works. The company files several
Problem Management Resolution (PMR) reports, tells IBM Support that their application used to work
correctly, and believes that WebSphere Application Server must be the culprit because they did not
change their application. They are frustrated.
Were they aware that WebSphere Application Server Version 5.1 and Version 6.0 conformed to the
J2EE standard? Yes, but because the application ran on Version 4.0, which also was J2EE compliant,
they believe that the problem has to be with WebSphere Application Server.
Did they investigate what is in WebSphere Application Server Version 6, which is their target version
level? Did they check all of the changes that are required to move from the version 4 plug-in to
version 6? - No
Did they allow any time to learn the WebSphere Application Server enhancements and changes? -
No
Did they plan to learn about these enhancements and changes? - No
pg_0013
WebSphere Application Server malpractices
Copyright ®2006, IBM Corporation
-13-
Version Date
: April 14, 2006
Possible solution and preventive care:
Allow time to periodically review the application for the following changes:
o
Changes in business needs
o
Sections that are affected by J2EE capability changes
This issue implies that you need the time to learn about the new functions or the depreciated
functions in the new WebSphere Application Server releases. Education no longer implies that
you must go to the classroom or decipher a manual by yourself. IBM now has the following free
educational opportunities:
a. IBM Education Assistant:
The IBM Education Assistant (IEA) enables you to learn from your desktop without going to a
classroom. There are many educational plug-ins that are available for Rational
®
, WebSphere, and
Lotus
®
products, including WebSphere Application Server and WebSphere Portal Server. For more
information, see the following Web site:
http://www.ibm.com/software/info/education/assistant/
b. Technical Exchange:
The WebSphere Support Technical Exchange is another free learning opportunity. These learning
opportunities involve live presentations that are open to all customers on topics of interest. You
can suggest topics for these sessions. All sessions include a question and answer time and
presentations are done several times per month. Previous talks are available for viewing after the
live presentation. For more information, see the following Web site:
http://www.ibm.com/software/websphere/support/supp_tech.html
c. Useful technotes