Archive for October, 2007

CHAPTER 11 PRODUCTION TROUBLESHOOTING METHODOLOGY 1. Configure (Web host 4 life)

Friday, October 12th, 2007

CHAPTER 11 PRODUCTION TROUBLESHOOTING METHODOLOGY 1. Configure a large heap, and spend time tuning the heap to maximize its performance to support such large sessions. 2. Perform an architecture review on their application, and set forth a plan to reduce the session size. When application problems are identified, the problem should be triaged to the correct application and correct component. Again, the goal is to reduce the amount of time wasted by the involvement of unnecessary parties: if the problem is in the data persistence layer, then you do not need to take time away from the visualization team. But be aware that problems are usually not absolutely definitive, so you need experience, skill, and a good diagnostic tool to be able to identify offending components. And yes, performance problems can easily span multiple components, but as the Java EE administrator, you need to isolate the problem as much as you can. Note A common situation I see in the field is that companies are beginning to build new common application infrastructures consolidating numerous, separate departmental or other types of application deployments into a single, highly powered infrastructure. This facilitates common tooling, deployment, and management practices, but also presents major challenges around identifying which application and component in a large pool of applications causes a particular issue. This situation is where application- and transaction-level detailed isolation is paramount. And SOA further complicates this management problem. Non Java EE troubleshooting at this phase follows a similar approach: the administrator must determine if she can solve the problem or if the problem is in details outside of her control. For example, a DBA might determine that the root cause of performance problem lies inside a stored procedure that he does not own. After analyzing the explain plan, he determines that he can create additional indices to mitigate the impact of the problem. He makes the changes, and then forwards the problem to the database developer responsible for the stored procedure for the true fix. Similarly, in some packaged application environments where the application source code cannot be modified, DBAs still have the ability to configure the database to interpret bad SQL code in a more efficient way, so the ability to see that bad SQL code and automatically evaluate all possible alternatives is essential. The important thing in level 2 support is to clearly define the person in each technology tier who is responsible for handling these problems. When the NOC staff finds a problem and triages it to a specific technology, they must have a specific individual to forward that problem to. Level 3 Support From a Java EE perspective, level 3 support consists of application support engineers, programmers responsible for maintaining application code and troubleshooting bugs. Development organizations typically maintain groups of developers building the next release(s) of their applications and groups that support the existing release(s). Maintaining these two distinct groups is important, because all applications have bugs, and usage patterns can never be completely anticipated; so application releases have to be supported. Each time a support issue arises, you do not want it to affect the schedule of the next release.
Note: If you are looking for cheap and reliable webhost to host and run your mysql application check mysql web server services.

Web hosting bandwidth - 308 CHAPTER 11 PRODUCTION TROUBLESHOOTING METHODOLOGY

Thursday, October 11th, 2007

308 CHAPTER 11 PRODUCTION TROUBLESHOOTING METHODOLOGY Logging: The support ticket is logged for other support representatives to see when they receive calls. Tracking: Tracking (that is, following a support ticket from inception to resolution) is important to ensure that it does eventually attain closure; only through tracking can support representatives determine whether an issue has already been captured. Analysis: The number of support tickets and the specific areas experiencing problems are gauges of the health of the performance of an application; an analysis of support tickets may help justify replacing an application server vendor, network devices, and so on. Additionally, this analysis will quantify improvements in mean time to resolution and other core indicators related to the organizational process and tools used for triaging production issues. Remember that in order to learn from mistakes and optimize your production support workflow, tracking and analyzing all support issues that arise is of the utmost importance and is an ongoing process that must continually improve. Level 2 Support Level 2 support is composed of technology administrators, each responsible for his or her own technical stack. On the Java EE side, the Java EE administrator is the level 2 support. To review, the Java EE administrator is responsible for all application server instances, deployment topology, and configuration options; therefore, this person determines whether a particular alert is the result of an application issue or a container issue. More specifically, the Java EE administrator does the following three things: Determines if the long-term fix for the reported alert is application or configuration related Triages the alert to the appropriate application and component owner, if the problem is application related Determines if initiating a short-term configuration change can mitigate the impact of the alert Determining the root of a problem is important, but determining whether a short-term solution that can mitigate the impact of the problem until a proper solution can be implemented is equally so. If only a long-term solution is considered, it may be underarchitected and poorly implemented in the interest of providing a solution to meet SLAs. For example, I have been at several customer sites where HTTP sessions were unnecessarily large, which led to poorly performing garbage collections and even out-of-memory errors. The proper solution to this session-size problem is to refactor the session implementation and associated code to reduce the amount of data stored in the session, but this refactoring is a major undertaking that could require months to implement properly. Simply identifying the core problem was not enough to both satisfy user requirements today and accurately build and test a long-term solution. Therefore, I helped them implement the following two-phase plan:
We recommend you use shared web hosting services, because many users agree that it is cheap, reliable and customer-satisfying webhost.

CHAPTER 11 PRODUCTION TROUBLESHOOTING METHODOLOGY In the (Web design software)

Wednesday, October 10th, 2007

CHAPTER 11 PRODUCTION TROUBLESHOOTING METHODOLOGY In the best-case scenario, your alerts encompass the majority of performance problems, so that your early defense system can identify them before SLAs are violated and users are affected. If a performance problem slips past your early defense system, the next catchall detection tool is your SLA monitoring. SLAs are configured to meet user requirements, so you should not receive user complaints while you are satisfying SLAs. Once SLAs are violated, users may start complaining, but if your SLA monitoring solution alerts you to the problem early enough, then you can either resolve the problem completely or at least inform users who do complain that your organization is aware of the problem and actively seek a remedy. Note Being able tell upset users that you are already aware of the problem and seeking a solution always reflects better on your organization s technical abilities than being caught off guard. Although this does not necessarily meet their immediate needs well, consider the alternative acting surprised that the problem occurred and asking the user to describe the symptoms. While the user may feel some degree of personalization that you are looking at his or her specific problem, the typical reaction after such a user hangs up the phone is to complain about your organization s incompetence and reflect on how you should hire him or her to solve your problems. The user response when you acknowledge the problem is either neutral, or maybe a mild annoyance, while the response when you do not know about the problem is dramatically negative. All you can do is choose the lesser of two evils! When users trigger your alerts, then the impact is more severe, and the mechanism that they use to alert you can be an indicator of how severe the problem is: if they click a support link or send an e-mail, then there is a legitimate problem, but by the time they call your support number, they are typically irate. You still have to quickly address e-mail complaints, but the urgency is not as great as if calls are coming in! Obviously, identifying performance problems before your users are affected is preferable, but if users do find problems for you, doing a postmortem analysis on the alert is important. Specifically, you want to identify the symptoms of the problem just prior to user complaints by looking at the historical data that your monitoring solution captured, and correlate the behavior of the system to the root cause of the problem that you discover. With any luck, you will be able to develop a new intelligent alert, so that you will be better equipped to detect this problem in the future. In general production issues can be categorized as either intermittent or persistent. Level 1 support must be prepared to detect and triage both. Intermittent issues, while typically lower in priority than similar types of persistent issues, tend to be more challenging to capture and require monitoring technology that can be set up to trigger alerts and take actions based on advanced rules using combinations of measurements from across multiple tiers of an application environment. Additionally, the monitoring tool must store diagnostic and other details for historical diagnosis as well as trending and capacity planning. The final activity that level 1 support must perform when responding to an alert is to open a support ticket and initiate alert tracking. This is important for the following three reasons:
Note: If you are looking for cheap and reliable webhost to host and run your mysql application check mysql web server services.

306 CHAPTER 11 PRODUCTION TROUBLESHOOTING METHODOLOGY Intelligent (Java web server)

Tuesday, October 9th, 2007

306 CHAPTER 11 PRODUCTION TROUBLESHOOTING METHODOLOGY Intelligent alerting is the key to staying several steps ahead of your users, but it is also the most difficult part of monitoring. The best way to build intelligent alerts is to bring together an expert from the NOC, who understands and can configure the monitoring solution, with the chief architect for the project, who maintains detailed knowledge of each tier in the technology stack. The chief architect is able to define intelligent alerts while the NOC expert can articulate those rules to the monitoring solution. Many times the chief architect finds it helpful to include representatives from various areas in the technology stack in such a discussion, as his or her vision for intelligent alerts must be substantiated by specific low-level pieces of data. As an example, the chief architect may state an alert is needed when user requests are pending, and the response time varies atypically; but the Java EE system administrator can translate that into an alert when queue depth is greater than 5, and the standard deviation of response times differs more than 25 percent from its average. The chief architect has the vision, while technology administrators have the knowledge both to advise the chief architect and realize his or her vision. Level 1 Support The principle responsibilities of the first level of support are to triage the alert to the appropriate technology domain and track the status of the alert throughout its life cycle. The NOC, or production support help desk, representative answers the following questions: How was the alert triggered? What technology component owns the alert? Are users being affected yet? What is the severity of the alert? Who is the technology administrator responsible for handling the alert? In other words, the NOC representative must establish the context of the alert by answering each of the aforementioned questions. The trigger of the alert is a leading factor in establishing the severity of the alert, but calling the trigger out explicitly on its own is important. Some common triggers include the following, in order of severity from lowest to highest: Intelligent alerting: These early warning alerts are identified by the monitoring solution. Violated SLA: SLAs can be evaluated through two mechanisms: passive and active. In the passive monitoring of SLAs, the monitoring solution watches live user requests as they occur and records metrics about those requests. In active monitoring, the monitoring solution generates synthetic transactions that target key points in the application. If either of these indicates that SLAs are violated, then an alert is triggered in the monitoring solution. User e-mail: One or more users can send e-mail to support indicating poor performance or the failure of a piece of functionality. User phone calls: Users are dissatisfied enough that they call support directly.
We would like to recommend you tested and proved virtual web hosting services, which you will surely find to be of great quality.

Web site development - CHAPTER 11 PRODUCTION TROUBLESHOOTING METHODOLOGY The NOC

Monday, October 8th, 2007

CHAPTER 11 PRODUCTION TROUBLESHOOTING METHODOLOGY The NOC (level 1 support) receives the trigger, opens a production support case, and triages the alert to the appropriate technology administrator (level 2 support). While the NOC is responsible for triaging and tracking support cases, the technology administrator is the first tier that can attempt to resolve the problem. For example, if the issue is triaged to a specific database, then it should be forwarded to the database administrator for the offending database; to avoid another bottleneck (created by sending a database issue to all database administrators), the alert is forwarded with context to the appropriate database administrator. The technology administrator s role is usually confined to nonapplication changes (configuration changes). For example, a database administrator can set up new indexes or move data stores to different hard drives, but unless the administrator also wears the database application support engineer hat, he or she should not change the underlying SQL code. If the technology administrator cannot resolve the issue through the configuration of the technology, then he or she identifies the offending application and forwards the alert to the appropriate application support engineer (level 3 support). The application support engineer is a technical representative in his particular domain; for example, an application support engineer in the application tier has development experience and may have been involved in the team that originally built the application. He or she has the ability to change code to resolve issues, but if the problem is deemed architectural in nature, requires a feature modification, or simply cannot be fixed by the application support engineer, then the alert is elevated to the appropriate development team (level 4 support). The development team may be local or off-site, but one architect or team lead should be the point of contact within the development team that the application support engineer engages for such alerts. The team lead, then, can determine who to remove from current development efforts to resolve the problem. In this tier the most visible impact on the organization can be observed: developers working either on a new project or the next version of an existing project must delay their work to resolve a production problem. This delay can impact release schedules and feature sets, which can lead to losses in competitive edge, and hence sales and revenue. Figure 11-1 illustrates the path that a Java EE application related alert travels through the production support workflow, but other technologies follow a similar pathway. The key to optimizing this workflow is to identify each tier in your environment and determine how far each alert could conceivably travel before being resolved. Then optimize the workflow accordingly by defining roles at each major checkpoint between the NOC and that final alert recipient. These checkpoints should occur in technology locations where the issue can either be resolved by an individual or forwarded to the next tier. Triggers Triggers come in one of two flavors: User-initiated triggers Intelligent alerts User-initiated triggers are particularly bad: your users observed errors or poor performance in your application before you did! If alerts are properly constructed, user-initiated triggers can be avoided. That is not to say that users will not complain, but rather you should know about the problem before they do, so that you can be working on a solution when they contact you.
Visit our web design programs services for an affordable and reliable webhost to suit all your needs.

304 CHAPTER 11 PRODUCTION TROUBLESHOOTING METHODOLOGY The (Web hosting resellers)

Sunday, October 7th, 2007

304 CHAPTER 11 PRODUCTION TROUBLESHOOTING METHODOLOGY The Production Support Workflow When an alert is triggered, either by user feedback or an intelligent alert, it is sent to the NOC that triages the issue to the appropriate technology administrator. The issue is then either resolved or forwarded down the chain to the next level of support until the problem is resolved. Figure 11-1 provides a visual representation of the production support workflow. Figure 11-1. The production support workflow The trigger that starts the production support workflow is either user feedback or an intelligent alert. The more intelligent and accurate your alerts are, the fewer user complaints you will receive. Being alerted to a problem by an early warning system is much better than hearing about it from your customers! An early warning system should include synthetic transactions and some mechanism for measuring real user experience on the desktops as well.
If you are searching for cheap webhost for your web application, please visit MySQL5 Web Hosting services.

CHAPTER 11 PRODUCTION TROUBLESHOOTING METHODOLOGY Level

Saturday, October 6th, 2007

CHAPTER 11 PRODUCTION TROUBLESHOOTING METHODOLOGY Level 1 support: Production support help desk or network operations center (NOC) personnel Level 2 support: Technology administrator Level 3 support: Application support engineer, application maintenance engineer, or performance engineer Level 4 Support: Architect or developer The production support help desk of an NOC consists of a team of individuals responsible for identifying performance issues throughout the organization, including Java applications, application servers, databases, mainframes, operating systems, hardware, networks, load balancers, routers, firewalls, and so on. They are truly Jacks-of-all-trades and masters of both the monitoring tool and the escalation process. They have one of the best views of the entire organization s technology stack and are experts at managing their monitoring tools and triaging issues to the appropriate second tier. Technology administrators are responsible for the performance and availability of a specific piece of the technology stack. This group includes Java EE administrators, DBAs, system administrators, network administrators, and the like. Any significant technology stack in your organization needs to have an identified administrator for that technology. In the ideal case, an individual or individuals should be dedicated to the technology administrator s role, but depending on the size of your organization, this role may be another hat that someone must wear. As the third level of support for alerts, application support engineers are responsible for the performance of individual applications or technologies. Application support engineers can exist in any tier running code. For example, a database application support engineer maintains extensive knowledge of stored procedures and functions that are used by an application while a Java EE support engineer maintains extensive knowledge of a component or components. The point is that application support engineers have not necessarily developed the application but have detailed enough code and architecture knowledge about the application s function to fix bugs and make minor modifications to its behavior. Additionally, their job responsibility includes the ability to isolate code-level issues to the specific area in within the code, such as the method, or to identify architectural issues that require code or configuration changes. Finally, the architect, or developer, is a member of the development team who has deep knowledge of the applications; this person may be a technical lead or technical owner of individual application components. In addition to maintaining intimate knowledge of application code, the architect also has the authority to change code or delegate the responsibility to another team member. Note While ideally your organization should have an individual dedicated to each role, these roles can be considered logical. For example, a single individual may fill the database administrator role (configuring tables, indexes, and so on) and the database application support engineer role (managing and maintaining the stored procedures and underlying data). The important thing is that you identify an individual in each of these areas to handle production issues.
Go visit our java server pages services for a reliable, lowcost webhost to satisfy all your needs.

Web hosting control panel - 302 CHAPTER 11 PRODUCTION TROUBLESHOOTING METHODOLOGY An

Friday, October 5th, 2007

302 CHAPTER 11 PRODUCTION TROUBLESHOOTING METHODOLOGY An enterprise monitoring solution must also provide a depth of knowledge across a breadth of technologies. Recall that enterprise applications are composed of a series of tiers that interact to solve a business process; they may include one or more Web servers, application servers, databases, operating systems, firewalls, load balancers, messaging systems, legacy systems, Web services, and other external dependencies, as well as network devices to facilitate communica tions between them. With so many different technology stacks at each tier, no single individual can maintain an up-to-date mastery of them all. Therefore, your monitoring solution needs a depth of knowledge about each layer of each technology stack. For example, monitoring only the application server itself is not enough: this monitoring may point to symptoms of the problem, but without knowing with which components it interacts, the cause can be evasive. The monitor must also provide dedicated user interfaces for the separate core administration groups (for example, database administrators, application administrators, and help desk personnel); without this, these administrators will simply default to their own custom tools. Combining these requirements, we have a 24 7, unattended monitor that provides a depth of monitoring across a breadth of technologies with intelligent alerts that rise above isolated thresholds to assess the impact of observed behavior on business processes. Mixing such a tool with a proven methodology will maximize your troubleshooting efficiency and minimize application downtime and lost revenue. Note Benjamin Franklin is often misquoted as saying Jack of all trades, master of none. Rather he said Jack of all trades, master of one, meaning that a cultured person knows something about everything and everything about one thing. Knowing a little about everything is great, but you should have an area of expertise, which is an area where you know all. Therefore, from a business perspective, you should understand how all of these disparate pieces fit together to solve a business problem, but you also need a specialization in one area where everyone turns to you for answers. And of course, from a personal perspective, if you are a Jackof- all-trades, but a master of none, you might consider what is valuable to you and find your one specialization, but we ll leave that for another discussion. Production Support Methodology Production support methodology is based upon configuring intelligent alerts, specific to an individual enterprise environment, and then identifying the optimal path through the support and development organizations to deliver the alert to the appropriate individual or group. The goal, met through using your triaging process, is for an alert to reach the appropriate individual or group to handle that alert without involving unnecessary groups or individuals. Meeting this goal optimizes resolution time and minimizes the impact of production issues on the organization itself, because for example, a DBA is not involved in troubleshooting an application server issue. Roles of Support Personnel Before following an alert through this methodology, let us meet the players:
If you are looking for affordable and reliable webhost to host and run your business application visit our ftp web hosting services.

CHAPTER 11 PRODUCTION TROUBLESHOOTING METHODOLOGY Prerequisites Before (Web hosting providers)

Thursday, October 4th, 2007

CHAPTER 11 PRODUCTION TROUBLESHOOTING METHODOLOGY Prerequisites Before diving into the production support methodology, a set of monitoring tools needs to be in place if the methodology is to be effective. Without monitoring tools in place, you have no early defense system, and the first casualties of poor performance will be your users. As an analogy, consider why you have smoke detectors in your house. If a fire breaks out in the middle of the night you want to be woken up before the fire reaches your bedroom so that you can evacuate. The smoke is a leading indicator of a fire, and your smoke detector is your monitor. The same is true of enterprise applications, and from the perspective of this book, the tragedy is poor performance. In order for your monitoring tools to be effective, they must exhibit the following qualities: They must monitor your environment 24 7. They must support intelligent alerting. They must exhibit a depth of monitoring across of breadth of technologies that spans, at minimum, end-user experience (both real and synthetic), application servers, and database servers. Real-time visualization tools provide valuable insight into the internal workings and performance of your primary technologies and contribute significantly to resolution efforts, but unless you plan on maintaining three full-time shifts of highly paid workers monitoring these tools 24 hours a day, 7 days a week, you need more. Unattended 24 7 monitoring is the core requirement for any monitoring software. If a problem occurs at 2:00 AM when no one is watching the environment, the monitoring system must detect the problem and alert the appropriate party. Additionally, the monitoring system has a critical requirement to store sufficient historical detail about problems to enable postmortem diagnostics, so that reproducing production issues in a test is not always required. While simple threshold alerting may be valuable to an individual administrator, for example, an execution queue depth of 7 is valuable to a Java EE administrator, a high-level monitoring solution that watches an entire enterprise needs deeper and more intelligent alerts. An intelligent alert acquires and correlates metrics from multiple sources and derives discernable business values from them. It answers the following question: how is this particular condition affecting users? For example, an execution queue depth of 7 alerts Java EE administrators to a backup of requests and hence a degradation in performance. If the requests still exhibit a response time buffer of 25 percent, then users are not affected, and the problem is not too severe. Additionally, when a problem occurs that does affect end users, you need to understand how to derive the root cause. The symptoms of a response-time degradation may include missed SLAs, an aggravated queue depth, CPU spikes, and increased garbage collection pause times. Correlating these apparently disparate metrics yields the following conclusions: because the application cycles objects, garbage collections are extended, which causes CPU spikes that delay processing and cause the queue depth to increase. Identifying the root of the problem requires understanding the interaction between these components. Individually, each metric could send you down a different diagnostic pathway, but combined, they reveal the true nature of the problem. Intelligent alerts must include the ability to define rules that minimize false alarms (otherwise, administrators will either ignore the monitoring system alerts or turn them off). And finally, intelligent alerts must allow for combinational logic across various domains of technology.
Visit our web design programs services for an affordable and reliable webhost to suit all your needs.

300 CHAPTER 11 (Web server) PRODUCTION TROUBLESHOOTING METHODOLOGY Performance

Wednesday, October 3rd, 2007

300 CHAPTER 11 PRODUCTION TROUBLESHOOTING METHODOLOGY Performance Issues in Production When a performance issue occurs in a production application, the costs can be severe, measured both in terms of the resolution costs as well as revenue loss. When an application is unavailable or underperforming, the revenue loss can be quantified in the following three categories: Business-to-consumer applications: Poor performance can lead to site abandonment and a loss of confidence in your organization. Business-to-business applications: Poor performance can lead to a loss of confidence in your technical abilities, loss of contractual revenue through violated SLAs, and in the worst case, the loss of a business partner. Intranet applications: Poor performance can lead to a loss in productivity, as your employees spend more time waiting and less time working. The impact of this revenue loss is in direct proportion to the significance of the performance problem and the resolution time. In addition to external losses, each individual involved in troubleshooting the cause of the performance issue loses productivity. In the case of a long- running problem that is not properly managed and consumes efforts from multiple resources, the loss can be measured by delayed development schedules or changes in the scope of product delivery, which can cause a loss of competitive edge in the marketplace. For example, consider a performance problem that takes developers away from their primary development responsibilities for two weeks. The product management team now has a decision to make: is the product released on time with missing features or does the product release date slip? In the former case, your sales force may lose sales opportunities, because your competitors products have features that you did not have time to implement while you were busy troubleshooting performance issues. In the latter case, slips in release dates may force your prospective customers to buy products from your competition. Regardless of where you experience a loss, the loss is real and quantifiable. Therefore, reviewing the way many corporations handle production performance issues is beneficial. Corporations all too often troubleshoot production problems by assembling a war room containing the leads of all teams. While the intent is to quickly identify the cause of the problem, the result is usually an activity I like to call a finger-pointing face-off. The application architects point to the database administrators, who point to the system administrators, who point back at the architects. In a flurried attempt to absolve themselves of blame, these otherwise talented individuals waste valuable time and resources. Rather than being any individual s character flaw, this behavior is the result of an environment that has been cultivated by a lack of a formal production workflow process. A formal process that everyone knows needs to be in place, so problems are rapidly and accurately triaged to the appropriate party for resolution. For example, if the problem is in the database configuration, then the application architects do not need to be involved, but if the problem is in the application code, then they very well may be involved. Only through a repeatable and proven process can resolutions be rapid and directed, downtime be minimized, and revenues saved. Again the clich An ounce of prevention is worth a pound of cure applies: put the tools in place and build a problem-solving process around them before you have problems to solve.
Searching for affordable and reliable webhost to host and run your web applications? Go to our java web server services and you will be pleased.