Author: Michael Cowan
Updated: October 2024
Audience: System Engineers, Engineering Leadership, etc
Products Applicable: Jama Connect®
Use Case
This article examines Jama Software's internal testing practices and provides insights for Jama Connect self-hosted customers on effectively scaling and testing within their environments. We will outline key considerations and share best practices and troubleshooting tips.
If you are using Jama Connect Cloud (our SaaS-based solution), you are already in an environment where we actively monitor and maintain optimal performance. If you are currently self-hosted and want more information about our SaaS offering, please get in touch with your Customer Success Manager.
Whether new to Jama Connect or a seasoned administrator, the information below will help you maximize its benefits as you scale it across your organization.
Understanding the Scaling of a Self-hosted Jama Connect Instance
It is essential to recognize that Jama Connect® is not a static web application with a few isolated workflows. It provides a wide range of features (Reviews, Baselines, Reports, Comment Stream, Test Management) with a rich set of customization options, giving organizations many options for setting up and using Jama Connect®.
While this flexibility is powerful, how Jama Connect is configured can significantly impact performance, just like the hardware you install it on.
While Jama Connect supports organizations with unique or heavy data/usage requirements, some effort will be required to tune the system so that performance is acceptable for all users.
Jama Connect® Enterprise Performance Recommendations for System Administrators
Multiple areas can impact Jama Connect's performance, and Jama Connect administrators should briefly consider these to see if any stand out.
1. Dataset profile and Size
- Total number of active items in your instance
-
How items are arranged into projects
- Max number of items in a container
- Number of containers
-
How items are related to each other
- Number of relationships per item
- Depth of relationship chains
- How often do users comment on items
- Size and number of file attachments
2. Usage and Workflows
-
How often items are batch-updated and deleted
- Operations on large numbers of items have a performance cost
-
How users search for items
- Do you have targeted project filters or large complex 'All Projects' filters
-
How users manage projects
- Reuse large numbers of items
- Batch import items from other sources
- API Integrations and item sync
-
Usage of Reviews
- Many items reviewed
- Have you published a large number of revisions
-
Usage of Baselines
- Are you creating baselines for many items
- Are you updating the baselines often
-
Usage of the Test Center
- How many Test Cases per Test Plan
- How many Test Groups per Test Plan
- How many Test Runs per Test Case
-
Usage of data exports and reports
- How many items are being exported
- How often are exports and reports run
- What time of the day are they being exported or run
3. Environment
-
Replicated Server (vertical Scale)
- CPU
- Memory
- Disk Space
- Replicated Container Memory Allocation
-
Network Topology
- Firewalls
- Proxies
-
Database Server
- CPU
- Memory
- Replication
- Backup Jobs
Common Performance Issues
Performance issues can be categorized in several different ways. Understanding the type of issue you encounter will help you tune Jama Connect to address it.
General Slow Performance
Your systems are resource-constrained if the application is slow and does not recover. We recommend you collaborate with your IT team to monitor CPU and memory usage on replicated and database servers. It might be as simple as adding more memory or migrating the database to a more powerful server.
Slow Performance During Specific Times of the Day
Scheduled (internal or external) processes may be running, which causes Jama Connect to consume all its available resources.
As mentioned above, monitoring and adjusting the system resources is one solution. Internally, Jama Connect has cleanup jobs scheduled to run later at night (talk to your Customer Success Manager for more information).
Additionally, some customers use API integrations or database tasks (e.g., backups) that may add stress to a system when they run. Consider changing the scheduled execution of these tasks to 'off-hours' when users are not on the system. Otherwise, you should ensure Jama has enough resource headroom to handle these tasks and meet user demand.
Random Complaints of Poor Performance
It can be difficult for a Jama Connect Administrator when their users report that Jama 'feels' slow but only 'sometimes.' We often work with the affected users and collect their common workflows, but we still need to reproduce the issue internally. In these situations, performance issues arise from one or two users executing large and frequently unnecessary operations.
In one instance, a product manager created a project baseline consisting of over 100,000 items daily to track changes. After a brief discussion with the Support team, we demonstrated how to apply a filter based on 'Last Modified Date,' which resolved the performance complaints.
In some cases, the issues have been as simple as one user generating significant reports at 8 a.m. Having that user generate their reports at 5 p.m. resolved the performance complaints. Sometimes, Jama does not have enough headroom to handle these 'performance spike events.' You may need more than periodic large operations if you have provisioned your systems based on a standard user load.
It's important to remember that Jama Connect is a very performant application. Users can log in, read comments, search for items, create new items, and generally do their work without issue. However, some operations can temporarily strain a system.
Examples
- Generating reports that review thousands of items, or items with dozens or hundreds of relationships each.
- Creating a Baseline or Review of thousands of items
- Reindexing a project with 100,000+ items
- Importing, Exporting, Reusing, or Deleting thousands of items at once
- Duplicating a project with 50,000+ items
- Reordering the Tree and causing thousands of items to be updated
- Batch Deletes of thousands of items
How Jama QA Tests Performance
API performance tests are conducted daily and then again on the official release version. The test results are published in each release as a part of the Jama Validation Kit. For information on pricing and availability of the JVK, please contact your Customer Success Manager.
Our performance tests are built and executed using an open-source tool called Gatling.io
Gatling is similar to JMETER and HP LoadRunner. It generated HTTP traffic between a test client and Jama's Applications Web Server. Using Gatling, we eliminate the need for physical web browsers and can simulate hundreds of virtual users simultaneously.
Current: API Test Approach
Our current approach to performance testing in Jama emphasizes stressing the system through our public REST API. This method is widely recognized and especially effective in quickly identifying performance degradation caused by changes to the code or database.
Our process involves scripting each of our REST Endpoints into a separate test and then running them in isolated batches to measure each endpoint without interference. For example, when we test creating items, we log in to a group of virtual users and then schedule them to execute a single REST endpoint. All users will make an item simultaneously and then wait for everyone to complete the test before moving on to the following endpoint.
Because of this rigid isolation and scheduling, we can measure performance for each endpoint with high accuracy. We noticed that different users executed the tests at slightly different rates without the isolation. Over time, this drift would impact the final results. This type of performance baseline aims to use the same tests to compare different versions of Jama and identify performance changes.
We run all of our tests in batches we call steps. For each step (Warmup, Step1, Step2, etc.), we run the same tests; the only difference is how many users execute each endpoint simultaneously (pictured below)
To further ensure accuracy, each step runs the same battery of tests 3 times in a row. Different factors can occasionally impact a performance test and cause anomalies in the results. By running the tests multiple times back to back, we can exclude invalid results caused by network or external issues. We only publish our results when all 3 test runs are within a similar range.
We conduct our performance tests every night using our latest development branch. Although our test cases are stored in Jama, we generate thousands of test results daily, making it logical to upload some of this data to Jama.
Instead, we uploaded the raw results to an Amazon S3 Bucket. Using these results, we can perform our daily triage (looking for regressions or defects) and then push the aggregate data into Jama Analyze for trending reports. We can then safely delete the raw S3 data to save space. Of course, we save the complete data for all release test runs.
When we triage the results in the morning, three (3) things will trigger an investigation into development changes made the day before:
All errors (404, 500, etc.) in the test results get investigated to determine their causes.
Deltas – Any large deltas in the performance numbers (shifts of 10% or more) get investigated to understand why
Threshold—We have a one- and two-second threshold for different APIs. If the results extend beyond a threshold, they get investigated. We expect smaller [GET] calls to be under 1 second, while larger [POST/PATCH] calls can move between 1.5 and 2 seconds under heavy load.
We usually release a new version of Jama every month. After identifying the final version, we conduct all performance tests again and create a comparison graph against the previous release. This chart and any defects are included in the Jama Validation Kit.
Future: Persona/Workflow Test Focus
Using the REST API to measure the impact of code changes on system performance works well. However, these tests cannot answer the common question of how real users are impacted in specific situations. For example, how are Review Center moderators impacted when an admin user generates a significant report in Jama?
Jama QAQC is committed to pursuing an innovative method of defining and measuring ‘simulations’ in which multiple personas execute different workflows simultaneously. This method allows us to measure the performance of each user at each workflow action.
This work is still several months from completion. Mechanically, it’s not very difficult to use Gatling to create a simulation with multiple personas, each performing its workflow.
The first challenge we face is identifying a valid simulation. Customers use Jama in various ways; some focus heavily on the Test Center, while others frequently move items around. As a result, there is more than one-size-fits-all test for Jama. We plan to identify several common scenarios to serve as baselines—such as 8 a.m. on a Monday, end-of-month reporting, and sprint planning—and then collaborate with select customers to share our framework, enabling them to identify and test their own simulations. For instance, consider a typical company logging in at 8 a.m. Dividing users into groups; some are inactive in the application, while others are actively engaged. To effectively test this scenario, we would define the various user personas and workflows, including the number of users logged in and the frequency of their actions.
The second problem arises when we need to report the Simulation's results. The sSimulationinteracts with dozens of REST endpoints in various ways, generating thousands of individual requests. How can we condense all this data into meaningful values that allow us to compare different versions of Jama?
We plan to define a simulation score based on specified performance thresholds for each Workflow (e.g., 2 seconds). When the Workflow is completed within the threshold, the Persona is satisfied; otherwise, the Persona is not satisfied.
Using these states, we can derive a percentage score rating of satisfaction with performance. We can even 'roSimulationore up.
- The Simulation gets an overall score based on all the Personas/Workflows
- Every Persona is scored based on the scores of its Workflows
- Every Workflow is scored based on the time it takes to execute all the requests in the Workflow
Executing Your Performance Tests
Some customers wish to conduct performance tests and validate Jama updates on an internal test system before deploying them to production. This section aims to make those efforts as successful as possible.
You Cannot Test Jama Like a Static Website
Many companies we consulted with have encountered challenges when creating their performance tests. Unlike a traditional website, it's more complex than recording the user interface and replaying a user session. Jama has highly complex workflows for creating and modifying data; our internal DWR web calls sometimes need clarification.
The UI has a lot of logic built into it. Moving an item is a single call, generating multiple other calls. It is challenging to map and parametrize all these different calls. Internally, we use the REST API to create comparative metrics across versions. It is the most stable and maintainable solution we have found.
Our APIs leverage the same server code as the UI, and changes to Jama that affect performance will generally appear in the API. The UI is very 'chatty' as it tries to keep updated as the user works. Most of these extra UI calls hit our cache server and account for minimal server load (milliseconds).
Finally, metrics used to measure static websites do not work well with some rich web applications (like Jama). Requests/Pages/Errors per Second are standard measurements. Still, Jama continually generates hundreds of small requests as the user works, but only a small number of severe web requests will be impacted by load/stress. Actual data is lost in the noise of all the other calls.
The solution is to identify key Jama operations (Item creation, searching, loading projects, etc.), isolate them from all the noise, and measure the time it takes for those calls to complete. How you set up your data and load/stress depends on what you want to test.
Creating a Test Plan: Ad-hoc vs. Baseline Testing
Before writing code to script calls to Jama, you must understand what you will measure. Our customers perform two common types of performance testing.
Ad-hoc
Ad-hoc testing involves performing random or loosely organized tests. In the context of performance testing, this means executing various calls or scenarios concurrently and observing the system's behavior. Generally, a threshold value is set—such as 2 seconds—and multiple operations are performed in different sequences. The tests are successful if all calls are completed within the 2-second limit.
Development teams often use this type of testing to try many combinations to see which ones fail or have issues. Customers can use this type of testing to build confidence that a system will handle different types of load.
The downside to this type of testing is that the results often need to be more repeatable. Because the tests are not ridged and isolated, running the same tests over and over will frequently give different results, such as slight delays or jobs on the system changing the timing. This is why this method requires setting a threshold for failure. If the same operation takes .5 seconds the first time and 1.9 seconds the second time, it is still considered a pass.
Baseline
Baseline testing is a much more rigid and structured way to run a performance test. This type of testing aims to remove the noise and drift of an extended test by isolating the calls and waiting for all the users to complete the request before moving on to the next one.
This type of testing benefits you by revealing clear deltas in performance between different systems. It is required if you want to know the impact of upgrading to a new Jama version or adding more memory to the server. You run your tests, reset the database, upgrade the system, and rerun the tests. You should get a clear view of the impact on performance.
The downside to this type of testing is that it needs to show how users interact with the system realistically. Thirty users do not press the ‘Save’ button simultaneously, and it does not call out interactions between different types of operations.
Essential Considerations for Baseline Testing
If you expect the same results running the tests multiple times in a row
- You need to reset the data between test runs
- It would be best if you ran the same tests in the same order
If you expect to compare the results of two different Jama Versions:
- Both tests need to run on the same hardware/configuration
- Both Jama instances must use the same data (projects, items, etc.)
If you need to change or reorder the test scripts
- It would be best if you rerun the tests on both systems again.
How Many Users Should I Test Within a Five-Minute Window?
It’s essential to pick the correct number of users to simulate with your tests. Too few and you will not stress the system, and too many will put it in an unrealistic state. If you think about how users access Jama, you realize very few actions happen simultaneously. Most of the time, users read or type in Jama, but the server is only accessed when the user saves or loads a new item.
At Jama, we use the ‘5-minute window’ estimation. We look at the total number of licenses and decide how many users will perform actions within 5 minutes of each other. Then, we take that number and choose how many users will act in the exact second. For example, when we simulate 8 a.m. on a weekday, we have 1000 licenses and decide that around 250 users perform actions in the same 5-minute period.
- Users will periodically perform a workflow where they may click something in Jama Connect every 5 seconds. Multiple users can run these types of workflows simultaneously.
- 300 seconds (5 mins) / 5 seconds = 60 actionable requests per user
- 60 requests per user * 250 users = 15,000 total requests in 300 seconds (5 minutes)
- On average, 50 of these 60 requests will be read/get requests, and 10 will be written/update requests.
Errors Invalidate Performance Results Comparisons
When executing performance tests, it’s essential to ensure that your test runs do not have any errors. Errors invalidate the results because they change the dynamic of the tests. Imagine a chain of requests (Create Item, Edit Item, Delete Item). When a request fails, all the chained requests fail as well. Different runs with different errors will have different results that are not comparable.
Performance results are generally based on averages of the time multiple requests take to respond. When Jama encounters an error, the results are affected. A request that takes 5000ms might suddenly return in 100ms, or the reverse may happen. Even if you filter out the errors, the errors and skipped chained requests fundamentally affect the dynamic of the test. Requests align differently and impact each other in different ways.
It is valid to conduct a test to gradually increase usage until Jama encounters errors; however, any results obtained after errors occur should not be compared to others, as they will not align.
HTTP Status 409 - Conflict Errors
A 409 error occurs when the REST API tries to modify an item currently being updated. In Jama, when you add, delete, or move an item, a post-save process updates the GlobalSortOrder for items in the tree. Jama uses this value to maintain the hierarchy of items among their siblings.
The Jama UI prevents this error by locking the Item and allowing only one (1) user to modify it at a time. With the REST API, developers need to handle this error with a short delay and a retry.
When writing tests against the API, you need to eliminate 409 errors. The most common cause of these errors is multiple users creating items in the same folder. Under load, you run into timing issues where numerous items are created simultaneously and try to update each other. One Item wins, and the other gets a 409.
As we discussed above, any errors (even with a retry) will change the dynamic of the test. So, the best practice is to structure the test scripts so that each user creates a unique folder for their tests.
At Jama, we put all test artifacts into a single component. Users create their components and structure (Set, Folder, and then items). We also use a date-time stamp to tell which artifacts from different test runs are apart.
References
- Jama Connect® Enterprise Performance Recommendations for System Administrators
- Jama Connect Validation Kit
- Gatling
- Success Programs
- Success Catalog
- Datasheets
- Request a Solution Offering or Training from the Success Catalog
Please feel free to leave feedback in the comments below.
Related to
Comments
0 comments
Please sign in to leave a comment.