Raiders of the Data Dictionary – Part II

An article by Lothar Flatz, Senior Principal Consultant, Diso AG

The Curse of the Buffer Cache

In his rules, designed to define what is required from a database management system, Edgar Codd states in rule number four: “The system must support an online, inline, relational catalog that is accessible to authorized users by means of their regular query language. That is, users must be able to access the database’s structure (catalog) using the same query language that they use to access the database’s data.” (“Is Your DBMS Really Relational?”, ComputerWorld, 14 October 1985)

I have used the exhaustive knowledge that the database holds about its own structure in many ways during my ten years with Oracle Consulting. Sometimes when you study the structures of the dictionary, chance does lead you to a nice discovery, as in this case, when I found that you can draw sort of a map of an application very quickly by querying v$segment_statistics. When will you go ahead and uncover the treasure in your own database?

A Map of Database Activity

On various occasions I have used the query in Figure 1 to understand an application that is new to me. The result is sorted by logical reads percentages, and that means it is sorted by activity.

It is self-evident that the segments showing the highest activity in logical reads are also the ones that have the focus of the application. Therefore, when looking at the top activity segments, we also see the segments that are the most relevant. Normally you would expect a linear correlation between logical and physical reads. Indeed, the old rule that you should have a hit rate of > 90% would suggest something like “physical read” <= “logical read”*0.1. When I did the query for the first time I was curious to know if reality would be in line with expectation. Since then I have learned that in most real-life cases, logical and physical read percentages do not go together. It is quite often very instructive to explore their distribution, which will reveal weaknesses, most of the time in physical design. With training, you can almost guess by the distribution percentages what the weakness will be.

Fig. 1: Statement to query segment activity

Fig. 2: Segment activity for walk-in clinic

Let’s have a look at a real-world example from a walk-in clinic. We can see that the activity of the patient segment is dominant (Fig. 2). We can see imbalances in various areas. First, do we believe that 72% of the overall activity is focused on patient? Granted patient is an important table in the context. But is it that dominant? To some extent this rationale also applies to treatment. How about patient history? Why is this segment generating 60% of the overall I/O activity? And treatment decision? Only 1% of all logical reads are generating 17% of physical I/O.
What is wrong with these tables? Graphically the imbalances become even more obvious (Fig. 3).

Fig. 3: Graphical comparison between physical and logical reads

Segment Statistics for the PATIENT Segment

So how would we investigate the patient issue? Well, my assumption was that the number of logical reads on patient is too high—probably caused by some inefficiency. Why not use a straightforward query for high buffer gets?
We can query for statements with high buffer gets that refer to patient. Unless statistics_level is set to all, we cannot query directly for high buffer gets on patient. However the indirect approach proves to be working well enough. For patient, treatment, and treatment_decision, we discover missing indexes as the root cause.

Fig. 4: Statement to query segment activity

Segment Statistics for the PATIENT_HISTORY Segment

Patient history is a classic under the performance issues, and it is worth some discussion. I previously researched a sanatorium’s
database. As you might expect they have long-term patients. Patients were coming in for a year, several weeks, or a month of treatment. Obviously, history was checked quite often. Since patients were coming over a number of years, the data blocks containing the data of one specific patient were scattered all over the segment.
It is impossible to cache this segment, since any prediction based on a hit ratio is very difficult. Therefore reading patient history generated a high number of physical reads causing a wait time up to ten minutes. My research revealed that we had an average of 77 rows per patient and the maximum was 10,400. Under this circumstance you have only one chance. You need to cluster the data. You can do this using an IOT, an Index Cluster, or a hash cluster. Hash partitioning by patient ID helps somewhat. In the case of the sanatorium’s database, it reduced the physical I/O by 50%, which was still an unsatisfying result. But clustering by patient ID using an IOT resulted in dramatically lower I/O and better response times.
Thus, when I saw the segment name “patient history” I blurted out that I knew the issue already. Whenever you think you are rather clever, fate will teach you humbleness. Fortunately I did check the numbers before I proposed a solution. It turned out that we had 2.13 rows per patient ID on average in the case of the walk-in clinic, which are residing in 1.4 data blocks. Actually there is no need for clustering. What is generating so many I/Os?

Hit Rate Fever

We can discover the actual cause of the I/O bottleneck by looking into the buffer cache. When we sort the segments by logical I/O percentage, patient history comes out as number three.
When we sort the segments by the percentage in which they are cached, patient history is number 70!
Let’s back up here. Segments are kept in the buffer cache based on hit rate (with some refinements that we skip here), which is basically the number of logical reads per data block. As we know, the number of logical reads for the segment is high; the only possible explanation is that this activity is spread across a high number of blocks compared to other segments that go before
patient history in the cache. Basically, that implies that patient history is pushed out of the cache by smaller segments that show less per-segment activity but more per-block activity.

“In most real-life cases,
logical and physical read percentages do not go together.”

Buffer Cache Contamination

To be complete in buffer cache diseases we need to mention buffer cache contamination. In principle this is a process where your data blocks contain less and less useful content. To maintain the same amount of physical I/O the buffer cache needs to be increased. The most obvious example is time-dependent data.
Let’s assume we are talking about a segment where the rows will carry a timestamp, like the effective date in PeopleSoft.
As a consequence any update will lead to a new row being created, and the effective date is set to the current date and time. The previously current row is now automatically outdated. We can safely assume that most of the time the underlying application will deal with current data. Thus, the data block that contains the row no longer current became a little less useful, and the number of rows increased. That way the number of current rows is automatically spread over an increasing number of data blocks, causing caching to become a little more inefficient every time that happens.
The only effective countermeasure seems to be partitioning based on the time stamp, forcing the current rows into concentration in a current partition.

Conclusion

There are many reasons for the buffer cache not to function as well as it could. In many cases, improving the physical design (indexing, clustering, partitioning) does help. When we run out of these options, the buffer cache needs to be increased.

Diso AG – The Swiss Data and Cloud Expert

Diso AG is a renowned IT service provider and long-standing Oracle distribution partner with headquarters in Switzerland focusing on database and cloud solutions. For instance, Diso offers its customers Oracle’s “Platform as a Service” solution and the associated data migration. Moreover, Diso’s customers benefit from the complete solution offering featuring planning, integration, support including the operation and the monitoring of IT infrastructures and database systems.

In the field of software engineering, Diso develops tailor-made IT and software solutions for company-specific applications – whenever appropriate with a mobile-first approach. Last but not least, Diso is the expert when it comes to software-based performance optimization. For many years, numerous prestigious customers of key industries such as banking, insurance, retail and public administration have put their confidence in the competence of this mid-sized established IT service provider.

Diso AG designs adaptable IT systems, develops tailor-made software, and allows the efficient use and analysis of data and information.

Lothar Flatz started working with Oracle Database in 1989, in the days of Version 5. He worked for Oracle Corporation for fifteen years and was a member of the Real-World Performance Group for two years. He is a member of the Oak Table network of Oracle scientists and specializes in performance tuning.
At Diso AG he is Senior Principal Consultant for database optimization and specializes in difficult Oracle challenges. He supports Diso customers in identifying and solving large-scale performance problems. Lothar Flatz is also author of numerous technical articles and is often invited as speaker on Oracle User conferences as well as other international events.