Table of contents
IntroductionRequired information to gather
Example 1: an ORA-4031 on the shared pool with a DELETE statement
Investigating the error message
Analyzing the alert file
Analyzing the trace file
Analyzing the incident trace file
Conclusions
Example 2: an ORA-4031 on the Java pool
Investigating the error message
Analyzing the alert file
Analyzing the trace file
Analyzing the V$SGA_RESIZE_OPS output
Conclusions
Introduction
This blog will discuss one of the more difficult and intriguing issues in an Oracle environment: ORA-4031 errors. ORA-4031 errors belong to the family of errors dealing with memory management of the Oracle instance. In particular these errors deal with SGA memory requests.
An instance is comprised of multiple processes and shared memory (called the SGA) managing access to database files. The SGA is comprised of various pieces of memory, each with a specific function. There are various memory areas used for transaction management, such as the shared, large, Streams and Java pools which are used to share transaction information (such as SQL statements for efficient RDBMS operation), a memory area called the buffer cache for caching the database files contents, an area called the log buffer where logs of user transaction activity is maintained, etc..
In Oracle12c the SGA consists of various pieces of memory, among which:
- The fixed SGA structures: this is the internal housekeeping area of the instance. It’s size is fixed and cannot be altered
- The buffer cache: the buffer cache is used to minimize I/O to data files by keeping frequently accessed datafile blocks in memory. The associated instance parameters are DB_CACHE_SIZE (for ASMM and AMM environments) and DB_BLOCK_BUFFERS (for manual SGA management)
- The shared pool: the shared pool is used to contain all data (dictionary cache, shared SQL statements, etc.) shared between the instance processes. The associated instance parameter is SHARED_POOL_SIZE
- The large pool: the large pool is an optional area to be used for memory allocations which are not suitable to go to the shared pool because of their size. The large pool is defined with the LARGE_POOL_SIZE instance parameter
- The Streams pool: the Streams pool is an optional area used for Oracle Streams operations. The size of the Streams pool is identified with STREAMS_POOL_SIZE
- The Java pool: the Java pool is an area in which all Java code and associated session data is stored when running Java programs in the database JVM. The initial Java pool size can be defined with the JAVA_POOL_SIZE instance parameter
- The log buffer: the redo log buffer contains the redo entries which describe all changes to the database. The log buffer size is identifed with LOG_BUFFER
- The server-side result cache: the result cache contains cached query results which can be provided to transactions in case the same data is retrieved more than once, preventing expensive operations such as aggregation to be done more than once. The server-side result cache is governed with the RESULT_CACHE_MAX_SIZE instance parameter
The Streams pool is used by Datapump processing. When explicitly setting MEMORY_TARGET, MEMORY_MAX_TARGET, SGA_TARGET and STREAMS_POOL_SIZE to 0 a piece of memory with the size of 10% of the shared pool is taken from the buffer cache and assigned to the Streams pool, to guarantee that basic Streams functionality is available. For example, consider the following setup:
- DB_CACHE_SIZE set to 100MB
- SHARED_POOL_SIZE set to 80MB
- MEMORY_TARGET, MEMORY_MAX_TARGET, SGA_TARGET, and STREAMS_POOL_SIZE are all set to 0
This results in the following instance configuration:
- Buffer cache of 92MB
- Shared pool of 80MB
- Streams pool of 8MB
When the result cache has been enabled, but no RESULT_CACHE_MAX_SIZE has been specified, then depending on whether AMM or ASMM is used, a certain portion of SGA memory is reserved for the result cache. When MEMORY_TARGET is set, 0.25% of the MEMORY_TARGET is reserved for this cache. Instead, when using ASMM, then 0.5% of SGA_TARGET is reserved for the result cache. And with manual memory management, 1% of SHARED_POOL_SIZE is used for this cache.
Next to these memory structures the buffer caches for non-standard block sizes (indicated by the DB_xK_CACHE_SIZE parameters) are also contained in the SGA. Other parameters (like SHARED_POOL_RESERVED_SIZE) have no direct impact on the SGA in general, but only on the management of specific pools (such as the shared pool in this case).
The SGA_MAX_SIZE instance parameter defines the maximum amount of memory reserved for SGA use. Within this memory, SGA_TARGET is used for the actual SGA size. SGA_TARGET defines the memory area in which all aforementioned pools are allocated. The SGA can grow using an ALTER SYSTEM command up to the SGA_MAX_SIZE limit when needed.
ORA-4031 errors only occur on SGA memory areas such as the shared, large, Streams and Java pools. Other memory areas (such as the buffer cache and log buffer) will never result in any such errors being reported.
The ORA-4031 error message contains valuable clues on how to analyze and potentially resolve the issue at hand. The first clue is in which pool the error occurs. Immediately we can make a distinction between 2 types of errors: the ones against the large, Streams or Java pools and the ones against the shared pool.
For large, Streams and Java pools the resolution most of the time is to enlarge the offending pool by setting its instance configuration parameter (LARGE_POOL_SIZE, STREAMS_POOL_SIZE or JAVA_POOL_SIZE) to a larger value. Of course also analysis on transaction behavior should be performed to see if that pool really needs to be that large.
Errors reported on the shared pool form a different story, mainly because of the function of the shared pool. The shared pool is used to cache and share so-called cursors. Cursors are small memory areas where SQL statements are stored along with their execution context and runtime statistics. Cursor management takes place in a shared pool area called the library cache. And this cursor management is frequently a cause for ORA-4031 errors.
Required information to gather
When trying to analyze and resolve ORA-4031 issues some data should be gathered first:
- the exact error message being reported
- the instance alert file. The alert file contains a chronological record of all interesting events (informational or warning messages or errors) which occurred during the lifetime of the instance. For ORA-4031 analysis it is important to see with what non-default instance parameters the instance had been started, whether any instance reconfiguration took place, ad well as whether any errors have been reported. ORA-4031 analysis should start with analysis of the instance setup at the first ORA-4031 error occurrence since the last successful instance startup, and with the trace information of that very first error occurrence
- the trace files contain the details on the ORA-4031 occurrence. Note that with the introduction of the Uniform Tracing Service as part of the Oracle11g Diagnosibility Framework also incident trace files are generated. Next to the trace file name also the incident trace file name will be reported in the alert file. These files are the most important ones to analyze, as they contain detailed information on what the process was doing at the time the ORA-4031 occurred
- the RDA report (obtainable by running the RDA utility as outlined in My Oracle Support Document 314422.1) provides insight into the OS environment and kernel parameters, as well as the Oracle setup (both Oracle instance parameters and various SGA and database related information). The RDA report provides insight in whether the various pools have been configured and whether ASMM or AMM has been enabled or not. The database information contains both SGA statistics as well as reports on any invalid database objects which might cause unnecessary parsing to occur
- the StatsPack and AWR reports show what actions the instance underwent around the time the ORA-4031 occurred. The reports might show problems (such as latch contention as a result of improper library cache use) occurring which in the end result in ORA-4031 errors
- finally, the output of the V$SGA_RESIZE_OPS dynamic performance table (and as of Oracle11g DBA_HIST_MEMORY_RESIZE_OPS in case the instance has been restarted after the ORA-4031 occurrence) should be examined to see what happened in case ASMM or AMM has been enabled. The RDA report should normally contain this information as well. However, the information in the RDA report is inappropriate if the instance has been restarted between the ORA-4031 occurrence and the generation of the RDA report
Out of the gathered data the following information can be very useful for analysis of the issue:
- the active instance and session parameters at the time of the error (alert file and incident trace file)
- the exact error message being reported and what traces contain detailed information on the error occurrence (alert file)
- a description of what memory allocation request failed (trace file). This includes what request size was made and in what subpool this occurred
- a process state dump with details on what type of client was used ad well as a history of wait events the process recently waited for (trace file). In case the last wait event shows 'SGA forcing component allocation growth' then we apparently deal with Automatic Shared Memory Management (ASMM) or Automatic Memory Management (AMM). Any other wait event can indicate a suboptimal use of the offending pool or a pool which is configured too small
- information on the current memory allocation of subpools (trace file)
- any dynamic SGA memory resizing information (V$SGA_RESIZE_OPS output and incident trace file)
- information on cached cursors and cursor sharing when dealing with shared pool errors (trace file and incident trace file)
- information on library cache statistics (trace file)
As can be seen, especially the trace and incident trace files contain vital pieces of information on the ORA-4031 occurrence. Analysis of the ORA-4031 error will be demonstrated by means of 2 examples. The pieces of information described above will be highlighted in a bit more detail in these examples.
Example 1: an ORA-4031 on the shared pool with a DELETE statement
The first example deals with a DELETE statement failing with an ORA-4031 error. The above information is retrieved from the instance and being analyzed.
Investigating the error message
The error message reported in this example reads:
ORA-04031: 4120 Byte des Shared Memorys konnten nicht zugewiesen werden ("shared pool","DELETE
FROM BstAbgl_T ...","SQLA^3f53ef79","xplSetRws:planLines_ctxdef")
FROM BstAbgl_T ...","SQLA^3f53ef79","xplSetRws:planLines_ctxdef")
This error can either be raised in some application trace or directly in the screen of the user connected to the database. From the error we can find that the error is raised when trying to allocate an additional 4120 bytes in the shared pool for the execution of a DELETE statement. The error is raised when trying to allocate this additional 4120 bytes in the SQL Area.
Analyzing the alert file
The alert file shows the following information:
. . .
Mon Nov 05 13:22:14 2012
Starting ORACLE instance (normal)
. . .
Starting up:
Oracle Database 11g Release 11.2.0.3.0 - 64bit Production.
. . .
System parameters with non-default values:
. . .
memory_target = 7632M
memory_max_target = 7632M
compatible = "11.2.0.0.0"
job_queue_processes = 1000
cursor_sharing = "FORCE"
optimizer_mode = "RULE"
. . .
Mon Nov 05 13:22:24 2012
PMON started with pid=2, OS id=30338
. . .
Sat Nov 10 15:10:56 2012
Errors in file /oracle/diag/rdbms/XXXXXX/XXXXXX/trace/XXXXXX_ora_22303.trc (incident=51379):
ORA-04031: 4120 Byte des Shared Memorys konnten nicht zugewiesen werden ("shared pool","DELETE
FROM BstAbgl_T
...","SQLA^3f53ef79","xplSetRws:planLines_ctxdef")
Incident details in: /oracle/diag/rdbms/XXXXXX/XXXXXX/incident/incdir_50627/XXXXXX_cjq0_30413_i50627.trc
Incident details in: /oracle/diag/rdbms/XXXXXX/XXXXXX/incident/incdir_51379/XXXXXX_ora_22303_i51379.trc
. . .
Mon Nov 05 13:22:14 2012
Starting ORACLE instance (normal)
. . .
Starting up:
Oracle Database 11g Release 11.2.0.3.0 - 64bit Production.
. . .
System parameters with non-default values:
. . .
memory_target = 7632M
memory_max_target = 7632M
compatible = "11.2.0.0.0"
job_queue_processes = 1000
cursor_sharing = "FORCE"
optimizer_mode = "RULE"
. . .
Mon Nov 05 13:22:24 2012
PMON started with pid=2, OS id=30338
. . .
Sat Nov 10 15:10:56 2012
Errors in file /oracle/diag/rdbms/XXXXXX/XXXXXX/trace/XXXXXX_ora_22303.trc (incident=51379):
ORA-04031: 4120 Byte des Shared Memorys konnten nicht zugewiesen werden ("shared pool","DELETE
FROM BstAbgl_T
...","SQLA^3f53ef79","xplSetRws:planLines_ctxdef")
Incident details in: /oracle/diag/rdbms/XXXXXX/XXXXXX/incident/incdir_50627/XXXXXX_cjq0_30413_i50627.trc
Incident details in: /oracle/diag/rdbms/XXXXXX/XXXXXX/incident/incdir_51379/XXXXXX_ora_22303_i51379.trc
. . .
The alert file shows some non-default parameters which might influence shared pool behavior. Immediately there is cause for alert: 2 parameters (CURSOR_SHARING and OPTIMIZER_MODE) have possibly destructive influence on the shared pool use. JOB_QUEUE_PROCESSES might also be a cause for alert in case the # of concurrent jobs put a stress on the shared and/or large pools.
The alert file also shows the exact error message as well as the trace and incident trace files generated for this incident.
Analyzing the trace file
The trace file XXXXXX_ora_22303.trc shows:
. . .
*** 2012-11-10 15:10:55.756
AUTO MEM: PGA get fail 144 for 40, 133, 1, 16777216, 7331643392, 671088640, 40, 8002732032, 6
. . .
Allocation request for: xplSetRws:planLines_ctxdef
Heap: 0x1e188bca0, size: 4120
******************************************************
HEAP DUMP heap name="sga heap" desc=0x60001190
extent sz=0x9800 alt=248 het=32767 rec=9 flg=-126 opc=0
parent=(nil) owner=(nil) nex=(nil) xsz=0x1 heap=(nil)
fl2=0x60, nex=(nil)
ds for latch 1: 0x60057a60 0x600592b8 0x6005ab10 0x6005c368
reserved granule count 0 (granule size 16777216)
******************************************************
HEAP DUMP heap name="SQLA^3f53ef79" desc=0x1e188bca0
extent sz=0xfe8 alt=32767 het=368 rec=0 flg=2 opc=2
parent=0x60001190 owner=0x1e188bb60 nex=(nil) xsz=0x1000000 heap=(nil)
fl2=0x27, nex=(nil), dsxvers=1, dsxflg=0x0
dsx first ext=0xc6704c0
Subheap has 1021408936 bytes of memory allocated
. . .
=========================
User Session State Object
=========================
----------------------------------------
SO: 0x23c9eccd0, type: 4, owner: 0x23c843c20, flag: INIT/-/-/0x00 if: 0x3 c: 0x3
proc=0x23c843c20, name=session, file=ksu.h LINE:12624, pg=0
(session) sid: 49 ser: 32739 trans: (nil), creator: 0x23c843c20
. . .
oct: 7, prv: 0, sql: 0x13a9de8a0, psql: 0x13a9de8a0, user: 81/XXXXXX
. . .
client details:
O/S info: user: sebastian.krieg, term: unknown, ospid:
machine: STUTE-LP-0167 program: JDBC Thin Client
application name: JDBC Thin Client, hash value=2546894660
. . .
Session Wait History:
elapsed time of 0.124456 sec since last wait
0: waited for 'SGA: allocation forcing component growth'
. . .
1: waited for 'SGA: allocation forcing component growth'
. . .
Current Instatiation Object
===========================
----- Dump Cursor sql_id=g6n7vrnzp7vvt xsc=0x7fa7a37e5c68 cur=0x7fa7a3a61bb8 -----
LibraryHandle: Address=0x13a9de8a0 Hash=3f53ef79 LockMode=N PinMode=0 LoadLockMode=0 Status=VALD
ObjectName: Name=DELETE
FROM BstAbgl_T
WHERE :"SYS_B_0000" = :"SYS_B_0001"
AND (BstAbglId,Tsn) IN ((:"SYS_B_0002",:"SYS_B_0003"),(:"SYS_B_0004",:"SYS_B_0005"),. . .
FullHashValue=0c67a4e48579e7b7f350fbbd3f53ef79 Namespace=SQL AREA(00) Type=CURSOR(00) Identifier=1062465401 OwnerIdn=81
. . .
ChildTable: size='16'
Child: id='0' Table=0x1e188c488 Reference=0x1e188bed0 Handle=0x18c6f73b0
. . .
Starting SQL statement dump
SQL Information
user_id=81 user_name=XXXXXX module=JDBC Thin Client action=
sql_id=g6n7vrnzp7vvt plan_hash_value=0 problem_type=0
----- Current SQL Statement for this session (sql_id=g6n7vrnzp7vvt) -----
DELETE
FROM BstAbgl_T
WHERE :"SYS_B_0000" = :"SYS_B_0001"
AND (BstAbglId,Tsn) IN ((:"SYS_B_0002",:"SYS_B_0003"),
. . .
:"SYS_B_2004"))
)
. . .
----- Call Stack Trace -----
calling call entry
location type point
-------------------- -------- --------------------
. . .
ksm_4031_dump()+101 call ksedst()
ksmasg()+283 call ksm_4031_dump()
kghnospc()+1852 call ksmasg()
kghalf()+964 call kghnospc()
kghalp()+94 call kghalf()
kksalx()+54 call kghalp()
kksalx()+54 call kghalp()
kghsupmm()+339 call kksalx()
kghssgai()+1081 call kghsupmm()
kksmcAlFree()+286 call kghssgai()
xplSetRws()+293 call kksmcAlFree()
xplCompact()+637 call xplSetRws()
. . .
==============================
Memory Utilization of Subpool 1
================================
Allocation Name Size
___________________________ ____________
"free memory " 5877098104
. . .
"SQLA " 1028201968
. . .
LIBRARY CACHE STATISTICS:
namespace gets hit ratio pins hit ratio reloads invalids
-------------- --------- --------- --------- --------- ---------- ----------
SQL AREA 82376518 0.770 362091964 0.997 41936 3861
. . .
*** 2012-11-10 15:10:55.756
AUTO MEM: PGA get fail 144 for 40, 133, 1, 16777216, 7331643392, 671088640, 40, 8002732032, 6
. . .
Allocation request for: xplSetRws:planLines_ctxdef
Heap: 0x1e188bca0, size: 4120
******************************************************
HEAP DUMP heap name="sga heap" desc=0x60001190
extent sz=0x9800 alt=248 het=32767 rec=9 flg=-126 opc=0
parent=(nil) owner=(nil) nex=(nil) xsz=0x1 heap=(nil)
fl2=0x60, nex=(nil)
ds for latch 1: 0x60057a60 0x600592b8 0x6005ab10 0x6005c368
reserved granule count 0 (granule size 16777216)
******************************************************
HEAP DUMP heap name="SQLA^3f53ef79" desc=0x1e188bca0
extent sz=0xfe8 alt=32767 het=368 rec=0 flg=2 opc=2
parent=0x60001190 owner=0x1e188bb60 nex=(nil) xsz=0x1000000 heap=(nil)
fl2=0x27, nex=(nil), dsxvers=1, dsxflg=0x0
dsx first ext=0xc6704c0
Subheap has 1021408936 bytes of memory allocated
. . .
=========================
User Session State Object
=========================
----------------------------------------
SO: 0x23c9eccd0, type: 4, owner: 0x23c843c20, flag: INIT/-/-/0x00 if: 0x3 c: 0x3
proc=0x23c843c20, name=session, file=ksu.h LINE:12624, pg=0
(session) sid: 49 ser: 32739 trans: (nil), creator: 0x23c843c20
. . .
oct: 7, prv: 0, sql: 0x13a9de8a0, psql: 0x13a9de8a0, user: 81/XXXXXX
. . .
client details:
O/S info: user: sebastian.krieg, term: unknown, ospid:
machine: STUTE-LP-0167 program: JDBC Thin Client
application name: JDBC Thin Client, hash value=2546894660
. . .
Session Wait History:
elapsed time of 0.124456 sec since last wait
0: waited for 'SGA: allocation forcing component growth'
. . .
1: waited for 'SGA: allocation forcing component growth'
. . .
Current Instatiation Object
===========================
----- Dump Cursor sql_id=g6n7vrnzp7vvt xsc=0x7fa7a37e5c68 cur=0x7fa7a3a61bb8 -----
LibraryHandle: Address=0x13a9de8a0 Hash=3f53ef79 LockMode=N PinMode=0 LoadLockMode=0 Status=VALD
ObjectName: Name=DELETE
FROM BstAbgl_T
WHERE :"SYS_B_0000" = :"SYS_B_0001"
AND (BstAbglId,Tsn) IN ((:"SYS_B_0002",:"SYS_B_0003"),(:"SYS_B_0004",:"SYS_B_0005"),. . .
FullHashValue=0c67a4e48579e7b7f350fbbd3f53ef79 Namespace=SQL AREA(00) Type=CURSOR(00) Identifier=1062465401 OwnerIdn=81
. . .
ChildTable: size='16'
Child: id='0' Table=0x1e188c488 Reference=0x1e188bed0 Handle=0x18c6f73b0
. . .
Starting SQL statement dump
SQL Information
user_id=81 user_name=XXXXXX module=JDBC Thin Client action=
sql_id=g6n7vrnzp7vvt plan_hash_value=0 problem_type=0
----- Current SQL Statement for this session (sql_id=g6n7vrnzp7vvt) -----
DELETE
FROM BstAbgl_T
WHERE :"SYS_B_0000" = :"SYS_B_0001"
AND (BstAbglId,Tsn) IN ((:"SYS_B_0002",:"SYS_B_0003"),
. . .
:"SYS_B_2004"))
)
. . .
----- Call Stack Trace -----
calling call entry
location type point
-------------------- -------- --------------------
. . .
ksm_4031_dump()+101 call ksedst()
ksmasg()+283 call ksm_4031_dump()
kghnospc()+1852 call ksmasg()
kghalf()+964 call kghnospc()
kghalp()+94 call kghalf()
kksalx()+54 call kghalp()
kksalx()+54 call kghalp()
kghsupmm()+339 call kksalx()
kghssgai()+1081 call kghsupmm()
kksmcAlFree()+286 call kghssgai()
xplSetRws()+293 call kksmcAlFree()
xplCompact()+637 call xplSetRws()
. . .
==============================
Memory Utilization of Subpool 1
================================
Allocation Name Size
___________________________ ____________
"free memory " 5877098104
. . .
"SQLA " 1028201968
. . .
LIBRARY CACHE STATISTICS:
namespace gets hit ratio pins hit ratio reloads invalids
-------------- --------- --------- --------- --------- ---------- ----------
SQL AREA 82376518 0.770 362091964 0.997 41936 3861
. . .
The trace file starts with some messages related to PGA management. This indicates Automatic Memory Management (AMM) is being used (which can be verified from the non-default parameters in the alert file).
Next in the trace file are the dumps of the top level heap and the heap in which the error was raised>
Then follows the information on the user session: the SQL handle of the current SQL statement, the details on the client process, as well as the wait events. Here it is visible that the shared pool was trying to grow at the time of the failure.
The SQL statement dump of the current cursor (use e.g. the SQL handle of the previous information to get here in the trace file, or go to the incident trace file as listed further on) contains details on the execution of the failing statement. This dump shows that there are only 16 child cursors. As such the # of child cursors or inability to share the SQL statement doesnh't seem to be a problem.
When skipping all "irrelevant" routines on the stack, the trace file shows that the real memory allocation takes place in the xplSetRws() routine.
The subpool statistics show that only 1 subpool is used and that there is a lot of free memory available in the subpool. Furthermore, the largest allocations are in the SQL Area.
The library cache statistics show a relatively "OK" hit ratio (could be higher, but is not extremely bad). The # of reloads and invalidations are small as compared to the # of gets.
Analyzing the incident trace file
The incident trace file XXXXXX_ora_22303_i51379.trc shows:
. . .
----- Session Cursor Dump -----
Current cursor: 2, pgadep=0
. . .
Cursor#2(0x7fa7a3a61bb8) state=SYNTAX curiob=0x7fa7a37e5c68
curflg=4c fl2=0 par=(nil) ses=0x23c9eccd0
----- Dump Cursor sql_id=g6n7vrnzp7vvt xsc=0x7fa7a37e5c68 cur=0x7fa7a3a61bb8 -----
LibraryHandle: Address=0x13a9de8a0 Hash=3f53ef79 LockMode=N PinMode=0 LoadLockMode=0 Status=VALD
ObjectName: Name=DELETE
FROM BstAbgl_T
. . .
ChildTable: size='16'
Child: id='0' Table=0x1e188c488 Reference=0x1e188bed0 Handle=0x18c6f73b0
Children:
Child: childNum='0'
. . .
LibraryObject: Address=0x1d74e5d60 HeapMask=0000-0001-0001-0000 Flags=EXS[0000] Flags2=[0000] PublicFlags=[0000]
DataBlocks:
Block: #='0' name=KGLH0^3f53ef79 pins=0 Change=NONE
. . .
Block: #='6' name=SQLA^3f53ef79 pins=0 Change=NONE
. . .
----- Bind Byte Code (IN) -----
Opcode = 13 Bind Twotask Scalar In(not out) Nocopy LiTeral
Offsi = 48, Offsi = 0
Opcode = 13 Bind Twotask Scalar In(not out) Nocopy LiTeral
. . .
----- Auto SGA Transfer History -----
GRANULE SIZE is 16777216
COMPONENT NAME : shared pool, curnum=430, usrsz=0, tgt=430, min=2, st=0, flg=81, r=(nil)
Number of granules in inuse list (listid 2) is 409
Number of granules in partially inuse list (listid 4) is 21
. . .
COMPONENT NAME : SGA Target, curnum=437, usrsz=0, tgt=437, min=0, st=0, flg=0, r=(nil)
COMPONENT NAME : System Memory, curnum=40, usrsz=0, tgt=40, min=0, st=0, flg=0, r=(nil)
Number of granules in inactive list (listid 0) is 40
COMPONENT NAME : DEFAULT buffer cache, curnum=1, usrsz=0, tgt=1, min=1, st=0, flg=81, r=(nil), L2=0
Number of granules in inuse list (listid 2) is 1
COMPONENT NAME : PGA Target, curnum=40, usrsz=0, tgt=40, min=0, st=0, flg=0, r=(nil)
Dumping Work Area Table (level=1)
=====================================
. . .
Mapping for Component id to Component name:
1: shared pool
. . .
7: DEFAULT buffer cache
. . .
Dumping memory transfer history; start=42
42 - Time 498955528: 0x11b842000 to 0x11b85a000 from 7 to 1
43 - Time 498955528: 0x11b85c000 to 0x11b89a000 from 7 to 1
44 - Time 498955528: 0x11b89c000 to 0x11b8e4000 from 7 to 1
45 - Time 498955528: 0x11b8e6000 to 0x11b902000 from 7 to 1
. . .
----- Session Cursor Dump -----
Current cursor: 2, pgadep=0
. . .
Cursor#2(0x7fa7a3a61bb8) state=SYNTAX curiob=0x7fa7a37e5c68
curflg=4c fl2=0 par=(nil) ses=0x23c9eccd0
----- Dump Cursor sql_id=g6n7vrnzp7vvt xsc=0x7fa7a37e5c68 cur=0x7fa7a3a61bb8 -----
LibraryHandle: Address=0x13a9de8a0 Hash=3f53ef79 LockMode=N PinMode=0 LoadLockMode=0 Status=VALD
ObjectName: Name=DELETE
FROM BstAbgl_T
. . .
ChildTable: size='16'
Child: id='0' Table=0x1e188c488 Reference=0x1e188bed0 Handle=0x18c6f73b0
Children:
Child: childNum='0'
. . .
LibraryObject: Address=0x1d74e5d60 HeapMask=0000-0001-0001-0000 Flags=EXS[0000] Flags2=[0000] PublicFlags=[0000]
DataBlocks:
Block: #='0' name=KGLH0^3f53ef79 pins=0 Change=NONE
. . .
Block: #='6' name=SQLA^3f53ef79 pins=0 Change=NONE
. . .
----- Bind Byte Code (IN) -----
Opcode = 13 Bind Twotask Scalar In(not out) Nocopy LiTeral
Offsi = 48, Offsi = 0
Opcode = 13 Bind Twotask Scalar In(not out) Nocopy LiTeral
. . .
----- Auto SGA Transfer History -----
GRANULE SIZE is 16777216
COMPONENT NAME : shared pool, curnum=430, usrsz=0, tgt=430, min=2, st=0, flg=81, r=(nil)
Number of granules in inuse list (listid 2) is 409
Number of granules in partially inuse list (listid 4) is 21
. . .
COMPONENT NAME : SGA Target, curnum=437, usrsz=0, tgt=437, min=0, st=0, flg=0, r=(nil)
COMPONENT NAME : System Memory, curnum=40, usrsz=0, tgt=40, min=0, st=0, flg=0, r=(nil)
Number of granules in inactive list (listid 0) is 40
COMPONENT NAME : DEFAULT buffer cache, curnum=1, usrsz=0, tgt=1, min=1, st=0, flg=81, r=(nil), L2=0
Number of granules in inuse list (listid 2) is 1
COMPONENT NAME : PGA Target, curnum=40, usrsz=0, tgt=40, min=0, st=0, flg=0, r=(nil)
Dumping Work Area Table (level=1)
=====================================
. . .
Mapping for Component id to Component name:
1: shared pool
. . .
7: DEFAULT buffer cache
. . .
Dumping memory transfer history; start=42
42 - Time 498955528: 0x11b842000 to 0x11b85a000 from 7 to 1
43 - Time 498955528: 0x11b85c000 to 0x11b89a000 from 7 to 1
44 - Time 498955528: 0x11b89c000 to 0x11b8e4000 from 7 to 1
45 - Time 498955528: 0x11b8e6000 to 0x11b902000 from 7 to 1
. . .
The incident trace file shows once again the current cursor being executed at the time of the ORA-4031 error. This time, the cursor dump shows also the bind variables being used (as well as again details on the # of child cursors).
The bind variables show “Nocopy LiTeral” as well as system generated bind variable names like :SYS_B_xxxx, which means that they were introduced by the Cost Based Optimizer (CBO) as a result of CURSOR_SHARING=SIMILAR or FORCE. This can be verified by the non-default instance parameters.
The SGA Transfer History shows that the SHARED_POOL_SIZE was not set at instance startup (usrsz=0), but currently has grown to 430 granules (curnum=430).
The SGA Transfer History also reveals that there is a constant flow of granules from the buffer cache to the shared pool.
Conclusions
Conclusions which can be drawn:
- CURSOR_SHARING=FORCE and a DELETE statement with a huge IN-list are being used
- OPTIMIZER_MODE=RULE is used
- The current routine on stack is xplSetRws()
- Small # of child cursors being created
- The session crashes when trying to enlarge the shared pool
A search for "ORA-4031 rule xplSetRws" in My Oracle Support reveals bug # 14269072 as being a potential cause. This bug characterizes itself by a huge IN-list being used together with the RULE hint. So the match seems found for the issue at hand.
The bug has been closed as not being a bug and caused by the behavior of bug # 4601087, where it is stated that this feature won’t be fixed in the Rule Based Optimizer. The solution is to disable the RULE optimizer mode and to switch to using the Cost Based Optimizer.
Example 2: an ORA-4031 on the Java pool
The second example deals a Java call failing with an ORA-4031 error.
Investigating the error message
The error message reported in this example reads:
ORA-04031: unable to allocate 4096 bytes of shared memory ("java pool","unknown object","JOXLE^cb5a18f0",":SGAClass")
This error can either be raised in some application trace or directly in the screen of the user connected to the database. From the error we can find that the error is raised when trying to allocate an additional 4096 bytes in the Java pool for the execution of some server-side Java routine.
Analyzing the alert file
The alert file shows the following information:
. . .
Sat Nov 10 19:00:30 2012
Starting up:
Oracle Database 11g Enterprise Edition Release 11.2.0.2.0 - 64bit Production
With the Partitioning, Real Application Clusters, OLAP, Data Mining
and Real Application Testing options.
Using parameter settings in server-side pfile /u01/app/oracle/product/11.2.0/dbhome_1/dbs/initYYYYYY.ora
System parameters with non-default values:
. . .
sga_target = 2G
sga_max_size = 3G
compatible = "10.2.0"
processes = 5000
sessions = 10016
olap_page_pool_size = 4M
shared_pool_size = 1G
dml_locks = 10000
session_cached_cursors = 1500
job_queue_processes = 10
cursor_sharing = "EXACT"
parallel_min_servers = 0
parallel_max_servers = 16
open_cursors = 5000
olap_page_pool_size = 4M
. . .
Sun Nov 11 13:22:00 2012
Errors in file /u01/app/oracle/diag/rdbms/YYYYYY/YYYYYY/trace/YYYYYY_m000_4303.trc (incident=400673):
ORA-04031: unable to allocate 4096 bytes of shared memory ("java pool","unknown object","JOXLE^cb5a18f0",":SGAClass")
Incident details in: /u01/app/oracle/diag/rdbms/YYYYYY/YYYYYY/incident/incdir_400673/YYYYYY_m000_4303_i400673.trc
. . .
Sat Nov 10 19:00:30 2012
Starting up:
Oracle Database 11g Enterprise Edition Release 11.2.0.2.0 - 64bit Production
With the Partitioning, Real Application Clusters, OLAP, Data Mining
and Real Application Testing options.
Using parameter settings in server-side pfile /u01/app/oracle/product/11.2.0/dbhome_1/dbs/initYYYYYY.ora
System parameters with non-default values:
. . .
sga_target = 2G
sga_max_size = 3G
compatible = "10.2.0"
processes = 5000
sessions = 10016
olap_page_pool_size = 4M
shared_pool_size = 1G
dml_locks = 10000
session_cached_cursors = 1500
job_queue_processes = 10
cursor_sharing = "EXACT"
parallel_min_servers = 0
parallel_max_servers = 16
open_cursors = 5000
olap_page_pool_size = 4M
. . .
Sun Nov 11 13:22:00 2012
Errors in file /u01/app/oracle/diag/rdbms/YYYYYY/YYYYYY/trace/YYYYYY_m000_4303.trc (incident=400673):
ORA-04031: unable to allocate 4096 bytes of shared memory ("java pool","unknown object","JOXLE^cb5a18f0",":SGAClass")
Incident details in: /u01/app/oracle/diag/rdbms/YYYYYY/YYYYYY/incident/incdir_400673/YYYYYY_m000_4303_i400673.trc
. . .
The alert file shows some non-default parameters which might influence SGA behavior. Immediately there is cause for alert: the Java pool (or any other SGA component) is not defined and there are parameters (such as SESSION_CACHED_CURSORS, JOB_QUEUE_PROCESSES, and OPEN_CURSORS) together with a possibility of a huge amount of sessions (more than 10,000 concurrently possible) which can result in a massive shared pool size, thereby making it difficult for other pools to grow in this ASMM environment. Next to this, the shared pool reserved area is limited to 200MB only (default it grows and shrinks with the shared pool, but in this case it has been fixed in size). Finally, SGA_TARGET is set to a lower value than SGA_MAX_SIZE, allowing for growth of the SGA.
The alert file also shows the exact error message as well as the trace and incident trace files generated for this incident.
Analyzing the trace file
The trace file YYYYYY_m000_4303.trc shows:
. . .
*** MODULE NAME:(MMON_SLAVE) 2012-11-11 14:38:00.121
*** ACTION NAME:(JAVAVM JIT slave action) 2012-11-11 14:38:00.121
joez_compile_method beginning compile of method oracle/aurora/rdbms/EnvironmentSpecificImpl.setStaticPermissionFlagImpl
. . .
Allocation request for: :SGAClass
Heap: 3da3d5358, size: 4096
******************************************************
HEAP DUMP heap name="java pool" desc=3801abd50
extent sz=0x1040 alt=248 het=32767 rec=9 flg=-126 opc=0
parent=0 owner=0 nex=0 xsz=0x1 heap=0
fl2=0x20, nex=0
ds for latch 1: 0x3801ad6a8
ds for latch 2: 0x3801b6f70
reserved granule count 0 (granule size 16777216)
******************************************************
HEAP DUMP heap name="JOXLE^550e779c" desc=3da3d5358
extent sz=0xfe8 alt=32767 het=56 rec=0 flg=66 opc=12
parent=380002100 owner=3d42e4fe8 nex=0 xsz=0x1 heap=0
fl2=0x26, nex=0, dsxvers=1, dsxflg=0x0
dsx first ext=0xd404e2c8
Subheap has 1584 bytes of memory allocated
. . .
=========================
User Session State Object
=========================
----------------------------------------
SO: 0x42dc5e8a8, type: 4, owner: 0x434b8a340, flag: INIT/-/-/0x00 if: 0x3 c: 0x3
proc=0x434b8a340, name=session, file=ksu.h LINE:12459 ID:, pg=0
(session) sid: 7521 ser: 4521 trans: 0x0, creator: 0x434b8a340
. . .
oct: 3, prv: 0, sql: 0x3e9e5e0d8, psql: 0x3e9e3ddf0, user: 0/SYS
ksuxds FALSE at location: 0
service name: SYS$BACKGROUND
. . .
Session Wait History:
elapsed time of 0.124456 sec since last wait
0: waited for 'SGA: allocation forcing component growth'
. . .
1: waited for 'SGA: allocation forcing component growth'
. . .
----- Call Stack Trace -----
calling call entry
location type point
-------------------- -------- --------------------
ksm_4031_dump()+118 CALL ksedst()
0
ksmasg()+332 CALL ksm_4031_dump()
kghnospc()+2084 PTR_CALL ksmasg()
kghalo()+5588 CALL kghnospc()
ioc_allocate0()+128 CALL kghalo()
0
iocbf_allocate0()+2 CALL ioc_allocate0()
4
ioc_do_call()+592 PTR_CALL iocbf_allocate0()
joet_switched_env_c CALL ioc_do_call()
allback()+252
ioct_allocate0()+76 CALL joet_switched_env_c
allback()
eoa_pagmem_get_page CALL ioct_allocate0()
_memory()+72
eoa_pagmem_add_page CALL eoa_pagmem_get_page
()+488 _memory()
. . .
==============================
Memory Utilization of Subpool 1
================================
Allocation Name Size
___________________________ ____________
"free memory " 270208
"miscellaneous " 0
"JOXLE " 15833664
"joxs heap " 673344
==============================
Memory Utilization of Subpool 2
================================
Allocation Name Size
___________________________ ____________
"free memory " 1318528
"miscellaneous " 0
"JOXLE " 14708736
"joxs heap " 749952
. . .
*** MODULE NAME:(MMON_SLAVE) 2012-11-11 14:38:00.121
*** ACTION NAME:(JAVAVM JIT slave action) 2012-11-11 14:38:00.121
joez_compile_method beginning compile of method oracle/aurora/rdbms/EnvironmentSpecificImpl.setStaticPermissionFlagImpl
. . .
Allocation request for: :SGAClass
Heap: 3da3d5358, size: 4096
******************************************************
HEAP DUMP heap name="java pool" desc=3801abd50
extent sz=0x1040 alt=248 het=32767 rec=9 flg=-126 opc=0
parent=0 owner=0 nex=0 xsz=0x1 heap=0
fl2=0x20, nex=0
ds for latch 1: 0x3801ad6a8
ds for latch 2: 0x3801b6f70
reserved granule count 0 (granule size 16777216)
******************************************************
HEAP DUMP heap name="JOXLE^550e779c" desc=3da3d5358
extent sz=0xfe8 alt=32767 het=56 rec=0 flg=66 opc=12
parent=380002100 owner=3d42e4fe8 nex=0 xsz=0x1 heap=0
fl2=0x26, nex=0, dsxvers=1, dsxflg=0x0
dsx first ext=0xd404e2c8
Subheap has 1584 bytes of memory allocated
. . .
=========================
User Session State Object
=========================
----------------------------------------
SO: 0x42dc5e8a8, type: 4, owner: 0x434b8a340, flag: INIT/-/-/0x00 if: 0x3 c: 0x3
proc=0x434b8a340, name=session, file=ksu.h LINE:12459 ID:, pg=0
(session) sid: 7521 ser: 4521 trans: 0x0, creator: 0x434b8a340
. . .
oct: 3, prv: 0, sql: 0x3e9e5e0d8, psql: 0x3e9e3ddf0, user: 0/SYS
ksuxds FALSE at location: 0
service name: SYS$BACKGROUND
. . .
Session Wait History:
elapsed time of 0.124456 sec since last wait
0: waited for 'SGA: allocation forcing component growth'
. . .
1: waited for 'SGA: allocation forcing component growth'
. . .
----- Call Stack Trace -----
calling call entry
location type point
-------------------- -------- --------------------
ksm_4031_dump()+118 CALL ksedst()
0
ksmasg()+332 CALL ksm_4031_dump()
kghnospc()+2084 PTR_CALL ksmasg()
kghalo()+5588 CALL kghnospc()
ioc_allocate0()+128 CALL kghalo()
0
iocbf_allocate0()+2 CALL ioc_allocate0()
4
ioc_do_call()+592 PTR_CALL iocbf_allocate0()
joet_switched_env_c CALL ioc_do_call()
allback()+252
ioct_allocate0()+76 CALL joet_switched_env_c
allback()
eoa_pagmem_get_page CALL ioct_allocate0()
_memory()+72
eoa_pagmem_add_page CALL eoa_pagmem_get_page
()+488 _memory()
. . .
==============================
Memory Utilization of Subpool 1
================================
Allocation Name Size
___________________________ ____________
"free memory " 270208
"miscellaneous " 0
"JOXLE " 15833664
"joxs heap " 673344
==============================
Memory Utilization of Subpool 2
================================
Allocation Name Size
___________________________ ____________
"free memory " 1318528
"miscellaneous " 0
"JOXLE " 14708736
"joxs heap " 749952
. . .
The trace file shows that the error is raised by a JAVA JIT compilation attempt. It also lists that the error occurs when trying to compile the oracle/aurora/rdbms/EnvironmentSpecificImpl.setStaticPermissionFlagImpl method. Next in the trace file are the dumps of the top level heap and the heap in which the error was raised. Apparently this deals solely with the Java pool.
Then follows the information on the user session: here it is visible that the Java pool was trying to grow at the time of the failure.
When skipping all "irrelevant" routines on the stack, the trace file shows that the real memory allocation takes place in the joet_switched_env_callback() routine.
Apparently for the Java pool 2 subpools were allocated. Both have very few free memory left. As the memory request of 4096 bytes fails where there is more free memory available, this means that the free memory is fragmented so no memory piece is available for the given memory request.
Analyzing the V$SGA_RESIZE_OPS output
The V$SGA_RESIZE_OPS output shows:
. . .
COMPONENT INITIAL_SIZE FINAL_SIZE STATUS CHANGED
DEFAULT buffer cache 150,994,944 150,994,944 ERROR 11/11/2012 14:55:38
java pool 33,554,432 33,554,432 ERROR 11/11/2012 14:55:38
DEFAULT buffer cache 150,994,944 150,994,944 ERROR 11/11/2012 14:55:38
java pool 33,554,432 33,554,432 ERROR 11/11/2012 14:55:38
DEFAULT buffer cache 150,994,944 150,994,944 ERROR 11/11/2012 14:55:38
java pool 33,554,432 33,554,432 ERROR 11/11/2012 14:55:38
DEFAULT buffer cache 150,994,944 150,994,944 ERROR 11/11/2012 14:55:38
java pool 33,554,432 33,554,432 ERROR 11/11/2012 14:55:38
. . .
COMPONENT INITIAL_SIZE FINAL_SIZE STATUS CHANGED
DEFAULT buffer cache 150,994,944 150,994,944 ERROR 11/11/2012 14:55:38
java pool 33,554,432 33,554,432 ERROR 11/11/2012 14:55:38
DEFAULT buffer cache 150,994,944 150,994,944 ERROR 11/11/2012 14:55:38
java pool 33,554,432 33,554,432 ERROR 11/11/2012 14:55:38
DEFAULT buffer cache 150,994,944 150,994,944 ERROR 11/11/2012 14:55:38
java pool 33,554,432 33,554,432 ERROR 11/11/2012 14:55:38
DEFAULT buffer cache 150,994,944 150,994,944 ERROR 11/11/2012 14:55:38
java pool 33,554,432 33,554,432 ERROR 11/11/2012 14:55:38
. . .
The V$SGA_RESIZE_OPS output shows that buffer cache and Java pool resize operations constantly fail.
Conclusions
Conclusions which can be drawn:
- Most memory in the SGA is consumed by the shared pool, leaving few memory for all other SGA components
- No DB_CACHE_SIZE or JAVA_POOL_SIZE parameters have been configured
- The Java subpools are almost full
This seems to point to a too small configured Java pool . Resolution is to set a minimal JAVA_POOL_SIZE value (to guarantee that the Java pool has a minimum size and will not shrink below the given value) and if necessary, enlarge SGA_TARGET to make sure that pools can grow when needed. A starting value for the Java pool size can be obtained by querying the largest TARGET_SIZE value for the Java pool as listed in either V$SGA_RESIZE_OPS or DBA_HIST_MEMORY_RESIZE_OPS (whichever of these 2 values is the largest). Afterwards this value can be finetuned as considered appropriate.
I would appreciate any feedback and/or questions you have on this. Any questions or feedback can also be sent to mvweb@ziggo.nl.