Oracle Data Pump was written from the ground up with an architecture designed to produce high performance with maximum flexibility. Understanding the architecture of Data Pump will help us to take advantage of its speed and features.
Data Pump Architecture :
Master Table :
At the heart of every Data Pump operation is the master table. This is a table created in the schema of the user running a Data Pump job. It is a directory that maintains all details about the job: the current state of every object being exported or imported, the locations of those objects in the dumpfile set, the user-supplied parameters for the job, the status of every worker process, the current set of dump files, restart information, and so on. During a file-based export job, the master table is built during execution and written to the dumpfile set as the last step. Conversely, loading the master table into the current user’s schema is the first step of a file-based import operation, so that the master table can be used to sequence the creation of all objects imported. The use of the master table is the key to the ability of Data Pump to restart a job in the event of a planned or unplanned job stoppage. Because it maintains the status of every object to be processed by the job, Data Pump knows which objects were currently being worked on, and whether or not those objects were successfully completed.
Process Structure :
Process Structure : A Data Pump job comprises several processes. These processes are described in the order of their creation.
Client Process : This is the process that makes calls to the Data Pump API. Oracle Database ships four client utilities of this API. These have a very similar look and feel to the original exp and imp clients, but have many more capabilities.Data Pump is integrated into Oracle Database, a client is not required once a job is underway. Multiple clients may attach and detach from a job as necessary for monitoring and control.
Shadow Process : This is the standard Oracle shadow (or foreground) process created when a client logs into Oracle Database. The shadow services Data Pump API requests.1 Upon receipt of a DBMS_DATAPUMP.OPEN request, the shadow process creates the job, which consists primarily of creating the master table, the Advanced Queuing (AQ) queues used for communication among the various processes, and the master control process. Once a job is running, the main task of the shadow process consists of servicing GET_STATUS requests from the client. If the client detaches, the shadow process also goes away.
Master Control Process (MCP) : As the name implies, the MCP controls the execution and sequencing of a Data Pump job. There is one MCP per Data Pump job, maintaining the job state, job description, restart, and dumpfile information in the master table. A job is divided into various phases of metadata and data unloading or loading, and the MCP hands out work requests to the worker processes appropriate for the current phase. The bulk of MCP processing is performed in this work dispatch loop. The MCP also performs central file management duties, maintaining the active dumpfile list and handing out file pieces as requested by processes unloading data or metadata. An MCP has a process name of the form: <instance>_DMnn_<pid>.
Worker Process : Upon receipt of a START_JOB request, the MCP creates worker processes as needed, according to the value of the PARALLEL parameter. The worker processes perform the tasks requested by the MCP (primarily unloading and loading of metadata and data), and maintain the object rows that make up the bulk of the master table. As database objects are unloaded or loaded, these rows are written and updated with the current status of these objects: pending, completed, failed, and so on. The worker processes also maintain type completion rows, which describe the type of object currently being worked on: tables, indexes, views, and so on. These types completion rows are used during restart. A worker process has a name of the form: ”*DWnn*”.
Parallel Query (PQ) Process : If the External Tables data access method is chosen for loading or unloading a table or partition, some parallel query processes are created by the worker process that was given the load or unload assignment, and the worker process then acts as the query coordinator. These are standard parallel execution slaves that exploit the parallel execution architecture of Oracle Database, and enable intra-partition loading and unloading. The Data Pump public API is embodied in the PL/SQL package DBMS_DATAPUMP.
Data Movement : Data Pump supports four methods of data movement, each of which has different performance and functional characteristics. In descending order of speed, these four methods are:
- Data File Copying (transportable tablespaces)
- Direct Path load and unload
- External Tables
- Conventional Path
Data Pump will choose the best data movement method for a particular operation. It is also possible for the user to specify an access method using command line parameters.The fastest method of moving data is by copying the database data files that contain the data without interpretation or altering of the data. This is the method used to move data when transportable mode is specified at export time. There are some restrictions on the use of data file copying. Some types of data, some types of tables, and some types of table oganization cannot be moved with this method. For example, tables with encrypted columns cannot be moved using this access method. In addition, the character sets must be identical on both the source and target databases in order to use data file copying.
Direct path and external tables are the two main data access methods provided by Oracle Database 11g. The direct path access method is the faster of the two, but does not support intra-partition parallelism. The external tables access method does support this function, and therefore may be chosen to load or unload a very large table or partition. Each access method also has certain restrictions regarding the use of the other. For example, a table being loaded with active referential constraints or global indexes cannot be loaded using the direct path access method. A table with a column of data type LONG cannot be loaded with the external
tables access method. In most cases, you need not be concerned about choosing an access method; the Data Pump job will make the correct choice based on an array of job characteristics. Both methods write to the dumpfile set in a compact, binary stream format that is approximately 15 percent smaller than the original exp data representation.
When neither direct path nor external tables can handle the data to be imported, Data Pump uses a method called conventional path. For example, a table that contains an encrypted column and a LONG column would be imported using conventional path because direct path cannot be used to import encrypted columns and external tables cannot be used to import LONG columns. Data loading with the conventional access method is much slower than the direct path and external tables methods. So, the Data Pump uses this method only when it has no other choice.
Metadata Movement : The Metadata API (DBMS_METADATA) is used by worker processes for all metadata unloading and loading. Unlike the original exp function (which stored object definitions as (SQL DDL), the Metadata API extracts object definitions from the database, and writes them to the dumpfile set as XML documents. This allows great flexibility to apply XML Style sheet Language Transformations (XSLTs) when creating the DDL at import time. For example, an object’s ownership, storage characteristics, and tablespace residence can be changed easily during import. This robust XML might take up more dumpfile space than the old style SQL DDL, but it provides more flexibility and features. In addition, the COMPRESSION parameter can be used to decrease the size of metadata written during a Data Pump export job.
Interprocess Communication : Advanced Queuing (AQ) is used for communicating among the various Data Pump processes. Each Data Pump job has two queues:
- Command and control queue: All processes (except clients) subscribe to this queue. All API commands, work requests and responses, file requests, and log messages are processed on this queue.
- Status queue: Only shadow processes subscribe to read from this queue. It is used to receive work-in-progress and error messages queued by the MCP. The MCP is the only writer to this queue.
File Management : The file manager is distributed across several parts of the Data Pump job. The actual creation of new files and allocation of file segments is handled centrally within the MCP. However, each worker and parallel query process makes local process requests to the file manager to allocate space, read a file chunk, write to a buffer, or update progress statistics. The local file manager determines if the request can be handled locally and if not, forwards it to the MCP using the command and control queue. Reading file chunks and updating file statistics in the master table are handled locally. Writing to a buffer is typically handled locally, but may result in a request to the MCP for more file space.
Directory Management : Because Oracle background server processes handle all dumpfile set I/O, not the user running the job. This presents a security dilemma because oracle is typically a privileged account. Therefore, all directory specifications are made using Oracle directory objects with read/write grants established by the DBA.
For example, the DBA may set up a directory as follows:
Step 1 : Create Directory
SQL> Create directory datapump as ‘D:\dpump\’;
Step 2 : Grant privileges to user
SQL> Grant read, write on directory datapump to scott;
Then scott can specify a dump file on the expdp command line as:
C:\>expdp scott/tiger dumpfile=hr_emp_tab.dmp tables=hr.employees directory=datapump logfile=hr_emp_log.log
Enjoy J J J