Web Content Display Web Content Display

Installing Webmin in Ubuntu Hardy Heron

Webmin is a web-based interface for system administration for Unix. Using any modern web browser, you can setup user accounts, Apache, DNS, file sharing and much more. Webmin removes the need to manually edit Unix configuration files like /etc/passwd, and lets you manage a system from the console or remotely.

You can install webmin for your server web interface to configure apache,mysql servers.Now we will see how to install webmin in Ubuntu 8.04

Preparing your system
First you need to install the following packages
sudo apt-get update
sudo apt-get install perl libnet-ssleay-perl openssl libauthen-pam-perl libpam-runtime libio-pty-perl libmd5-perl

Now download the latest webmin using the following command
wget http://prdownloads.sourceforge.net/webadmin/webmin_1.420_all.deb
Now we have webmin_1.420_all.deb package install this package using the following command
sudo dpkg -i webmin_1.420_all.deb
This will complete the installation.

Web Content Display Web Content Display

Data migration ETL Services

Design and Deploy of Business Intelligence Servers

Design and Deploy of Virtualisation Services

Instant Deploy of apps server services

We have pre-built apps servers (CRM,CMS, email/db/web, firewall, BI, data-migration etc.) which can be deployed in minutes! Our platforms are Vmware ESX or Amazon Elastic Cloud (AEC).

For Business Intelligence (BI) we use the Pentaho suite.The Pentaho BI Platform is a process-centric, workflow-enabled, scalable platform for solving Business Intelligence problems. A direct quote: "The Pentaho BI Project provides enterprise-class reporting, analysis, dashboard, data mining and workflow capabilities that help organizations operate more efficiently and effectively. The software offers flexible deployment options that enable use as embeddable components, customized BI application solutions, and as a complete out-of-the-box, integrated BI platform. Ranked #1 in Open Source Business Intelligence." from their site

http://www.pentaho.com/

Virtualisation services is primarily the free VMware server. We also have experience with VMware from desktop to ESX thus reducing Server deployment costs through server consolidation. Also reduce testing time and cost by using the test system virtualisation use of the VMware snap-shot capability.

We manage hosted Oracle, SAPdb and MYsql Postgres databases and have extensive migration experience between them.

Data ETL and Replication. We use the Pentaho toolset to effortlessly migrate data from anywhere to anywhere, between ANY databes, platform and file system.

Enterprise class Document Management is achieved via Alfresco (a Documentum replacement), for smaller lighter needs we use Knowledgetree (DMS) and Plone (web CMS). Both can be 'rapidly deployed' as an appliance.


Quick HOWTO change Mysql default data directory

By default, MySQL’s datadir is placed in the /var/lib/mysql directory.

Create the directory that will be new datadir (ex:-/home/db)

chown the directory to the mysql:mysql user

sudo chown -R mysql:mysql /home/db/*

You need to stop the mysql server using the following command

sudo /etc/init.d/mysql stop

Now you need to edit the /etc/mysql/my.cnf file

sudo vi /etc/mysql/my.cnf

and look for “datadir = /var/lib/mysql” this si where mysql database default data directory here you need to change this one to your new directory

datadir = /home/db

copy the files from the old datadir to the new location. However, make sure that the files named
ib_arch_log_0000000000, ib_logfile0 etc. are not copied to the newer location.

Make sure that the files and directories are owned by mysql user

Make changes in the my.cnf to point the new datadir.

Restart the MySQL database

sudo /etc/init.d/mysql start

I'm always looking and  can never seem to find this  list AMI's on the amazon site. So here it is:

The Amazon AMI list.

(Especially For Elastic Compute Geeks and others hoping to starts using this fantastic offering.)

Web Content Display Web Content Display

Do You Find Gmail web client slow?

Use Thunderbird 2.0 and get dramatically improved performance.

Follow the instructions below to set up IMAP access in Thunderbird 2.0

Note: if you're using an older version of Thunderbird, we highly recommend upgrading to version 2.0.

To set up your Thunderbird client to work with Gmail:

  1. Enable IMAP in Gmail.
  2. Open Thunderbird, and select Tools > Account Settings.
  3. Click Add Account.
  4. Select the Email account radio button and click Next. The Identity screen appears.
  5. Enter your full name in the Your Name field. Enter your Gmail address (username@gmail.com) in the Email Address field, and click Next. Google Apps users, enter your full address in the format username@your_domain.com.
  6. Select IMAP as the type of incoming server you are using. Enter imap.gmail.com in the Incoming Server field.
  7. Set the Outgoing Server to smtp.gmail.com, and click Next.


  8. Enter your full email address (including @gmail.com @your_domain.com) in the Incoming User Name and Outgoing User Name fields, and click Next.


  9. Enter a name for your email account in the Account Name field, and click Next.
  10. Verify your account information in the dialog box, and click Finish.
  11. Select Server Settings from the folder list below your new account.
  12. Update the Port value to 993.
  13. In the Security Settings section, select SSL from the Use secure connection options.
  14. Select Check for messages at startup and Automatically download new messages
  15. Click Outgoing Server (SMTP) in the folder list


  16. Select the smtp.gmail.com (Default) entry from the list and click Edit. The SMTP Server page appears.
  17. Enter smtp.gmail.com as the Server Name and set the Port to 587.
  18. Select User name and password and enter your full email address (including @gmail.com or @your_domain.com) in the User Name field.
  19. Select TLS from the Use secure connection radio buttons and click OK.
  20. Click OK to save your changes and exit the Account Settings dialog.

* Did you click 'Save Changes' after enabling IMAP in Gmail?

To ensure that Gmail can communicate with your mail client, be sure to click

Save Changes

on the

Forwarding and POP/IMAP

settings tab.

Data Pump in Oracle Database 10g: Foundation

for Ultra-High Speed Data Movement Utilities

Data Pump Overview  

Data Pump is a new facility in the Oracle Database 10g Server that enables very high speed data and metadata loading and unloading to / from the Oracle Database. It automatically manages and schedules multiple, parallel streams of load or unload for maximum throughput. Data Pump infrastructure is callable via the PL/SQL package DBMS_DATAPUMP. Thus, custom data movement utilities can be built using Data Pump. Oracle Database 10g will see four of these: New command line export and import clients (expdp & impdp), a web-based Enterprise Manager export / import interface and a custom interface for the movement of complex Data Mining models.

Data Pump is also the foundation for several other key features in the Oracle server: Streams-based Replication, Logical Standby, Grid and Transportable Tablespaces. For Streams-based Replication and Logical Standby, Data Pump quickly generates the initial configuration at the replicated site using Flashback technology to provide a consistent starting point. Initial Grid instantiation is based on Transportable Tablespaces which relies on Data Pump to move and “hook-up” the metadata for the objects defined in the transported tablespace set.

When gathering requirements for Data Pump with large customers, we repeatedly heard, “Time is money. If you do nothing else, make the export / import operation much faster for large amounts of data”. We took that to heart. Data Pump dramatically decreases the elapsed time for most large export / import operations; in some data-intensive cases, by more than two orders of magnitude. In addition to increased performance, our customers enumerated many other requirements. As a result, Data Pump-based export and import clients support all the functionality of the original exp/imp as well as many new features such as checkpoint restart, job size estimation, very flexible, fine-grained object selection, direct loading of one instance from another and detailed job monitoring.

Data Pump is an integral feature of Oracle Database 10g and therefore is available in all configurations. However, parallelism with a degree greater than one is only available in Enterprise Edition.

This paper will start with an overview of Data Pump's architecture then describe its main features along with some best-use practices, then provide a brief comparison to the original export and import facilities.

Architecture

Master Table

At the heart of every Data Pump operation is the “Master Table” (MT). This is a table created in the schema of the user running a Data Pump job. It is a directory which maintains all facets about the job: The current state of every object exported or imported and their locations in the dumpfile set, the job's user-supplied parameters, the status of every worker process, the current set of dumpfiles, restart information, etc.

The MT is built during a file-based export job and is written to the dumpfile set as the last step. Conversely, loading the MT into the current user's schema is the first step of a file-based import operation and is used to sequence the creation of all objects imported.

During export,  note that the MT cannot span files (as other written objects can). If dumpfile sizes are being limited by the FILESIZE parameter, this must be large enough to contain the master table. As a reference point, a database with approximately 400,000 objects including 10,000 tables creates a master table 189 Mb in size.

The MT is the key to Data Pump's restart capability in the event of a planned or unplanned job stoppage. Since it maintains the status of every object comprising the job, upon restarting, Data Pump knows what objects were currently being worked on and if they completed successfully or not.

Process Structure

There are various processes that comprise a Data Pump job. They are described in the order of their creation.

Client Process - This is the process that makes calls to Data Pump's API. As mentioned earlier, Oracle Database 10g ships four clients of this API. This paper will only discuss the new export / import clients, expdp and impdp. They have a very similar look and feel to the original exp and imp, but have far more capabilities as will be described later. Since Data Pump is completely integrated into the Oracle Database server, a client is not required once a job is underway. Multiple clients may attach and detach from a job as necessary for monitoring and control.

Shadow Process - This is the standard Oracle shadow (or foreground) process created when a client logs in to the Oracle server. The shadow services Data Pump API requests.1Upon receipt of a DBMS_DATAPUMP.OPEN request, the shadow creates the job which primarily consists of creating the master table, creating the AQ queues used for communication among the various processes and creating the Master Control Process. Once a job is running, the shadow's main task usually consists of servicing GET_STATUS requests from the client. If the client detaches, the shadow goes away, too.

Master Control Process (MCP) - There is one MCP per job and as the name implies, the MCP controls the execution and sequencing of a Data Pump job. A job is divided into various metadata and data unloading or loading phases, and the MCP hands out work requests to the worker processes appropriate for the current phase. It mainly sits in this work dispatch loop during a job's execution. It also performs central file management duties, maintaining the active dump file list and handing out file pieces as requested by processes unloading data or metadata. An MCP has a process name of the form: <instance>_DMnn_<pid>. The MCP maintains job state, job description, restart and dumpfile information in the master table.

Worker Process - Upon receipt of a START_JOB request, the MCP creates N worker processes where N is the value of the PARALLEL parameter. The workers perform the tasks requested by the MCP that consist primarily of unloading / loading of metadata and data. The workers maintain the object rows in the master table which comprise the bulk of the table. As database objects are unloaded or loaded, these rows are written and updated with the objects' current status: pending, completed, failed, etc. The workers also maintain what are called “type completion rows” which describe what type of object is currently being worked on: tables, indexes, views, etc. These are used during restart. A worker has a name of the form: <instance>_DWnn_<pid>.

Parallel Query (PQ) Process - If External Tables (ET) is chosen as the data access method for loading or unloading a table or partition, N PQ processes are created by the worker given the load or unload assignment, and the worker acts as the query coordinator. These are standard parallel execution slaves exploiting the Oracle server's parallel execution architecture, thus enabling intra-partition loading and unloading. In a RAC, PQ processes may be created on an instance other than where the Data Pump job was initiated. All other processes described thus far are created on that initial instance.

Data Movement

Two access methods are supported: Direct Path (DP) and External Tables (ET). DP is the faster of the two but does not support intra-partition parallelism. ET does and therefore may be chosen to load or unload a very large table or partition. Each also has certain restrictions requiring the use of the other: For example, a table being loaded with active referential constraints or global indexes must be loaded via ET. A table with a column of data type LONG must be unloaded and loaded with DP. Both methods write to the dumpfile set in a compact, binary stream format that is approximately 15% smaller than original export's data representation.

Metadata Movement

The Metadata API (DBMS_METADATA) is used by worker processes for all metadata unloading and loading. Database object definitions are extracted and written to the dumpfile set as XML documents rather than as SQL DDL as was done by original export. This allows great flexibility to apply XSL-T transformations when creating the DDL at import time. For example, an object's ownership, storage characteristics and tablespace residence can be changed easily during import. Necessarily, XML takes up more dumpfile space than DDL, but the flexibility gained by deferring DDL creation until import is worth the trade-off. Since metadata is typically dwarfed by data in a dumpfile set, the reduction in binary stream size typically produces dumpfile sets the same as or smaller than those produced by original export.

One other small point: The XML representing table metadata is stored within the MT rather than being written immediately to the dumpfile set. This is because it is used for several downstream operations, thus avoiding multiple fetches.

Inter-process Communication

Advanced Queuing (AQ) is used for communicating among the various Data Pump processes. Each job has two queues:

  •  Command and control queue: All processes (except clients) subscribe to this queue. All API commands, work requests and responses, file requests and log messages are processed on this queue.
  •  Status queue: Only shadow processes subscribe to this queue to receive work in progress and error messages queued by the MCP. The MCP is the only writer to this queue.

These queues have names of the form: 'KUPC${C|S}_<job-unique timestamp>

File Management

The file manager is a distributed component: As mentioned earlier, the actual creation of new files and allocation of file segments is handled centrally within the MCP. However, each worker and PQ process make local process requests to the file manager to request space, read a file chunk, write a buffer or update progress statistics. The local file manager determines if the request can be handled locally and if not, forwards it via the command and control queue to the MCP. Reading file chunks and updating file statistics in the master table are handled locally. Writing a buffer is typically handled locally, but may result in a request to the MCP for more file space.

Directory Management

Because all dumpfile set I/O is handled by Oracle background server processes, the operating system persona doing the I/O is “oracle”, not the user running the job. This presents a security dilemma since “oracle” is typically a privileged account. Therefore, all directory specifications are made using Oracle directory objects with read / write grants established by the DBA. For example, the DBA may set up a directory as follows:

Create directory dmpdir1 as '/private1/data/dumps';

Grant read, write on directory dmpdir1 to scott;

Then SCOTT can specify a dumpfile on the expdp command line as:

 expdp scott/tiger dumpfile=dmpdir1:scott.dmp

If the file size is limited for manageability by the FILESIZE parameter, then potentially many dump files can be created. The file manager automatically maintains dumpfile set coherency via a globally unique identifier and other information written into the file headers. An import or SQL file job cannot start until all members of the job's dumpfile set are present in DUMPFILE parameter specifications

Major Features

This section will briefly describe some of the major new features in Data Pump.

Performance

Data Pump based export / import operations are typically much faster than their original exp / imp counterparts. A single thread of Data Pump's direct path data unload is about twice as fast as original exp's direct path unload. A single thread of Data Pump data load is 15X - 45X faster than original imp. And of course, Data Pump operations can be specified with parallel threads of execution2. Also note that parallel threads can be dynamically added and removed to/from running jobs to tailor the job to the changing execution environment.

During export, when there are two or more workers, data and metadata unloading proceed in parallel.

Data Pump will also automatically build each index in parallel up to the parallel degree of the job. The permanent parallel degree of the index remains unchanged from its source value.

I/O bandwidth is most important factor

It is important to make sure there is sufficient I/O bandwidth to handle the number of parallel threads specified; otherwise performance can actually degrade with additional parallel threads. Care should be taken to make sure the dumpfile set is located on spindles other than those holding the instance's data files. Wildcard file support makes it easy to spread the I/O load over multiple spindles. For example, a specification such as:

Dumpfile=dmpdir1:full1%u.dmp,dmpdir2:full2%u.dmp

Dumpfile=dmpdir3:full3%u.dmp,dmpdir4:full4%u.dmp

will create files named full101.dmp,  full201.dmp, full301.dmp, full401.dmp, full102.dmp, full202.dmp, full302.dmp, etc. in a round-robin fashion across the four directories pointed to by the four directory objects.

Initialization Parameters

Essentially no tuning is required to achieve maximum Data Pump performance. Initialization parameters should be sufficient out of the box.

  •  Make sure disk_asynch_io remains TRUE: It has no effect on those platforms whose file systems already support asynchronous I/O, but those that don't are significantly impacted by a value of FALSE.
  •  db_block_checksum's default value is FALSE, but if an integrity issue is being investigated requiring this to be set to TRUE, its impact on data loading and unloading is minimal; less than 5%.

Data Pump's AQ-based communication and the metadata API both require some amount of SGA… make sure shared_pool_size is sufficient.

Both the metadata API during export and the worker process during import may execute some fairly long-running queries that have the potential for exhausting rollback segments. This is mostly an issue in jobs affecting many objects. Make sure these are configured sufficiently large. For example, export / import of a database containing 400,000 objects required two rollback segments, each 750 Mb in size.

Metadata API

Metadata performance is about the same as the original exp and imp utilitities, but much more flexible and extensible. However, since most operations are dominated by data movement, most see a dramatic overall improvement.

Network Mode

Data Pump supports the ability to load one instance directly from another (network import) and unload a remote instance (network export). Rather than using network pipes which are not supported on all platforms, network mode uses DB links.

During network import, the Metadata API executes on the remote node, extracting object definitions and sending them to the local instance for creation where the Data Pump job is executing. Data is fetched and loaded using insert as select statements such as:

Insert into foo (a,b,c,…) select (a,b,c,…) from foo@remote_service_name

These statements incorporate hints on both sides to access the direct path engine for maximum performance.

Network export enables the ability to export read-only databases. Data Pump export cannot run locally on a read-only instance because maintaining the master table, writing messages to queues and creating external tables all require writes on the instance. Network export creates the dumpfile set on the instance where the Data Pump job is running and extracts the metadata and data from the remote instance, just as network import does. Data movement in network export is done exclusively by External Tables since 'create as select@service' style DML statements are required.

With either network mode operation, it is expected that network bandwidth will become the bottleneck. Be careful that the parallel setting does not saturate the network. We discovered that many of our customers would implement a sort of 'network mode' with original exp and imp by exporting into a network pipe and importing out the other end. This overlapped the export and import operations, thus improving elapsed time. Given the dramatic improvement in the performance of file-based Data Pump operations, it is unclear if network mode can provide a significant reduction in elapsed time for instance initialization as it did with original exp and imp.

Restart

All stopped Data Pump jobs can be restarted without loss of data as long as the master table and dumpfile set remain undisturbed while the job is stopped. It doesn't matter if the job was stopped voluntarily with the clients' STOP_JOB command or the stoppage was involuntary due to a crash, power outage, etc. Sufficient context is maintained in the master table to know where to pick up. A client can attach to a stopped job with the ATTACH=<job name> parameter, then start it with the interactive START command.

During import, sometimes an unforeseen, repeating problem with a particular object occurs that prevents further progress. START=SKIP_CURRENT will skip the current object and continue with the next, thus allowing progress to be made.

Fine-grained Object Selection

One could only choose to include or ignore indexes, triggers, grants and constraints with original exp and imp. With various client parameters, a Data Pump job can include or exclude virtually any type of object and any subset of objects within a type.

Exclude

The exclude parameter allows any database object type to be excluded from an export or import operation. The optional name qualifier allows you finer selectivity within each object type specified. For example, the following three lines in a parameter file:

Exclude=function

Exclude=procedure

Exclude=package:”like 'PAYROLL%' “

Would exclude all functions, procedures and packages with names starting with 'PAYROLL' from the job.

Include

The include parameter includes only the specified object types and objects in an operation. For example, if the above three specifications were INCLUDE parameters in a full database export, only functions, procedures and packages with names starting with 'PAYROLL' would be written to the dumpfile set..

Content

The content parameter allows one to request for the current operation just metadata, just data or both. Original exp's 'ROWS=N' parameter was equivalent to content=metadata_only, but there is no equivalent for content=data_only.

Query

The query parameter operates much as it did in original export, but with two significant enhancements:

  1.  It can be qualified with a table name such that it only applies to that table
  2.  It can be used during import as well as export.

Monitoring and Estimates

Another requirement we heard from our customers was to provide better, more detailed monitoring capabilities. In addition to the standard progress and error messages printed by the client and into the log file, the new client interactive command STATUS will show detailed job information including overall percent done, the status of each worker process, the current objects being worked on and the percent done for each one. You can also specify a time interval in seconds for an automatic update of detailed status rather than manually requesting it.

The start of every export job now also includes an estimate phase where the approximate amount of all data to be unloaded is determined. The default method for determining this is to estimate the size of a partition by counting the number of blocks currently allocated to it. If tables have been analyzed, statistics can also be used which should provide a more accurate estimate. This serves two purposes: 1) You get an idea of how much dumpfile space will be consumed. 2) All the information needed to start unloading tables is retrieved and ordered by size descending. This allows the MCP to schedule the unloading of metadata and data in parallel. The objects retrieved during this estimate phase are called table data objects with each representing a partition or the entire table if the table is unpartitioned.

Since 0-N clients may be attached to a running job, one may start a job at work, detach from it, go home, re-attach and monitor it throughout the evening.

New Clients: expdp and impdp

Although the new export and import clients retain a similar look and feel to the original exp and imp clients, 100% parameter compatibility was not a goal since there are so many inconsistencies in the original clients' parameters. Where a concept makes sense for both an export and import operation, we made sure the parameter is the same for both. The new clients also support far greater capabilities:

Interactive command mode - Typing control-C (^C) will invoke the interactive command mode and an 'export>' or 'import>' prompt will appear. From this prompt, you can request help, detailed job status, change monitoring parameters, dynamically add files (including wildcard specifications) to the job's dumpfile set, stop the job leaving it restartable, kill the job leaving it not restartable, change the degree of parallelism for the job, return to logging mode to continue receiving progress messages or exit the client and leave the job running.

All modes of operation are supported: Full, schema, table, tablespace and transportable tablespace. Data Pump and its new expdp and impdp clients are a complete superset of original exp/imp's functionality3.

Flashback is supported for exports and imports as of a certain time.

WHERE clause predicates may be applied to individual tables by both expdp and impdp.

Privileged users (those with either the EXP_FULL_DATABASE or IMP_FULL_DATABASE roles) may attach to and control jobs initiated by other users even if the job is stopped.

Other Useful Features

DDL Transformations - Because object metadata is stored as XML in the dumpfile set, it is easy to apply transformations when DDL is being formed (via XSL-T) during import. Impdp supports several transformations: REMAP_SCHEMA provides the old 'FROMUSER / TOUSER' capability to change object ownership. REMAP_TABLESPACE allows objects to be moved from one tablespace to another. This changes the tablespace definition as well. REMAP_DATAFILE is useful when moving databases across platforms that have different file system semantics. One can also specify via the TRANSFORM parameter that storage clauses should not be generated in the DDL. This is useful if the storage characteristics of the target instance are very different from those of the source.

SQL file - Impdp can also perform a SQL file operation. Rather than creating database objects, this merely writes the equivalent DDL to a file in the form of a SQL script almost ready for execution. Only the embedded 'connect' statements are commented out.

TABLE_EXISTS_ACTION - Original imp would allow rows to be appended to pre-existing tables if IGNORE=Y was specified. Impdp's TABLE_EXISTS_ACTION parameter provides four options:

  1.  SKIP is the default: A table is skipped if it is found to exist.
  2.  APPEND will append rows if the target table's geometry is compatible.
  3.  TRUNCATE will truncate the table, then load rows from the source if: 1) The geometries are compatible    2) A truncate is possible; for example, it is not possible if the table is the target of referential constraints.
  4.  REPLACE will drop the existing table then create and load it from the source.

CONTENT - This new parameter, applicable to both clients allows the movement of DATA_ONLY, METADATA_ONLY or BOTH (the default).

VERSION - expdp supports the VERSION parameter which tells the server-based Data Pump to generate a dumpfile set compatible with the specified version. This will be the means to perform downgrades in the future. There is no need (as there was with original exp) to run older versioned Data Pump clients.

Enterprise Manager supports a fully functional interface to Data Pump.

Data Pump Views - Data Pump maintains a number of user and DBA accessible views to monitor the progress of jobs:

DBA_DATAPUMP_JOBS: This view shows a summary of all active Data Pump jobs on the system.

USER_DATAPUMP_JOBS: This view shows a summary of the current user's active Data Pump jobs.

DBA_DATAPUMP_SESSIONS: This shows all sessions currently attached to Data Pump jobs.

V$SESSION_LONGOPS: A row is maintained in the view showing progress on each active Data Pump job. The OPNAME column displays the Data Pump job name.

 

Original exp and imp

Both original export and import will ship with Oracle Database 10g:

  •  Original imp will be supported forever and will provide the means to import dumpfiles from earlier releases: V5 through V9i. Original and Data Pump-based dumpfiles are not compatible: Neither client can read the other's dump files.
  •  Original exp will ship and be supported in Oracle Database 10g to provide at least 9i functionality, but will eventually be deprecated. Data Pump-based export will be the sole supported means of export moving forward. New features in Oracle Database 10g are not supported in original exp. The Oracle 9i version of exp may be used with version 10g for downgrade purposes. Beyond version 10g, expdp's VERSION parameter should be used for downgrade.

Differences Between Data Pump and Original exp/imp

This section highlights some of the main differences you will notice when first running Data Pump-based export and import:

Data Pump is designed for big jobs with lots of data. This has a number of implications:

  •  Startup time is longer. All of the process and communication infrastructure must be initialized before a job can get underway. This could take about 10 seconds. Also, export start time includes retrieval and ordering of all table data objects so the MCP can start immediately scheduling table unloads.
  •  Data Pump export must write the master table to the dumpfile set and the end of a job. Data Pump import must locate and load the master table, then build its indexes. This should also be about 10 seconds, but if the master table is very large, the index builds at import start time could take longer. Direct path is used to unload and load the MT.
  •  Importing a subset of a dumpfile set deletes non-pertinent rows from the master table: If the subset is very small compared to the export set, this can take a noticeable amount of time to perform the required deletes.
  •  Performance of metadata extraction and creation is about on-par with original exp / imp: It's very difficult to make DDL go faster. The greatest performance improvement in Data Pump occurs when unloading and loading data.
  •  XML metadata in dumpfiles is about 7X bigger than exp's DDL and Data Pump's data stream format is about 15% smaller than exp's row/column format. Standalone compression tools like gzip work quite nicely on dumpfiles dominated by metadata.
  •  Data Pump is as resource-intense as you wish: Reducing elapsed time per-job is first and foremost in the design. Data Pump will consume as much CPU, memory, I/O bandwidth and network bandwidth (in network mode) as your setting of PARALLEL will allow.

The user running Data Pump must have sufficient tablespace quota to create the master table.

Be cognizant of rollback configuration in jobs containing many objects.

The progress messages displayed by the clients are different than those displayed by the original exp/imp clients, but still reflect the current object type. Also note that 'Already exists' errors are flagged as such and included in the total errors count issued at the end of the job.

A log file is generated by default with a name 'export.log' or 'import.log'. This can be over-ridden with the LOGFILE parameter. Original exp and imp do not generate a log file by default.

Conclusion

Data Pump is a new, callable facility in the Oracle 10g database that provides very high speed loading and unloading of data and metadata. New export and import clients, expdp and impdp that fully exploit Data Pump's infrastructure are also provided in this release. They are implemented as complete supersets of original exp / imp functionality and will eventually replace them.