SpyroSoft
An Oracle11g Database Migration to PostgreSQL in Practice
Łukasz Macuga
October 11, 2018

Recently on our client’s request,  we’ve had an opportunity to perform an Oracle11g database migration to PostgreSQL in an infrastructure of one of our existing production.

As of very strict security policy on the client’s side, we were not allowed to access database servers directly via SSH and create any dumps for our use. Therefore, we’ve decided to utilize a useful tool – Ora2Pg, to do the heavy lifting for us. This software was developed by Gilles Darold and is distributed as Open Source project. Once the environment was setup, with the help of Ora2Pg, we were able to migrate the production database of estimated size of 15 GB in about 30 minutes.

In this article, we will go through some steps required to accomplish this task.

Installation of dependencies

To setup the environment I recommend using Docker UNIX based container (you can also use Debian or if you prefer lightweight distribution – try Alpine), as it requires installation of a few additional tools and adding some environment variables. With such a prepared image we are able to easily use it multiple times in different environments.

At first, we need to install Oracle client libraries:

apt-get install unzip
apt-get install alien
apt-get install libaio1
apt-get install unixodbc
#Unzip oracle-xe-11.2.0-1.0.x86_64.rpm.zip
cd Disk1
alien --scripts -d oracle-xe-11.2.0-1.0.x86_64.rpm
dpkg -i oracle-xe_11.2.0-2_amd64.deb

In order to setup the environment variables paste the following lines to

/etc/bash.bashrc

export ORACLE_HOME=/u01/app/oracle/product/11.2.0/xe
export ORACLE_SID=XE
export NLS_LANG=`$ORACLE_HOME/bin/nls_lang.sh`
export ORACLE_BASE=/u01/app/oracleexport
LD_LIBRARY_PATH=$ORACLE_HOME/lib:$LD_LIBRARY_PATH
export PATH=$ORACLE_HOME/bin:$PATH

then execute

source /etc/bash.bashrc

and

echo $ORACLE_HOME

to ensure that the variable is set properly.

As Ora2Pg is written in Perl, we need to install it with modules to enable database interface methods:

apt-get install perl
perl -MCPAN -e shell

enter Perl shell:

perl -MCPAN -e shell

execute following commands:

get DBD::Oracle
quit
cd ~/.cpan/build/DBD-Oracle*
export LD_LIBRARY_PATH=/usr/lib/oracle/11.2/client64/lib
export ORACLE_HOME=/usr/lib/oracle/11.2/client64/
perl Makefile.PL
make
make install

Our goal is to transfer the data directly to Postgres database without any storage in between. For that we need to install Postgres client:

apt-get install postgresql-9.6 libpq-de
perl -MCPAN -e 'install DBD::Pg'

Now, when the environment is ready, we can install Ora2Pg. Download the newest source code (now it will be a 18.2. version).

To compile and install the tool go to the archive folder and execute:

tar xzf ora2pg-18.2.tar.gz
cd ora2pg-18.2/
perl Makefile.PL
make && make install

Ora2Pg configuration

Now, we can proceed to Ora2Pg conf. To init a new project we must execute the following command:

ora2pg --project_base <baseDir>/ora2pg --init_project migration

Under config directory we can find a default ora2pg.conf file, which requires some parameters to be adjusted. Its detailed description may be found in a default config file. For our use case we need to set the following ones:

#Update Oracle DB credentials
ORACLE_DSN dbi:Oracle:host=mydb.mydom.fr;sid=SIDNAME;port=1521
ORACLE_USER    system
ORACLE_PWD manager

#Enable more detailed log to stderr
DEBUG     1

# Export Oracle schema to PostgreSQL schema
EXPORT_SCHEMA  1

# Oracle schema/owner to use
SCHEMA    USER_NAME

#Postgres schema name
PG_SCHEMA  pg_schema

# Type of export. Values can be the following keyword:
#  TABLE     Export tables, constraints, indexes, ...
#  SEQUENCE   Export sequences
#  VIEW      Export views
# other possible values available in documentation
TYPE      TABLE,SEQUENCE,VIEW

# Add a TRUNCATE TABLE instruction before loading data on COPY and INSERT
# export. When activated, the instruction will be added only if there's no
# global DELETE clause or one specific to the current table.
TRUNCATE_TABLE 0

# It will drop all foreign keys before all data import and recreate them at
# the end of the import.
DROP_FKEY  1

# Disables alter of sequences on all tables in COPY or INSERT mode.
# Disable update of sequence during data migration.
DISABLE_SEQUENCE   0

# Disables triggers on all tables in COPY or INSERT mode. Available modes
DISABLE_TRIGGERS 0

# Postges DB credentials
PG_DSN    dbi:Pg:dbname=test_db;host=localhost;port=5432
PG_USER        test
PG_PWD    test


# Comment out, as we want to sent all data directly to PG
#OUTPUT       output.sql

# Creates one file per constraints, indexes, fkeys and functions
FILE_PER_CONSTRAINT    0
FILE_PER_INDEX    0
FILE_PER_FKEYS    0
FILE_PER_FUNCTION  0

# Extract data by bulk of DATA_LIMIT tuples at once. Default 10000. If you set
# a high value be sure to have enough memory if you have millions of rows.
DATA_LIMIT 40
# Use this directive to set the database handle's 'LongReadLen' attribute to
# a value that will be the larger than the expected size of the LOB. The default
# is 1MB witch may not be enough to extract BLOB objects. If the size of the LOB
# exceeds the 'LongReadLen' DBD::Oracle will return a 'ORA-24345: A Truncation'
# error.
LONGREADLEN    20023776


# Disable this if you don't want to load full content of BLOB and CLOB and use
# LOB locators instead.
NO_LOB_LOCATOR  1

# Enable PLSQL to PLPSQL conversion.
PLSQL_PGSQL    1

If you increase the value of LONGREADLEN, keep in mind that DATA_LIMIT also needs to be reduced. Even if you only have a 1MB blobs and LONGREADLEN is set to 1MB, trying to read 5000 of them at once will require 5GB of memory. This needs to be adjusted depending on what resources you have on your machine. Keep in mind that if we reduce DATA_LIMIT to a very low value, the migration will get slower, as data will be pulled in smaller batches.

In our case, most of the DB volume were average sized BLOBs. That’is why we had to increase LONGREADLEN and decrease DATA_LIMIT. It resulted in lower speed of migration to 15GB/30 minutes. If you don’t have large BLOBs the transfer can be sped up to even 15GB/5 minutes.

Migration execution

In this part we are finally ready to perform the migration. To export schema, we need to execute the following command in our project directory:

./export_schema.sh

And import:

./import_all.sh -h <host_name> -U <user> -d <database_name> -o <schema> -x

-x at the end disables the indexes and constraints for the time of data transfer and enables them after the transfer.

To verify if all data and objects are present in the same amount in both databases we can run the following command:

ora2pg -t TEST --count_rows -c config/ora2pg.conf > migration_diff.txt

Autogenerated reports about our migration can be found under /reports directory.

Bonus: Streaming binary data with Postgres and JAVA

Our application heavily relies on the streaming capabilities. Postgres provides two types of storing binary data BYTEA and LargeObject. BYTEA is a byte array which is stored directly in our table, but unfortunately, Postgres does not provide streaming capabilities for it.

In our case, we need to make a use of LargeObjects (please note, that this type is recommended for use in data larger than 1GB, but if you need streaming – there are not many alternatives).

If we decide to use LargeObjects, only OID value (which points to the internal table with raw binary data) will be stored in the column.

Here’s a sample script used to migrate our existing BYTEA column to OID:

alter table test_table add column lo_data oid;

do $$
declare
  loid oid;
  lfd integer;
  lsize integer;
  d test_table;
begin
  for d IN (select * from test_table) loop
    loid := lo_create(0);
    lfd := lo_open(loid,131072);
    lsize := lowrite(lfd, d.binary_data);
    perform lo_close(lfd);
    update test_table set lo_data = loid where id = d.id;
  end loop;
end;
$$;

alter table test_table alter column lo_data set not null;
alter table test_table drop column binary_data;
alter table test_table rename lo_data to binary_data;
commit;

There may be multiple ways to utilize LargeObjectAPI in JAVA application (assuming that the project is built with use of Spring and Hibernate):

  1. Extending the PostgreSQL9Dialect in order to map properly the OID stored in a column to any data type we need
  2. Modifying our entity in such way that it contains the OID value which is of long data type, and then retrieving LargeObject when needed
  3. Modifying our entity so it consists of LargeObject provided by Hibernate AttributeConverter

Let’s assume that tight dependency in our entity with LargeObject does not bother us and see how last proposal can be implemented.

In our entity we must create a field like hte following one:

@Column(name = "BINARY_DATA")
private LargeObject binaryData;

Please, note that LargeObjects are lightweight and lazy by default.

@Converter(autoApply = true)
@Component
public class LargeObjectToLongConverter implements AttributeConverter<LargeObject, Long> {

    private static LargeObjectConversion largeObjectConversionProvider;

    @Autowired
    public void setDataBaseConnectionProvider(LargeObjectConversion dataBaseConnectionProvider) {
        largeObjectConversionProvider = dataBaseConnectionProvider;
    }

    @Override
    public Long convertToDatabaseColumn(LargeObject value) {
        if (value == null) {
            return null;
        }
        return value.getLongOID();
    }

    @Override
    public LargeObject convertToEntityAttribute(Long value) {
        try {
            return largeObjectConversionProvider.getLargeObjectForOID(value);
        } catch (SQLException e) {
            throw new LargeObjectConversionException(e);
        }
    }
}
@Component
public class LargeObjectConversionProvider implements LargeObjectConversion {

    @Autowired
    private DataSource dataSource;

    public LargeObject createLargeObject() throws SQLException {
        PgConnection pgConnection = DataSourceUtils.getConnection(dataSource).unwrap(PgConnection.class);
        long oid = pgConnection.getLargeObjectAPI().createLO();
        return pgConnection.getLargeObjectAPI().open(oid, LargeObjectManager.READWRITE);
    }

    public LargeObject createLargeObject(InputStream inputStream) throws SQLException, IOException {
        PgConnection pgConnection = DataSourceUtils.getConnection(dataSource).unwrap(PgConnection.class);
        long oid = pgConnection.getLargeObjectAPI().createLO();
        LargeObject largeObject = pgConnection.getLargeObjectAPI().open(oid, LargeObjectManager.READWRITE);
        try (OutputStream dbOut = largeObject.getOutputStream()) {
            StreamUtils.copy(inputStream, dbOut, 4096);
        }

        return largeObject;
    }

    public LargeObject getLargeObjectForOID(long oid) throws SQLException {
        PgConnection pgConnection = DataSourceUtils.getConnection(dataSource).unwrap(PgConnection.class);
        return pgConnection.getLargeObjectAPI().open(oid, LargeObjectManager.READWRITE);
    }
}

To create a new LargeObject and write to it:

try (OutputStream out = largeObject.getOutputStream()) {
  //write something to OutputStream
}

Read from LargeObject:

lb.getInputStream();

Some additional useful utilities can be found in LargeObjectManager class provided by the driver.