Introduction

This article presents an overview of how to integrate Big Data Preparation Cloud Service (BDP) with Business Intelligence Cloud Service (BICS). BDP is a big data cloud service designed for customers interested on cleansing, enriching, and transforming their structured and unstructured business data. BICS is a business intelligence cloud service designed for customers interested on gaining insights into their data with interactive visualizations, data model designs, reports and dashboards. BDP and BICS are both cloud services under the Oracle Platform as a Service (PaaS).

Users can upload data into BICS using various tools and technologies such as Data Sync, Oracle Application Express, PL/SQL, BICS REST APIs, Oracle Data Integrator, among other tools. The BICS REST APIs allows users to programmatically load large volumes of data from both on-premise and cloud data sources into the cloud database service connected to BICS.

In BDP, users can define BICS as a target, and publish the results of a BDP transform script into the cloud database service connected to BICS. BDP uses the BICS REST APIs to accomplish the integration with BICS. BDP users do not need to write REST APIs programs or learn how to use the BICS REST APIs – the only requirement is to create a BDP connection to access the cloud database service of BICS. BDP transform scripts can be executed as BDP policies, and the results can be published directly into BICS.

The next sections of this article demonstrate how to create, execute, and publish the results of a BDP transform script into BICS.

Integrating Big Data Preparation (BDP) Cloud Service with Business Intelligence Cloud Service (BICS)

Figure 1, below, illustrates the BDP main dashboard, which includes a list of metrics such as the number of executed jobs, total configured sources, and the number of rows processed and transformed by the BDP cloud service. A Quick Start option is also available for easy-access when creating sources, transform scripts, and uploading data into BDP.

Figure 1 – BDP Dashboard & Overview Page

The next sections of this article discuss the following concepts:

How to create a connection in BDP to access BICS. In BDP, a connection is known as a BDP Source, and the BDP Source can be used as either a source or target connection. In this article, a BDP Source will be created and used as a target connection to publish into BICS the data results of running a BDP policy.
How to create a BDP Transform Script that uses two source datasets. In BDP, this is known as blending two source datasets or files.
How to create and execute a BDP Policy. BDP policies are used in BDP to configure the executions of BDP transform scripts. In this article, a BDP policy will be created to run a BDP transform script and publish the data results into BICS.
How to view and use the published BDP dataset in the BICS data model and the BICS Visual Analyzer.

The first step to integrate BDP with BICS is to create a BDP Source connection. Use the Quick Start menu to create a new BDP Source, and select CREATE SOURCE. Figure 2, below, illustrates how to create this BDP Source connection. Enter the name of the BDP Source, and select Oracle BICS as the connection type.

Figure 2 – Creating a BDP Source – Oracle BICS

Enter the Service URL, Username, Password, and Domain of the BICS cloud service as shown on Figure 3, below. The Service URL is the Service Instance URL of the BICS cloud service.

Figure 3 – Creating a BDP Source – Oracle BICS Definition

Test the new BDP Source using the Test option as shown on Figure 3, above. Save the new BDP Source once the test connection is successful.

The next step is to create a BDP transform script. Use the Quick Start menu to create a new BDP transform script, and select CREATE TRANSFORM. Figure 4, below, illustrates an example of a new transform script called A_TEAM_CUSTOMERS.

Figure 4 – Creating a BDP Transform – Customer Accounts

On Figure 4, above, the BDP Source for this transform script is called A_TEAM_STORAGE. This BDP Source has been previously defined by a BDP user. The source type for this BDP Source is Oracle Cloud Storage. A structured XSL file called ATeamCustomerAccounts.xls has been previously imported into this BDP Source. This XLS file is used as the source dataset for this new BDP transform script.

Once the new BDP transform script is defined, it is then submitted to the BDP engine for data ingestion, data profiling, data de-duplication, and detection of sensitive data. The BDP engine displays a notification on screen when the BDP transform script is ready for transformations.

Figure 5, below, shows the transform script called A_TEAM_CUSTOMERS after the BDP ingesting process. A series of transformations have been added by the BDP user. These transformations are illustrated on the Transform Script section as follow:

Columns Col_0001, Col_0005, and date_02 have been renamed – respectively – to cust_num, middle_initial, and exp_dt.
Column City has been enriched with a new data element (column) called Population.
The email domain has been extracted from column email.
Columns Col_0013, Col_0014, and Col_0019 have been removed from the transform script.
Columns us_phone and exp_dt have been reformatted to (999) 999-9999 and MM-dd-yyyy, respectively.
Columns credit_card and us_ssn (social security number) have been obfuscated to the first 12 and 5 digits, respectively. BDP has detected that these two columns contain sensitive data.

Figure 5 – Creating a BDP Transform – Transform Script

The BDP section called Recommendations, on Figure 5, above, can be used to enrich the transform script with additional data elements – a feature that is part of the BDP cloud service. For instance, the transformation script can be enriched with new columns such as country_name, capital, continent, and square_km, among others.

The Column Profile section, on Figure 5, above, provides a set of metrics for each of the columns found on the source file. In this example, the column called first_name has been profiled as follow:

A total of 15,000 rows where found on the source file.
A total of 2,101 distinct values, 14.01% of total rows, were found on this column.
A total of 12,899 duplicate names, 85.99% of total rows, were found on this column.
A total of 10 distinct patterns were found on this column.
The type for this column is TEXT.
The bubble graph illustrates the most common data values for this column: Mich, Robert, Mary, James, and John are the most common first names.

In BDP, users can join an existing dataset of a transform script with one additional file – this feature is known as Blending. To blend the existing dataset of a transform script with another file, select the Blend option located on the Transform Script page. Figure 6, below, shows how to add a file.

Figure 6 – Creating a BDP Transform – Adding File to Blend

In this example, on Figure 6, above, a json file called ATeamCustomerTransactionsLog.json will be blended with an existing dataset of a transform script. This json file contains customer transaction logs, which will be used to enrich and add additional data columns to the BDP transform script.

Once the additional file is added to the BDP transform script, the BDP engine analyzes the new file, and recommends a set of blending conditions. Figure 7, below, shows the recommended blending conditions or blending keys for these two datasets.

Figure 7 – Creating a BDP Transform – Blending Configuration

In this example, on Figure 7, above, the BDP engine has recommended to use the cust_num column, which has been found on both datasets, as the blending condition. BDP has an underlying discovery engine that provides blending key recommendations based on data profiling. When blending two datasets, BDP users can choose one of three types of output options:

Rows matching both datasets – All rows on both datasets must match the blending condition. Those rows that do not match the blending condition – on either dataset – will be removed from the blended dataset.
Left Join – All rows from the first dataset will be included on the blended dataset even if the rows from the first dataset do not meet the blending condition.
Right Join – All rows from the second dataset will be included in the blended dataset even if the rows from the second dataset do not meet the blending condition.

Once the blending configuration is ready for submission, the BDP engine performs the blending operation. A message will be displayed on screen when the blending operation is complete and the transform script is ready for additional modifications. The BDP transform script will show a combined set of columns from both the first dataset and the second dataset. BDP users will be able to perform additional transformations or accept additional recommendations on this blending BDP transform script.

In order to execute a BDP transform script, BDP users must create a BDP Policy. Figure 8, below, shows the configuration of a BDP Policy.

Figure 8 – Creating a BDP Policy – Policy Details

When configuring a BDP Policy, as shown on Figure 8, above, BPD users must specify the following parameters:

Name of the BDP Policy.
Name of the BDP transform script.
If the transform script is a blending script, two source datasets are required: Source 1 and Source 2.
Name of the Target output. In this example, the target output is BICS. This is the BDP BICS Source created on a previous section of this article.
The scheduling information such as Time, Start Date, and End Date are required parameters as well.

The BDP Policy can be executed by selecting the Run option from the Policies screen, as shown on Figure 9, below:

Figure 9 – Creating a BDP Policy – Running Policy

Once the BDP Policy is submitted for execution, BDP users can monitor the progress of its execution using the BDP Job Details screen. Figure 10, below, shows an example.

Figure 10 – Running a BDP Policy – Job Details

The Job Details screen, on Figure 10, above, shows the job Id and policy name: 4798122, and A_TEAM_CUSTOMERS, respectively. The status of the execution of this policy is succeeded – the policy has been executed successfully. The Metrics section shows a total of 15K rows – this is the number of rows that met the blending condition. A total of 15K rows were transformed, and no errors were found during the execution of the policy. Additional execution metrics can be found under sections: Ingest Metrics, Prepare Metrics, and Publish Status.

When BDP executes a policy that uses BICS as a target, the BICS RESTful APIs are invoked, and the result-set of the BDP policy gets published into BICS. BDP uses the name of the BDP policy to create a table in the database that is connected to BICS.

Figure 11, below, illustrates the name of the table, A_TEAM_CUSTOMERS, created by BDP during the execution of the BDP policy.

Figure 11 – Integrating BICS with BDP – Inspecting the BDP Data

The newly published table can be seen on the BICS data model module, as shown on Figure 11, above. In this example, on Figure 11, above, some of the dataset columns are illustrated:

The credit card number (CREDIT_CARD) and the social security (US_SSN) have been obfuscated.
The US phone (US_PHONE) and the credit card expiration date (EXP_DT) has been reformatted.

In BICS, this new dataset can be used to create warehouse facts and dimensions. Furthermore, BICS users can expand the use of this dataset to other BICS features such as BICS Visual Analyzer. Figure 12, below, shows an example of how this dataset, A_TEAM_CUSTOMERS, is used in Visual Analyzer:

Figure 12 – Integrating BICS with BDP – Creating a Project in BICS Visual Analyzer

In this example, on Figure 12, above, a set of metrics, the A-Team Metrics, have been created on BICS Visual Analyzer. A new tile chart, Revenue Amount By State, has been created as well. This tile chart uses an aggregated value, REVENUE_AMT, to sum revenue by state. The source of the REVENUE_AMT column comes from the blended dataset – a source column from the blend file, ATeamCustomerTransactionsLog.json, the customer transaction log file. The source of the state column comes from the blended dataset as well – a source column from the first dataset, ATeamCustomerAccounts.xls – the customer accounts file.

Conclusion

BDP users can publish the data produced by BDP policies directly into BICS without writing additional programs or having to learn BICS RESTful APIs. In BICS, the data results of an executed BDP policy can be modeled as facts and dimensions. BICS users can then create dashboards and reports with data that has been transformed by BDP.

For more ODI best practices, tips, tricks, and guidance that the A-Team members gain from real-world experiences working with customers and partners, visit “Oracle A-Team Chronicles for ODI”.