Informatica Quick Tips: Informatica Scenarios - Real Time Scenarios Interview Q&A

Question

How can we say that union transformation is an active transformation? Can u plz give me an example?

A) The only condition for a transformation to become active is row number changes.

Now the thing is how a row number can change. Then there are 2 conditions:

1. either the no of rows coming in and going out is diff.

eg: in case of filter we have the data like

id name dept row_num

1 aa 4 1

2 bb 3 2

3 cc 4 3

and we have a filter condition like dept=4 then the o/p wld b like

id name dept row_num

1 aa 4 1

3 cc 4 2

So row num changed and it is an active transformation

Question:

Can we add different work flows in to one batch and run sequentially? If pos how do we do that?

For simulating a batch we can create a unix script and write pmcmd commands for running different workflows one after other into that unix script . like

workflowrun.sh(Giving example for informatica 8 )

pmcmd startworkflow -sv IS_INTEGRATION_SERVICE -d DOMAIN1 -u myuser -p mypass -f folder1 workflow1

pmcmd startworkflow -sv IS_INTEGRATION_SERVICE -d DOMAIN1 -u myuser -p mypass -f folder1 workflow2

Save this file and run the same .

This will trigger workflow1 and then workflow2.

Question:

How can you assign a index for a particular column? how do you choose which column is the best to create index?

In Db2 it will be :-

Create index idx_1 on table

( abc integer);

The column which is used in the where clause and the query is referenced many times must be used in the index.

Question:

I have flatfile it contains 'n' number of records.

i need to load half of the records to one target table another half to another target table. Explain me the procedure.

There will be 2 pipelines

In first pipeline , read from source file , put an expression,in expression take a variable and increment it by 1 ( v=v+1),then put a target T0,generates sequence in T0 column.

Now after first pipeline gets executed we have

a) count of all the rows from the file

b) rank of all the records in T0 table and c1 column.

In second pipeline,take T0 as source and T1,T2 as target and router R1 as transformation in between.

In R1 , put 2 groups ->

1st group c1<=v/2 - direct to T1

2nd group c2>v/2 - direct to T2

b) use variable/sequence generator to get sequence of record.

use 2 filter/1 router for two pipeline and give the following condition:

Conditions:

Pipeline1:

mod(sequence_number,2)=1

Pipeline1:

2. mod (sequence_number,2)=0

Question:

If I get a flat file as source, I need to get distinct data in the target. How do I accomplish this? Please give a detailed answer of what all transformations we have to use and how the data flow will be, and what SQL statement we have to use.

when we have a flat file as a source

we need to use the sorter transformation to get the distinct rows in the target

we have an option distinct in sorter transformation which is eliminated the duplicates for flat files

Question:

What is Dimension table Exactly?

Dimension tables gives description about something.

for eg.

If we take Student as a dimention table, we have various attributes like college name, age, gender,etc which gives some description about a student.

Question:

What are load types in Informatica and what is delta load?

There are two types of load i) Normal Load ii) Bulk Load

Normal Load-

The integration service writes to the database log then it enters into target.

a)performance of loading to target decreases but session recovery occurs.

b)Rollback and commit possible

Bulk Load:

The integration service bypasses the database log without writing into it and directly loaded into target.

a)Performance increases but session recovery doesnot occur.

b) Rollback and commit also not possible.

In bulk loading we need to consider the following:

1)Without creating any primary and foreign key at database level but however in target definition.

2)drop index before loading into target and create index after loading.

3)disable enable parallel mode option

Question:

Where to store Informatica rejected data? How to extract the Informatica rejected data?

The reject rows say for example due to unique key constrain, all pushed by session into the $PMBadFileDir (default relative path is <INFA_HOME/PowerCenter/server/infa_shared/BadFiles) which is configured with path at Integration Service level. Every Target will have property saying Reject filename which gives the file in which rejects rows are stored.

Question:

This is a scenario in which the source has 2 cols

10 A

20 C

30 D

40 E

20 C

And there should be 2 targets one to show the duplicate values and another target for distinct rows.

T1 T2

10 A 10 A

20 C 20 C

30 D

40 E

Which transformation can be used to load data into target?

Ans:

Source -Source qualifier -Expression -Target

-Source qualifier: check sotred by empno

Expression:- port expression

in_empno

var-empno=var_duplicate

var_duplicate=in_emp

flag= IIF(var_duplicate=var_emp,'Y','N')

sort the source data based on the unique key.

Expression TR:

Flag= iif(col1 =prev_col1,'Y','N')

prev_col1 = col1

Router:

1.for duplicate record: condition: falg = 'Y'

2. for distinct Records condition

flag = 'N'

1. Target 2

From Source Qualifier transfer the link to sorter transformation there you say distinct

2. Target 1 ( duplicate values only)

Aggregator Transformation -> Filter -> target1

in aggregator group all and add one more column called count

in filter say only count = 2

Question:

What is the need and when do we use a dynamic lookup cache?

When you want to insert new record or update the existing data into the target i.e slowly changing dimension (scd type2)

Implementing Dynamic cache in the lookup transformation comes under performance tuning side. If your mapping has lookup on target u can implement dynamic cache.

Question:

How i can do incremental aggregation in real time?

For incremental Aggregation.. We need to use Aggregations + Look up on Target + Expression to SUM up Count obtained from New Aggregations and Lookup on target.

For one record already present in Aggregations table.. count is also there..

It will be available in lookup.. new count will be available through AGG.. Sum then up and update that record in target..

Question:

In a scenario I have col1, col2, col3, under that 1,x,y, and 2,a,b and I want in this form col1, col2 and 1,x and 1,y and 2,a and 2,b, what is the procedure?

Answers:

Use Normalizer :

create two ports -

first port occurs = 1

second make occurs = 2

two output ports are created and

connect to target

Question:

What is the method of loading 5 flat files of having same structure to a single target and which transformations I can use?

Two Methods.

1. Write all files in one directory then use file repository concept(don’t forget to type source file type as indirect in the session).

2. Use union t/r to combine multiple input files into a single target.

Question:

Why is meant by direct and indirect loading options in sessions?

Answers:

Direct loading can be used to Single transformation whereas indirect transformation can be used to multiple transformations or files

In the direct we can perform recovery process

but in Indirect we can’t do it.

Question:

How many types of dimensions are available in Informatica?

The types of dimensions available are:

1. Junk dimension

2. Degenerative Dimension

3. Conformed Dimension

4. SCD

Question:

When we create a target as flat file and source as oracle.. how can i specify first rows as column names in flat files...

In Version 8.2 of Informatica, we have in session properties, an option Called “Header options". Use that to get the field names as the first row in the Target Flat file

Question:

1. Can u explain one critical mapping?

2. Performance issue which one is better? whether connected lookup transformation or unconnected one?

Answers:

It depends on your data and the type of operation you are doing.

If you need to calculate a value for all the rows or for the maximum rows coming out of the source then go for a connected lookup.

Or, if it is not so then go for unconnected lookup.

Especially in conditional cases like,

We have to get value for a field 'customer' from order tabel or from customer_data table,on the basis of following rule:

If customer_name is null then ,customer=customer_data.ustomer_Id

otherwise

customer=order.customer_name.

so in this case we will go for unconnected lookup

Question:

How can you improve the performance of Aggregate transformation?

1. by using sorted input

2. by using incremental aggregation

Note: if u r using incremental aggregation u can't use sorted input

3. by giving group by on numerical ports

Question:

Why did you use stored procedure in your ETL Application?

Usage of stored procedure has the following advantages

1. Checks the status of the target database

2. Drops and recreates indexes

3. Determines if enough space exists in the database

4. Performs a specialized calculation

Question:

Why did u use update strategy in your application?

I have used in the case where i wanted to insert and update the records in the same mapping.

Question:

How do you create single lookup transformation using multiple tables?

Write an override sql query. Adjust the ports as per the sql query.

Question:

In update strategy target table or flat file

Which gives more performance? Why?

Pros: Loading, Sorting, Merging operations will be faster as there is no index concept and Data will be in ASCII mode.

Cons: There is no concept of updating existing records in flat file.

As there is no indexes, while lookups speed will be lesser.

Question:

How to load time dimension?

We can use SCD Type 1/2/3 to load any Dimensions based on the requirement.

We can also use procedure to populate Time Dimension

Question:

After dragging the ports of three sources (sql server,oracle,informix) to a single source qualifier, can u map these three ports directly to target?

You cannot pull heterogeneous transformations into a single Source Qualifier. You have to use separate SQs and then use Joiner to join the sources.

Question:

How can u work with remote database in Informatica? Did u work directly by using remote Connections?

You can work with remote, but you have to Configure FTP Connection details

IP address

User authentication

Question:

What is difference between partioning of relational target and partitioning of file targets?

Partitions can be done on both relational and flat files.

Informatica supports following partitions

1. Database partitioning

2. Round Robin

3. Pass-through

4. Hash-Key partitioning

5. Key Range partitioning

All these are applicable for relational targets. For flat file only database partitioning is not applicable.

Informatica supports partitioning you can just specify the name of the target file and create the partitions; rest will be taken care by Informatica session.

Question:

What is parameter file?

When you start a workflow, you can optionally enter the directory and name of a parameter file. The Informatica Server runs the workflow using the parameters in the file you specify.

For UNIX shell users, enclose the parameter file name in single quotes:

-paramfile '$PMRootDir/myfile.txt'

For Windows command prompt users, the parameter file name cannot have beginning or trailing spaces. If the name includes spaces, enclose the file name in double quotes:

-paramfile ?$PMRootDirmy file.txt?

Note: When you write a pmcmd command that includes a parameter file located on another machine, use the backslash () with the dollar sign ($). This ensures that the machine where the variable is defined expands the server variable.

pmcmd startworkflow -uv USERNAME -pv PASSWORD -s SALES:6258 -f east -w wSalesAvg -paramfile '$PMRootDir/myfile.txt'

Question:

Can u copy the session to a different folder or repository?

Yes it is possible. For copying a session to a folder in the same repository or to another in a different repository, we can use the repository manager ( which is client sid etool).Simply by just dragging the session to the target destination, the session will be copied.

Question:

How the Informatica server increases the session performance through partitioning the source?

For relational sources Informatica server creates multiple connections for each partition of a single source and extracts separate range of data for each connection. Informatica server reads multiple partitions of a single source concurrently. Similarly for loading also informatica server creates multiple connections to the target and loads partitions of data concurrently.

For XML and file sources, informatica server reads multiple files concurrently. For loading the data informatica server creates a separate file for each partition(of a source file).U can choose to merge the targets.

Question:

Why we use partitioning the session in informatica?

Performance can be improved by processing data in parallel in a single session by creating multiple partitions of the pipeline.

Informatica server can achieve high performance by partitioning the pipeline and performing the extract, transformation, and load for each partition in parallel.

Question:

How can u recognize whether or not the newly added rows in the source r gets insert in the target?

If it is Type 2 Dimension the above answer is fine, but if u want to get the info of all the insert statements and Updates you need to use session log file where you configure it to verbose.

You will get complete set of data which record was inserted and which was not.

Question:

What are the transformations that we use for slowly changing dimension table?

We can use the following transformations for slowly Changing dimension table.

1. Expression

2. Lookup

3. Filter

4. Sequence Generator

5. Update Strategy

Question:

What r the options in the target session of update strategy transformation?

Update as Insert:

This option specified all the update records from source to be flagged as inserts in the target. In other words, instead of updating the records in the target they are inserted as new records.

Update else Insert:

This option enables informatica to flag the records either for update if they are old or insert, if they are new records from source.

Insert, update, delete, Insert as update, update else insert. Update as update.

Question:

What is the default source option for update strategy transformation?

Default option for update strategy t/r is dd_insert or we can put '0'.

in session level data driven

Question:

What r the basic needs to join two sources in a source qualifier?

The both the table should have a common field with same data type.

It’s not necessary both should follow primary and foreign relationship. If any relation ship exists that will help u in performance point of view.

Question:

What is the target load order?

You specify the target load order based on source qualifiers in a mapping. If u have the multiple source qualifiers connected to the multiple targets, you can designate the order in which Informatica server loads data into the targets.

Question:

What r the tasks that source qualifier perform?

Join data originating from the same source database. You can join two or more tables with primary-foreign key relationships by linking the sources to one Source Qualifier.

Filter records when the Informatica Server reads source data. If you include a filter condition, the Informatica Server adds a WHERE clause to the default query.

Specify an outer join rather than the default inner join. If you include a user-defined join, the Informatica Server replaces the join information specified by the metadata in the SQL query.

Specify sorted ports. If you specify a number for sorted ports, the Informatica Server adds an ORDER BY clause to the default SQL query.

Select only distinct values from the source. If you choose Select Distinct, the Informatica Server adds a SELECT DISTINCT statement to the default SQL query.

Create a custom query to issue a special SELECT statement for the Informatica Server to read source data. For example, you might use a custom query to perform aggregate calculations or execute a stored procedure.

Question:

Why we use stored procedure transformation?

A Stored Procedure transformation is an important tool for populating and maintaining databases. Database administrators create stored procedures to automate time-consuming tasks that are too complicated for standard SQL statements.

Question:

What is the Rank index in Rank transformation?

Based on which port you want generate Rank is known as rank port, the generated values are known as rank index.

Question:

How the informatica server sorts the string values in Rank transformation?

We can run informatica server either in UNICODE data moment mode or ASCII data moment mode.

Unicode mode: in this mode informatica server sorts the data as per the sorted order in session.

ASCII Mode:in this mode informatica server sorts the date as per the binary order

Question:

Difference between static cache and dynamic cache

Static cache – you cannot insert or update the cache

Dynamic cache - you can insert rows into the cache as u pass to the target

The informatica server returns a value from the lookup table or cache when the condition is true. When the condition is not true, informatica server returns the default value for connected transformations and null for unconnected transformations.

The informatica server inserts rows into cache when the condition is false. This indicates that the row is not in the cache or target table. U can pass these rows to the target table

Question:

Differences between connected and unconnected lookup?

Connected lookup - Receives input values directly from the pipe line.

You can use a dynamic or static cache

Cache includes all lookup columns used in the mapping

Support user defined default values

Unconnected lookup - Receives input values from the result of a lkp expression in an another transformation.

You can use a static cache.

Cache includes all lookup output ports in the lookup condition and the lookup/return port.

Does not support user defined default values

Question:

What are the settings that you use to configure the joiner transformation?

Master and detail source

Type of join

Condition of the join

The Joiner transformation supports the following join types, which you set in the Properties tab:

Normal (Default)

Master Outer

Detail Outer

Full Outer

Question:

In which conditions we cannot use joiner transformation (Limitations of joiner transformation)?

We cannot use Joiner after Router Transformation. This is one of the Limitations of Joiner Transformation

Question:

What r the difference between joiner transformation and source qualifier transformation?

Source qualifier - Homogeneous source

Joiner - Heterogeneous source

Joiner condition executed with in the informatica staging layer

Source qualifier condition executed with in the database level

Question:

What is aggregate cache in aggregator transformation?

When you run a workflow that uses an Aggregator transformation, the Informatica Server creates index and data caches in memory to process the transformation. If the Informatica Server requires more space, it stores overflow values in cache files.

Question:

Can you use the mapping parameters or variables created in one mapping into another mapping?

NO. You might want to use a workflow parameter/variable if you want it to be visible with other mappings/sessions

Question:

What r the mapping parameters and mapping variables?

Mapping variables have two identities:

Start value and Current value

Start value = Current value (when the session starts the execution of the underlying mapping)

Start value <> Current value ( while the session is in progress and the variable value changes in one or more occasions)

Current value - at the end of the session is nothing but the start value for the subsequent run of the same session.

Question:

What r the unsupported repository objects for a mapplet?

Source definitions. Definitions of database objects (tables, views, synonyms) or files that provide source data.

Target definitions. Definitions of database objects or files that contain the target data.

Multi-dimensional metadata. Target definitions that are configured as cubes and dimensions.

Mappings. A set of source and target definitions along with transformations containing business logic that you build into the transformation. These are the instructions that the Informatica Server uses to transform and move data.

Reusable transformations. Transformations that you can use in multiple mappings.

Mapplets. A set of transformations that you can use in multiple mappings.

Sessions and workflows. Sessions and workflows store information about how and when the Informatica Server moves data. A workflow is a set of instructions that describes how and when to run tasks related to extracting, transforming, and loading data. A session is a type of task that you can put in a workflow. Each session corresponds to a single mapping.

Question :

How to read rejected data or bad data from bad file and reload it to target?

Correction the rejected data and send to target relational tables using load order utility. Find out the rejected data by using column indicator and row indicator.

Question :

How a two dimension array is initializing with pointer?

#include<iostream.h>

void main()

{

int size;

cout<<"Size ";

cin>>size;

int **p=new int *[size];

for(int i=0;i<size;i++)

{

p[i]=new int [size];

}

for(i=0;i<size;i++)

{

for(int j=0;j<size;j++)

{

cin>>p[i][j];

}

for(i=0;i<size;i++)

{

for(int j=0;j<size;j++)

{

cout<<p[i][j]<<" ";

}

cout<<endl;

}

if the element in second row and first column represented by A[2][1].

then this represent by pointer is *(*(A+2)+1).

Question:

Difference between static and dynamic cache? And please explain with one example?

Static cache: It is read-only cache

Dynamic Cache: It is Read and Write

Static: Informatica returns value when condition is true and if it is false it will return default value in connected look up and Null value in unconnected look up

Dynamic: It will return only if condition is false

Question:

What is pushdown optimization in pc 8.x with example?

Use pushdown optimization to push transformation logic to the source or target database. The Integration Service analyzes the transformation logic, mapping, and session configuration to determine the transformation logic it can push to the database. At run time, the Integration Service executes any SQL statement generated against the source or target tables, and it processes any transformation logic that it cannot push to the database.

Select one of the following values:

- None. The Integration Service does not push any transformation logic to the database.

- To Source. The Integration Service pushes as much transformation logic as possible to the source database.

- To Target. The Integration Service pushes as much transformation logic as possible to the target database.

- Full. The Integration Service pushes as much transformation logic as possible to both the source database and target database.

- $$PushdownConfig. The $$PushdownConfig mapping parameter allows you to run the same session with different pushdown optimization configurations at different times. For more information about configuring the $$PushdownConfig mapping parameter and parameter file, see Using the $$PushdownConfig Mapping Parameter.

Question :

What are snapshots? What are materialized views & where do we use them?

A snapshot - is a copy of a set of files and directories as they were at a particular point in the past.It is implicitly created at the start of every transaction.

Materialized views - are schema objects that can be used to summarize, precompiled, replicate, and distribute data to construct a data warehouse.

Question:

Can we look-up a table from source qualifier transformation? ie. Unconnected lookup

You cannot look-up from a source qualifier directly. However, you can override the SQL in the source qualifier to join with the lookup table to perform the lookup.

Question:

Where do we use connected and un connected lookups

If return port only one then we can go for unconnected. More than one return port is not possible with Unconnected. If more than one return port then go for Connected.

Question:

What are the different Lookup methods used in Informatica?

Specify the Full path of the Shell script the "Post session properties

of session/workflow".

You can use Command task also for executing the shell scripts.

As well as you need to use the Keyword "call" before the full path of the shell script.

Using the same Call command you can also call the SQL Store procedures.

There is a task named command task using that you can write or call Shell script DOS commands or BAT files

Question:

What is Full load & Incremental or Refresh load?

Full Load: completely erasing the contents of one or more tables and reloading with fresh data.

Incremental Load: applying ongoing changes to one or more tables based on a predefined schedule.

First time what we are loading the data is called initial load or full load.

Second time or modified data what we are loading is called as incremental load or delta load.

Question:

Techniques of Error Handling - Ignore, Rejecting bad records to a flat file, loading the records and reviewing them (default values)

Rejection of records either at the database due to constraint key violation or the informatica server when writing data into target table. These rejected records we can find in the badfiles folder where a reject file will be created for a session. We can check why a record has been rejected. And this bad file contains first column a row indicator and second column a column indicator.

These row indicators or of four types

D-valid data,

O-overflowed data,

N-null data,

T- Truncated data,

And depending on these indicators we can changes to load data successfully to target.

Question:

What is partitioning? What are the types of partitioning?

If you use PowerCenter, you can increase the number of partitions in a pipeline to improve session performance. Increasing the number of partitions allows the Informatica Server to create multiple connections to sources and process partitions of source data concurrently.

When you create a session, the Workflow Manager validates each pipeline in the mapping for partitioning. You can specify multiple partitions in a pipeline if the Informatica Server can maintain data consistency when it processes the partitioned data.

When you configure the partitioning information for a pipeline, you must specify a partition type at each partition point in the pipeline.

The partition type determines how the Informatica Server redistributes data across partition points.

The Workflow Manager allows you to specify the following partition types:

Round-robin partitioning: The Informatica Server distributes data evenly among all partitions. Use round-robin partitioning where you want each partition to process approximately the same number of rows.

For more information, see Round-Robin Partitioning.

Hash partitioning: The Informatica Server applies a hash function to a partition key to group data among partitions. If you select hash auto-keys, the Informatica Server uses all grouped or sorted ports as the partition key. If you select hash user keys, you specify a number of ports to form the partition key. Use hash partitioning where you want to ensure that the Informatica Server processes groups of rows

With the same partition key in the same partition. For more information, see Hash Partitioning.

Key range partitioning: You specify one or more ports to form a compound partition key. The Informatica Server passes data to each partition depending on the ranges you specify for each port. Use key range partitioning where the sources or targets in the pipeline are partitioned by key range. For more information, see Key Range Partitioning.

Pass-through partitioning: The Informatica Server passes all rows at one partition point to the next partition point without redistributing them. Choose pass-through partitioning where you want to create an additional pipeline stage to improve performance, but do not want to change the distribution of data across partitions.

Question:

What is Dimensional Modeling?

Dimensional Modeling is a design concept used by many data warehouse designers to build their data warehouse. In this design model all the data is stored in two types of tables - Facts table and Dimension table. Fact table contains the facts/measurements of the business and the dimension table contains the context of measurements i.e., the dimensions on which the facts are calculated.

Question:

What is the main difference between schema in RDBMS and schemas in Data Warehouse?

RDBMS Schema

* Used for OLTP systems

* Traditional and old schema

* Normalized

* Difficult to understand and navigate

* Cannot solve extract and complex problems

* Poorly modeled

DWH Schema

* Used for OLAP systems

* New generation schema

* De Normalized

* Easy to understand and navigate

* Extract and complex problems can be easily solved

* Very good model

Question:

What is incremental loading? What is batch processing?

Incremental loading: means loading the ongoing changes in the OLTP. Aggregate table contains the [measure] values ,aggregated /grouped/summed up to some level of hierarchy.

Batch Processing: means executing more than one session in single run at the same time. We can execute these sessions in 2 ways : linear: executing one after another parallel: executing more than one session at time

Question:

Differences between star and snowflake schemas?

Star schema: is a logical structure that can be arranged with fact and dimension tables

in a star formation. It looks like a star with fact table at the core of the star and the dimension tables along the spikes of the star. The dimension model is therefore called a STAR SCHEMA.

Star Schema: can have less number of joins.

Star Schema: has data redundancy, so the query performance is good.

SNOWFLAKING is a method of normalizing the dimension tables in a star schema.

Snow flake: can have more number of joins.

Snow flake: is normalized, so does not have data redundancy can have performance issues.

Question:

What is a CUBE in data warehousing concept?

Cubes are logical representation of multidimensional data. The edge of the cube contains dimension members and the body of the cube contains data values.

Question:

What is degenerate dimension table?

Degenerate Dimensions: If a table contains the values, which r neither dimension nor measures is called degenerate dimensions .Ex : invoice id,empno

Degenerate dimensions are those without any attributes or measures and they are included inside the fact table

Eg orderno

Question:

What is Difference between E-R Modeling and Dimensional Modeling?

Basic diff is E-R modeling will have logical and physical model. Dimensional model will have only physical model.

E-R modeling is used for normalizing the OLTP database design.

Dimensional modeling is used for de-normalizing the ROLAP/MOLAP design.

Question:

Why fact table is in normal form?

Basically the fact table consists of the Index keys of the dimension/look up tables and the measures.

So whenever we have the keys in a table .that itself implies that the table is in the normal form.

Question:

What is the definition of normalized and de-normalized view and what are the differences between them?

Normalization is the process of removing redundancies.

De-normalization is the process of allowing redundancies.

Question:

What is junk dimension? What is the difference between junk dimension and degenerated dimension?

Junk dimension: Grouping of Random flags and text attributes in a dimension and moving them to a separate sub dimension.

Degenerate Dimension: Keeping the control information on Fact table ex: Consider a Dimension table with fields like order number and order line number and have 1:1 relationship with Fact table, In this case this dimension is removed and the order information will be directly stored in a Fact table in order eliminate unnecessary joins while retrieving order information..

Question:

What is the difference between view and materialized view?

Views: At run time, the query will be executed against the database.

Materialized views: The data for the materialized view query will be generated at compile time.

Materialized views can be created by the following ways:

1. Immediate - mview will be created along with data.

2. Deferred - Mview structure alone will be created. Data will be populated only when you refresh the mview.

We have the option of refreshing the mviews. It means when the data in the master table used in the mview query changes, the refreshing of mviews helps to get the updated (new) data for the mview.

Mview will behave very much like a table. At run time, data will be retrieved from the result set just as retrieved from a table. The retrieval time will be very fast unlike the views.

Question:

What is the data type of the surrogate key?

Data type of the surrogate key is either integer or numeric or number

Question:

What is aggregate table and aggregate fact table ... any examples of both

Aggregate table: Contains summarized data. The materialized views are aggregated tables.

For ex in sales we have only date transaction. If we want to create a report like sales by product per year. in such cases we aggregate the date Values into week_agg, month_agg, quarter_agg, year_agg to retrieve date from this tables we use aggregate function.

Question:

What is the difference between ODS and OLTP

ODS It is nothing but a collection of tables created in the Data warehouse that maintains only current data.

Whereas OLTP maintains the data only for transactions, these are designed for recording daily operations and transactions of a business.

Question:

What is the difference between data warehouse and BI?

BI is the capability of analyzing the data of a data warehouse in advantage of that business. A BI tool analyzes the data of a data warehouse and to come into some business decision depending on the result of the analysis.

Question:

What is the difference between OLAP and data warehouse?

Data warehouse is the place where the data is stored for analyzing

Whereas OLAP is the process of analyzing the data, managing aggregations, partitioning information into cubes for in-depth visualization.

ODS A collection of tables created in the Data warehouse that maintains only current data.

OLTP maintains the data only for transactions, these are designed for recording daily operations and transactions of a business.

Question:

What type of Indexing mechanism do we need to use for a typical datawarehosue?

On the fact table it is best to use bitmap indexes. Dimension tables can use bitmap and/or the other types of clustered/non-clustered, unique/non-unique indexes.

To my knowledge, SQLServer does not support bitmap indexes. Only Oracle supports bitmaps.

Question:

Which columns go to the fact table and which columns go the dimension table?

The Primary Key columns of the Tables (Entities) go to the Dimension Tables as Foreign Keys.

The Primary Key columns of the Dimension Tables go to the Fact Tables as Foreign Keys.

Question:

How do you load the time dimension?

Time dimension are used to represent the data’s or measures over a certain period of time. The server time dimension is the most widely used one by which we can represent the data in hierarchical manner such as quarter->year->months->week wise representations.

Question:

Why should you put your data warehouse on a different system than your OLTP system?

An OLTP system is basically “data oriented” (ER model) and not “Subject oriented "(Dimensional Model) .That is why we design a separate system that will have a subject oriented OLAP system...

Moreover if a complex query is fired on a OLTP system will cause a heavy overhead on the OLTP server that will affect the day to day business directly.

Question:

Define and expansion of ssl.

Secure Sockets Layer (SSL)

Question:

Performance tuning in Informatica?

The goal of performance tuning is optimize session performance so sessions run during the available load window for the Informatica Server. Increase the session performance by following.

The performance of the Informatica Server is related to network connections. Data generally moves across a network at less than 1 MB per second, whereas a local disk moves data five to twenty times faster. Thus network connections often effect on session performance. So avoid network connections.

· Flat files: If flat files stored on a machine other than the informatica server, move those files to the machine that consists of informatica server.

Relational data sources: Minimize the connections to sources, targets and informatica server to

· Improve session performance. Moving target database into server system may improve session Performance.

Staging areas: If you use staging areas u force informatica server to perform multiple data passes. Removing of staging areas may improve session performance.

· You can run the multiple informatica servers against the same repository. Distributing the session load to multiple informatica servers may improve session performance.

Run the informatica server in ASCII data movement mode improves the session performance. Because ASCII data movement mode stores a character value in one byte. Unicode mode takes 2 bytes to store a character.

If a session joins multiple source tables in one Source Qualifier, optimizing the query may improve performance. Also, single table select statements with an ORDER BY or GROUP BY clause may benefit from optimization such as adding indexes.

We can improve the session performance by configuring the network packet size, which allows Data to cross the network at one time. To do this go to server manager, choose server configure database connections.

If u r target consists key constraints and indexes you slow the loading of data.To improve the session performance in this case drop constraints and indexes before u run the session and rebuild them after completion of session.

Running parallel sessions by using concurrent batches will also reduce the time of loading the data. So concurrent batches may also increase the session performance.

Partitioning the session improves the session performance by creating multiple connections to sources and targets and loads data in parallel pipe lines.

In some cases if a session contains a aggregator transformation, u can use incremental aggregation to improve session performance.

avoid transformation errors to improve the session performance.

If the session contained look-up transformation u can improve the session performance by enabling the look up cache.

If your session contains filter transformation, create that filter transformation nearer to the sources or u can use filter condition in source qualifier.

Aggregator, Rank and joiner transformation may often decrease the session performance because they must group data before processing it. to improve session performance in this case use sorted ports option.

Informatica Quick Tips

Saturday, 16 May 2015

Informatica Scenarios - Real Time Scenarios Interview Q&A

1 comment: