Question
How
can we say that union transformation is an active transformation? Can u plz give
me an example?
A) The only condition for a transformation to become
active is row number changes.
Now the thing is how a row
number can change. Then there are 2 conditions:
1. either the no of rows
coming in and going out is diff.
eg: in case of filter we
have the data like
id name dept row_num
1 aa 4 1
2 bb 3 2
3 cc 4 3
and we have a filter condition
like dept=4 then the o/p wld b like
id name dept row_num
1 aa 4 1
3 cc 4 2
So row num changed and it
is an active transformation
Question:
Can
we add different work flows in to one batch and run sequentially? If pos how do
we do that?
For simulating a batch we can create a unix script and
write pmcmd commands for running different workflows one after other into that
unix script . like
workflowrun.sh(Giving example for informatica 8 )
pmcmd startworkflow -sv IS_INTEGRATION_SERVICE -d DOMAIN1
-u myuser -p mypass -f folder1 workflow1
pmcmd startworkflow -sv IS_INTEGRATION_SERVICE -d
DOMAIN1 -u myuser -p mypass -f folder1 workflow2
Save this file and run the same .
This will trigger workflow1 and then workflow2.
Question:
How
can you assign a index for a particular column? how do you choose which column
is the best to create index?
In Db2 it will be :-
Create index idx_1 on table
( abc integer);
The column which is used in the where clause and the
query is referenced many times must be used in the index.
Question:
I
have flatfile it contains 'n' number of records.
i
need to load half of the records to one target table another half to another
target table. Explain me the procedure.
There will be 2 pipelines
In first pipeline , read from source file , put an
expression,in expression take a variable and increment it by 1 ( v=v+1),then
put a target T0,generates sequence in T0 column.
Now after first pipeline gets executed we have
a) count of all the rows from the file
b) rank of all the records in T0 table and c1 column.
In second pipeline,take T0 as source and T1,T2 as
target and router R1 as transformation in between.
In R1 , put 2 groups ->
1st group c1<=v/2 - direct to T1
2nd group c2>v/2 - direct to T2
b) use variable/sequence generator to get sequence of
record.
use 2 filter/1 router for two pipeline and give the
following condition:
Conditions:
Pipeline1:
mod(sequence_number,2)=1
Pipeline1:
2. mod (sequence_number,2)=0
Question:
If
I get a flat file as source, I need to get distinct data in the target. How do
I accomplish this? Please give a detailed answer of what all transformations we
have to use and how the data flow will be, and what SQL statement we have to
use.
when we have a flat file as a source
we need to use the sorter transformation to get the
distinct rows in the target
we have an option distinct in sorter transformation
which is eliminated the duplicates for flat files
Question:
What
is Dimension table Exactly?
Dimension tables gives description about something.
for eg.
If we take Student as a dimention table, we have
various attributes like college name, age, gender,etc which gives some
description about a student.
Question:
What
are load types in Informatica and what is delta load?
There are two types of load i) Normal Load ii) Bulk
Load
Normal
Load-
The integration service writes to the database log then
it enters into target.
a)performance of loading to target decreases but
session recovery occurs.
b)Rollback and commit possible
Bulk
Load:
The integration service bypasses the database log
without writing into it and directly loaded into target.
a)Performance increases but session recovery doesnot
occur.
b) Rollback and commit also not possible.
In bulk loading we need to consider the following:
1)Without creating any primary and foreign key at
database level but however in target definition.
2)drop index before loading into target and create
index after loading.
3)disable enable parallel mode option
Question:
Where
to store Informatica rejected data? How to extract the Informatica rejected data?
The reject rows say for example due to unique key
constrain, all pushed by session into the $PMBadFileDir (default relative path
is <INFA_HOME/PowerCenter/server/infa_shared/BadFiles) which is configured
with path at Integration Service level. Every Target will have property saying
Reject filename which gives the file in which rejects rows are stored.
Question:
This
is a scenario in which the source has 2 cols
10 A
10 A
20 C
30 D
40 E
20 C
And there should be 2 targets one to show the duplicate
values and another target for distinct rows.
T1 T2
10 A 10 A
20 C 20 C
30 D
40 E
Which transformation can be used to load data into
target?
Ans:
Source -Source qualifier -Expression -Target
-Source qualifier: check sotred by empno
Expression:- port expression
in_empno
var-empno=var_duplicate
var_duplicate=in_emp
flag= IIF(var_duplicate=var_emp,'Y','N')
sort the source data based on the unique key.
Expression
TR:
Flag= iif(col1 =prev_col1,'Y','N')
prev_col1 = col1
Router:
1.for duplicate record: condition: falg = 'Y'
2. for distinct Records condition
flag = 'N'
1. Target 2
From Source Qualifier transfer the link to sorter
transformation there you say distinct
2. Target 1 ( duplicate values only)
Aggregator Transformation -> Filter -> target1
in aggregator group all and add one more column called
count
in filter say only count = 2
Question:
What
is the need and when do we use a dynamic lookup cache?
When you want to insert new record or update the
existing data into the target i.e slowly changing dimension (scd type2)
Implementing Dynamic cache in the lookup transformation
comes under performance tuning side. If your mapping has lookup on target u can
implement dynamic cache.
Question:
How
i can do incremental aggregation in real time?
For incremental Aggregation.. We need to use
Aggregations + Look up on Target + Expression to SUM up Count obtained from New
Aggregations and Lookup on target.
For one record already present in Aggregations table..
count is also there..
It will be available in lookup.. new count will be
available through AGG.. Sum then up and update that record in target..
Question:
In
a scenario I have col1, col2, col3, under that 1,x,y, and 2,a,b and I want in
this form col1, col2 and 1,x and 1,y and 2,a and 2,b, what is the procedure?
Answers:
Use Normalizer :
create two ports -
first port occurs = 1
second make occurs = 2
two output ports are created and
connect to target
Question:
What
is the method of loading 5 flat files of having same structure to a single target
and which transformations I can use?
Two Methods.
1. Write all files in one directory then use file
repository concept(don’t forget to type source file type as indirect in the
session).
2. Use union t/r to combine multiple input files into a
single target.
Question:
Why
is meant by direct and indirect loading options in sessions?
Answers:
Direct loading can be used to Single transformation whereas
indirect transformation can be used to multiple transformations or files
In the direct we can perform recovery process
but in Indirect we can’t do it.
Question:
How
many types of dimensions are available in Informatica?
The types of dimensions available are:
1. Junk dimension
2. Degenerative Dimension
3. Conformed Dimension
4. SCD
Question:
When
we create a target as flat file and source as oracle.. how can i specify first
rows as column names in flat files...
In Version 8.2 of Informatica, we have in session properties,
an option Called “Header options". Use that to get the field names as the
first row in the Target Flat file
Question:
1.
Can u explain one critical mapping?
2.
Performance issue which one is better? whether connected lookup transformation or unconnected one?
Answers:
It depends on your data and the type of operation you
are doing.
If you need to calculate a value for all the rows or
for the maximum rows coming out of the source then go for a connected lookup.
Or, if it is not so then go for unconnected lookup.
Especially in conditional cases like,
We have to get value for a field 'customer' from order
tabel or from customer_data table,on the basis of following rule:
If customer_name is null then
,customer=customer_data.ustomer_Id
otherwise
customer=order.customer_name.
so in this case we will go for unconnected lookup
Question:
How
can you improve the performance of Aggregate transformation?
1. by using sorted input
2. by using incremental aggregation
Note: if u r
using incremental aggregation u can't use sorted input
3. by giving group by on numerical ports
Question:
Why
did you use stored procedure in your ETL Application?
Usage of stored procedure has the following advantages
1. Checks the status of the target database
2. Drops and recreates indexes
3. Determines if enough space exists in the database
4. Performs a specialized calculation
Question:
Why
did u use update strategy in your application?
I have used in the case where i wanted to insert and
update the records in the same mapping.
Question:
How
do you create single lookup transformation using multiple tables?
Write an override sql query. Adjust the ports as per
the sql query.
Question:
In
update strategy target table or flat file
Which
gives more performance? Why?
Pros: Loading, Sorting, Merging operations will be
faster as there is no index concept and Data will be in ASCII mode.
Cons: There is no concept of updating existing records
in flat file.
As there is no indexes, while lookups speed will be
lesser.
Question:
How
to load time dimension?
We can use SCD Type 1/2/3 to load any Dimensions based
on the requirement.
We can also use procedure to populate Time Dimension
Question:
After
dragging the ports of three sources (sql server,oracle,informix) to a single
source qualifier, can u map these three ports directly to target?
You cannot pull heterogeneous transformations into a
single Source Qualifier. You have to use separate SQs and then use Joiner to
join the sources.
Question:
How
can u work with remote database in Informatica? Did u work directly by using remote
Connections?
You can work with remote, but you have to Configure FTP
Connection details
IP address
User authentication
Question:
What
is difference between partioning of relational target and partitioning of file
targets?
Partitions can be done on both relational and flat
files.
Informatica supports following partitions
1. Database partitioning
2. Round Robin
3. Pass-through
4. Hash-Key partitioning
5. Key Range partitioning
All these are applicable for relational targets. For
flat file only database partitioning is not applicable.
Informatica supports partitioning you can just specify
the name of the target file and create the partitions; rest will be taken care
by Informatica session.
Question:
What
is parameter file?
When you start a workflow, you can optionally enter the
directory and name of a parameter file. The Informatica Server runs the
workflow using the parameters in the file you specify.
For UNIX shell users, enclose the parameter file name
in single quotes:
-paramfile '$PMRootDir/myfile.txt'
For Windows command prompt users, the parameter file
name cannot have beginning or trailing spaces. If the name includes spaces,
enclose the file name in double quotes:
-paramfile ?$PMRootDirmy file.txt?
Note: When you write a pmcmd command that includes a
parameter file located on another machine, use the backslash () with the dollar
sign ($). This ensures that the machine where the variable is defined expands
the server variable.
pmcmd startworkflow -uv USERNAME -pv PASSWORD -s
SALES:6258 -f east -w wSalesAvg -paramfile '$PMRootDir/myfile.txt'
Question:
Can
u copy the session to a different folder or repository?
Yes it is possible. For copying a session to a folder
in the same repository or to another in a different repository, we can use the
repository manager ( which is client sid etool).Simply by just dragging the
session to the target destination, the session will be copied.
Question:
How
the Informatica server increases the session performance through partitioning
the source?
For relational sources Informatica server creates
multiple connections for each partition of a single source and extracts separate
range of data for each connection. Informatica server reads multiple partitions
of a single source concurrently. Similarly for loading also informatica server
creates multiple connections to the target and loads partitions of data concurrently.
For XML and file sources, informatica server reads
multiple files concurrently. For loading the data informatica server creates a separate
file for each partition(of a source file).U can choose to merge the targets.
Question:
Why
we use partitioning the session in informatica?
Performance can be improved by processing data in
parallel in a single session by creating multiple partitions of the pipeline.
Informatica server can achieve high performance by
partitioning the pipeline and performing the extract, transformation, and load
for each partition in parallel.
Question:
How
can u recognize whether or not the newly added rows in the source r gets insert
in the target?
If it is Type 2 Dimension the above answer is fine, but
if u want to get the info of all the insert statements and Updates you need to
use session log file where you configure it to verbose.
You will get complete set of data which record was
inserted and which was not.
Question:
What
are the transformations that we use for slowly changing dimension table?
We can use the following transformations for slowly Changing dimension table.
1. Expression
2. Lookup
3. Filter
4. Sequence Generator
5. Update Strategy
Question:
What
r the options in the target session of update strategy transformation?
Update
as Insert:
This option specified all the update records from
source to be flagged as inserts in the target. In other words, instead of
updating the records in the target they are inserted as new records.
Update
else Insert:
This option enables informatica to flag the records
either for update if they are old or insert, if they are new records from
source.
Insert, update, delete, Insert as update, update else
insert. Update as update.
Question:
What
is the default source option for update strategy transformation?
Default option for update strategy t/r is dd_insert or
we can put '0'.
in session level data driven
Question:
What
r the basic needs to join two sources in a source qualifier?
The both the table should have a common field with same
data type.
It’s not necessary both should follow primary and
foreign relationship. If any relation ship exists that will help u in
performance point of view.
Question:
What
is the target load order?
You specify the target load order based on source
qualifiers in a mapping. If u have the multiple source qualifiers connected to
the multiple targets, you can designate the order in which Informatica server
loads data into the targets.
Question:
What
r the tasks that source qualifier perform?
Join data originating from the same source database.
You can join two or more tables with primary-foreign key relationships by
linking the sources to one Source Qualifier.
Filter records when the Informatica Server reads source
data. If you include a filter condition, the Informatica Server adds a WHERE
clause to the default query.
Specify an outer join rather than the default inner
join. If you include a user-defined join, the Informatica Server replaces the
join information specified by the metadata in the SQL query.
Specify sorted ports. If you specify a number for
sorted ports, the Informatica Server adds an ORDER BY clause to the default SQL
query.
Select only distinct values from the source. If you
choose Select Distinct, the Informatica Server adds a SELECT DISTINCT statement
to the default SQL query.
Create a custom query to issue a special SELECT
statement for the Informatica Server to read source data. For example, you
might use a custom query to perform aggregate calculations or execute a stored
procedure.
Question:
Why
we use stored procedure transformation?
A Stored Procedure transformation is an important tool
for populating and maintaining databases. Database administrators create stored
procedures to automate time-consuming tasks that are too complicated for
standard SQL statements.
Question:
What
is the Rank index in Rank transformation?
Based on which port you want generate Rank is known as
rank port, the generated values are known as rank index.
Question:
How
the informatica server sorts the string values in Rank transformation?
We can run informatica server either in UNICODE data
moment mode or ASCII data moment mode.
Unicode mode: in this mode informatica server sorts the
data as per the sorted order in session.
ASCII Mode:in this mode informatica server sorts the
date as per the binary order
Question:
Difference
between static cache and dynamic cache
Static cache – you cannot insert or update the cache
Dynamic cache - you can insert rows into the cache as u
pass to the target
The informatica server returns a value from the lookup
table or cache when the condition is true. When the condition is not true,
informatica server returns the default value for connected transformations and
null for unconnected transformations.
The informatica server inserts rows into cache when the
condition is false. This indicates that the row is not in the cache or target
table. U can pass these rows to the target table
Question:
Differences
between connected and unconnected lookup?
Connected
lookup - Receives input values directly from the pipe line.
You can use a
dynamic or static cache
Cache includes
all lookup columns used in the mapping
Support user
defined default values
Unconnected
lookup - Receives input values from the result of a lkp
expression in an another
transformation.
You can use a
static cache.
Cache
includes all lookup output ports in the lookup condition and the lookup/return port.
Does not support user defined default values
Question:
What
are the settings that you use to configure the joiner transformation?
Master and detail source
Type of join
Condition of the join
The Joiner transformation supports the following join
types, which you set in the Properties tab:
Normal (Default)
Master Outer
Detail Outer
Full Outer
Question:
In
which conditions we cannot use joiner transformation (Limitations of joiner
transformation)?
We cannot use Joiner after Router Transformation. This
is one of the Limitations of Joiner Transformation
Question:
What
r the difference between joiner transformation and source qualifier
transformation?
Source qualifier - Homogeneous source
Joiner - Heterogeneous source
Joiner condition executed with in the informatica
staging layer
Source qualifier condition executed with in the
database level
Question:
What
is aggregate cache in aggregator transformation?
When you run a workflow that uses an Aggregator
transformation, the Informatica Server creates index and data caches in memory
to process the transformation. If the Informatica Server requires more space,
it stores overflow values in cache files.
Question:
Can
you use the mapping parameters or variables created in one mapping into another
mapping?
NO. You might want to use a workflow parameter/variable
if you want it to be visible with other mappings/sessions
Question:
What
r the mapping parameters and mapping variables?
Mapping variables have two identities:
Start
value and Current
value
Start
value = Current value (when the session starts the execution
of the underlying mapping)
Start value <> Current value ( while the session
is in progress and the variable value changes in one or more occasions)
Current
value - at the end of the session is nothing but the start
value for the subsequent run of the same session.
Question:
What
r the unsupported repository objects for a mapplet?
Source definitions. Definitions of database objects
(tables, views, synonyms) or files that provide source data.
Target definitions. Definitions of database objects or
files that contain the target data.
Multi-dimensional metadata. Target definitions that are
configured as cubes and dimensions.
Mappings. A set of source and target definitions along
with transformations containing business logic that you build into the
transformation. These are the instructions that the Informatica Server uses to
transform and move data.
Reusable transformations. Transformations that you can
use in multiple mappings.
Mapplets. A set of transformations that you can use in
multiple mappings.
Sessions and workflows. Sessions and workflows store
information about how and when the Informatica Server moves data. A workflow is
a set of instructions that describes how and when to run tasks related to
extracting, transforming, and loading data. A session is a type of task that
you can put in a workflow. Each session corresponds to a single mapping.
Question
:
How
to read rejected data or bad data from bad file and reload it to target?
Correction the rejected data and send to target
relational tables using load order utility. Find out the rejected data by using
column indicator and row indicator.
Question
:
How
a two dimension array is initializing with pointer?
#include<iostream.h>
void main()
{
int size;
cout<<"Size ";
cin>>size;
int **p=new int *[size];
for(int i=0;i<size;i++)
{
p[i]=new int [size];
}
for(i=0;i<size;i++)
{
for(int j=0;j<size;j++)
{
cin>>p[i][j];
}
}
for(i=0;i<size;i++)
{
for(int j=0;j<size;j++)
{
cout<<p[i][j]<<" ";
}
cout<<endl;
}
}
if the element in second row and first column represented
by A[2][1].
then this represent by pointer is *(*(A+2)+1).
Question:
Difference
between static and dynamic cache? And please explain with one example?
Static cache: It is read-only cache
Dynamic Cache: It is Read and Write
Static: Informatica returns value when condition is
true and if it is false it will return default value in connected look up and
Null value in unconnected look up
Dynamic: It will return only if condition is false
Question:
What
is pushdown optimization in pc 8.x with example?
Use pushdown optimization to push transformation logic
to the source or target database. The Integration Service analyzes the
transformation logic, mapping, and session configuration to determine the
transformation logic it can push to the database. At run time, the Integration
Service executes any SQL statement generated against the source or target
tables, and it processes any transformation logic that it cannot push to the
database.
Select one of the following values:
- None. The Integration Service does not push any
transformation logic to the database.
- To Source. The Integration Service pushes as much
transformation logic as possible to the source database.
- To Target. The Integration Service pushes as much
transformation logic as possible to the target database.
- Full. The Integration Service pushes as much
transformation logic as possible to both the source database and target
database.
- $$PushdownConfig. The $$PushdownConfig mapping
parameter allows you to run the same session with different pushdown
optimization configurations at different times. For more information about
configuring the $$PushdownConfig mapping parameter and parameter file, see
Using the $$PushdownConfig Mapping Parameter.
Question :
What are snapshots? What are materialized views &
where do we use them?
A
snapshot - is a copy of a set of files and directories as they
were at a particular point in the past.It is implicitly created at the start of
every transaction.
Materialized
views - are schema objects that can be used to summarize, precompiled,
replicate, and distribute data to construct a data warehouse.
Question:
Can
we look-up a table from source qualifier transformation? ie. Unconnected lookup
You cannot look-up from a source qualifier directly.
However, you can override the SQL in the source qualifier to join with the
lookup table to perform the lookup.
Question:
Where
do we use connected and un connected lookups
If return port only one then we can go for unconnected.
More than one return port is not possible with Unconnected. If more than one
return port then go for Connected.
Question:
What
are the different Lookup methods used in Informatica?
Specify the Full path of the Shell script the
"Post session properties
of session/workflow".
You can use Command task also for executing the shell
scripts.
As well as you need to use the Keyword "call"
before the full path of the shell script.
Using the same Call command you can also call the SQL
Store procedures.
There is a task named command task using that you can
write or call Shell script DOS commands or BAT files
Question:
What
is Full load & Incremental or Refresh load?
Full
Load: completely erasing the contents of one or more tables
and reloading with fresh data.
Incremental
Load: applying ongoing changes to one or more tables based
on a predefined schedule.
First time what we are loading the data is called initial load or full load.
Second time or modified data what we are loading is
called as incremental load or delta
load.
Question:
Techniques
of Error Handling - Ignore, Rejecting bad records to a flat file, loading the
records and reviewing them (default values)
Rejection of records either at the database due to
constraint key violation or the informatica server when writing data into
target table. These rejected records we can find in the badfiles folder where a
reject file will be created for a session. We can check why a record has been rejected.
And this bad file contains first column a row indicator and second column a
column indicator.
These row indicators or of four types
D-valid data,
O-overflowed data,
N-null data,
T- Truncated data,
And depending on these indicators we can changes to
load data successfully to target.
Question:
What
is partitioning? What are the types of partitioning?
If you use PowerCenter, you can increase the number of
partitions in a pipeline to improve session performance. Increasing the number
of partitions allows the Informatica Server to create multiple connections to
sources and process partitions of source data concurrently.
When you create a session, the Workflow Manager
validates each pipeline in the mapping for partitioning. You can specify
multiple partitions in a pipeline if the Informatica Server can maintain data
consistency when it processes the partitioned data.
When you configure the partitioning information for a
pipeline, you must specify a partition type at each partition point in the
pipeline.
The partition type determines how the Informatica
Server redistributes data across partition points.
The Workflow Manager allows you to specify the
following partition types:
Round-robin
partitioning: The Informatica Server distributes data
evenly among all partitions. Use round-robin partitioning where you want each
partition to process approximately the same number of rows.
For more information, see Round-Robin Partitioning.
Hash
partitioning: The Informatica Server applies a hash
function to a partition key to group data among partitions. If you select hash
auto-keys, the Informatica Server uses all grouped or sorted ports as the
partition key. If you select hash user keys, you specify a number of ports to
form the partition key. Use hash partitioning where you want to ensure that the
Informatica Server processes groups of rows
With the same partition key in the same partition. For
more information, see Hash Partitioning.
Key
range partitioning: You specify one or more ports to form a
compound partition key. The Informatica Server passes data to each partition
depending on the ranges you specify for each port. Use key range partitioning
where the sources or targets in the pipeline are partitioned by key range. For
more information, see Key Range Partitioning.
Pass-through
partitioning: The Informatica Server passes all rows at
one partition point to the next partition point without redistributing them.
Choose pass-through partitioning where you want to create an additional
pipeline stage to improve performance, but do not want to change the
distribution of data across partitions.
Question:
What
is Dimensional Modeling?
Dimensional Modeling is a design concept used by many
data warehouse designers to build their data warehouse. In this design model
all the data is stored in two types of tables - Facts table and Dimension
table. Fact table contains the facts/measurements of the business and the
dimension table contains the context of measurements i.e., the dimensions on
which the facts are calculated.
Question:
What
is the main difference between schema in RDBMS and schemas in Data Warehouse?
RDBMS
Schema
* Used for OLTP systems
* Traditional and old schema
* Normalized
* Difficult to understand and navigate
* Cannot solve extract and complex problems
* Poorly modeled
DWH
Schema
* Used for OLAP systems
* New generation schema
* De Normalized
* Easy to understand and navigate
* Extract and complex problems can be easily solved
* Very good model
Question:
What
is incremental loading? What is batch processing?
Incremental
loading: means loading the ongoing changes in the
OLTP.<br><br>Aggregate table contains the [measure] values
,aggregated /grouped/summed up to some level of hierarchy.<br>
Batch
Processing: means executing more than one session in
single run at the same time. We can execute these sessions in 2 ways :
<br>linear: executing one after another<br>parallel: executing more
than one session at time
Question:
Differences
between star and snowflake schemas?
Star
schema: is a logical structure that can be arranged with fact
and dimension tables
in a star formation. It looks like a star
with fact table at the core of the star and the dimension tables along the
spikes of the star. The dimension model is therefore called a STAR SCHEMA.
Star Schema: can have less number of joins.
Star Schema: has data redundancy, so the query performance
is good.
SNOWFLAKING is a method of normalizing the dimension
tables in a star schema.
Snow flake: can have more number of joins.
Snow flake: is normalized, so does not have data redundancy
can have performance issues.
Question:
What
is a CUBE in data warehousing concept?
Cubes are logical representation of multidimensional data.
The edge of the cube contains dimension members and the body of the cube
contains data values.
Question:
What
is degenerate dimension table?
Degenerate Dimensions: If a table contains the values,
which r neither dimension nor measures is called degenerate dimensions .Ex :
invoice id,empno
Degenerate dimensions are those without any attributes
or measures and they are included inside the fact table
Eg orderno
Question:
What
is Difference between E-R Modeling and Dimensional Modeling?
Basic diff is E-R modeling will have logical and
physical model. Dimensional model will have only physical model.
E-R
modeling is used for normalizing the OLTP database design.
Dimensional
modeling is used for de-normalizing the ROLAP/MOLAP design.
Question:
Why
fact table is in normal form?
Basically the fact table consists of the Index keys of
the dimension/look up tables and the measures.
So whenever we have the keys in a table .that itself
implies that the table is in the normal form.
Question:
What
is the definition of normalized and de-normalized view and what are the
differences between them?
Normalization is the process of removing redundancies.
De-normalization is the process of allowing
redundancies.
Question:
What
is junk dimension? What is the difference between junk dimension and
degenerated dimension?
Junk
dimension: Grouping of Random flags and text attributes in a dimension
and moving them to a separate sub dimension.
Degenerate
Dimension: Keeping the control information on Fact table ex:
Consider a Dimension table with fields like order number and order line number
and have 1:1 relationship with Fact table, In this case this dimension is
removed and the order information will be directly stored in a Fact table in
order eliminate unnecessary joins while retrieving order information..
Question:
What
is the difference between view and materialized view?
Views:
At run time, the query will be executed against the database.
Materialized
views: The data for the materialized view query will be
generated at compile time.
Materialized
views can be created by the following ways:
1. Immediate - mview will be created along with data.
2.
Deferred - Mview structure alone will be created. Data will be populated only
when you refresh the mview.
We have the option of refreshing the mviews. It means
when the data in the master table used in the mview query changes, the
refreshing of mviews helps to get the updated (new) data for the mview.
Mview will behave very much like a table. At run time,
data will be retrieved from the result set just as retrieved from a table. The
retrieval time will be very fast unlike the views.
Question:
What
is the data type of the surrogate key?
Data type of the surrogate key is either integer or
numeric or number
Question:
What is aggregate table and aggregate fact table ...
any examples of both
Aggregate
table: Contains summarized data. The materialized views are
aggregated tables.
For ex in sales we have only date transaction. If we
want to create a report like sales by product per year. in such cases we
aggregate the date Values into week_agg, month_agg, quarter_agg, year_agg to retrieve
date from this tables we use aggregate function.
Question:
What
is the difference between ODS and OLTP
ODS It
is nothing but a collection of tables created in the Data warehouse that
maintains only current data.
Whereas OLTP maintains the data only for transactions,
these are designed for recording daily operations and transactions of a
business.
Question:
What
is the difference between data warehouse and BI?
BI is the capability of analyzing the data of a data
warehouse in advantage of that business. A BI tool analyzes the data of a data
warehouse and to come into some business decision depending on the result of
the analysis.
Question:
What
is the difference between OLAP and data warehouse?
Data warehouse is the place where the data is stored
for analyzing
Whereas OLAP is the process of analyzing the data,
managing aggregations, partitioning information into cubes for in-depth
visualization.
ODS A collection of tables created in the Data
warehouse that maintains only current data.
OLTP maintains the data only for transactions, these
are designed for recording daily operations and transactions of a business.
Question:
What
type of Indexing mechanism do we need to use for a typical datawarehosue?
On the fact table it is best to use bitmap indexes.
Dimension tables can use bitmap and/or the other types of
clustered/non-clustered, unique/non-unique indexes.
To my knowledge, SQLServer does not support bitmap
indexes. Only Oracle supports bitmaps.
Question:
Which
columns go to the fact table and which columns go the dimension table?
The Primary Key columns of the Tables (Entities) go to
the Dimension Tables as Foreign Keys.
The Primary Key columns of the Dimension Tables go to
the Fact Tables as Foreign Keys.
Question:
How
do you load the time dimension?
Time dimension are used to represent the data’s or
measures over a certain period of time. The server time dimension is the most widely
used one by which we can represent the data in hierarchical manner such as
quarter->year->months->week wise representations.
Question:
Why
should you put your data warehouse on a different system than your OLTP system?
An OLTP system is basically “data oriented” (ER model)
and not “Subject oriented "(Dimensional Model) .That is why we design a
separate system that will have a subject oriented OLAP system...
Moreover if a complex query is fired on a OLTP system
will cause a heavy overhead on the OLTP server that will affect the day to day
business directly.
Question:
Define
and expansion of ssl.
Secure Sockets Layer (SSL)
Question:
Performance
tuning in Informatica?
The goal of performance tuning is optimize session
performance so sessions run during the available load window for the
Informatica Server. Increase the session performance by following.
The performance of the Informatica Server is related to
network connections. Data generally moves across a network at less than 1 MB
per second, whereas a local disk moves data five to twenty times faster. Thus
network connections often effect on session performance. So avoid network
connections.
·
Flat files: If flat files stored on a
machine other than the informatica server, move those files to the machine that
consists of informatica server.
·
Relational data sources: Minimize the
connections to sources, targets and informatica server to
·
Improve session performance. Moving target
database into server system may improve session Performance.
·
Staging areas: If you use staging areas u
force informatica server to perform multiple data passes. Removing of staging areas may improve
session performance.
·
You can run the multiple informatica
servers against the same repository. Distributing the session load to multiple
informatica servers may improve session performance.
·
Run the informatica server in ASCII data
movement mode improves the session performance. Because ASCII data movement
mode stores a character value in one byte. Unicode mode takes 2 bytes to store
a character.
·
If a session joins multiple source tables in
one Source Qualifier, optimizing the query may improve performance. Also,
single table select statements with an ORDER BY or GROUP BY clause may benefit
from optimization such as adding indexes.
·
We can improve the session performance by
configuring the network packet size, which allows Data to cross the network at
one time. To do this go to server manager, choose server configure database
connections.
·
If u r target consists key constraints and
indexes you slow the loading of data.To improve the session performance in this
case drop constraints and indexes before u run the session and rebuild them
after completion of session.
·
Running parallel sessions by using
concurrent batches will also reduce the time of loading the data. So concurrent
batches may also increase the session performance.
·
Partitioning the session improves the
session performance by creating multiple connections to sources and targets and
loads data in parallel pipe lines.
·
In some cases if a session contains a
aggregator transformation, u can use incremental aggregation to improve session
performance.
·
avoid transformation errors to improve the
session performance.
·
If the session contained look-up
transformation u can improve the session performance by enabling the look up
cache.
·
If your session contains filter transformation,
create that filter transformation nearer to the sources or u can use filter
condition in source qualifier.
·
Aggregator, Rank and joiner transformation
may often decrease the session performance because they must group data before
processing it. to improve session performance in this case use sorted ports
option.
Thank you for providing such a useful and informative blog post about real-time Informatica scenarios and technicilaties.
ReplyDeleteInformatica Read Rest API