美文网首页
Data Flow Transformations

Data Flow Transformations

作者: 鲸鱼酱375 | 来源:发表于2019-10-02 09:56 被阅读0次

1.Data Flow Transformations:

  • Transformations are used to apply business rules on data.
  • It can be used to aggregate, merge, distribute, and modify data as per the business requirement.
  • Transformations can also perform lookup operations and generate sample datasets.
  • It can also be used to perform data cleansing and data standardization.
  • it only one can execute in data flow task**

Transformation Types:

Transformations are classified as follows:

Row Transformations: row by row

  • Character Map
  • Copy Column
  • Data Conversion
  • Derived Column,
  • OLEDB Command, etc.

Split and Join Transformations

  • Conditional Split and Multicast non- blocking
  • Union All : semi bolcking
  • Merge and Merge Join: semi bolcking
  • Lookup : row by row,non-blocking
  • Cache Transform: non- blocking
    clear the cache file

Rowset Transformations full blocking

  • Aggregate
  • Sort

Business Intelligence Transformations

  • Slowly Changing Dimension
  • Fuzzy Lookup and Fuzzy Grouping: full blocking
    perform the (?)
    identify the duplicte value(?)

Auditing Transformations

  • Audit unblocking
  • Row Count

Synchronous and Asynchronous Transformations:

SSIS dataflow contain 2 types of components:

Synchronous:

  • Non-Blocking Transformations
  • The output of an synchronous component uses the same buffer as the input.
  • Number of records IN == Number of records OUT

◌row by row processing
◌ doesn't create new buffer

◆when you have pipline, you have buffer(by default 10 mb)

buffer and thread

  • if max buffer rows =1000;maxbuffer=10mb, 超过了其中一个都不会插入
  • buffer size min=64mb,max=100mb

execution tree

  • =execution plan is sql
  • when a new buffer created, until a new buffer was wxecuted.

how to check execution tree?

  • view-other windows log- event
  • ssis- logging- details -add

Asynchronous:

  • Semi-Blocking and Full-Blocking Transformations
  • The output of an asynchronous component uses the new buffer.
  • An asynchronous component can have more or less output records then input records
  • we need create a new buffer, input =/= outpout record

◆ in transformation, we might need wait whole record; or wait partial record

semi-blocking vs full-blocking

semi full
read a portion source and apply trans and send to destination 2 execution tree
may or may not require more buffer must create new buffer
N in =/= N out only execute transflormation when all record were read

Synchronous Vs Asynchronous:

  • Differences between Non-Blocking, Semi-Blocking, Full-Blocking transformations are listed below:
image.png

thread(?): when you creating a new buffer, it required a new engine thread

  • Non-Blocking, Semi-Blocking and Full-Blocking Transformations are listed below:


    image.png

Data Conversion:

  • Converts the data in an input column to a different data type and then copies it to a new output column.

It can be used to:

  • Change the data type

  • Set the column length of string data and the precision and scale on numeric data.

  • Specify a code page. ( configure language: eg: chinese; english)
    we need change everytime, because in hard drive

  • This transformation has one input, one output, and one error output.

image.png

Character Map:

  • Applies string functions, such as conversion from lowercase to uppercase, to character data.
  • Operates only on column with a string data type.
  • Character Map transformation can convert column data in place or add converted data in the new column.
  • This transformation has one input, one output, and one error output.

You configure the Character Map transformation in the following ways:

  • Specify the columns to convert.
  • Specify the operations to apply to each column.

Copy Column:

  • Copy Column transformation creates new columns by copying input columns and adding the new columns to the transformation output
  • It can create multiple copies of a column, or create copies of multiple columns in one operation.
  • This transformation has one input, one output. It does not support an error output.

Derived Column:

  • Creates new column values by applying expressions to transformation input columns.
  • Expression can have variables, functions, operators, and columns from the transformation input.
  • It can be used to:
    ●Concatenate data from different columns
    ●Extract characters from string data by using functions
    ●Apply mathematical functions to numeric data
    ●Create expressions that compare input columns and variables
    ●Extract parts of a datetime value

  • The result can be added as a new column or inserted into an existing column

  • This transformation has one input, one regular output, and one error output.

▶always use when we want to calculate

相关文章

网友评论

      本文标题:Data Flow Transformations

      本文链接:https://www.haomeiwen.com/subject/joqaectx.html