Friday, March 30, 2012

Question with Flattening / Denormalizing a Hiearchy Dynamically

Hello,

I've been trying to figure out a way to handle flattening a Reports To hierarchy table dyamically. The end result I need is this. Example assumes a two person / two level hiearchy. Emplid of 1 is the CEO, Emplid of 2 is the direct report. I am unable to get this result.

Emplid Reports To Level 1 Emplid Reports To Level 2 Id

1 1

2 1 2

Much kudos to Adam Machanic and Itzik Ben-Gan for their code samples and articles (links below). They have gotten me quite far but I am not there yet.

http://searchsqlserver.techtarget.com/tip/1,289483,sid87_gci1107414,00.html

http://www.sqlmag.com/Article/ArticleID/94268/sql_server_94268.html

After reading these articles I have been able to find out the materialized path and the level in which an employee resides in the Reports To hierarchy. Here is what the data looks like after the recursive CTE (I'll put the code down at the bottom of the blog)

Listing 1:

Emplid Level thePath (Materialized reporting path)

1 1 .1

2 2 .1.2

I have then dynamically invoked the PIVOT command via a stored procedure thanks to help from Itzik again (http://www.sqlmag.com/Article/ArticleID/94268/sql_server_94268.html). This is allowing me to get a result like this below, but it only has the column for the level in the hierarchy in which the employee resides popululated and not the columns for the levels about the employee (Emplid 2 should have a 1 in the second column) See result below.

Listing 2:

Emplid Reports To Level 1 Emplid Reports To Level 2 Id

1 1

2 2

Is there a way to populate the second column for Emplid 2 all in one shot? Right now I am thinking I'll have to land the result of pivot function call to a table and then parse out the thePath column and update the unpopulated columns that way. It seems like this is too complicated. Any suggestions?

Thanks in advance!

Sean

Code is below. Below has a view that contains my recursive CTE and a select from the view (Like Listing 1 Results). The function that dynamically executes the pivot function and then the function call (Like Listing 2 Results).

--Recursive CTE view definition

Use AdventureWorks
;

create view HumanResources.V_EMP_HIER
as

WITH EmployeeCTE
AS
(
SELECT
EmployeeID,
1 AS Level,
CAST('.' + CAST(EmployeeID AS VARCHAR(10)) + '.'
AS VARCHAR(MAX)) AS thePath
FROM HumanResources.Employee
WHERE ManagerID IS NULL

UNION ALL

SELECT
E.EmployeeID,
x.Level + 1 AS Level,
x.thePath + '.' + CONVERT(VARCHAR(MAX), E.EmployeeID) AS thePath
FROM HumanResources.Employee E
JOIN EmployeeCTE x ON x.EmployeeID = E.ManagerID
)
SELECT
EmployeeID,
Level,
thePath
FROM EmployeeCTE

;
GO

--Snipet to list the hierarchy
select
EmployeeID,
Level,
thePath,
REPLICATE(' | ', Level) + CONVERT(varchar,EmployeeID) AS EmpSort

FROM
HumanResources.V_EMP_HIER
order by thePath
;

Dynamic Pivot Function. Thanks Itzik!

Use AdventureWorks
;

IF EXISTS
(SELECT * FROM sys.objects
WHERE object_id = OBJECT_ID(N'[dbo].[usp_pivot]')
AND type in (N'P', N'PC'))
DROP PROCEDURE [dbo].[usp_pivot]
;
GO

CREATE PROC [dbo].[usp_pivot]
@.schema_name AS sysname = N'dbo', -- schema of table/view
@.object_name AS sysname = NULL, -- name of table/view
@.on_rows AS sysname = NULL, -- group by column
@.on_cols AS sysname = NULL, -- rotation column
@.agg_func AS NVARCHAR(12) = N'MAX', -- aggregate function
@.agg_col AS sysname = NULL -- aggregate column
AS

DECLARE
@.object AS NVARCHAR(600),
@.sql AS NVARCHAR(MAX),
@.cols AS NVARCHAR(MAX),
@.newline AS NVARCHAR(2),
@.msg AS NVARCHAR(500);

SET @.newline = NCHAR(13) + NCHAR(10);
SET @.object = QUOTENAME(@.schema_name) + N'.' + QUOTENAME(@.object_name);

-- Check for missing input
IF @.schema_name IS NULL
OR @.object_name IS NULL
OR @.on_rows IS NULL
OR @.on_cols IS NULL
OR @.agg_func IS NULL
OR @.agg_col IS NULL
BEGIN
SET @.msg = N'Missing input parameters: '
+ CASE WHEN @.schema_name IS NULL THEN N'@.schema_name;' ELSE N'' END
+ CASE WHEN @.object_name IS NULL THEN N'@.object_name;' ELSE N'' END
+ CASE WHEN @.on_rows IS NULL THEN N'@.on_rows;' ELSE N'' END
+ CASE WHEN @.on_cols IS NULL THEN N'@.on_cols;' ELSE N'' END
+ CASE WHEN @.agg_func IS NULL THEN N'@.agg_func;' ELSE N'' END
+ CASE WHEN @.agg_col IS NULL THEN N'@.agg_col;' ELSE N'' END
RAISERROR(@.msg, 16, 1);
RETURN;
END

-- Allow only existing table or view name as input object
IF COALESCE(OBJECT_ID(@.object, N'U'),
OBJECT_ID(@.object, N'V')) IS NULL
BEGIN
SET @.msg = N'%s is not an existing table or view in the database.';
RAISERROR(@.msg, 16, 1, @.object);
RETURN;
END

-- Verify that column names specified in @.on_rows, @.on_cols, @.agg_col exist
IF COLUMNPROPERTY(OBJECT_ID(@.object), @.on_rows, 'ColumnId') IS NULL
OR COLUMNPROPERTY(OBJECT_ID(@.object), @.on_cols, 'ColumnId') IS NULL
OR COLUMNPROPERTY(OBJECT_ID(@.object), @.agg_col, 'ColumnId') IS NULL
BEGIN
SET @.msg = N'%s, %s and %s must'
+ N' be existing column names in %s.';
RAISERROR(@.msg, 16, 1, @.on_rows, @.on_cols, @.agg_col, @.object);
RETURN;
END

-- Verify that @.agg_func is in a known list of functions
-- Add to list as needed and adjust @.agg_func size accordingly
IF @.agg_func NOT IN
(N'AVG', N'COUNT', N'COUNT_BIG', N'SUM', N'MIN', N'MAX',
N'STDEV', N'STDEVP', N'VAR', N'VARP')
BEGIN
SET @.msg = N'%s is an unsupported aggregate function.';
RAISERROR(@.msg, 16, 1, @.agg_func);
RETURN;
END

-- Construct column list
SET @.sql =
N'SET @.result = ' + @.newline +
N' STUFF(' + @.newline +
N' (SELECT N'','' + '
+ N'QUOTENAME(pivot_col) AS [text()]' + @.newline +
N' FROM (SELECT DISTINCT('
+ QUOTENAME(@.on_cols) + N') AS pivot_col' + @.newline +
N' FROM ' + @.object + N') AS DistinctCols' + @.newline +
N' ORDER BY pivot_col' + @.newline +
N' FOR XML PATH('''')),' + @.newline +
N' 1, 1, N'''');'

EXEC sp_executesql
@.stmt = @.sql,
@.params = N'@.result AS NVARCHAR(MAX) OUTPUT',
@.result = @.cols OUTPUT;

-- Check @.cols for possible SQL injection attempt
IF UPPER(@.cols) LIKE UPPER(N'%0x%')
OR UPPER(@.cols) LIKE UPPER(N'%;%')
OR UPPER(@.cols) LIKE UPPER(N'%''%')
OR UPPER(@.cols) LIKE UPPER(N'%--%')
OR UPPER(@.cols) LIKE UPPER(N'%/*%*/%')
OR UPPER(@.cols) LIKE UPPER(N'%EXEC%')
OR UPPER(@.cols) LIKE UPPER(N'%xp[_]%')
OR UPPER(@.cols) LIKE UPPER(N'%sp[_]%')
OR UPPER(@.cols) LIKE UPPER(N'%SELECT%')
OR UPPER(@.cols) LIKE UPPER(N'%INSERT%')
OR UPPER(@.cols) LIKE UPPER(N'%UPDATE%')
OR UPPER(@.cols) LIKE UPPER(N'%DELETE%')
OR UPPER(@.cols) LIKE UPPER(N'%TRUNCATE%')
OR UPPER(@.cols) LIKE UPPER(N'%CREATE%')
OR UPPER(@.cols) LIKE UPPER(N'%ALTER%')
OR UPPER(@.cols) LIKE UPPER(N'%DROP%')
-- Look for other possible strings used in SQL injection here
BEGIN
SET @.msg = N'Possible SQL injection attempt.';
RAISERROR(@.msg, 16, 1);
RETURN;
END

-- Create the PIVOT query
SET @.sql =
N'SELECT *' + @.newline +
N'FROM' + @.newline +
N' ( SELECT ' + @.newline +
N' ' + QUOTENAME(@.on_rows) + N',' + @.newline +
N' ' + QUOTENAME(@.on_cols) + N' AS pivot_col,' + @.newline +
N' ' + QUOTENAME(@.agg_col) + N' AS agg_col' + @.newline +
N' FROM ' + @.object + @.newline +
N' ) AS PivotInput' + @.newline +
N' PIVOT' + @.newline +
N' ( ' + @.agg_func + N'(agg_col)' + @.newline +
N' FOR pivot_col' + @.newline +
N' IN(' + @.cols + N')' + @.newline +
N' ) AS PivotOutput;';

EXEC sp_executesql @.sql;
GO

Call of the function that flattens the hierarchy, but doesn't populate of of the reporting level columns.

Use AdventureWorks
;

EXEC dbo.usp_pivot
@.schema_name = N'HumanResources',
@.object_name = N'V_EMP_HIER',
@.on_rows = N'EmployeeID',
@.on_cols = N'Level',
@.agg_func = N'MAX',
@.agg_col = N'EmployeeID';
;

Sean, we talked at the Microsoft BI Conference last week in Seattle and you were looking for some help on this posting. I’ve followed up with an internal alias that had some T-SQL gurus on it and hopefully one of the following solutions may help you get going in the right direction. Alternatively, you could consider using Microsoft SQL Server Analysis Services (AS) for this kind of reporting which may make it easier and faster. There is the notion of Parent\Child dimensions built into AS.

Good Luck!

-- Scott

-

<Header clipped>

Perhaps this would help…

set nocount on;

use northwind;

go

-- create helper function fn_nums

-- returning an auxiliary table of numbers

if object_id('dbo.fn_nums', 'if') is not null

drop function dbo.fn_nums;

go

create function dbo.fn_nums(@.max as int)

returns table as return

with

c0 as(select 0 as const union all select 0),

c1 as(select 0 as const from c0 as a, c0 as b),

c2 as(select 0 as const from c1 as a, c1 as b),

c3 as(select 0 as const from c2 as a, c2 as b),

c4 as(select 0 as const from c3 as a, c3 as b),

c5 as(select 0 as const from c4 as a, c4 as b),

c6 as(select 0 as const from c5 as a, c5 as b)

select top(@.max) row_number()

over(order by const) as n

from c6;

go

-- create helper function fn_split

-- splitting a string

if object_id('dbo.fn_split') is not null

drop function dbo.fn_split;

go

create function dbo.fn_split

(@.string varchar(max), @.separator char(1) = ',') returns table

as

return

select

n - len(replace(left(array, n), @.separator, '')) + 1 as pos,

substring(array, n,

charindex(@.separator, array + @.separator, n) - n) as element

from (select @.string as array) as d

join dbo.fn_nums(900)

on n <= len(array)

and substring(@.separator + array, n, 1) = @.separator;

go

-- Solution query

with empscte as

(

select employeeid, reportsto,

'.' + cast(employeeid as varchar(max)) + '.' as thepath

from employees

where reportsto is null

union all

select sub.employeeid, sub.reportsto,

mgr.thepath + cast(sub.employeeid as varchar(max)) + '.'

from empscte as mgr

join employees as sub

on sub.reportsto = mgr.employeeid

)

select *

from (select employeeid, pos, element as managerid

from empscte as e

cross apply dbo.fn_split(stuff(e.thepath, 1, 1, ''), '.') as s) as d

pivot(max(managerid) for pos in([1],[2],[3],[4],[5]/*add more levels here*/)) as p;

-

From: Ty Balascio
Sent: Monday, May 14, 2007 11:09 AM
Subject: RE: T-SQL Question from customer

I was part of a Usenet thread on this many years ago. Here was the preferred solution.

<paste>

-- If a node's parent is 0, then it's the root node

-- Each node has a unique id and a name

CREATE TABLE Tree

(

parent INT NOT NULL DEFAULT 0 CHECK (parent >= 0),

node INT NOT NULL CHECK (node > 0) PRIMARY KEY,

name VARCHAR(20) NOT NULL

)

GO

-- UDF to return all descendants of a given node

-- Node is identified by its id number

-- Distance is the number of links between two nodes

CREATE FUNCTION Descendants(@.root_node INT)

RETURNS @.nodes TABLE

(node INT NOT NULL PRIMARY KEY CHECK (node > 0),

name VARCHAR(20) NOT NULL,

distance INT NOT NULL CHECK (distance >= 0))

AS

BEGIN

IF NOT EXISTS (SELECT * FROM Tree WHERE node = @.root_node)

RETURN

DECLARE @.distance INT,

@.next_distance INT

SELECT @.distance = 0,

@.next_distance = 1

INSERT INTO @.nodes (node, name, distance)

SELECT node, name, @.distance

FROM Tree

WHERE node = @.root_node

WHILE EXISTS (SELECT * FROM @.nodes WHERE distance = @.distance)

BEGIN

INSERT INTO @.nodes (node, name, distance)

SELECT T.node, T.name, @.next_distance

FROM @.nodes AS N

INNER JOIN

Tree AS T

ON N.distance = @.distance AND

N.node = T.parent

SELECT @.distance = @.next_distance,

@.next_distance = @.next_distance + 1

END

RETURN

END

GO

-- Sample tree

INSERT INTO Tree (parent, node, name)

SELECT 0, 1, 'A' -- root node

UNION ALL

SELECT 1, 2, 'B'

UNION ALL

SELECT 1, 3, 'C'

UNION ALL

SELECT 2, 4, 'D'

UNION ALL

SELECT 2, 5, 'E'

UNION ALL

SELECT 3, 6, 'F'

UNION ALL

SELECT 5, 7, 'G'

UNION ALL

SELECT 5, 8, 'H'

UNION ALL

SELECT 5, 9, 'I'

-- All nodes

SELECT *

FROM Descendants(1)

ORDER BY distance, node

-- All nodes from node id 2 (named B)

SELECT *

FROM Descendants(2)

ORDER BY distance, node

</paste>

|||

Below are simpler and more efficient solutions. You don't really need the UDFs or PIVOT clause and other objects. You can do it with a single CTE. The level can be encoded in the hierarchy path so it is just a matter of decoding it the easiest way.

Code Snippet

-- Using Northwind Employees table:

with EmpTree
as
(
select e.EmployeeID, cast(cast(e.EmployeeID as binary(4)) as varbinary(max)) as EmpHier
from Northwind.dbo.Employees as e
where e.ReportsTo is null
union all
select c.EmployeeID, cast(p.EmpHier + cast(c.EmployeeID as binary(4)) as varbinary(max))
from EmpTree as p
join Northwind.dbo.Employees as c
on c.ReportsTo = p.EmployeeID
)
select EmployeeID
, nullif(cast(substring(EmpHier, 1, 4) as int), 0) as lvl_1
, nullif(cast(substring(EmpHier, 5, 4) as int), 0) as lvl_2
, nullif(cast(substring(EmpHier, 9, 4) as int), 0) as lvl_3
, nullif(cast(substring(EmpHier, 13, 4) as int), 0) as lvl_4
, nullif(cast(substring(EmpHier, 17, 4) as int), 0) as lvl_5
, nullif(cast(substring(EmpHier, 21, 4) as int), 0) as lvl_6
, nullif(cast(substring(EmpHier, 25, 4) as int), 0) as lvl_7
, nullif(cast(substring(EmpHier, 29, 4) as int), 0) as lvl_8
, nullif(cast(substring(EmpHier, 33, 4) as int), 0) as lvl_9
, nullif(cast(substring(EmpHier, 37, 4) as int), 0) as lvl_10
from EmpTree;


-- Using AdventureWorks Employee table:

with EmpTree
as
(
select e.EmployeeID, cast(cast(e.EmployeeID as binary(4)) as varbinary(max)) as EmpHier
from AdventureWorks.HumanResources.Employee as e
where e.ManagerID is null
union all
select c.EmployeeID, cast(p.EmpHier + cast(c.EmployeeID as binary(4)) as varbinary(max))
from EmpTree as p
join AdventureWorks.HumanResources.Employee as c
on c.ManagerID = p.EmployeeID
)
select EmployeeID
, nullif(cast(substring(EmpHier, 1, 4) as int), 0) as lvl_1
, nullif(cast(substring(EmpHier, 5, 4) as int), 0) as lvl_2
, nullif(cast(substring(EmpHier, 9, 4) as int), 0) as lvl_3
, nullif(cast(substring(EmpHier, 13, 4) as int), 0) as lvl_4
, nullif(cast(substring(EmpHier, 17, 4) as int), 0) as lvl_5
, nullif(cast(substring(EmpHier, 21, 4) as int), 0) as lvl_6
, nullif(cast(substring(EmpHier, 25, 4) as int), 0) as lvl_7
, nullif(cast(substring(EmpHier, 29, 4) as int), 0) as lvl_8
, nullif(cast(substring(EmpHier, 33, 4) as int), 0) as lvl_9
, nullif(cast(substring(EmpHier, 37, 4) as int), 0) as lvl_10
from EmpTree;

|||

Scott, Ty and Umachandar.

Thank you all for your options. The solutions give us what we need and a few alternatives too.

I really appreciate you time and efforts.


Sean

|||

Two other options that were forwarded to me:

--

Here's another option:

with empscte as
(
select employeeid, managerid, 1 as lvl,
cast(employeeid as varbinary(max)) as binpath
from humanresources.employee
where managerid is null

union all

select sub.employeeid, sub.managerid, mgr.lvl + 1,
mgr.binpath + cast(sub.employeeid as binary(4))
from empscte as mgr
join humanresources.employee as sub
on sub.managerid = mgr.employeeid
)
select replicate(' | ', lvl-1) + cast(employeeid as varchar(10)) as emp,
nullif(0+substring(binpath, 1, 4), 0) as l1,
nullif(0+substring(binpath, 5, 4), 0) as l2,
nullif(0+substring(binpath, 9, 4), 0) as l3,
nullif(0+substring(binpath, 13, 4), 0) as l4,
nullif(0+substring(binpath, 17, 4), 0) as l5,
nullif(0+substring(binpath, 21, 4), 0) as l6,
/* add more levels here if needed */
lvl
from empscte
order by binpath;

Itzik

Another response – this one from Adam.

Maybe I'm missing something--how dynamic do you want this to be? Can you use placeholder columns for levels that may not be there? If so, wouldn't something like the following work fine (you can run it in AW):

with emps as

(

select

e.employeeid,

null as ReportsToLevel1,

null as ReportsToLevel2,

null as ReportsToLevel3,

null as ReportsToLevel4,

null as ReportsToLevel5,

null as ReportsToLevel6,

null as ReportsToLevel7,

null as ReportsToLevel8,

1 as theLevel

from humanresources.employee e

where managerid is null

union all

select

e.employeeid,

case theLevel when 1 then e.managerid else x.ReportsToLevel1 end,

case theLevel when 2 then e.managerid else x.ReportsToLevel2 end,

case theLevel when 3 then e.managerid else x.ReportsToLevel3 end,

case theLevel when 4 then e.managerid else x.ReportsToLevel4 end,

case theLevel when 5 then e.managerid else x.ReportsToLevel5 end,

case theLevel when 6 then e.managerid else x.ReportsToLevel6 end,

case theLevel when 7 then e.managerid else x.ReportsToLevel7 end,

case theLevel when 8 then e.managerid else x.ReportsToLevel8 end,

theLevel + 1 as theLevel

from humanresources.employee e

join emps x on x.employeeid = e.managerid

)

select *

from emps

No comments:

Post a Comment