Friday, March 9, 2012

Question on index creation for large table

Hi all--Given a table with the current setup and structure, I would like to create a clustered index for a table that gets updated once per week values I would like to keep unique:

CREATETABLE [dbo].[DBA_AUDIT_SESSION](

[INSTANCE_NAME] [varchar](16)NULL,

[HOST_NAME] [varchar](64)NULL,

[OS_USERNAME] [varchar](255)NULL,

[USERNAME] [varchar](30)NULL,

[USERHOST] [varchar](128)NULL,

[TERMINAL] [varchar](255)NULL,

[TIMESTAMP] [datetime] NOTNULL,

[ACTION_NAME] [varchar](28)NULL,

[LOGOFF_TIME] [datetime] NULL,

[LOGOFF_LREAD] [float] NULL,

[LOGOFF_PREAD] [float] NULL,

[LOGOFF_LWRITE] [float] NULL,

[LOGOFF_DLOCK] [varchar](40)NULL,

[SESSIONID] [float] NOTNULL,

[RETURNCODE] [float] NOTNULL,

[CLIENT_ID] [varchar](64)NULL,

[SESSION_CPU] [float] NULL,

[Import_Date] [datetime] NULL

)

This is is for an existing table within a SQL Server 2005 SP1 database.

I was thinking about creating an IDENTITY column and creating that as a composite primary key with TIMESTAMP, but the following posting seems to suggest that's not a good idea for performance and not the best design:

http://www.sqlservercentral.com/forums/shwmessage.aspx?forumid=9&messageid=152846

Average size for this table is 1.75-2.5 million rows, and the TIMESTAMP field doesn't carry completely unique values. The data is sourced from an Oracle 9i and/or 10gR2 database, populated by an SSIS package. Suggestions on creating a good clustered index for this table? Would I gain value by using the NEWSEQUENTIALID function in SQL Server 2005 on a uniqueidentifier datatype column?

Thanks,

Jonathan

The primary question is why do you think that this table needs a 'clustered index'.

If you want to create a 'sequential' load, then clustering on an IDENTITY field is enough.

Typically, Audit tables have a minimal need for indexing -and it rarely has a primary factor of load sequence.

|||

Hi Arnie--I'm renovating the entire repository to factor in a couple of changes:

1. The audience for this data (manager, security & network administrator) commented on how slow queries returned data once three-four months' data accumulated in this repository.

2. I also have to store this on a VMWare-based virtual server with only 1 GB of RAM. (A request for additional resources was turned down a few months ago.) Given that, I'm looking to come up with the best possible indexing strategy with a need to store at least 3-6 months' data at a time, which leaves up to 11-12 million rows to query. That's an awful lot of data to sort through, so I'm trying to also create a long-term indexing strategy without reinventing the wheel for a server with fairly tight resource restraints.

Given there's a large amount of data that doesn't change more than once per month, a clustered index seems the best candidate for this table and another, similar table called 'dba_audit_trail', based on an Oracle-derived view. Thoughts?

- Jonathan

|||

I'd cluster index the [Timestamp] or [Import_Date] field (whichever one is capturing the datetime of the insert)-and NOT bother with an IDENTITY field.

(Duplicate [Timestamp] values are not an issue for the clustered index.)

IF all queries have a [Timestamp] criteria, I'd carefully examing other criteria filters to determine indexing the remaining fields.

|||OK--Thanks for the advice.

No comments:

Post a Comment