Hi all--Given a table with the current setup and structure, I would like to create a clustered index for a table that gets updated once per week values I would like to keep unique:
CREATETABLE [dbo].[DBA_AUDIT_SESSION](
[INSTANCE_NAME] [varchar](16)NULL,
[HOST_NAME] [varchar](64)NULL,
[OS_USERNAME] [varchar](255)NULL,
[USERNAME] [varchar](30)NULL,
[USERHOST] [varchar](128)NULL,
[TERMINAL] [varchar](255)NULL,
[TIMESTAMP] [datetime] NOTNULL,
[ACTION_NAME] [varchar](28)NULL,
[LOGOFF_TIME] [datetime] NULL,
[LOGOFF_LREAD] [float] NULL,
[LOGOFF_PREAD] [float] NULL,
[LOGOFF_LWRITE] [float] NULL,
[LOGOFF_DLOCK] [varchar](40)NULL,
[SESSIONID] [float] NOTNULL,
[RETURNCODE] [float] NOTNULL,
[CLIENT_ID] [varchar](64)NULL,
[SESSION_CPU] [float] NULL,
[Import_Date] [datetime] NULL
)
This is is for an existing table within a SQL Server 2005 SP1 database.
I was thinking about creating an IDENTITY column and creating that as a composite primary key with TIMESTAMP, but the following posting seems to suggest that's not a good idea for performance and not the best design:
http://www.sqlservercentral.com/forums/shwmessage.aspx?forumid=9&messageid=152846
Average size for this table is 1.75-2.5 million rows, and the TIMESTAMP field doesn't carry completely unique values. The data is sourced from an Oracle 9i and/or 10gR2 database, populated by an SSIS package. Suggestions on creating a good clustered index for this table? Would I gain value by using the NEWSEQUENTIALID function in SQL Server 2005 on a uniqueidentifier datatype column?
Thanks,
Jonathan
The primary question is why do you think that this table needs a 'clustered index'.
If you want to create a 'sequential' load, then clustering on an IDENTITY field is enough.
Typically, Audit tables have a minimal need for indexing -and it rarely has a primary factor of load sequence.
|||Hi Arnie--I'm renovating the entire repository to factor in a couple of changes:
1. The audience for this data (manager, security & network administrator) commented on how slow queries returned data once three-four months' data accumulated in this repository.
2. I also have to store this on a VMWare-based virtual server with only 1 GB of RAM. (A request for additional resources was turned down a few months ago.) Given that, I'm looking to come up with the best possible indexing strategy with a need to store at least 3-6 months' data at a time, which leaves up to 11-12 million rows to query. That's an awful lot of data to sort through, so I'm trying to also create a long-term indexing strategy without reinventing the wheel for a server with fairly tight resource restraints.
Given there's a large amount of data that doesn't change more than once per month, a clustered index seems the best candidate for this table and another, similar table called 'dba_audit_trail', based on an Oracle-derived view. Thoughts?
- Jonathan
|||I'd cluster index the [Timestamp] or [Import_Date] field (whichever one is capturing the datetime of the insert)-and NOT bother with an IDENTITY field.
(Duplicate [Timestamp] values are not an issue for the clustered index.)
IF all queries have a [Timestamp] criteria, I'd carefully examing other criteria filters to determine indexing the remaining fields.
|||OK--Thanks for the advice.
No comments:
Post a Comment