日均20亿笔记录的数据库优化问题

作者: wokeman | 来源:发表于2017-11-15 16:20 被阅读15次

日均20亿笔记录的数据库优化问题
MySQL单表数据量过千万如何优化（转）
joins；select；group的区别和用法
阿里新产MySQL性能优化实践笔记，GitHub已获千万推荐
zabbix优化之进阶版
史上最全SQL优化方案（二）
Mysql数据库优化
sql查询更新update select
Android 数据库
数据库存储原理特性索引优化

I've written an application with a SQL Server backend that collects and stores and extremely large amount of records. I've calculated that, at the peak, the average amount of records is somewhere in the avenue of 3-4 billion per day (20 hours of operation).

My original solution (before I'd done the actual calculation of the data) was to have my application inserting records into the same table that is queried by my clients. That crashed and burned fairly quickly, obviously, because it's impossible to query a table that's having that many records inserted.

My second solution was to use 2 databases, one for data received by the application and one for client-ready data.

My application would receive data, chunk it into batches of ~100k records and bulk-insert into the staging table. After ~100k records the application would, on the fly, create another staging table with the same schema as before, and begin inserting into that table. It would create a record in a jobs table with the name of the table that has 100k records and a stored procedure on the SQL Server side would move the data from the staging table(s) to client-ready production table, and then drop the table temporary table created by my application.

Both databases have the same set of 5 tables with the same schema, except the staging database which has the jobs table. The staging database has no integrity constraints, key, indexes etc... on the table where the bulk of records will reside. Shown below, the table name is SignalValues_staging. The goal was to have my application slam the data into SQL Server as quickly as possible. The workflow of creating tables on the fly so they can easily be migrated works pretty well.

The following is the 5 relevant tables from my staging database, plus my jobs table:

Staging tables The stored procedure I have written handles the moving of the data from all of the staging tables and inserting it into production. Below is the part of my stored procedure that inserts into production from the staging tables:

-- Signalvalues jobs table.

SELECT *

,ROW_NUMBER() OVER (ORDER BY JobId) AS 'RowIndex'

INTO #JobsToProcess

FROM

(

SELECT JobId

,ProcessingComplete

,SignalValueStagingTableName AS 'TableName'

,(DATEDIFF(SECOND, (SELECT last_user_update

FROM sys.dm_db_index_usage_stats

WHERE database_id = DB_ID(DB_NAME())

AND OBJECT_ID = OBJECT_ID(SignalValueStagingTableName))

,GETUTCDATE())) SecondsSinceLastUpdate

FROM SignalValueJobs

) cte

WHERE cte.ProcessingComplete = 1

OR cte.SecondsSinceLastUpdate >= 120

DECLARE @i INT = (SELECT COUNT(*) FROM #JobsToProcess)

DECLARE @jobParam UNIQUEIDENTIFIER

DECLARE @currentTable NVARCHAR(128)

DECLARE @processingParam BIT

DECLARE @sqlStatement NVARCHAR(2048)

DECLARE @paramDefinitions NVARCHAR(500) = N'@currentJob UNIQUEIDENTIFIER, @processingComplete BIT'

DECLARE @qualifiedTableName NVARCHAR(128)

WHILE @i > 0

BEGIN

SELECT @jobParam = JobId, @currentTable = TableName, @processingParam = ProcessingComplete

FROM #JobsToProcess

WHERE RowIndex = @i

SET @qualifiedTableName = '[Database_Staging].[dbo].['+@currentTable+']'

SET @sqlStatement = N'

--Signal values staging table.

SELECT svs.* INTO #sValues

FROM '+ @qualifiedTableName +' svs

INNER JOIN SignalMetaData smd

ON smd.SignalId = svs.SignalId

INSERT INTO SignalValues SELECT * FROM #sValues

SELECT DISTINCT SignalId INTO #uniqueIdentifiers FROM #sValues

DELETE c FROM '+ @qualifiedTableName +' c INNER JOIN #uniqueIdentifiers u ON c.SignalId = u.SignalId

DROP TABLE #sValues

DROP TABLE #uniqueIdentifiers

IF NOT EXISTS (SELECT TOP 1 1 FROM '+ @qualifiedTableName +') --table is empty

BEGIN

-- processing is completed so drop the table and remvoe the entry

IF @processingComplete = 1

BEGIN

DELETE FROM SignalValueJobs WHERE JobId = @currentJob

IF '''+@currentTable+''' <> ''SignalValues_staging''

BEGIN

DROP TABLE '+ @qualifiedTableName +'

END

'

EXEC sp_executesql @sqlStatement, @paramDefinitions, @currentJob = @jobParam, @processingComplete = @processingParam;

SET @i = @i - 1

END

DROP TABLE #JobsToProcess

I use sp_executesql because the table names for the staging tables come as text from the records in the jobs table.

This stored procedure runs every 2 seconds using the trick I learned from this dba.stackexchange.com post.

The problem I cannot for the life of me resolve is the speed at which the inserts into production are performed. My application creates temporary staging tables and fills them with records incredibly quickly. The insert into production cannot keep up with the amount of tables and eventually there's a surplus of tables into the thousands. The only way I've ever been able to keep up with the incoming data is to remove all keys, indexes, constraints etc... on the production SignalValues table. The problem I then face is that the table ends up with so many records it becomes impossible to query.

I've tried partitioning the table using the [Timestamp] as a partitioning column to no avail. Any form of indexing at all slows the inserts so much that they can't keep up. In addition, I'd need to create thousands of partitions (one every minute? hour?) years in advance. I couldn't figure out how to create them on the fly

I tried creating partitioning by adding a computed column to the table called TimestampMinute whose value was, on INSERT, DATEPART(MINUTE, GETUTCDATE()). Still too slow.

I've tried making it a Memory-Optimized Table as per this Microsoft article. Maybe I don't understand how to do it, but the MOT made the inserts slower somehow.

I've checked the Execution Plan of the stored procedure and found that (I think?) the most intensive operation is

SELECT svs.* INTO #sValues

FROM '+ @qualifiedTableName +' svs

INNER JOIN SignalMetaData smd

ON smd.SignalId = svs.SignalId

To me this doesn't make sense: I've added wall-clock logging to the stored procedure that proved otherwise.

In terms of time-logging, that particular statement above executes in ~300ms on 100k records.

The statement

INSERT INTO SignalValues SELECT * FROM #sValues

executes in 2500-3000ms on 100k records. Deleting from the table the records affected, per:

DELETE c FROM '+ @qualifiedTableName +' c INNER JOIN #uniqueIdentifiers u ON c.SignalId = u.SignalId

takes another 300ms.

How can I make this faster? Can SQL Server handle into the billions of records per day?

If it's relevant, this is SQL Server 2014 Enterprise x64.

Hardware Configuration:

I forgot to include hardware in the first pass of this question. My bad.

I'll preface this with these statements: I know I am losing some performance because of my hardware configuration. I've tried many times but because of budget, C-Level, the alignment of the planets, etc... there's nothing I can do to get a better setup unfortunately. The server is running on a virtual machine and I can't even increase the memory because we simply don't have any more.

Here's my system information:

System Info

The storage is attached to the VM server via iSCSI interface to a NAS box (This will degrade performance). The NAS box has 4 drives in a RAID 10 configuration. They're 4TB WD WD4000FYYZ spinning disk drives with 6GB/s SATA interface. The server only has one data-store configured so tempdb and my database are on the same datastore.

Max DOP is zero. Should I change this to a constant value or just let SQL Server handle it? I read up on RCSI: Am I correct in assuming that the only benefit from RCSI comes with row updates? There will never be updates to any of these particular records, they'll be INSERTed and SELECTed. Will RCSI still benefit me?

My tempdb is 8mb. Based on the answer below from jyao, I changed the #sValues to a regular table to avoid tempdb altogether. Performance was about the same though. I will try increasing the size and growth of tempdb, but given that the size of #sValues will more or less always be the same size I don't anticipate much gain.

I have taken an execution plan that I've attached below. This execution plan is one iteration of a staging table -- 100k records. The execution of the query was fairly quick, around 2 seconds, but keep in mind that this is without indexes on the SignalValues table and the SignalValues table, the target of the INSERT, has no records in it.