Comments on: Data Deduplication in Relational Databases

By: surmenok

surmenok — Wed, 27 May 2015 04:23:00 +0000

Performance can be weird, agreed. CTE are often used to force SQL Server to use better query plan. But it is a very simple case. Usually a query has several CTE in chain.
Thanks for an idea of max(payment).
Yes, I understand that IDs must increase over time, that’s why I made it an IDENTITY field in an example. In some cases, when an application inserts records in wrong order, you have to use something else, some kind of timestamp perhaps.

By: Evgeny Hramov

Evgeny Hramov — Wed, 27 May 2015 04:02:00 +0000

CTE are strange). In some cases they are fast, in some – very slow. And if you want join cte itself twice it may be bad performance idea. You can use simplier

SELECT TransactionID, max(PaymentID)m_id
FROM Payment
GROUP BY TransactionID

to select about *last* records. And you of course know, that ids mustn’t go straight 1,2,3,4… And bigger ID doesn’t mean later record.