I’ve heard it said that SQL is a set-based language, and that thinking in sets is the way to make proper use of it. So I thought I was on solid ground when I decided to represent a bag of sets as a relation:
CREATE TABLE bag_of_sets ( set_id INTEGER NOT NULL, member CHAR(1) NOT NULL, CONSTRAINT enforce_set PRIMARY KEY (set_id, member) );
As far as I know that’s a good, fully-normalized representation. As intersection is such a fundamental operation on sets, I expected to find a natural way to express it in SQL. But this is the best I can manage:
-- Intersection SELECT member FROM bag_of_sets GROUP BY member HAVING COUNT(*) = (SELECT COUNT(DISTINCT set_id) FROM bag_of_sets);
Ugly, isn’t it? How long does it take you to realise what it’s trying to do? And then, how long does it take you to be sure it’s correct? The intention is hidden amongst lots of implementation detail (a thought I often have about the SQL I write). And having to mention bag_of_sets
twice seems so wrong. Have I missed something better? Continue reading SQL Set Intersection – harder than I thought?