The cassandra.yaml comment explains perfectly: "When executing a scan, within or across a partition, we need to keep the tombstones seen in memory, so we can return them to the coordinator, which will use them to make sure other replicas also know about the deleted rows. With workloads that generate a lot of tombstones, this can cause performance problems and even exhaust the server heap."
How do we get tombstones in Cassandra
  • Inserting null values
  • Inserting values into collection columns
  • Expiring Data with TTL
  • Frequent update on at Table with TTL
  • Explicit delete statement
  • Materialized view

Inserting Null Value

Upsert operation on a table can generate a tombstone as well. How? Because Cassandra doesn’t check a condition before writing a value if it exist or not (that would slow write down). Cassandra returns a null value when there is no value for a field. Therefore when a field is set to null Cassandra needs to delete (marks the column with a tombstone for deletion) the existing data.
Table Definition used for this example
CREATE TABLE example ( KEY INT PRIMARY KEY, Column1 text, Column2 text );
Statement with Problem
Statement will result in a tombstone for Column2, even if it is the first insertion
INSERT INTO example (KEY, Column1 , Column2) VALUES (1, 'someValue', null);
Deletion entry in sstable
Solution
Do not insert null values, instead use unset columns while performing Upsert operation
INSERT INTO example (KEY, Column1 ) VALUES (1, 'someValue');
No deletion entry in sstable

Inserting Values Into Collection Columns

Cassandra collections list, SET, map results in tombstones even if we never delete a value. Cassandra optimizes for writes and does not check, if the list has changed (or even existed), instead, it immediately deletes (creates tombstone on a column), before inserting the new value. Table Definition used for this example
CREATE TABLE collection_example( KEY INT PRIMARY KEY, Column1 list, Column2 SET, Column3 map );
Statement with Problem
statement creates three tombstone; one for each collection type even for first insert, because Cassandra does not check for existence of a collection.
INSERT INTO collection_example(KEY, Column1 , Column2 ,Column3 ) VALUES (1, ['a', 'b'], {'c', 'd'}, {1 : 'a', 2 : 'b'});
Deletion entry in sstable
Solution
There is no direct solution to this, we need to be careful while designing table, Collection column is required or not.

Expiring Data with TTL

Expiring data by setting a TTL (Time To Live) is one an alternative to deleting data explicitly, but technically results in the same tombstones recorded by Cassandra and requiring the same level of attention as other types of tombstones.
Cassandra sets all the TTLs at column level irrespective of definition being Global (Table) or Row level.

Frequent Update on at Table with TTL

Frequent updates on a table with TTL can create tombstones multiple times. As every update statements on a specific column generates new TTL for that column whereas existing columns can have older TTL value.
Row level TTL
Table level TTL

Explicit Delete Statement

Explicit delete statement creates a tombstone in the Cassandra to manage consistency

Materialized View

A materialized view is a table that is maintained by Cassandra. One of its main feature is that we can define a different primary key than the one in the base table. We can re-order the fields of the primary key from the base table, but we can also add one extra field into the primary key of the view. This is great as it allows to define a different partitioning or clustering keys but it also generates more tombstones in the view. If we update a key value column of a view in the base table.

Final Thought

As we’ve seen tombstones can be tricky and are not necessarily a bad thing that we should avoid at all cost. It’s just a way in Cassandra to delete data in an append-only structure. However it can affect performances so we’d better be aware when they are generated while designing data model and queries. With this knowledge we should be able to limit their generation.