sorting - Cassandra sort and a changing clustering key -
i have data modeling question cases data needs sorted keys can modified. , have user table
{ dept_id text, user_id text, user_name text, mod_date timestamp primary key (dept_id,user_id) } now can query cassandra users dept_id.
what if wanted query users in dept, sorted mod_date.
so, 1 way to
{ dept_id text, mod_date timestamp, user_id text, user_name text, primary key (dept_id, mod_date,user_id) } but, mod_date changes every time user name updated. can't part of clustering key.
attempt 1:
don't update row instead create new record every update.
so, record user foo below {'dept_id1',timestamp1','user_id1','foo'} , name changed 'bar' , 'baz' . in case add row table, like
{'dept_id1',timestamp3','user_id1','baz'} {'dept_id1',timestamp2','user_id1','bar'} {'dept_id1',timestamp1','user_id1','foo'} now can users in dept, sorted mod_date presents different problem.
the data returned duplicated
.
attempt 2 : add column identify head record linked list
{ dept_id text, mod_date timestamp, user_id text, user_name text, next_record text primary key (dept_id,mod_date,user_id) } every time update happens adds row , adds pk of new record.
{'dept_id1',timestamp3','user_id1','baz','head'} {'dept_id1',timestamp2','user_id1','bar','dept_id1#timestamp3'} {'dept_id1',timestamp1','user_id1','foo','dept_id1#timestamp2'} and add secondary index 'next_record' column.
now can support users in dept, sorted mod_date by
select * users dept_id=':dept' , next_record='head' order mod_date.
but looks involved solution , perhaps missing , simpler solution ..
the other option delete , insert high frequency changes think cassandra has issues tombstones.
suggestions/feedback welcome. !
as see, simplest way sorting users on application (client code) side. use dept partition key, means users in 1 dept can handled 1 cassandra node, there no many users in 1 dept , users can sorted on application side fast enough.
Comments
Post a Comment