MariaDB Temporal tables – Federico Razzoli – MariaDB Server Fest 2020

hi i’m federico razoli and i’m going to talk about one of the most interesting mariadb feature which is temporal tables i am a database consultant i am specialized in mariadb and mysql and i have a preference for open source technologies i recently started this new company called weta base so let’s start by taking a quick look at the landscape of temporal tables implementations as you can see from this slide um they were implemented by all traditional proprietary database systems which means oracle db2 sql server and also snowflake which is a proprietary cloud database for analysts it was they are also implemented by open source databases but with limitations so postgresql has a complete implementation but it’s implemented as an extension and this extension is not available for example in rds or in cloud sql cockroachdb has a limited but decent implementation crooksdb which is a nosql database well it’s actually built around the concept of by temporal tables but it’s not an sql database the same applies to hbase which has the concepts of raw versions but it’s no sql database so it’s something different and mariadb implemented temporal tables in version 10 temporary initially they were only system timed version and in the following version 104 they implemented application timetables and you can actually use both these features for the same table and you get what we call a by temporal tables we will see what it means during the rest of the talk so let’s start by application time tables application time tables are tables having two special columns i call them temporal columns but it’s my terminology that i use for clarity and you will not find it in the documentation or in technical articles civilized before and basically the versioning is not automatic when you insert rows you take care of registering when the validity of the rows begins and where the validity of the row and using these temporal columns which are basically two time stems the beginning and the end of the version so this is an example that we will use for the rest of the talk a ticket table which basically represents some sort of support ticket there is of course a primary key which is an id column there is a state which could be open verified fixed invalid and when you create a row it’s typically open and then there are a summary of the of the ticket and the longer description but at some point we decide that we want to track the changes to the rows of this table so here’s what we can do so basically okay we do it with a normal altered table i wanted to highlight that this puts a lock on the table a shared lock so basically this means that reads on on the rows are possible during the operation of this alter table but writes to the rows are not possible and basically they are cured and the algorithm used is copy which means that mariadb creates internally a new copy of the table so

this requires more space on disk this can be a problem if you have small disks and the table is big then we create as mentioned before the temporal columns in this case they are date time columns but they could be also date or timestamp if we prefer and then there is a special syntax add period for time period is actually the name of that time period we can specify any name we like but i suggest to use time period or something equally clear because these could be used in some of the queries and if you use x66 well the queries will not be very readable and then we specify the names of the temporal columns not that you can use any storage engine for application timed tables okay here i show a problem of creating a table in this way because basically um the first insert in this slide fails why because we did not specify the values for the temporal columns okay the second insult succeeds because we specify these values so this is one reason why this is a better way to create an application tank table so let’s see step by step what we are doing and why so first we create a new empty copy of the table which we call ticket temp then we alter this copy and we start by creating the temporal columns but the difference is that now we are adding default values so for the valued from column the default value is now so the current timestamp for the valid 2 well typically when we insert a new row it’s still valid right so in this case the default value is the highest possible value for a timestamp then i create two indexes probably too bad indexes but my point is please create indexes because you are going to use um temporal columns so as you can see in this talk i’m specifying not just what you can do but what in my opinion you should or should not do um i think this is probably more important than just explaining a feature uh because basically i’m showing you what experience taught me okay so the next step is modifying the primary key why because if id is the primary key we cannot have two verse sorry two rows with the same id even if they are valid in different periods of time we solved this problem by adding valid to to the primary key okay so we dropped the existing primary key we add a new primary key with id and then value two then we populate the table somehow and then we run rename table um it’s important to notice that rename table is an atomic operation so basically we are switching the names but in no point in time the typical table does not exist okay so there is no service disruption it’s also important to notice that the operation we made with primary key should also be made with all unique indexes okay so here i show that if we run a plane select basically we get not just the current data but also the rows that are not valid anymore okay this is the default to avoid this we add the work close i used syntax i like because i find it more clear because it’s near to what i would say in english so basically i say

where the valid from value is less than now and the valid to value is higher than now okay so the row is actually valid now here i show how to delete rows properly um so we don’t really run a delete on an application time table at least most of the times we will not do this instead we will run an update to change the end of the validity of the row and we change it from at some point in the future to now okay so from this point on the row still exceeds so we can read it but it’s not valid anymore so i did this with the stored procedure so here you can see the stored procedure in action we run a plane select we see the row we run ticket delete then we run the select again the row still exists but the valid to value change it and so the row is not valid anymore um so basically what is the real advantage of creating a c sorry an application time table well um basically by doing so we are then able to use a special syntax that i demonstrated here so suppose i want to physically delete historical rows so i’m not preserving the history anymore because it doesn’t matter anymore right if our application runs for 20 years i don’t care anymore about what happened 15 years ago then i have this special syntax which is the lead from table name for portion of time period name from a certain timestamp to another timestamp okay well in this case i also added a where to show that you can do it but probably most of the times you will not um i suggest that if you do tears the beginning the start timestamp is always as low as possible okay so it’s ideally the beginning of time because in this way we only delete the oldest data but if we have data for like starting from the 1999 and ending today and we decide to delete data from data valid from the year 2000 to the year 2001 well it is possible that some rows um are valid for a longer period of time so what will happen well these rows are split into two versions okay so there is a version which is older than the period that we are deleting and a version that is newer than the period we are deleting and basically i think this is very very very confusing and we should not do it unless we have very good reasons the same syntax for a portion of also exists for update is it useful well yes it is because for example instead of physically deleting a row we could mark a row for a deletion so basically maybe there will be some sort of tool that we read the rows marked for deletions and it will store them elsewhere outside of mysql before they can be actually deleted so let’s talk about system versioning so system version tables i will use the abbreviation c square which is probably only mine i’m not sure if you can find it in the documentation but if i say csv you know what i’m talking about um basically the difference is system version tables are much more automatic when i insert a row the beginning of the validity is now i cannot even change it

when i delete a row well yes i can run delete but the row is not physically deleted instead um older version of the row is stored okay well there is a way to also delete it but i will not demonstrate it now um so let’s back to our example the ticket table so this is the original example so the table is not application timed so this is the alter table that we can use to make the table system versioned again i wanted to highlight that there is a shade lock and that the used algorithm is copy and then we simply add the syntax for system versioning but again there is a better way to do this which is basically we create the temporal columns which i called in this case inserted at and deleted that these columns are actually also generated columns okay so you can see the syntax generated always as row start or roll hand these columns exist anyway even in the first example but the difference is if i’m creating them explicitly and i’m deciding their names i can use them i can mention them in my queries otherwise i cannot but anyway i’m also declaring these columns as invisible which means if i run a select star the values of inserted at and deleted at are not and not returned okay they are only returned if i mention these columns explicitly in my select um you don’t have to do this i’m just recommending to do it because i think it’s more practical then there is at period for system time then i specify again the names of the temporal columns and then i specify add system versioning so the syntax is a bit on that but well you can get used to it um i wanted to highlight um you can actually use any storage engine except for connect because there is a bug that prevents us to use connect for system versioned tables which is probably not super useful but well it could have some use cases so if you’re interested you can follow the bug because mariadb has a public jira and then there is another bug which bug or feature request which is support date time for row start and low end because sql standards specify that you can also use the time then i show how to query assist their table so what i wanted to highlight is in the first case select star from ticket i get only the current data i will not get historical data this is very different from application time tables right then in the second example i mentioned explicitly inserted at and deleted add because i also want to see these values but i’m still getting only the current data the third example shows how to get the historical data as well as the current data and i do this with this syntax from table name and time all okay it means get all the versions of the rows from this table then i show how to get all versions of the rows okay so um well basically i’m showing how to get the rows that were deleted or updated so the first two syntaxes are special syntaxes that are part of the support for temporal tables so as you can see you can add from timestamp to another timestamp or between timestamp

and another timestamp while this is shorter maybe but you have to learn a new syntax and if you use an oram i’m sure that the rm will not support the syntax if you use some special tools they will probably not support or understand this syntax so i suggest to use actually the third syntax which is juiced plain sql i add basically where deleted at less than now so less than current timestamp here i show how to get the history of a particular row so basically i specify again for system time all and then i specify where id equals and also in order by because otherwise it will be probably a bit confusing then i show how to read rows from a specific point in time so for example here the first syntax is actually a special syntax and it’s for system time as of timestamp and then the timestamp but again it’s shorter but i’m sure it’s not supported by any tool or our rm or anything so i prefer to use the plain sql syntax which is well i do it in a way i personally like so timestamp between insulted at and deleted ad but it basically just means if you don’t understand it it means we’re inserted at less than timestamp and deleted at greater than timestamp okay so here i show that you can use what i call a temporal join so as you can see i’m joining a table to itself with two different table aliases of course but take a look at the world clause basically what i’m actually doing is joining the ticket table at a specific point in time to the ticket table to another specific point in time so without digging into all the clauses basically what i’m doing is um i’m getting the rules that were present on a certain time and um the state one was not changed after one month right so let’s say some words about indexes um okay in if we are using a cis verse table the row end column is automatically appended to the primary key and to all unique indexes one thing to notice is you probably know that a query can use a whole index or the leftmost part of an index so it’s first columns so basically this means that if the primary key or a unique index was usable for a query before this change it will also be use usable after this change so the change will not affect negatively some some queries this is actually what i suggested to do explicitly for application time tables and the reason is if we don’t do this explicitly well this will not happen at all for application time tables okay so the final part of the talk is just some hints about what you can or should do so some things you should do are or you can do are well a table can be both cis verse and application timed in this case we will call it a bi-temporal table

okay so we can query the rows based on their system version validity or based on their logical validity we can run statistics on the rows that were added or deleted or changed by year by month weekday day or whatever we can run statistics on the rows lifetime we can get the rows that never change it or we can get the rows that change too often because maybe this is a bit suspect so this indicates that there is something strange and we can also examine the history of a particular row why because well maybe a row we suspect it changed because of an application bag so we want to check what actually happened to that road over the time and we can do it um some more things that we should or should not do well a primary key first of all should never change because otherwise it will become impossible to change to track the history of rose if you are not 100 sure about this maybe you should create a trigger which basically checks if old id is different than new id and if it is different it draws an error so the statement fails for application timetables you should never do hard deletions or hard updates because otherwise you will not have old versions of the rows if you have to drop a colon you can actually move it to a different table because in this way you will avoid losing the history and if you have to add a column and the column is not often read or written well you can consider putting it into a different table because in this way you will not have a new row which doesn’t sorry a new column which doesn’t have a history until now okay and if you run statistics or complex queries that involve temporal columns well you should probably add a persistent generated colon on these on these temporal columns and then add indexes on them to make your queries faster so what we left out this is not even a complete list of what we left out but the time is what it is so uh first of all i did not show any sql for the last two slides because there is no time i did not mention alter table basically alter table of course can delete part of the history or change it even worse so alter table for temporal tables is disabled by default but you can actually enable it you can use partitions one or more partitions to store the history of the rows dumps are affected by system version tables so please check the documentation to find out how it affects dumps about replication and binary log again please check the documentation for possible problems caused by system version tables but the last hint suppose you are using mysql mysql does not support temporal tables maybe at some point you want to use temporal tables because maybe you want to allow analysts to inspect production data what you can do is to attach a mariadb replica to a mysql primary server so you replicate data and then you change existing regular tables in maliadb from from what they are now to temporal

tables system version tables um the main problem with this is that if replication lags of course the timestamps will not be correct but maybe you don’t care in some cases so you can actually do it without major problems in my experience so thank you for attending i hope you found this talk useful as you can see it was quite opinionated because i really believe that it’s important to comment the the features and not just explain what you can actually do because what you should do is sometimes very different from what you can do so this is question time so thank you very much please ask your questions