About time and computers

This is a revisited post from my article "El tiempo y los sistemas informáticos" published at my old personal blog aldrin.martoq.cl/techblog on March 8, 2011.

Aldrin Martoq Ahumada
Servicios A0 SpA

--

Given yet another modification to winter time in Chile, I usually read some minor (and some big) mistakes about how date and time is handled by computer systems. I was looking for a good article that explains in detail all the nuances, but didn't find one. So, I wrote one trying to cover this need.

I want to talk about the following topics:

  1. The oficial time reference is UTC, not GMT.
  2. There are minutes which have more than 60 seconds.
  3. NTP is not related to time changes, time zones, neither summer or winter time.
  4. All systems must abandon local formats like MM/DD/YY or DD/MM/YY.
  5. Many software is not prepared to work with different time zones. In a global world, that’s unacceptable.
  6. Even worst, I think no software contemplates the inevitable modifications to time zones.

While all of this is kind of a mess and an unresolved topic, knowing more about it can help you on many of the issues you may be facing now or in the future.

Time, a continuos regular measure unit

The first assumption many software takes for granted is that time is continuos and uniform in the whole universe. Thanks to Einstein, we now know this is actually not true. We maybe face more of this when space travel becomes common, but for now we can live with an uniform time, in earth at least.

This clock just fall off the wall today.

By taking time as uniform, we can assume that time is the same anywhere around the world. So we can define an universal time. Every country can have its official time that references this universal time, so we can appoint a meeting in Chile at 10:00 in the morning with people in Finland at 15:00 in the afternoon. Easy, right?

Well, not so much. Let's start with the problems…

It's UTC, not GMT… please

Since 1880+ the official universal time was calculated from the mean solar time. That is: it is observed how long it takes to the earth take a complete turn on itself based on the sun position. Divide that time by 86400, name that a second, and done: we have an official time which everyone can synchronize. This is what is known as Greenwich Mean Time or GMT, since this measure was made at the Royal Observatory in Greenwich, London.

Deviation of day length from SI based day (source: Wikipedia/Wikimedia)

The problem is that the mean solar time is quite irregular. If you look at the graph above, every year the earth turns slower and days become longer. There are different effects that may explain why it changes the speed in both directions, but we don’t actually know how much is going to change.

This deviation in measure of time is unacceptable for many applications and systems. Just think everybody constantly changing the length of a second!

Since 1972, GMT is superseeded by a new universal time standard: UTC that in english means "Coordinated Universal Time" and in french means "Temps Universal Coordonné". Yep, the couldn't agree on the abbreviation.

In UTC the length of a second is always the same, because it's measured by atomic clocks. Now we can synchronise many, many systems with little error and it's not necessary to adjust our systems to the vagaries of the mean solar time. Goodness!

The relevant thing here is that nowadays UTC is the universal time used as a reference by all countries. This is important because of the tricks it has.

Leap seconds or syncing with the earth

In UTC we have regular seconds (and milliseconds, microseconds, etc). But, remember that every year the length of a day is slightly and randomly longer. UTC solves this problem making the length of a minute variable.

Most days have 86400 seconds, but there are some days that at the last minute of the last hour of that day have 61 seconds instead of 60 seconds, and that extra second is called leap second ("segundo intercalar" in spanish). For example: December 31, 2008 at 23:59:60 UTC is a valid date.

This is not so weird if you remember we have leap years that add an extra day in February 29 for the same reason: trying to sync the length of the year to astronomical observations.

A complete list of leap seconds can be found in wikipedia, there have been 27 so far. Because the deviation is highly unpredictable, this is not something that can be calculated in advance. So, all leap seconds are added arbitrarily, according to astronomical observations of the movement of the earth. In any case the earth turns faster and the day become shorter, there could be a minute with 59 seconds, but this hasn't happened so far.

Example leap second (source: Wikipedia)

Please note that while the leap second is added at 23:59:60 of universal time (that is UTC), it could be at any time of the day depending of the local time zone.

This can have unintended effects in the software we write, so we must learn how our system handles (or not) leap seconds. It could even become random crashes or system failures, as some system administrators found at the 2012 leap second.

In most Linux systems, the clock is slewed during a period of time (say, a day), so a day never has more or less than 86400 seconds. This is usually done automatically if you have an updated timezone or use a synchronization service like NTP.

Seems messy? We are just starting. So, let's talk about NTP.

Syncing time

Our next problem to look is how to keep our systems synchronized, that is, how to keep all your systems at the same time with the least possible difference. For this, we have NTP or Network Time Protocol, in which servers that somehow have the correct time provide them to clients that will try to catch up.

It may seem trivial, but it is not. First, latency is highly variable on Internet, packages (messages) arrive disordered and at different times: 10 round trips from Chile to time.apple.com gives 111ms average/66ms stdev for example. The NTP protocol must compensate for this. If we simply ask the time to the server and just use whatever the server says, you could easily end with huge errors! Some people configure their systems like that, something called SNTP for Stupid NT… I mean, Simple NTP.

You may think that a second or even a couple of seconds is not such a big deal. But there are many, many applications that will not work well unless you have well synchronized clocks. And with time, you will learn that 100ms of difference is the most you can tolerate. If you are a system administrator for example, it will be good if all your servers share the same time.

It's so important, that NTP does not simply copy the time from the server. What really happens is that the clock on your computer is not as precise as an atomic clock, so it has an error (let's say it drifts -10 ms every 30 days). So, if you configure your system as an NTP client, it will calculate and compensate that error for you. That takes time, but in the long run you will have a very precise clock without having an atomic source. Pretty neat, eh?

NTP stratums (source: Wikimedia)

To keep time precision, NTP has many Stratum or levels. Stratum 0 are atomic clocks, the most accurate source of time. At Stratum 1, there are servers that synchronize with Stratum 0, the servers at Stratum 2 synchronize with Stratum 1, and the same for the next levels. At any level some servers can talk between each other, to provide a more robust and stable time.

If you are a sysadmin and your systems are in Chile, you should replicate a similar setup: configure a master server to synchronize with ntp.shoa.cl and then sync every computer and server to your master server. This is good, because inside your network the time difference will be minimal. But I repeat, don’t use SNTP.

NTP provides universal time, but it has nothing to do with the local time in your country or daylight-saving/standard time. So, lets cover that now.

Time around the world

So far, we have an universal time (UTC), arbitrary corrections (leap seconds) and a synchronization mechanism (NTP). Now we are ready to define time around the world, or time zones, using UTC as a reference.

Time zones from the TZ database version 2012c (source: Wikimedia)

We must consider that time zones are defined by political issues, not by geographical reasons. Most areas of Chile for example, by longitude are closer to UTC-05, but the standard time for Chile Continental is UTC-04 and UTC-03 during summer. Chile has also different offsets for Easter Island (UTC-06, UTC-05 for summer timer), and in Region de Magallanes is UTC-03 all the year round.

Time zones used to have names with an abbreviation like Chile Standard Time (CLT) for UTC-04 and Chile Summer Time (CLST) for UTC-03. These are not being used in current software anymore, because they don't always mean the same. For example, Moscow or MSK means UTC+03 some years and UTC+04 in others. So, when I ask to my computer what time is it, it just simply display the offset from UTC:

$ date
Tue Jan 16 19:30:27 -03 2018

Today, in a global world, most software store and handles time internally in UTC . This guarantees that an event can be represented in different time zones around the world. For example, a local meeting in Santiago of Chile on Monday May 9, 2011 at 09:00 AM (UTC-04) is stored in the database as May 9, 2011 at 13:00 PM UTC. If in the same meeting there will be people from Mexico, the saved database time is correctly translated to May 9, 2011 at 07:00 AM (UTC-06).

Some systems like most Linux/UNIX machines go even futher by keeping their internal clock in UTC, so when you choose a time zone it really never touches the internal clock, it just configures how is represented from UTC to whatever zone you just configured. Remember that NTP calculates how much your clock drift over time? If you constantly change the internal clock to match your wall clock, NTP will not be able to correctly calculate that your clock has a difference of -10ms around 30 days.

If you are a software architect or developer, you may be wondering how should you store your date and time data by now. You must definitively store it in UTC, and let the software handle the conversion to whatever time zone your users have. Now, how do you know that one user is from Africa and another from South America is another story.

Some databases have an option like TIMESTAMP WITH TIMEZONE, that internally save the time and the zone (well… not actually the timezone, just the offset). For our example, it will save something like 2011–05–09 13:00 (-04:00). Keeping only the offset is not really useful, as we are going to see.

Arbitrariness and the Summer Time

So far, so good. If you could stop reading here, there is nothing too complicated about time: there is a universal time, NTP should sync all your systems clocks, you should store time as UTC, and use whatever library works well to display time in the current user preferred time zone.

But, there is a always a but.

The problem is that what the official time for a country has, is, and will be complete arbitrary. Chile has used many different times in its history… For different political, economic, whatever… reasons… time will change. Even the earth gives as leap seconds, we don’t really know when there will be a new definition of this UTC leap seconds.

Some countries have this daylight-savings or summer time, for example, Chile used to switch from UTC-04 to UTC-03 in October and then switch back in March… but that rule has never been the same, even worst: it was changed capriciously in the last 10 years. Chile is so long, that in 2017 a new time zone was created for the southest region of the country, Magallanes, that has no summer time. And Chile is not the only country making all of this arbitrary changes.

All of this is a huge pain in the neck for everyone who is a system administrator. Just check at your home how many devices keep some for of time: your cellphone, computer, microwave, TV and cable set, etc. And wall clocks, of course. Now image you have to keep everything in sync and how difficult it is, because every device is configured in a different way. Now image the same problem multiplied by 10, 100 or 1000.

Hopefully, there is people that have been working in solutions.

Olson and the TZ database

The tzdata, zone info, or Olson database is a public iniciative created by Arthur David Olson. It consist of code (software) and data files with all the current and past arbitrary modification of time to every known time zone. It has: leap seconds, time zones, daylight-saving, and even history of past times, as accurate as it can be.

In the Olson database, a time zone is “any national region where local clocks have all agreed since 1970”, and they have a simple but meaningful name like “America/Santiago” with a description like “Chile (most locations)”; or “Pacific/Easter” is “ Easter Island & Sala y Gomez”. If you have seen this kind of description when you choose a time zone, your system is using some form of the tz database.

For example, this is an extract of the current Chilean rules as version tzdb-2018a:

Isn’t it funny that Paul Eggert doesn’t believe Magallanes will be back to normal in 2019?

It is important to note that any modification to this database is not official, as you can read from the comments. It’s just good people trying to make something useful for everybody, providing info as exactly and accurately as they can get. And of course, anyone can contribute to it.

The thing is that many software like Linux, Unix, Java, and many more have tzdata as the official source to correctly handle different time zones, leap seconds and daylight-savings for all kind of users around the world.

But this database is not online like the NTP protocol. Is arbitrary updated (because the changes are arbitrary) and is manually published at www.iana.org/time-zones. So, when a new version arrives, all software like operating systems must provide new versions of this database as upgrades or patches. Many times, your platform doesn’t use the operating system database for whatever reason (for example Java and PHP). So you have to update your operating system AND your software platform like Java, PHP, database and whatever. Yes, more patches means more fun! (NOT).

And that is the real problem here, because a patch to your operating system is actually a big deal. It probably requires a restart of your software or maybe the whole system. Not to mention that not every system can be rebooted anytime whenever a new update is published.

Upgrading a software for changes in tzdata is a very expensive operation. And not everyone has the time or resources to do it well or care about it. So, even if the tzdata is released as soon as possible, many times those updates arrive weeks or months later to the final users.

This is why sudden changes in time zones are really bad for everyone. Like the incompetent changes made by Chilean government in 2011, where the programmed change from UTC-03 to UTC-04 should occur in April 2, 2011; but it was delayed to May 7, 2011… and it was announced just 6 days before, on March 28. Who did elect this fools?

It is simple impossible that tzdata is updated and distributed to everyone in just only 6 days. The end result is that everyone get’s confused, some systems are updated, many are messed up, and most of them will simply not be updated. What a nice and huge impact in productivity, thank you.

Time formatting

I want to insist that every system today should be built as globally. That’s why date representation is important.

You know, chilean dates are formatted DD/MM/YY but in USA they are MM/DD/YY. It’s something trivial, but it’s a big mess when import/exporting data in any software.

The solution for this is to always use the ISO 8601 standard, which defines the format YYYY-MM-DD, and of course consider things like time, UTC offsets, etc. It also consider other things like intervals and durations, for ex: 1 month is “P1M”. As always, you should not try to do all of this by yourself: use a library, SDK or whatever provides your language to build your software correctly. Well, if still doesn’t exist in your favorite platform, you can create it and share with the rest of the world ;-)

Maybe your client tell you this is stupid, and they want to keep using the old local format, whatever is that. But I strongly disagree. Just tell them that in the current world is important to easily share information with anyone, and this is part of it.

The end

Well, this is enough for today, but I hope to give you a glimpse of how complicated is to handle date and time in computers.

This matter is quite complicated, I am sure that I have made at least one error/omission. Leave in the comments anything that you may find, thank you!

Time is so complicated for software engineers, that we should always use an API that correctly handles years, month, days, minutes, etc. If you try to do it yourself, you’ll get it wrong. Remember that all software today must think and work globally.

I did not write about how software could handle all of this mess, even though time zone changes happens all the time. I will talk about solutions in thenext post. Keep tuned!

--

--