I’ve been researching date/time formats for inclusion in a software product, and wanted to share some of the thought processes involved.Use case: an engineer or scientist wants to plot something over some time period. All time periods are possible, so we must provide years down to minutes. For example, a coastal engineer surveyed said he:
“visualizes the results of the global wave prediction models we run. I am typically dealing with long term (20 – 40 year) datasets of hourly to sub-hourly model data.”
Another scientist[1] samples temperature readings in and around geysers located in Yellowstone national park. Samples are typically taken at intervals ranging from 15 seconds up to 30 minutes (though 30-seconds is typical). He might look at data over a period of months, then want to zoom in on smaller areas for particular areas of interest. For example, here are some sample views of Congress Pool, Yellowstone National Park, taken from October 5 – 18, 2005[1]:
October 5 – 12 | 24-hour period, October 10, 2005 | 0:00 – 0:15, October 10,2005 |
Storage: This was perhaps the easiest thing to determine. Representing dates as text strings is problematic because we deal with tens of millions of data points. Converting to and back would make the files larger, be prone to round-off error, and would introduce a lot of problems with supporting different input formats. Madness!
We borrowed from Microsoft Excel, which stores date/time values as a floating point number. The integer is the number of days since December 31, 1899. The decimal is the fraction of a day. For example:
Date/Time representation | FP value | Note |
---|---|---|
Not defined | 0.0 | Not defined |
1 January 1900 00:00:00 | 1.0 | Midnight! |
1 January 1900 01:00:00 | 1.041667 | One hour |
1 January 1900 00:01:00 | 1.00069444 | One minute |
1 January 1900 00:00:01 | 1.000011574 | One second |
1 January 2008 11:00:00 | 39448.45833 | Polar Bear Plunge! |
23 February 2008 12:34:45 | 39501.52413 | As I type this |
There are a couple of instances where someone might want to look at data from prior to 1900, so we allow negative integers. However, we’re only allowing back to 1800 because the Gregorian calendar has some quirks, especially in 1752.
Calendar systems: Gregorian. Seriously. We decided early on that we would not support other calendar systems[4] — like the more accurate Iranian calendar[3] or the artistic French Republican Calendar[5] — until there was a sufficient business case. If someone wants to wave a bunch of Rial or Euro at us, we will reconsider. (Except the US Treasury prevents us from transacting business with Iran, so it’s unlikely.)
Printed values: Now the hard part, what representations to use for dates? For example, consider how most people in the U.S. write dates. “02/24/08” typically means “24 February 2008.” In Europe, this would be abbreviated as “24/02/08.” This isn’t so bad, because there’s obviously no “month 24.” However, if you write 02/08/08, you might be referring to either February 8th or August 2nd. Entirely different seasons demand entirely different wardrobes!
The official standard, ISO 8601 [6], standardizes the ordering in a rational way. Thus, our default suggested format is ISO 8601-happy yyyy-mm-dd hh:mm:ss, or 2008-02-23 12:34:45. However, we leave the determination up to the user. We think with millions of points of data, they’ll use a more compact format. 🙂
Sources:
- [1] Dr. Henry Heasler, Park Geologist, Yellowstone National Park Service.
- [2] Gregorian Calendar, Wolfram.
- [3] Iranian Calendar, Wolfram.
- [4] Calendar comparisons, Kelly Ross.
- [5] Preserving the French Republican Calendar, Alan Taylor.
- [6] ISO 8601 simplified.
- [7] Office of Foreign Asset Control, US Department of Treasury.