3rd-party data
Trust but verify — validate external timestamps against an expected format and fail loudly.
External APIs, feeds, and data imports do not share your assumptions about time. A field called created_at might be UTC epoch milliseconds in one provider, local-time ISO 8601 with no offset in another, and a Unix timestamp in seconds from a third. Even within a single provider, formats can change across API versions or differ between endpoints.
The safe approach: read the documentation, pick an expected format, and validate every timestamp field against it on ingestion. If the value does not match — wrong format, wrong units, suspiciously out-of-range — fail loudly with a clear error rather than silently continuing with a mis-parsed value. A corrupt timestamp that slips through will cause confusion far from the ingestion point.
Pin your assumptions in code, not just in comments. A parsing function that says parse_epoch_ms(value) is a contract; a comment that says “this field is epoch ms” can silently become stale.
Common traps with external timestamps
The two most common surprises are unit confusion and missing offsets. Epoch timestamps arrive as either seconds or milliseconds since 1970-01-01T00:00:00Z — and both look like plausible integers. A millisecond value interpreted as seconds points tens of thousands of years into the future; a seconds value interpreted as milliseconds collapses to just weeks after the epoch (around 1970-01-21). Both failures are silent without validation.
Missing or ambiguous offsets are equally dangerous. An ISO 8601 string with no Z or +HH:MM suffix is not a point in time — but many parsers will happily treat it as local time or UTC depending on the runtime’s default.
Pitfall: Treating an offset-free string from an external API (e.g. "2024-03-15T10:30:00") as UTC. The provider may have intended local time in an unstated zone. The value will parse without error and produce silently wrong results — typically off by the provider’s server timezone offset.
Go deeper: epoch units, schema pinning, and drift over time
Detecting seconds vs milliseconds. A simple heuristic: a Unix timestamp in seconds is currently ~1.7 × 10⁹; in milliseconds it is ~1.7 × 10¹². If the value is larger than, say, 10¹¹, it is almost certainly milliseconds. Codify this check rather than relying on developer memory.
Schema pinning. When an external schema is available (OpenAPI, JSON Schema, Protobuf), use it to generate typed, validated deserialization code. When it is not, write your own thin adapter layer that normalizes external timestamps to your internal representation (preferably RFC 3339 UTC) immediately on ingestion. Nothing downstream should ever see the raw external format.
Drift over time. External formats change without notice. A provider that sends seconds today may switch to milliseconds in a new API version. Build monitoring or tests that detect when ingested timestamps are unreasonably far from “now” — e.g., timestamps more than a year in the past or future are often a unit or format mismatch, not real data.
See also. For the analogous problem with user-entered dates, see Parsing user input. For how epoch formats are defined, see Unix time.