The network is not invisible
- Insurmountable physical limits: The speed of light in glass fiber is roughly 30% slower than in a vacuum, imposing an unbreakable minimum latency floor.
- The serialization tax: Translating in-memory data structures to textual formats like JSON consumes CPU and adds significant latency compared to binary protocols.
- The distributed N+1 anti-pattern: Designing conversational rather than transactional APIs across microservices exponentially multiplies network RTT.
In local development, the network is an abstract concept. Everything responds in fractions of a millisecond. We make local database queries in Docker containers and connect services via local sockets with the illusion that transport is free. However, the moment our software is deployed to production—across multiple availability zones, cloud regions, or microservice architectures—physical reality reclaims its place.
The network is not transparent. It has costs, physical limits dictated by relativistic physics, and an economic penalty in terms of computational capacity. Ignoring these truths is the most common cause of performance degradation and catastrophic failures in modern systems.
Transparent Network Bias and False Expectations
In 1994, L. Peter Deutsch and other engineers at Sun Microsystems formulated the famous fallacies of distributed computing. The first and most destructive of these is: “The latency is zero”.
When a monolithic system is broken down into microservices, in-memory function calls (which take nanoseconds) are replaced by TCP/IP network requests. Even within the same AWS region, a network round-trip time (RTT) rarely drops below 1 to 2 milliseconds. If resolving a single user transaction requires chaining five calls to independent microservices sequentially, we have introduced a base delay of 10 milliseconds of pure network transit, without counting CPU execution or database queries.
This phenomenon is drastically aggravated when microservices communicate chaotically or without a clear aggregation model, leading to the distributed N+1 anti-pattern. If we make an HTTP call for each product to get its inventory status when rendering a list of 50 products on a page, we perform 50 round-trips. At 2ms per trip, the aggregated latency will be 100ms in network transit alone, destroying the user experience.
The Physics of Fiber Optics: Why the Speed of Light Doesn’t Scale
It is often assumed that with better cables or more bandwidth, latency will eventually disappear, ignoring the physical limits illustrated by the classic latency numbers every programmer should know. This runs directly against Einstein’s theory of special relativity.
The speed of light in a vacuum is approximately (c \approx 299,792 \text{ km/s}). However, network signals do not travel in a vacuum; they travel through glass fiber optic cables. The refractive index of glass ((n \approx 1.5)) slows down the speed of light propagation to about two-thirds of its speed in a vacuum:
[v = \frac{c}{n} \approx 200,000 \text{ km/s}]
This means that, by pure materials physics, the optical signal takes 1 millisecond for every 200 kilometers of glass fiber cable traveled.
If we calculate the geographical straight-line distance between Paris and New York (about 5,800 kilometers), the actual undersea cabling route is considerably longer due to marine geography, estimated at roughly 6,500 kilometers.
- The optical signal’s one-way trip will take: (6,500 \text{ km} / 200,000 \text{ km/s} = 32.5\text{ ms}).
- The theoretical absolute minimum Round-Trip Time (RTT) is 65 milliseconds.
- In practice, due to congestion in intermediate routers, switches, optical amplifiers, and optoelectronic conversion, the actual RTT ranges between 75 and 90 milliseconds.
No matter what processor you use, what trendy framework you choose, or how much you optimize your code: a synchronous request between Paris and New York will take at least 80ms to resolve because of the fundamental laws of our universe.
The Serialization Tax and CPU Overhead
Distance and routing are not the only culprits of network slowness. Every time we send data over a socket, it must be converted from its structured in-memory representation (objects, graphs) into a sequential byte stream. This is serialization.
In modern web development, JSON is the undisputed king because of its human readability. But this readability comes at a high computational price:
- String parsing: Converting numbers to text and vice versa (e.g.,
12345.67to"12345.67") requires CPU-intensive division and formatting operations. - Memory allocation: JSON parsers typically generate a massive number of temporary strings and objects on the heap, triggering frequent garbage collection cycles.
For high-throughput systems or internal microservice communications, the textual format of JSON becomes a severe performance bottleneck. This is where compact binary serialization protocols shine:
| Format | Type | CPU Overhead | Message Size | Schema Support |
|---|---|---|---|---|
| JSON | Textual | High | Large (repetitive) | No (dynamic) |
| Protobuf | Binary | Very Low | Very Small | Yes (strict) |
| Avro | Binary | Low | Small | Yes (dynamic) |
As Martin Kleppmann outlines in Designing Data-Intensive Applications, by adopting gRPC or Protocol Buffers (Protobuf), data is encoded into binary formats optimized for the CPU (for example, using Varint encoding for integers), drastically reducing both CPU overhead and payload size, thereby lowering transmission and routing times.
Pragmatic Mitigation Strategies
Since we cannot change the speed of light or eliminate the network, software architecture must adapt to mitigate its effects:
1. Aggregated and Transactional APIs (Batching)
Instead of granular interfaces that force the client to make multiple requests to gather information, design APIs tailored to the specific use case. Using patterns like Backend for Frontend (BFF) or technologies like GraphQL/gRPC with complex query support helps consolidate multiple network calls into one.
2. Asynchronous Decoupling
The best network call is the one that does not have to be made synchronously. If your system can process tasks in the background via message queues (Kafka, RabbitMQ) or event-driven patterns, network latency from the user’s perspective is reduced to zero, as the initial request is acknowledged immediately and heavy processing occurs eventually.
3. Co-location of Services
When low-latency synchronous calls are critical, physically group the services. Use the same geographic region and even the same Availability Zones for components that converse heavily. If two services have a strict temporal coupling, they should probably belong to the same monolithic process rather than separate distributed microservices.
Conclusion
Designing distributed systems requires a change in mindset: the network must be treated as a hostile, limited, and intrinsically slow resource. Optimizing internal code to save microseconds is useless if we then introduce milliseconds of latency due to poor API design or inefficient serialization. The physics of software teaches us that to build robust, high-performance systems, we must design our architectures with respect for the speed of light.



