u/No-Resolution-4054

▲ 0 r/docker

What is Docker Compose and Volumes? What problem do they solve?

I am still learning , so if I am wrong anywhere or if there is something important that I should know, please let me know.

In a real application, we usually have multiple containers like a frontend, backend, database, Redis, etc.

Managing all these containers manually is very difficult. Also, Docker images are immutable, so whenever we change our code, we don't want to rebuild the image and recreate the container every single time during development.

This is where docker-compose.yml

It lets us define everything in one file. We can define which images to build, which ports to expose, environment variables, volumes, networks, and much more.

Then we can start the entire application with just one command:

\- docker compose up

and stop everything with:

\- docker compose down

One thing that confused me a lot was volumes.

Let's say I have a folder named backend on my system, and Docker builds an image where all the code is copied into /app.

Then in docker-compose.yml I write:

volumes:
- ./backend:/app

From what I understood, this bind mount hides (overrides) the /app folder inside the container and mounts my local ./backend folder there instead.

So now the container reads the files directly from my local machine instead of the files that were copied into the image. This is great because whenever I edit my code, I don't have to rebuild the image.

But this creates another problem.

Since the entire backend folder is mounted, it also mounts my local node\_modules.

That is not what we want because my local machine could be Windows or macOS, while the container is running Linux. The dependencies inside node\_modules are installed for the operating system they were built on, so using the host's node\_modules inside a Linux container can cause issues.

This is where a named volume comes in.

We add another volume:

volumes:
- ./backend:/app
- backend_node_modules:/app/node_modules

Here, backend\_node\_modules is just the name of a Docker managed volume.

If this named volume doesn't already exist, Docker creates it. Since the volume is initially empty, Docker copies the existing /app/node\_modules from the image into the named volume.

Now this named volume is mounted at /app/node\_modules. Since we already mounted ./backend:/app, the container was using the node\_modules from my local Windows/macOS machine. This new mount hides those host node\_modules and replaces them with the node\_modules stored in the backend\_node\_modules named volume, which contains the Linux dependencies copied from the image.

So the result is:

My application code comes directly from my local machine, so changes are reflected instantly.

node\_modules comes from the Linux container, so I don't have operating system compatibility issues.

reddit.com
u/No-Resolution-4054 — 3 hours ago
▲ 0 r/webdev

What is Docker Compose and Volumes? What problem do they solve?

I am still learning , so if I am wrong anywhere or if there is something important that I should know, please let me know.

In a real application, we usually have multiple containers like a frontend, backend, database, Redis, etc.

Managing all these containers manually is very difficult. Also, Docker images are immutable, so whenever we change our code, we don't want to rebuild the image and recreate the container every single time during development.

This is where docker-compose.yml

It lets us define everything in one file. We can define which images to build, which ports to expose, environment variables, volumes, networks, and much more.

Then we can start the entire application with just one command:

- docker compose up

and stop everything with:

- docker compose down

One thing that confused me a lot was volumes.

Let's say I have a folder named backend on my system, and Docker builds an image where all the code is copied into /app.

Then in docker-compose.yml I write:

volumes:
  - ./backend:/app

From what I understood, this bind mount hides (overrides) the /app folder inside the container and mounts my local ./backend folder there instead.

So now the container reads the files directly from my local machine instead of the files that were copied into the image. This is great because whenever I edit my code, I don't have to rebuild the image.

But this creates another problem.

Since the entire backend folder is mounted, it also mounts my local node_modules.

That is not what we want because my local machine could be Windows or macOS, while the container is running Linux. The dependencies inside node_modules are installed for the operating system they were built on, so using the host's node_modules inside a Linux container can cause issues.

This is where a named volume comes in.

We add another volume:

volumes:
  - ./backend:/app
  - backend_node_modules:/app/node_modules

Here, backend_node_modules is just the name of a Docker managed volume.

If this named volume doesn't already exist, Docker creates it. Since the volume is initially empty, Docker copies the existing /app/node_modules from the image into the named volume.

Now this named volume is mounted at /app/node_modules. Since we already mounted ./backend:/app, the container was using the node_modules from my local Windows/macOS machine. This new mount hides those host node_modules and replaces them with the node_modules stored in the backend_node_modules named volume, which contains the Linux dependencies copied from the image.

So the result is:

My application code comes directly from my local machine, so changes are reflected instantly.

node_modules comes from the Linux container, so I don't have operating system compatibility issues.

reddit.com
u/No-Resolution-4054 — 12 hours ago
▲ 0 r/webdev

Wanna share my understanding of Docker and would love to hear from seniors if I'm missing something

Docker solves a problem we've all heard about:

"It works on my machine."

Why does that happen?

Because of the environment.

My PC might be running Windows, yours might be Linux or macOS. Every operating system has different binaries, packages may install differently, and even two Windows machines can behave differently because of different Node.js versions or packages that only work with specific versions.

So... what's the solution?

Docker.

Docker solves this by packaging everything needed to run your application into a single environment.

But what does that actually mean?

Let's first understand Images.

I like to think of a Docker image as a mini PC that already contains everything your application needs:

  • A specific Node.js version
  • An Alpine Linux OS
  • All your dependencies (node_modules)
  • Your application code

Now, what is Alpine Linux?

It's a very small and lightweight Linux distribution. It doesn't include unnecessary tools or a graphical interface it's designed to use very little memory and efficiently run applications.

One important thing I learned is that images are immutable. That means if you change your application code, the image doesn't magically update. You need to rebuild the image and then run a new container from it.

So now you have one consistent environment that behaves the same everywhere.

But...

Where does this image actually run?

It runs inside a container.

A container is basically a running instance of an image.

It has:

  • Its own filesystem
  • Its own processes
  • Its own network interfaces
  • Its own isolated environment

You can think of it as a mini computer running inside your computer.

Because it's isolated, if your app listens on port 3000 inside the container, your browser can't access it directly.

That's why we map the container's port to a port on our host machine:

docker run -p <host_port>:<container_port> <image_name>

For example:

docker run -p 3000:3000 my-app

Now visiting localhost:3000 on your machine forwards the request to port 3000 inside the container.

This is my current understanding after learning Docker. I'm still exploring volumes, networks, Compose, Kubernetes, etc.

If I misunderstood anything I'd appreciate corrections from people who use Docker in production. Always happy to learn!

reddit.com
u/No-Resolution-4054 — 1 day ago
▲ 0 r/webdev

I was building a service like Lovable and whenever I opened the preview URL, the page kept reloading after some time

I was building a service like Lovable where every user gets their own React app running inside a sandbox.

Everything was working fine except one weird issue.

Whenever I opened the preview URL, the page kept reloading after some time. After debugging for hours, I found that the problem was Vite's HMR (Hot Module Reload).

I already knew that WebSocket starts as an HTTP request and then sends an Upgrade request to switch from HTTP to the WebSocket protocol.

The request flow in my setup was:

Preview URL -> Ingress -> Router -> Preview Service -> Pod

A normal HTTP request worked perfectly because the router knew where to forward it.

But when the browser sent the Upgrade request for WebSocket, the router got confused.

Why?

Because every sandbox has its own pod, and the router didn't know which pod should receive the WebSocket upgrade request. Unlike a normal React app (where there's only one backend), my router had to choose between many different sandbox pods.

So I had to manually handle the upgrade event:

server.on("upgrade", (req, socket, head) => {
  const host = req.headers.host;
  if (!host) return socket.destroy();

  const sandboxId = host.split(".")[0];

  const proxy = getProxy(sandboxId);
  proxy.upgrade(req, socket, head);
});

Now, using the sandbox ID from the hostname, I can proxy the WebSocket upgrade to the correct pod, and HMR works perfectly.

One thing I was initially confused about was:

Why do we need to manually handle server.on("upgrade")?

The answer is that the browser only asks to upgrade the connection. My custom router is the component that has to decide where that upgraded connection should go. If I don't handle the upgrade event, the router treats it like a normal HTTP server and the WebSocket handshake never reaches the correct sandbox.

reddit.com
u/No-Resolution-4054 — 9 days ago

I was building a service like Lovable and Whenever I opened the preview URL, the page kept reloading after some time

I was building a service like Lovable where every user gets their own React app running inside a sandbox.

Everything was working fine except one weird issue.

Whenever I opened the preview URL, the page kept reloading after some time. After debugging for hours, I found that the problem was Vite's HMR (Hot Module Reload).

I already knew that WebSocket starts as an HTTP request and then sends an Upgrade request to switch from HTTP to the WebSocket protocol.

The request flow in my setup was:

Preview URL -> Ingress -> Router -> Preview Service -> Pod

A normal HTTP request worked perfectly because the router knew where to forward it.

But when the browser sent the Upgrade request for WebSocket, the router got confused.

Why?

Because every sandbox has its own pod, and the router didn't know which pod should receive the WebSocket upgrade request. Unlike a normal React app (where there's only one backend), my router had to choose between many different sandbox pods.

So I had to manually handle the upgrade event:

server.on("upgrade", (req, socket, head) => {
  const host = req.headers.host;
  if (!host) return socket.destroy();

  const sandboxId = host.split(".")[0];

  const proxy = getProxy(sandboxId);
  proxy.upgrade(req, socket, head);
});

Now, using the sandbox ID from the hostname, I can proxy the WebSocket upgrade to the correct pod, and HMR works perfectly.

One thing I was initially confused about was:

why do we need to manually handle server.on("upgrade")?

The answer is that the browser only asks to upgrade the connection. My custom router is the component that has to decide where that upgraded connection should go. If I don't handle the upgrade event, the router treats it like a normal HTTP server and the WebSocket handshake never reaches the correct sandbox.

reddit.com
u/No-Resolution-4054 — 9 days ago

Difference between Ingress and API Gateway, and at first I thought they were basically the same thing.

First, what is Ingress?

In Kubernetes, services are usually internal and run inside a cluster. You can think of the cluster as a private network that the outside internet cannot directly access. But we still need a way to expose some services to users. That's where Ingress comes in.

Ingress takes requests from the internet and routes them to the correct service inside the cluster based on rules such as paths. E.g.:

/auth --> auth-service --> order-service

Its main job is routing traffic into the cluster.

Now what is an API Gateway?

It also feels very similar because it acts as a central entry point between clients and microservices. It receives incoming requests, verifies them, and routes them to the correct services.

So how are they different?

Ingress mainly focuses on: HTTP/HTTPS routing, Path-based routing, TLS termination

An API Gateway can do all of that, but it usually provides many additional features such as:

Authentication
Authorization
Rate limiting
API keys
Request transformation
Response transformation
Logging Analytics
Caching
Load balancing

So an API Gateway is not just routing traffic it is also enforcing API policies.

Another question I had was: "If we are already using Kubernetes and have Ingress, do we still need an API Gateway?"

From what I learned, the answer is it depends.

For small projects, startups, or simple architectures, Ingress alone is often enough.
But large companies with 100+ microservices may use both.

In that setup:
API Gateway handles things like:

Verify JWT
Check rate limits
Log requests
Add headers
Apply API policies

And Ingress mainly handles routing traffic to the correct service inside the cluster.

if I'm missing anything or Any corrections? please let me know

reddit.com
u/No-Resolution-4054 — 17 days ago
▲ 0 r/webdev

Difference between Ingress and API Gateway, and at first I thought they were basically the same thing.

First, what is Ingress?

In Kubernetes, services are usually internal and run inside a cluster. You can think of the cluster as a private network that the outside internet cannot directly access. But we still need a way to expose some services to users. That's where Ingress comes in.

Ingress takes requests from the internet and routes them to the correct service inside the cluster based on rules such as paths. E.g.:

/auth --> auth-service --> order-service

Its main job is routing traffic into the cluster.

Now what is an API Gateway?

It also feels very similar because it acts as a central entry point between clients and microservices. It receives incoming requests, verifies them, and routes them to the correct services.

So how are they different?

Ingress mainly focuses on: HTTP/HTTPS routing, Path-based routing, TLS termination

An API Gateway can do all of that, but it usually provides many additional features such as:

Authentication
Authorization
Rate limiting
API keys
Request transformation
Response transformation
Logging Analytics
Caching
Load balancing

So an API Gateway is not just routing traffic it is also enforcing API policies.

Another question I had was: "If we are already using Kubernetes and have Ingress, do we still need an API Gateway?"

From what I learned, the answer is it depends.

For small projects, startups, or simple architectures, Ingress alone is often enough.
But large companies with 100+ microservices may use both.

In that setup:
API Gateway handles things like:

Verify JWT
Check rate limits
Log requests
Add headers
Apply API policies

And Ingress mainly handles routing traffic to the correct service inside the cluster.

if I'm missing anything or Any corrections? please let me know

reddit.com
u/No-Resolution-4054 — 17 days ago
▲ 27 r/webdev

Recently I studied Kafka and wanted to share my understanding.

Kafka is used for handling messages/events between different services.

Here's how I understand it:

  1. A Producer sends an event/message to Kafka.
  2. The message contains things like Topic, Key-Value data, and Timestamp.
  3. Kafka stores these messages in Brokers (Kafka servers).
  4. Topics can be divided into multiple Partitions.
  5. Each partition has one Leader and multiple Followers (Replicas).
  6. All read and write operations happen through the Leader, while Replicas act as backups if a broker fails.

Now Kafka does not immediately delete messages after they are consumed, unlike many traditional queues.

There is a term called Offsets. You can think of an offset like the index of a message inside a partition.

For example:

A user places an order → payment is processed → email is sent → analytics service processes the event.

Suppose during that analytics service goes down, Kafka knows which offset was last processed. When the service comes back up, it can continue from that offset instead of starting from the beginning.

This is also one reason why Kafka keeps messages for some time after consumption.

Any corrections? Is there anything else I should know about this topic? Please let me know.

reddit.com
u/No-Resolution-4054 — 27 days ago

Recently I studied Kafka and wanted to share my understanding.

Kafka is used for handling messages/events between different services.

Here's how I understand it:

  1. A Producer sends an event/message to Kafka.
  2. The message contains things like Topic, Key-Value data, and Timestamp.
  3. Kafka stores these messages in Brokers (Kafka servers).
  4. Topics can be divided into multiple Partitions.
  5. Each partition has one Leader and multiple Followers (Replicas).
  6. All read and write operations happen through the Leader, while Replicas act as backups if a broker fails.

Now Kafka does not immediately delete messages after they are consumed, unlike many traditional queues.

There is a term called Offsets. You can think of an offset like the index of a message inside a partition.

For example:

A user places an order → payment is processed → email is sent → analytics service processes the event.

Suppose during that analytics service goes down, Kafka knows which offset was last processed. When the service comes back up, it can continue from that offset instead of starting from the beginning.

This is also one reason why Kafka keeps messages for some time after consumption.

Any corrections? Is there anything else I should know about this topic? Please let me know.

reddit.com
u/No-Resolution-4054 — 27 days ago
▲ 6 r/webdev

Understanding Consistent Hashing Correct Me If I'm Wrong

Why we need it ?

Suppose we have multiple databases and want to distribute data among them.

Instead of searching every database when we need some data, we use a rule that tells us exactly which database should store a particular record.

Method 1: Modulo Based Distribution

Imagine we have 3 databases DB-A , DB-B , DB-C Each record has a unique ID. Now We decide the database using ID % Number_of_Databases for e.g. 16 % 3 = 1 So record 16 goes to database index 1 (DB-B). This works fine until we add another database. The same record becomes 16 % 4 = 0 Now record 16 should be stored in DB-A instead of DB-B. The problem is that when the number of databases changes, a huge amount of data gets remapped to different databases. This can cause Massive data migration , Increased CPU and network usage

Method 2: Consistent Hashing

Instead of using modulo, imagine a circular ring numbered from 0 to 99. We place our databases on the ring:

DB-A -> 0
DB-B -> 25
DB-C -> 50
DB-D -> 75

Now we pass the data unique id through a hash function and it will give the location of that data on the ring for e.g. User ID = 12345 hash(12345) = 42 now we get the position we Move clockwise. Store the data in the first database you encounter This means we store the 42 position data at the DB-C

Now What Happens When We Add a New Database?

Suppose we add DB-E -> 37 now only the data between 26 to 37 needs to move from DB-C to DB-E. The rest of the data stays exactly where it was. This is the biggest advantage of Consistent Hashing much less data migration , easier scaling , lower operational cost

Now there is one more thing in this method which is Virtual Nodes

One issue is that some databases may receive much more traffic than others. To balance the load, the same database can appear multiple times on the ring.

DB-A -> 0, 40, 80
DB-B -> 25, 65
DB-C -> 13, 50, 90
DB-D -> 75

These extra positions are called virtual nodes.

Any corrections? Is there anything else I should know about this topic? Please let me know.

reddit.com
u/No-Resolution-4054 — 30 days ago
▲ 0 r/webdev

Studied how the News Feed works in Instagram and other social media platforms.

One important concept I learned is Fanout, which is basically how posts are distributed to user's feeds.

  1. Fanout Push

When a user creates a post, the system immediately pushes that post to the feed cache of all followers.
This is very fast because the feed is already prepared when users open the app.

  1. Fanout Pull

Instead of precomputing feeds, the system generates the feed when a user opens the application by fetching posts from accounts they follow.
It saves storage and avoids unnecessary work for accounts with huge follower counts.

Now real system user Hybrid Approach

For normal users with a few hundred followers, Fanout Push works well because the cost is manageable and feed loading is fast.

For celebrities like Virat Kohli with 250M+ followers, pushing every post to every follower's feed cache would be extremely expensive. Many followers may not even open the app, so a lot of storage and compute would be wasted. That's why large scale systems often use Fanout Pull (or a hybrid approach) for such accounts.

But How Does the Feed Know to Fetch Celebrity Posts?

A question I had was:

If my normal friends' posts are already present in my feed cache through Fanout Push, how does the system know that it should also fetch posts from celebrity accounts?

One possible approach is that the social graph stores metadata about accounts. Celebrity or high follower accounts can be marked differently. When a user opens the app, the Feed Service:

  1. Loads the feed generated through Fanout Push.
  2. Checks the accounts the user follows in the Social Graph.
  3. Identifies celebrity accounts that use Fanout Pull.
  4. Fetches their latest posts separately.
  5. Merges both results and then applies recommendation algorithms before returning the final feed.

Simplified Flow

User Creates Post
Post Service
Store in Database
Fanout Service
Check Social Graph & User Preferences (blocked users, muted users, close friends, etc.)
Create Fanout Tasks
Message Queue
Fanout Workers (Push or Pull Strategy)

I'm still learning system design, so if I've misunderstood anything or missed important concepts related to feed generation, please let me know in the comments.

reddit.com
u/No-Resolution-4054 — 1 month ago
▲ 14 r/webdev

Need honest guidance from seniors where do I actually stand in tech right now?

I’m currently in my 2nd year of college. After completing 12th, I started learning development seriously and since then I’ve explored a lot of things.

So far I have:

Built MERN stack CRUD apps
Solved 250+ LeetCode questions in Java
Learned ML basics and the math behind it Studied concepts like ANN and CNN Made a few small ML projects
Started learning Blender and Three.js for 3D development Created my own portfolio
Currently learning System Design

The problem is that I get impressed by new technologies very quickly. Whenever I see something cool, I start learning it. Because of that, I feel confused now. I don’t know what I should actually master.

Part of me wants to learn everything because I genuinely enjoy tech, but at the same time I’m scared about my future and earning money. Sometimes I feel like I know many things but I’m not exceptional at one thing yet.

I would really appreciate honest advice from seniors:

Am I going in the right direction?
Should I focus deeply on one field now?
How do I figure out what to master?
And how can I start earning while still learning?

Would really appreciate guidance from people who were once in the same situation.

reddit.com
u/No-Resolution-4054 — 2 months ago

Recently studied how push notifications work. Correct me if I’m wrong anywhere.

There are mainly 3 types of notifications:

  • SMS
  • Email
  • Push notifications (from apps)

Different providers handle them:

  • Email → Mailchimp, ActiveCampaign, etc.
  • SMS → Twilio, Sinch, etc.
  • Push notifications → Firebase Cloud Messaging (FCM)

From what I understood:

When we install an app and allow notification permission, Firebase generates an FCM/device token for that device. The app sends this token to the backend and stores it in the database.

Later, whenever the backend wants to send a notification, it sends a request to Firebase with that token, and Firebase knows which device should receive the notification.

I was confused about one thing:
“How do notifications still come even when the app is closed?”

Turns out Android uses Google Play Services in the background, which keeps connection with Firebase Cloud Messaging. So even if the app is killed, the OS/Play Services can still receive and display notifications.

Also, the token can change sometimes, like after uninstalling/reinstalling the app.

Typical flow looks something like:

Services → Authentication → Message Queue → Worker → Firebase → User

Pretty interesting system honestly

reddit.com
u/No-Resolution-4054 — 2 months ago

It generates a unique 64 bit ID using different parts:

  • 1 bit = always 0
  • 41 bits = timestamp (in milliseconds)
  • 5 bits = datacenter ID
  • 5 bits = machine ID
  • 12 bits = sequence number

The 41 bit timestamp gives around 2^41 milliseconds, which is roughly 70 years. This is one limitation because the system eventually runs out of timestamp range.

With 5 bits for datacenter ID and 5 bits for machine ID:

  • 2^5 = 32 datacenters
  • Each datacenter can have 32 machines

The 12 bit sequence number helps generate multiple unique IDs in the same millisecond from the same machine.

Example:
If two IDs are generated on the same machine at the exact same millisecond, the sequence number increments to keep them unique.

Some limitations/problems I learned:

  • Clock synchronization issues between servers can create problems. If one machine’s clock moves backward, duplicate IDs may occur.
  • Misconfigured machines with the same datacenter ID and machine ID can also generate collisions.
  • Sequence overflow can happen if too many IDs are generated in a single millisecond on one machine.

Really interesting system design concept because it creates distributed unique IDs without depending on a central database.

If you know more about it or how companies solve these issues at scale, please share in the comments. Would love to learn more

reddit.com
u/No-Resolution-4054 — 2 months ago

Recently I learned about Rate Limiters in System Design and wanted to share my understanding.

A rate limiter controls how many requests a user/client can make in a given amount of time.

Example:

  • 5 requests per second
  • 100 requests per minute

Why do we need Rate Limiting?

Rate limiters help:

  • Protect servers from too many requests
  • Prevent spam and brute force attacks
  • Avoid API abuse
  • Control infrastructure cost
  • Maintain system stability during traffic spikes

How does it work?

Usually, we track requests using something fast like Redis.

We identify users through:

  • User ID
  • API Key
  • IP Address
  • Session ID

Whenever a request comes:

  1. Check how many requests the user already made
  2. Compare against the allowed limit
  3. Accept or reject the request

Algorithms I Studied

1. Token Bucket Algorithm

Imagine a bucket containing tokens.

  • Bucket has a fixed capacity
  • Tokens are refilled continuously (per second/minute)
  • Each request consumes one token
  • If tokens are available = request passes
  • If bucket is empty = request gets rejected

This algorithm is nice because it allows small bursts while still controlling the average traffic rate.

2. Fixed Window Counter

This is the simplest approach.

Example:

  • Limit = 5 requests per minute

Windows:

  • 0–60 sec
  • 61–120 sec

Problem:
A user can send:

  • 5 requests at second 59
  • another 5 requests at second 61

So effectively 10 requests are accepted in a very short time.

This is called the boundary burst problem.

3. Sliding Window Log

This improves Fixed Window.

Instead of tracking only the current window, we store timestamps of recent requests.

Example:

  • Limit = 5 requests per minute

Suppose user made:

  • 1 request at second 0
  • 4 requests at second 50

Now at second 61:

  • requests from second 0 are outside the 60 second range
  • but requests from second 50 are still counted

So only some new requests may be allowed.

This gives smoother rate limiting and handles bursts better.

Other Things I Learned

  • Different APIs can have different limits
  • Payment APIs usually have stricter limits
  • Systems can also have a global rate limiter to control total incoming traffic

Still learning System Design, so feel free to correct me if I missed something.

reddit.com
u/No-Resolution-4054 — 2 months ago