According to Statista, 4.66 billion people were active internet users as of October 2020, encompassing 59% of the global population and i’m sure that number will grow since we rely on the internet for just about pretty much everything: from checking emails, ordering food and clothes, calling Uber to investing and making some extra money on the side. the best thing about the internet is the fact that it’s so easy to use yet as a software engineer student I’m fascinated by the complexity of the behind the screens interactions.
In this article i will try to simplify what goes behind the screen when you enter a URL into a browser and hit “ ENTER” .
Before going through “what happens after hitting the URL”, we must first know what a URL actually is, and what different parts of the URL .
1 - What’s a “URL”? :
URL or Uniform Resource Locator is so self explanatory: it is the location of the website we want to access. it is the address of the place that’s going to give us the information that we required. for example your home address is the URL to where someone goes in order to find the location of your house. A Uniform Resource Locator consists of the following components:
- The network protocol is used to refer to the browser which protocol it should use wither it’s http, https, ftp, etc. A protocol is a set of rules that’s used for communication over the network. https is the most secure version.
- A domain name it is the address of the website it is being used to reach the server that is responsible for the information served. People usually mistaken the domain with the URL. The domain name consists of three levels: a third-level domain, Even though when talked about a third-level domain we always think of “www” (world wide web) since it become a standard over time and it’s the most known , it is not absolutely necessary for an URL. other subdomains exists for example shop.domain.de or blog.domain.de, which refer to separate areas of a website. Second level domain this level contains the actual name of the searched website (e.g “holbertonschool”), it is purchased from various hosting providers. Top-level domain that represents the ending of the domain Ex : com, net, de etc. the domain name was mainly created to make it easier for us humans to navigate easily around the web since computer networks and servers talk to one another using a unique address called the IP address it is a language made up of letters and numbers ex : 22.214.171.124 or 3ffe:1900:4545:3:200:f8ff:fe21:67cf. So basically a domain name is the nickname of an IP address.
- File Path it contains the name of the file as well as it’s location, the file path can also be followed by some dynamic parameters to determine the structure of the page.
2- what happens when we enter
a URL into a browser?
Now that we know what an URL represents we can go ahead and further explain what happens when we decide that we want to access a certain website.I’ll try to explain this as easy and detailed as possible.
Let’s say I’m trying to access the “holbertonschool” official website (www.holbertonschool.com)
1- The physical keyboard action:
By typing the letter “h” in the search bar of the browser, the auto complete function kicks in by using the browser algorithm based on previous search history, cookies, bookmarks and popular web searches.
2- URL Parsing:
Now we have holbertonschool.com as a string the browser will start parsing it and it will start questioning if it is a Uniform Resource Locator or a simple search term. Usually the URL have a special text appended to it to inform the search engine that it comes from a particular browser’s URL bar.
3- HSTS checking:
HSTS is the HTTP Strict Transport Security it forces web browsers to interact with the websites via secure HTTPS connections instead of just HTTP in order to prevent cookie hijacking and downgrade attacks. So the browser will go head and check the HSTS list if it’s the given website is listed our browser will send a request via HTTPS, otherwise it will send a HTTP request. so what are HTTPS and HTTP?
- HTTP (Hypertext Transfer Protocol): is an application protocol for distributed, collaborative, hypermedia information systems that allows users to communicate data on the World Wide Web.
- HTTPS (Hypertext Transfer Protocol Secure): is simply a Hypertext Transfer Protocol (HTTP) with a Secure Socket Layer (SSL)/Transport Layer Security (TLS) protocol. TLS is an authentication and security protocol widely implemented in browsers and Web servers.
4- DNS request:
a DNS lookup is the process of returning a DNS record from a DNS server. It’s like searching up a phone number in a phone book. If not found the browser will call a gethostbyname library function to check if the hostname can be refered in the local hosts file before trying to resolve the hostname though DNS.
To find the DNS record, the browser checks four caches, the 1st cache to run a DNS query is the browser cache since the browser stores DNS records previously visited. Secondly, the OS cache as mentioned before the browser will make a system call to the computer OS to get the record since it maintins a DNS records cache. On a third step the browser will communicate directly with the router that sotres it’s own cache of DNS and that’s called the router cache. Finally, the browser will check the ISP cache (Internet Service Provider). So what is a DNS exactly?
- DNS (Domain Name System): is the collection of the database that’s translate the host names to IP addresses.
5- TCP/IP connections:
Once the correct IP address is received by the browser, a connection with the server of that IP address will be built in order to transfer information. diffrent internet protocols are used to build these connections however TCP is the most used protocol for HTTP requests.
A TCP connection is so important when it comes to transferring data packets between the client (your computer) and the server. It is established using the TCOIP three-way handshake process it allows the client and the server to exchange SYN(synchronize) and ACK(acknowledge) messages to establish a connection using theree steps
- the client will ask the server for a new connection by sending a SYN packet
2. when open ports are being found the server will respond with a ACKnowledgment of the SYN packet using a SYN/ACK packet to accept the new connections
3.a SYN/ACK packet will be recieved eventually by the client and will acknowledge it by sending back an ACK packet
6- HTTP request:
The browser now will send a GET request containing information such as (User-Agent header), (Accept header) and connection headers and information taken from the cookies of the browser. However,when submitting a form or entering credentials, this would be a POST request.
7-The server response:
The server will receive the request from the browser and pass it to a request handler ( a program written in PHP, RUBY…)using a web server like Apache, Nginx.. to read, generate and then assemble a JSON, XML, HTML response.
1- Firewall : A firewall can be hardware, software, or both. it monitors outgoing and incoming traffic. It establishes a barrier between incoming traffic from external sources and internal network for a better security.
2-Load-balancer: A load balancer acts as the “traffic cop” its main functions are distributing client requests and network load efficiently across multiple servers, sending requests to only the online severs and adding or subtracting servers as demand dictates.
4-Database:a database stores important information of the website like for example messages, people’s accounts, posts, etc.
And that’s what it takes for a webpage to be displayed after hinting enter. it’s so fascinating how a complex process takes seconds to load.