Scroll down to the script below, click on any sentence (including terminal blocks!) to jump to that spot in the video!
SymfonyCon 2019 Amsterdam presentation by Tobias Sjösten.
As a technical interviewer, one of the questions I like to ask the most is "what happens when I write www.example.com in the browser and then press enter?". The answer reveals a lot about the interviewee's understanding of a vast number of technologies that fringes web development.
In this talk, I will go through exactly what happens, down to excruciating detail, so that you will be better prepared for your future job interview.
Hello, can everyone hear me? Good. Cool. So second to last session, are we getting a bit tired today or spirits are up? We're good to go. All right, so my name is Tobias Sjösten. Very happy to see everyone here. So a lot of people in here. The, you've seen my last name. Uh, there's this ring with the two dots. That's a Swedish ER. So my accent didn't give it away. I'm from Sweden and this is my fourth SymfonyCon now. I'm very excited to be here and stumbled up on stage for the first time. Is anyone here for the first time. Ah quite a few. Very cool to see new faces here. So I've been, fourth SymfonyCon. I was also in the SymfonyLive back, back when there was a thing and before that SymfonyDay Uh, so I've been involved in Symfony for more than 10 years now and I've written a lot of code during this time.
First as a freelancer, uh, then later on as running a web agency and then as a consultant, up until last year. So more than a decades worth of uh, writing Symfony code for customers. Last year I was employed by this company Stim. I'm not gonna talk too much corporate sales or anything, but Stim basically. Uh, it's a company where you as a musician, it's a nonprofit organization. You as a musician can become a member and then you own part of the part of the organization. And if you write the music and your music is later on used to make money, we make sure that you get a piece of that money. And my role at Stim is as a software architect and that involves a lot of tasks that are non-coding tasks. For example, I do a lot of technical interviews, so when we have a candidate coming in and they want to work with us at Stim, we sit down for one hour and I get to prod them, nudge them and see where they are.
Like their technical skill level, was a very weird situation for me as a coder coming into. So I initially I did what any sane person do. You know, you go to Google and type in how to conduct a technical interview. And what Google tells you is that you should ask open ended questions to let the candidate speak freely and talk about whatever is important to them, what they think, you know, to show their width, and the depth of their knowledge. And we, we work mainly with web development at Stim. So I had to think and I came up with this one question that I thought was very clever. So I started asking our candidates this question to see how they talk their way through this problem and explain the situation. And it turned out to give very good answers. Like you could see people coming in as front end developers, they really focused on the front end stuff and they didn't care too much about the servers and things like that.
Whereas back enders were the reverse or full stackers, so it was a, I thought it was very clever, coming up with this question until one candidate came in and said, Oh, this question again. So it's a very common question. But despite that, a lot of the candidates who came in, they couldn't really talk their way through this question very well. So yeah, I, I will, I will show the question actually. Uh, what happens when I press enter, so I will, I will demo this this very quickly. So I will pull up, I think Chrome, I type in stim.se and I hit enter and there is a website. So what happens between me pressing enter and this website being rendered. So a lot of people got, couldn't really give a very good answer. So I thought why don't I go to SymfonyCon with a lot of developers here and maybe I can help you answer this question if you are looking for a job in the future. We are hiring by the way. In Sweden, if you'd like the cold just come talk to me after.
So the grid at the big overview of what does happen is your enter key is being pressed, a electric circuit is being closed, which flows some electric current through your keyboard to the logic controller logic circuit of the keyboard. It scans all the keys on the keyboard to determine which ones are being pressed. It encodes that as one signal, which it sends to the operating system. In this case, the enter keys number 13 so it sounds that to the operating system, it goes through the drivers of the operating system. For this specific keyboard, the operating system interprets the signal sees that this is signal number 13 so it's the enter key being pressed. The operating systems checks which application is open on your browser, on your computer. And it sounds, in our case it's Chrome.
So the operating system sends to Chrome, Hey the enter keys being pressed do without what you want. Chrome realizes that it needs to do something else. So it looks at the address bar because in the address bar you can, you can type stim.se for example, which is a domain. Or you can type how to conduct a technical interview, for example. And depending on this various cases, Chrome needs to do different things. So Chrome figures out that it needs to contact the server to talk to a server to get a response back. Chrome contacts the server sends a request to the server, the server hands off the request to Apache. Apache of course hands it off to Symfony, right Symfony does some processing, runs your codes, connects to a database, does a lot of calculations. And then the response gets sent back to the browser. The browser renders this HTML in the response and we see our web page. However, I am terrible with electronics, so we're not going to talk any electronics today and I am really bad at what happens with the whole rendering process. That's a whole talk in itself. So we're not going to touch that. We are only going to talk about the backend stuff basically.
So first things first, we need to determine what to do with whatever the browser, the user has written in the, in the address field. Hopefully it's a, we're going to assume that it's a, it's a URL of some kind. So this is one of the most complex part, of a URL. So here you have the protocol starting out HTTPS, which tells the browser that how we're going to communicate with the server. We're going to communicate over a protocol, HTTPS. We are requesting a protected resource. So I'm sending my username to be us together with a password, a secret to reach this resource. Then it's the www.stim.se, which is the domain. The sub domain is www. So we know who we're going to talk to. On this machine there might be several applications running and listening for web traffic.
So we can also define which port and thereby which application on this server that we're going to talk to. So you can send in 79 for example. By default it's port 80 or port 443. Yup. Thank comes the /hej, which is Swedish for hello. So if you don't learn anything else today, at least you know some Swedish now. So we want the page, Hey, at this web server. Ah ?q=a, we are telling the server, uh, something that is for the server to interpret. And finally we have a fragment which is #X, in this case, we never send this to the server. So whenever we get the HTML back from the server, our browser can look through the HTML and see if there's an anchor tag called #X. So it knows to jump to that immediately. So we know how to talk to, we know who to talk to. www.stim.se
We know how to talk to it through HTTPS and we know what to ask from it. We want the /hej page. If you remember my surname question, it has this funny character. So domain name kernel have these funny characters in them, in which case we, because in the internet for some reason doesn't speak Swedish. So we need to translate our domain name to our proper domain name, which is done through punical. So the browser also looks at their domain name now to see whether it needs to do that Punic killed translation.
Next, given the domain name stim.se, it doesn't give us enough information to know how to reach this domain. So for that we're going to use two protocols, which powers most of the internet. Uh, so these two together, TCP IP are over 50 years old. There's also one other part, called UDP, which is frequently used not, in the web yet. This suit used to be called department of defense model by the way, because it was developed by DARPA back in the, back in the old days. So the IP part that we're going to start talking about here is, uh, the locator basically, which is the address and a computer that connects to the internet is given an address so we can talk to it and it can talk to us. And of course, 50 years ago when they developed this technology, they came up with an address space with which holds about just over 4 million addresses, which of course it's never going to end right?
There's so many address that we were going to last forever until you start connecting your phone, your computer, your fridge, your microwave, and so on. And then we quickly run out of addresses. So they had to develop a new, IP version, which is IP version six, which is more and more being used today. We're still running a lot, a lot on IP version four. So within the IP address, there's a protocol called DNS, the domain name system, which is a way for us to actually translate a domain name into an IP address so that we later on can talk to the computer. So the domain name system, it's something, most browsers, they use the operating systems built in function called
getAddr. So they send in the domain name there and they get back an IP address. Chrome for some reason decided not to do that.
They have implemented their own DNS lookup. I don't know why, but they probably had some good reasons. So what the operating system does when it gets a request from application is it needs to, so it's a bit of a chicken and egg problem here that you have a domain that you want to translate to an IP address. In order to do that, you need to talk to a server to do the translation. The server, you can only be reached by an IP address. So how do you get an IP address in order to get an IP address? Ah, very clever. You have it pre-installed in your computer when you get the computer, or you can also get the list of IP addresses from the network that you connect to through the HCP. So the first thing though that the computer does the operating system does when it gets the request to translate a domain is that it looks at your local hosts file.
So you can override this whole DNS system in your computer. For example, if we have stim.se on the server here, everything's working, the internet is using that server. I want to move it to a new server. Then I take this, the site I copied over to the new server on a new IP address, and then in my local computer I can update my hosts file. So whenever I go to stim.se, I go to this new server. Everything looks fine, I update it so the internet goes to the new server. So that's a useful way to use your internal host system. If you don't have the domain name in your host system, though, the operating system moves on to their
resolv.conf file. All of these are text files laying around in your computer and you're free to edit them and do whatever you want with them.
resolv.conf file contains usually a list of name servers. You can add your own name server like Google has it's 8888 or CloudFlare has 1111 so you can add it yourself if you want a better name server. So we have an IP address for a name server. Finally we can connect to it so we make a connection. We ask name server, Hey, we have stim.se. What's the IP address? What you get back down is a pointer. A pointer can be there an A pointer or an AAAA pointer, whether it's IP version four or version six telling us that this domain name has IP address. It could also give us back our CNAME or an A name pointer, which tells us that this domain name is actually an alias for another domain name, so you have to go to that other domain name and they might be a couple of steps until finally hopefully get an IP address back or a list of IP addresses as well.
Each pointer has a TTL or time to live, which is usually by default it's set very high. So when you want to change your IP address, you might have the experience that it can take up to 72 hours or something like that. And that's the TTL that is set very high, which is a way for everything in between your computer and the name server to cache the the lookup. So the next time you go to stim.se, You don't have to go the whole way to look it up and then back. But you can write in your computer, it can be cached for however long this TTL set to. So next time it will be much faster. So we have one IP address or a list of IP addresses. Next step is to connect to it to start making our requests for what we want from the server and the internet is built upon a communication layers. You might have heard of the OSI model for example, which is an old way of structuring, like you have the electronics in the bottom and then you build upon layers.
The layer, the deepest layer that we will go through here though is the TCP layer transmission control protocol. So this is what powers all of most of our internet traffic today. FTP, SSH, HTTP and so on. And the way it works is that you take your data that you want to send to another computer, you chop it up into small packages. On each package you can set the number of flags on the metadata in the header of the packages. You number them one, two, three, four and then you send them on through the network. So then the receiving part can get, depending on which route the packages take, you can get number two first and then number three, then the number one. And then the receiving part can put them together and unpack them and get the whole whole picture.
So it's a way to to optimize by sending out all the packages out in several different ways. The packages have one outport and one import. So we are telling the package that from our computer, our outgoing port is one, two, three, four, five for example. So that when the response comes back, then, the server can tell us that this is going back to our application Chrome one, two, three, four, five. It also has an outgoing port, which is the equivalent for the server where we want to send it into the server. And finally they have a bunch of flags, these TCP packages. So these flags is used to communicate where in the process of communication we are.
And in order to establish our TCP connection first, we send one package with a
SYN flag set to true the synchronize. So we send that off to the server, the server response setting, the ACK
SYN ACK synchronized acknowledge bits, flags on the packages sends it back to us. And finally our client sets, sends one package with the acknowledge flag to the server. So synchronize synchronous, acknowledge and acknowledge back. So now we have done, ah, we have established a connection to the server that we want to communicate with them. We might have IP version four and IP version six addresses IP addresses, in which case we use a technique called happy eyeballs. So the server sends out both an IP four and IP six connection to the server and whoever responds to fastest is the one that we're going to use. So also a way to, to optimize because the way you've set up your IP version six might be very, very slow. So in that case, we want to use the old IP four.
SYN ACK package to you, they send a reset package. So then the connection is dropped and you can't go to Facebook.
Next up we have our TCP connection established so we can now communicate with the server. And on top of this we might want to put another layer of communication before we are ready to actually start sending data to them. So for example, if you wrote HTTPS colon slash slash in the web browser, then we are telling the browser that we want to make an HTTPS connection. So HTTPS hypertext transfer protocol secure. So it's a way for our browser to know that whoever we're talking to is actually the one we want to talk to and no one can listen on the traffic in between us. So it's encrypted traffic. So either we write in our browser that we want this specific protocol or we there, there's a technological HSTS, which is a way for the server to tell us that going forward from the first request. Now going forward, we want to use HTTPS for everything. You can also, if you have a domain that you know that we will never want to use regular HTTP. You can submit your domain to the browser lists and they will include it by default in the browser. So everyone always gets HTTPS default.
The way this works to establish HTTPS is that the client first on top of the TCP layer sends a package to the guest server saying client hello. Along with this request you also send a list of algorithms for encryption that you are able to, that the browser is able to encrypt. So in encryption you have, two types roughly speaking you have two types of encryption. You have symmetric encryption where you have one key that you use to encrypt it and decrypted you have also asymmetric encryption, which is when you have two keys. So you can use the first key to encrypt it and then you need to use the second key to decrypt it or vice versa. So the client here first in this step it sends a list of algorithms, symmetric algorithms to encrypt with, the server response saying server hello and he picks one of these algorithms to go forward with.
It also sends the certificate of the server to the client and the certificate contains a domain and an expire date. So we can from the get go we can see that this is actually the certificate for stim.se. So we are good to go with this one and it has not expired. The certificate also contains a public key, which is one of these, you know, key pairs and a signature. And based on the signature we can determine if it's a valid, like if it's actually, uh, if it comes from a source that we, uh, that we trust. So in, in your computer, again to solve the whole chicken egg problem, you have a list of certificate authorities, uh, that are companies that issues these certificates that your computer trusts. So given the signature of the certificate, you can check whether it's a valid issued certificate.
We also make a request to this issuer to make sure that it's still a valid request because sometimes these issuers get hacked and then you are in a world of hurt because then whoever hacked them can issue a new certificate. So you don't want to trust those. So we make an extra check just to make sure that the certificate is actually still valid. Now we know that this is a valid certificate. Then we have the public key contained in the certificate and we on the client side, we generate a session key. So the session key is what we're going to use later on going forward. We take the session key, we encrypt it with the public key of the server, we send it onto the server. Now since we have a public key and a private key the server only knows the private key, only the server can decrypt it. So now the client knows a key, a secret key and the servant knows a secret key. And we know and we have, we have all decided on an algorithm to use. So everything is in place now. Now we can encrypt our traffic, we know that we can trust this server and we ready to go. So the client says, client ready service says client server ready and we have one layer on top of the other, we're ready to go.
So in order to actually send some data, we need to format it. So that a web server can understand the request. So here's a very simple request. It would look something like this also in clear text being sent over the internet. So we're saying that we want to get the
/hej page using protocol HTTP 1.1 and we are asking for this page from the host www.stim.se. Instead of the GET, there's many different, I think you have nine or ten different verbs. You can say that you want to delete this resource or you want to update it with a PUT or PATCH request and so on. The protocol part, I think we've also talked earlier about HTTP 3. So you have HTTP 1.1 which is the most common today. We're, we're slowly rolling over to HTTP 2 which instead of having these six connections simultaneously to send data.
So with, with HTTP you can, you can send a request and then you wait for the server, the server turns a response and that's it. Then you send another one, which is very, very slower and that's why they do six simultaneous connections. So with HTTP 2 you can instead send all the requests at one time and you get back a bunch of responses in the same connection. HTTP 3 instead is going to use that other instead of TCP, it's going to use UDP protocol, which is less reliable, but it's very, very much faster. So to speed up the internet. Along with the along with the first line and GET Hey, uh, we also send a list of headers. So in this case we were only using the host header. You can also send the content type to say that this data that I'm sent, this data that I am sending you is in text format or JSON format.
You can also send the accept header for example, and saying that the data you're to return to me should be in YAML whatever you want back. And there's one funny header called the
referer, header. So whenever you're on one website and you click a link, you go to another website, then your browser automatically sends the
referer header to the new website so that they know where you came from. Of course in the English, English language referrer is spelled with two Rs. But when they wrote up the spec in the 90s, they used a spelling program called spell. And this spelling program did not have support for referrer. So there was a typo sneaking in to the specification. So this is an email from, from Roy Fielding the big brains behind the rest API architecture. Just talking about how, we made a mistake, blame it on the France, blame it on France and that's why we have misspelled, header in our messages today.
So we have assembled our, our GET requests, we're ready almost to send it off to the server. Before we do that though, since the server, the internet does not speak Swedish, nor does it speak English. We need to convert our message into numbers because computer speak numbers, right? We do that with our us gear with Alaska table. So every character is represented by a number or a series of number. But to keep it simple, one number. So if you remember when we pressed the enter key, it was the key number 13. So key number 13 you see character return is the name of that key. And you can look up capital G, capital E, capital E, T and so on to spell out GET and the whole message. So every character gets a number. But there are many different number systems to pick and are a number system in English and Swedish and many other languages, we use the decimal system, right?
So we have 10 digits to represent the number with, going from zero to nine. There are other number systems, for example, the hex decimal system, which goes from 0 to F. So they representing 16 different digits and then finally have the binary number system. And when sending data over the internet, we do it through fiber optics, right? Cause that's the speed of light is very, very fast. But with light, you have a limitation. You can only have the light bulb being on or it can be off. So one of these number systems is better to represent data through fiber optics. So the binary system with a zero and the one it's very, very easily translated to the light being on or being off.
So given our number 13 for example, if we want to send the enter key over the internet, you need to count to 13 in the binary system. And the way we do that in the decimal system is that we count from one, two, three, four up to nine and then we don't have any more numbers. So we add a one and we're dropped down to zero and then up to 19 and then 20 and then 99 100 zone. That's how we keep growing. The binary system works exactly the same. So let's see if I get the rights. We have the decimal system, the decimal system here. You have the binary system there. So we've started counting the number of attendance attendees in the audience today. We started with zero and then go to one one, one and then we'll go to Magnus, then we'll have two, but we don't have the number two in the binary system.
So then we roll over and we do one zero the next one, three one one and so on and so on. So that way we can easily represent, uh, many numbers in the binary system. So doing the translation, we get something like this, the GET request. Uh, the G is number 71, which turns into zero one zero zero, zero blah blah, blah. And this is how we get the whole, the whole HTTP request in binary format. Now finally we have the connection. We know who to talk to. We have encrypted our message and we are ready to go. So we send our message over the internet in tiny little packages. They arrive at the server, the server orders them correctly. One, two, three, four binds them together, unpacks them and extracts the data. It looks at the header of each package.
See that this is an incoming package for port 80 and port 80 is listened to by this web server application called Apache. So we the server unpacks all the data and sends the data onward to Apache. Apache has a look at this GET request, sees that is that it's uh, going to the domain stim.se Which is powered by Symfony of course. Right? And it sends the GET request onwards to with PHP. So PHP, you can have it running either in a separate process. You can make connections over the internet, local, local network, or over the internet if you want. It could also be a sub-process of, your Apache program.
PHP started running. It takes your code, compiles into op code and executes the code. Uh, Apache also feeds in a lot of data, like for example, that the server, the global server variable, the files variable POST, GET, session cookie and so on. So you have a mess of global variables now ready for you in PHP to start executing and responding to this request. Of course, we use also composer, uh, to handle our loading. So we do, we use the composer autoloading, which when composer pulls down all the packages that we want to use, it builds a map of all these packages that this specific one is in here. This one is there. So whenever our code asks PHP to load this specific class composer can kick in and load that specific file without us having to handle that. We just tell PHP whatever we want, Composer handles it for us.
Finally we come into Symfony. Symfony takes all of these, Uh, so, so if uh, anyone here was coding web early 2000s, you know there was very common to use frameworks like CodeIgniter cakePHP and so on Symfony as well calling themselves a MVC framework and model view controller framework. That, it's an architecture, a graphical user interface architecture from way back when, which doesn't really fit in the web. So you know, you had to make modifications to make it work in a web context and I don't know. No, it didn't really solve the problem. So from Symfony 2 and onwards, they said that we're not going to be a MVC framework. We are going to be a request, response framework, which makes a lot of sense in a web context. Right? So the way Symfony expresses this is through this beautiful little interface, the
HTTP Kernel, which says that I can handle a request and I will return a response. That's it. That's, that's how we work. But of course this is just an abstraction on an interface, something for us to actually implement and fulfill.
So the way this is being done, is Symfony coming in, when the request comes into Symfony, one of the first thing it does is it creates a request object. It takes all this messy global variables, pulls it down into one request object, which it then can send into the framework to do its work, which is brilliant when you want to test stuff because if you use to write your own code PHP code back in 5.2, uh, it was a mess to, to make sure that all these global variables were up to date. There are no side effects when you, when you called one function, it didn't ruin anything for the old tests. So everything that we need to do is create a request object from all of this Symfony takes care of the rest
When we have the request object Symfony is a very uh, event driven framework. So Symfony, if you know about the solid principles, uh, the O in the solid principles stands for the open close principle, which says that your code should be open for extension but closed for modification. So you write your code once, you should ideally never have to touch it again, but you should still be able to extend it and add functionality or modify functionality. Using an event system that Symfony has, achieves exactly that. So we never have to touch Symfony ever again. But by using listeners to these events that goes on into Symfony, we can extend it and do a lot of fun stuff with it. So using this request object. We uh, we uh, sorry I lost my thread there. Using this request object we tell all our listeners that, Hey guys, we have a, we have a request coming in here. Do with that what you want.
Your listeners that you set up can respond with a response straight there saying that it looks at their, you can, you can look at their request to see that this comes from IP, blah, blah, blah. We don't want to serve those guys, so we respond with a 404 for this specific IP. If none of the listeners returns a response, the processing goes on to the router listener, which is a built in listener in Symfony framework. The router listener obviously uses a router to take this request and look up which controller should execute this functionality. It takes the controller and sets it on the request object. It doesn't execute anything it just prepares the request. With this data.
We move onto the controller resolver, which is responsible for extracting an actual controller from the, from the request. We have already prepped it so we can, we can grab the controller straight from the request because the router listener already put it in there and then we fire another event. This is telling everyone who's listening to this event that now we have a controller ready to do its work so you can plugin your functionality to do things like if there's an incoming request that states that it wants version 1.1 of this or version 2 of this request of this resource that you have exposed, you can, you can steer it to another controller. Uh, so that way you can do like API versioning for example. You can also see that we have an incoming request which has an ID in it, but this controller that we want to call requires a user. So we can take this ID, we'll look up a user and we inject it into the, into the controller arguments.
Moving on, our controller is ready to be called, we call the
hej method on it, which basically just instantiates a response object, returns that. We have a response. Again, we call all our listeners. If we have not returned a response in our controller, we might have just returned on error, for example. There's a proceeding event called a
You can have many different responses here. The 200 series is that something went well. 300, it's do something else where. 400, you did something wrong. 500, I did something wrong. Um, and there might be, ah so here we also defined that this content that we send back is a HTML format, which obviously it's not. It should be a text. Uh, you can have many different, uh, headers tagged onto this response. You can add, for example, you want to allow anyone from the server and onwards to save this measure message. You can add some caching headers for example, or add some cookies for future requests.
PHP returns the response to Apache. Apache does not respond to the server because now we've put a reverse proxy in between. A reverse proxy listens to this cache headers that we have set up, sees that, okay, this, this a response that came in, it came back here, has been configured to be saved for one day, for example. So the next time someone requests the
/hej page, we don't have to go into PHP, we don't have to go into Apache or our controller. We can just return it immediately. It speeds it up very, very much. Uh, in the next slot there's going to be a talk about HTTP caching. Someone's going to go into that here. Finally, varnish returns the response to the server. The server goes sends it into tiny little packages all over the internet, which can be intercepted unless we do HTTPS. They can be intercepted by proxies along the way that also can listen to this, uh, HTTP headers, cache it for future requests so that in your building, your company building, you might have our proxy standing to speed up requests within the company or whatever
FIN for finish, send it off to the server saying that we're done. The TCP connection breaks. And that concludes my talk. Thank you very much.