Friday, 20 February 2015

Difference between port and socket

A port is part of the address in the TCP and UDP protocols. It is used to help the OS identify which application should get the data that is received. An OS has to support ports to support TCP and UDP because ports are an intrinsic part of TCP and UDP.
A socket is part of the interface the OS presents to applications to allow them to send and receive network data. Most socket implementations support many protocols beyond TCP and UDP, some of which have no concept of ports. An OS does not have to support sockets to support TCP or UDP; it could provide a different interface for applications to use. A socket is simply one way of sending and receiving data on a specific port.
Think of your machine as an apartment building:
  • A port is an apartment number.
  • A socket is the door of an apartment.
  • An IP address is the street address of the building

A computer has an IP address that identifies it as a separate entity on the network. We add an additional number to that to allow us to differentiate between connections to that computer. This is the port number. On the OS side of the connection you need buffers, connection state, etc. This logical object is the socket.

A socket is a communication path to a port. When you want your program to communicate over the network, you have given it a way of addressing the port and this is done by creating a socket and attaching it to the port. Basically, socket = IP + ports Sockets provide access to the port+ip

S is a server: let's say it's an HTTP server, so it'll use the well-known port number for HTTP, which is 80. I run it on a host with IP address 10.0.0.4, so it will be listening for connections on 10.0.0.4:80(because that's where everyone will expect to find it).
Inside S, I'm going to create a socket and bind it to that address: now, the OS knows that connections coming into 10.0.0.4:80 should be routed to my S process via that particular socket.
  • netstat output once socket is bound:
    $ netstat --tcp -lan
    Active Internet connections (servers and established)
    Proto Recv-Q Send-Q Local Address               Foreign Address            State
    tcp        0      0 0.0.0.0:80                  0.0.0.0:*                  LISTEN
    
    NB. the local address is all zeroes because S doesn't care how its clients reach it
Once S has this socket bound, it will accept connections - each time a new client connects, acceptreturns a new socket, which is specific to that client
  • netstat output once a connection is accepted:
    $ netstat --tcp -lan
    Active Internet connections (servers and established)
    Proto Recv-Q Send-Q Local Address               Foreign Address            State
    tcp        0      0 0.0.0.0:80                  0.0.0.0:*                  LISTEN
    tcp        0      0 10.0.0.4:80                 10.0.0.5:55715             ESTABLISHED
    
    • 10.0.0.4:80 represents S's end of the connection, and is associated with the socket returned by accept
    • 10.0.0.5:55715 is the client's end of the connection, and is associated with the socket the client passed to connect. The client's port isn't used for anything except routing packets on this TCP connection to the right process: it's assigned randomly by the client's kernel from the ephemeral port range.
Now, S can happily go on accepting more client connections ... each one will get its own socket, each socket will be associated with a unique TCP connection, and each connection will have a unique remote address. S will track client state (if there is any) by associating it with the socket.
So, roughly:
  • the IP address is for routing between hosts on the network
  • the port is for routing to the correct socket on the host
    • I nearly said correct process, but it's actually possible to have multiple (usually child) processes all accepting on the same socket ...
    • however, each time one of the concurrent accept calls returns, it does so in only oneprocess, each incoming connection's socket is unique to one instance of the server
  • the socket is the object a process uses to talk to the OS about a particular connection, much like a file descriptor
    • as Dirk says, there are plenty of other uses for sockets that don't use ports at all: for examplesocketpair creates a pair of sockets connected together that have no addressing scheme at all - the only way to use that pipe is by being the process which called socketpair, being a child of that process and inheriting one, or being explicitly passed one of the sockets from that process

No comments:

Post a Comment