0
import ssl
import socket

ssl_context = ssl.create_default_context()
target = 'swapi.co' 
port = 443 
resource = '/api/people/1/'
client = socket.socket(socket.AF_INET, socket.SOCK_STREAM) 

secure_client = ssl_context.wrap_socket(client, server_hostname=target)
send_str = 'GET {} HTTP/1.1\r\nHost: {}:{}\r\n\r\n'.format(resource, target, str(port))

secure_client.connect((target, port))
secure_client.send(send_str.encode()) 
print(send_str)

print(len(secure_client.recv(8192))) # 1282
print(len(secure_client.recv(8192))) # 5. Why?

Above is a simple Python program that sends an HTTP request to Star Wars API using TCP sockets.

This is the request sent:

GET /api/people/1/ HTTP/1.1
Host: swapi.co:443

The response header has Transfer-Encoding: chunked in it. When the first recv is executed the header and the first chunk is obtained. However, to get the last chunk with terminator sequence ("0\r\n\r\n"), a second recv must be called. What is the underlying cause of this behavior?

2 Answers 2

1

TCP is a protocol that provides a stream of bytes. It doesn't provide any way to "glue" bytes together into messages. The actual number of bytes you will receive when you call recv is arbitrary and will depend on all kinds of factors that vary such as the exact implementation of the other side, how quickly you got around to calling recv, the network's maximum message size, and so on. It doesn't mean anything.

Since you indicated in your query that you support HTTP version 1.1, the server is permitted to use any encoding HTTP 1.1 clients are required to support. That includes this form of chunked encoding which uses one or more "chunks" of data, each preceded by a size indicator. This is convenient for cases where the output is generated by a script and the server won't know how big it is until the entire response is generated. This encoding scheme allows sending to begin immediately.

Don't claim HTTP 1.1 compliance in an HTTP query unless your code supports everything the HTTP 1.1 standard says a client "MUST" support.

7
  • I do not claim HTTP 1.1 compliance in any sense. I know how chunked encoding chunks responses for which the size is not known beforehand. What I was asking is that how this non overlapping behavior of chunks is achieved on implementation level? Jul 12, 2019 at 17:34
  • You claim HTTP 1.1 compliance by sending an "HTTP/1.1" in the query. That's what that is. What "non overlapping behavior of chunks" are you talking about? You mean that you happened to receive the last chunk in a separate call to recv. My first paragraph explains that. Jul 12, 2019 at 17:35
  • Well, I happened to receive the last chunk on a seperate recv every time I execute this. Tried this on swapi.co and www.google.com just to be sure. It was just coincidence each time? Jul 12, 2019 at 17:40
  • 1
    @SıddıkAçıl In a sense yes and in a sense no. It's likely Nagle's algorithm interacting with the implementation of their web server which likely uses a separate code path to send the final chunk. But it's coincidence in the sense that they could upgrade their web server tomorrow and it could change. Or a packet could get dropped one time you try it and it could change. Jul 12, 2019 at 17:44
  • 1
    @SıddıkAçıl It wasn't engineering by anyone. It's just the way various pieces just happen to come together most of the time. All it would take would be a code change to the HTTP server, a slight delay on the HTTP client due to an interrupt, or a packet to drop on the network and the behavior could change. Jul 12, 2019 at 17:55
0

It's because In chunked transfer encoding, the data stream is divided into a series of non-overlapping "chunks". The chunks are sent out and received independently of one another.

Your Answer

By clicking “Post Your Answer”, you agree to our terms of service and acknowledge you have read our privacy policy.

Not the answer you're looking for? Browse other questions tagged or ask your own question.