reading from a socket in C

A place to discuss the implementation and style of computer programs.

Moderators: phlip, Moderators General, Prelates

reading from a socket in C

Postby DanB91 » Sat Nov 12, 2011 9:56 pm UTC

I am writing an FTP client for college and there seems to be an issue with the read() function. It seems that for some reason, after it reads all the data front the socket, instead of returning 0, it hangs, waiting for more data to be read.

I am connecting to an arbitrary FTP server, such as ftp.freebsd.org. The initial response of the server is "220 Welcome to freebsd.isc.org.". If I just read the data without looping, the string prints out fine.

Would the solution of this be to put a timeout on the socket or something? Why isn't read() returning 0?

Thanks very much. Here is code.

Code: Select all
//reads fully from a file or stream
void *readFully(int fd)
{
    //will hold the bytes read from the socket
    //data size is the size of the final data
    ssize_t bytesRead, dataSize = BUF_LEN;
   
    //will be the buffer for reading in the data
    char buf[BUF_LEN];
   
    //zero out the buffer
    bzero(buf, BUF_LEN);
   
    void *data = NULL;
   
    data = chk_realloc(data, BUF_LEN);
   
   
    //read reply from server
    while ((bytesRead = read(fd, buf, BUF_LEN)) != 0) { //hangs here
       
       
        //if read failed, print error and exit
        if(bytesRead < 0)
        {
            perror("Error reading from server");
            exit(1);
        }
       
        //zero out the buffer
        bzero(buf, BUF_LEN);
       
        //copy the data into the final pointer
        memcpy(data, buf, BUF_LEN);
       
        //increase the data size
        dataSize += BUF_LEN;
       
        //realloc the data
        data = chk_realloc(data, dataSize);
    }
   
    return data;

}
DanB91
 
Posts: 11
Joined: Fri Nov 26, 2010 5:01 pm UTC

Re: reading from a socket in C

Postby not baby Newt » Sat Nov 12, 2011 10:49 pm UTC

Does the documentation say it ishould return 0 while the connectionion exists? (I have no idea)

There probably is a function such as IsThereAnythingToReadRightNow() to let you know.

Edit: reading a bit about this suggests the above is irrelevant.

Edit2: ignore edit1.
Another source suggests read() is blocking until there is something to read per default, no way to check if data is available right now, but you can use some flags to get an error instead of blocking. Which you'd then check specifically for.

Also:
//zero out the buffer
//copy the data into the final pointer
Last edited by not baby Newt on Sun Nov 13, 2011 12:24 am UTC, edited 2 times in total.
not baby Newt
 
Posts: 108
Joined: Wed Feb 03, 2010 11:30 pm UTC

Re: reading from a socket in C

Postby Meem1029 » Sat Nov 12, 2011 10:50 pm UTC

I'm not experienced in C, but I know in most languages I've worked in standard protocol is to wait until there is data to read to read the data. One way that you could implement this is to have a separate thread that listens for data on the port and writes it into a queue and then just check to see if the queue has any data in it.
cjmcjmcjmcjm wrote:If it can't be done in an 80x24 terminal, it's not worth doing
Meem1029
 
Posts: 377
Joined: Wed Jul 21, 2010 1:11 am UTC

Re: reading from a socket in C

Postby jareds » Sun Nov 13, 2011 12:28 am UTC

TCP is a transport layer protocol that presents you with a stream of bytes in each direction. It has no knowledge of how (or whether) the stream is split into messages at the application layer, nor can you write a generic routine that splits the stream into messages (e.g., your idea of using timeouts) without the routine being specific to some higher layer protocol. For example, suppose your computer receives "220 Welcome to " and then slightly later receives "freebsd.isc.org.\r\n". Do you want readFully() to return "220 Welcome to " if you call it at the wrong time?

You need to decide when you have built up enough of a message to pass to your higher level handling based on your understanding of the higher level protocol. For example, you could look for "\r\n" if you want to handle one line at a time. (Note that you need to account for all possibilities of reading chunks, including things like "Start of "; "line 1\r\nThis is"; " line 2\r"; "\n", etc.)

As for the actual mechanics of read(), note that this is working just the same as reading standard input from a terminal in Unix. If you loop calling read() until it returns 0, it won't return 0 when the user enters a line or something: it will return 0 when standard input is closed (often by control-D), and if you keep calling it then it will hang waiting for user input.

It is possible to set it so that read() will operate in a non-blocking mode (returning an error (EAGAIN) if it can't return anything immediately), but I don't think this will help you at this point. For a first pass at an event-driven approach, it is enough to use select() to see when you can read() and/or write(), and then to read() and/or write() exactly once each time that select() says you can.
jareds
 
Posts: 317
Joined: Wed Jan 03, 2007 3:56 pm UTC

Re: reading from a socket in C

Postby DanB91 » Sun Nov 13, 2011 4:11 pm UTC

jareds wrote:TCP is a transport layer protocol that presents you with a stream of bytes in each direction. It has no knowledge of how (or whether) the stream is split into messages at the application layer, nor can you write a generic routine that splits the stream into messages (e.g., your idea of using timeouts) without the routine being specific to some higher layer protocol. For example, suppose your computer receives "220 Welcome to " and then slightly later receives "freebsd.isc.org.\r\n". Do you want readFully() to return "220 Welcome to " if you call it at the wrong time?

You need to decide when you have built up enough of a message to pass to your higher level handling based on your understanding of the higher level protocol. For example, you could look for "\r\n" if you want to handle one line at a time. (Note that you need to account for all possibilities of reading chunks, including things like "Start of "; "line 1\r\nThis is"; " line 2\r"; "\n", etc.)

As for the actual mechanics of read(), note that this is working just the same as reading standard input from a terminal in Unix. If you loop calling read() until it returns 0, it won't return 0 when the user enters a line or something: it will return 0 when standard input is closed (often by control-D), and if you keep calling it then it will hang waiting for user input.

It is possible to set it so that read() will operate in a non-blocking mode (returning an error (EAGAIN) if it can't return anything immediately), but I don't think this will help you at this point. For a first pass at an event-driven approach, it is enough to use select() to see when you can read() and/or write(), and then to read() and/or write() exactly once each time that select() says you can.


Thanks for the information. The issue is that the command socket a persistent connection as opposed to the data connection (which cuts off as soon as the data is sent and, thus, the readFully() method should be useful here). So now I just created a separate method that reads until it sees "\r\n", which works. I guess if I wanted to just keep one method instead of 2, I could have something like this:

Code: Select all
//read reply from server
    while (select(sizeof(fds)*4, &fds, NULL, NULL, &timeout)) { //hangs here
        bytesRead = read(fd, buf, BUF_LEN);
       
        //if read failed, print error and exit
        if(bytesRead < 0)
        {
            perror("Error reading from server");
            exit(1);
        }
       
        //zero out the buffer
        bzero(buf, BUF_LEN);
       
        //copy the data into the final pointer
        memcpy(data, buf, BUF_LEN);
       
        //increase the data size
        dataSize += BUF_LEN;
       
        //realloc the data
        data = chk_realloc(data, dataSize);
    }


Where the timeout would be 1-2 seconds.
Would that be good?
DanB91
 
Posts: 11
Joined: Fri Nov 26, 2010 5:01 pm UTC

Re: reading from a socket in C

Postby Yakk » Sun Nov 13, 2011 7:39 pm UTC

Your code behaves badly if the data incoming is not a multiple of BUF_LEN.

What version of the ftp protocol are you implementing?

For an interactive C single threaded text based program with network IO, you'll want to select on both the "file stream" and the user input stream. The timeout case should result in you trying again (at least once), and the amount of timeouts/retries should be a function of some configuration options. But that is for a more advanced problem, I'm guessing -- for a first run, you can rely on the user aborting the connection (which, of course, requires that you pay attention to stdin).

Generally, I'd write such a program with a core event loop (much like I'd write a game) based around a select. Each file descriptor would have a handler function associated with it, and possibly I'd have multiple priority queues (to check user input before I go and check for more files, so the user can abort out). The "game loop" as it where would take the prioritized (file descriptor, handler) functions (I'd call this the scheduler), do a select on the file descriptors, and then call the appropriate handler.

The handlers could go and change the scheduler -- so when the user asks to open a connection, I'd create the connection in that task, then set up a scheduler and handler function pair to deal with any data from the connection.

The handler when invoked would slurp data off the connection, and process it.

I'm not certain how ftp determines that a connection is finished -- it is in-connection, or outside? -- but regardless, when this happens, you remove the file descriptor and handler from the scheduler. You'd possibly do some cleanup -- which might be stored in the scheduler as a "cleanup" function -- or maybe handled elsewhere.

Does that make sense?
One of the painful things about our time is that those who feel certainty are stupid, and those with any imagination and understanding are filled with doubt and indecision - BR

Last edited by JHVH on Fri Oct 23, 4004 BCE 6:17 pm, edited 6 times in total.
User avatar
Yakk
 
Posts: 10039
Joined: Sat Jan 27, 2007 7:27 pm UTC
Location: E pur si muove

Re: reading from a socket in C

Postby jareds » Sun Nov 13, 2011 10:18 pm UTC

DanB91 wrote:The issue is that the command socket a persistent connection as opposed to the data connection (which cuts off as soon as the data is sent and, thus, the readFully() method should be useful here).

Sorry, no. If you download something large (a Linux .iso, e.g.), do you want to read the entire thing into a buffer in memory before writing it to disk? No, you want to write to disk once you have a reasonable chunk.

I guess if I wanted to just keep one method instead of 2, I could have something like this:

Code: Select all
//read reply from server
    while (select(sizeof(fds)*4, &fds, NULL, NULL, &timeout)) { //hangs here
        bytesRead = read(fd, buf, BUF_LEN);
       
        //if read failed, print error and exit
        if(bytesRead < 0)
        {
            perror("Error reading from server");
            exit(1);
        }
       
        //zero out the buffer
        bzero(buf, BUF_LEN);
       
        //copy the data into the final pointer
        memcpy(data, buf, BUF_LEN);
       
        //increase the data size
        dataSize += BUF_LEN;
       
        //realloc the data
        data = chk_realloc(data, dataSize);
    }


Where the timeout would be 1-2 seconds.
Would that be good?

Sorry, still no. I wasn't joking when I said you can't write a generic method that gets a single message from a socket. If you wait 1-2 seconds for each command, that will be very perceptible to the user, on top of which you still risk a timeout for even a slight network hiccup. Yakk's advice is very good: use an event loop and rely on the user to break if target stops responding. I should have explained more specifically what you should do.
jareds
 
Posts: 317
Joined: Wed Jan 03, 2007 3:56 pm UTC

Re: reading from a socket in C

Postby DanB91 » Mon Nov 14, 2011 12:24 am UTC

Yakk wrote:Your code behaves badly if the data incoming is not a multiple of BUF_LEN.

What version of the ftp protocol are you implementing?

For an interactive C single threaded text based program with network IO, you'll want to select on both the "file stream" and the user input stream. The timeout case should result in you trying again (at least once), and the amount of timeouts/retries should be a function of some configuration options. But that is for a more advanced problem, I'm guessing -- for a first run, you can rely on the user aborting the connection (which, of course, requires that you pay attention to stdin).

Generally, I'd write such a program with a core event loop (much like I'd write a game) based around a select. Each file descriptor would have a handler function associated with it, and possibly I'd have multiple priority queues (to check user input before I go and check for more files, so the user can abort out). The "game loop" as it where would take the prioritized (file descriptor, handler) functions (I'd call this the scheduler), do a select on the file descriptors, and then call the appropriate handler.

The handlers could go and change the scheduler -- so when the user asks to open a connection, I'd create the connection in that task, then set up a scheduler and handler function pair to deal with any data from the connection.

The handler when invoked would slurp data off the connection, and process it.

I'm not certain how ftp determines that a connection is finished -- it is in-connection, or outside? -- but regardless, when this happens, you remove the file descriptor and handler from the scheduler. You'd possibly do some cleanup -- which might be stored in the scheduler as a "cleanup" function -- or maybe handled elsewhere.

Does that make sense?


Thanks very much. I think I get it. I wrote up a quick outline of something that might be in the ballpark of what you described. The following code assumes the command and data sockets never change throughout the program (which, of course, is not true).

Code: Select all
fd_set fds;
   
    /* Create a descriptor set containing our 3 streams.  */
    FD_ZERO(&fds);
    FD_SET(0, &fds); //0 is stdin
    FD_SET(cmd_socket, &fds);
    FD_SET(data_socket, &fds);
    struct timeval timeout;
   
    timeout.tv_sec = 0;
    timeout.tv_usec = 0;
       
    while (1) {
       
        if(select(FD_SETSIZE, &fds, NULL, NULL, &timeout) < 0)
        {
            perror("select failed");
            exit(1);
        }
       
        if (FD_ISSET(0, &fds)) {
            handleUserInput();
        }
        if (FD_ISSET(cmd_socket, &fds)) {
            handleServerResponse();
        }
       
        if (FD_ISSET(data_socket, &fds)) {
            downloadData();
        }
       
    }



jareds wrote:Sorry, no. If you download something large (a Linux .iso, e.g.), do you want to read the entire thing into a buffer in memory before writing it to disk? No, you want to write to disk once you have a reasonable chunk.

Wow, of course. That was an oversight by me. Sorry! What would you recommend the max size of the memory buffer before it's written to disk?
DanB91
 
Posts: 11
Joined: Fri Nov 26, 2010 5:01 pm UTC

Re: reading from a socket in C

Postby jareds » Mon Nov 14, 2011 2:14 am UTC

DanB91 wrote:
jareds wrote:Sorry, no. If you download something large (a Linux .iso, e.g.), do you want to read the entire thing into a buffer in memory before writing it to disk? No, you want to write to disk once you have a reasonable chunk.

Wow, of course. That was an oversight by me. Sorry! What would you recommend the max size of the memory buffer before it's written to disk?

I'd go with 64k, but this is a very loose recommendation--just a reasonable value off the top of my head. It's also relatively unimportant: probably anything between 4 kB and 4 MB will work just as well (the C library, OS, and disk controller will take good care of buffering).

Edit 1: With the event loop, remember that you should read at most once per file descriptor each time you loop, in order to avoid blocking. For example, your handleUserInput() routine might be (pseudocode):
Code: Select all
handleUserInput:
    read data
    append data to user input buffer
    while user input buffer contains a complete command:
        remove command from user input buffer
        handle command
Your server response handler would be similar. Yakk was suggesting you abstract things a little more. However, if that seems difficult to understand, it's ok to write your program concretely, with somewhat repetitive code, and then look at the code you've written and think about how you might avoid such repetition in the future. Especially for learning, this can be easier than figuring out how to write it perfectly before writing anything. However, to be a good programmer, you seriously will want to look at what you wrote and think about what can be improved, not just in the functionality of the code, but in the structure of the code.

Edit 2: I noticed your timeout is 0. You really do not want to do that. Rather, you should use a timeout of NULL (wait forever). With a timeout of 0, your client will peg one core of the CPU to 100% just waiting for user input. Also, you need to clear and set the FD_SET within the while loop.
jareds
 
Posts: 317
Joined: Wed Jan 03, 2007 3:56 pm UTC

Re: reading from a socket in C

Postby DanB91 » Mon Nov 14, 2011 5:49 am UTC

jareds wrote:I'd go with 64k, but this is a very loose recommendation--just a reasonable value off the top of my head. It's also relatively unimportant: probably anything between 4 kB and 4 MB will work just as well (the C library, OS, and disk controller will take good care of buffering).

Edit 1: With the event loop, remember that you should read at most once per file descriptor each time you loop, in order to avoid blocking. For example, your handleUserInput() routine might be (pseudocode):
Code: Select all
handleUserInput:
    read data
    append data to user input buffer
    while user input buffer contains a complete command:
        remove command from user input buffer
        handle command
Your server response handler would be similar. Yakk was suggesting you abstract things a little more. However, if that seems difficult to understand, it's ok to write your program concretely, with somewhat repetitive code, and then look at the code you've written and think about how you might avoid such repetition in the future. Especially for learning, this can be easier than figuring out how to write it perfectly before writing anything. However, to be a good programmer, you seriously will want to look at what you wrote and think about what can be improved, not just in the functionality of the code, but in the structure of the code.


Thanks very much! I can see that there will be some repetition in code. Not exactly sure how to abstract it yet but I am sure I'll figure it out as write it. (Perhaps associating an array of file descriptors with an array of function pointers?)


jareds wrote:Edit 2: I noticed your timeout is 0. You really do not want to do that. Rather, you should use a timeout of NULL (wait forever). With a timeout of 0, your client will peg one core of the CPU to 100% just waiting for user input. Also, you need to clear and set the FD_SET within the while loop.


Originally I was confused why I would want select to wait indefinitely (not realizing the function returns when at least one fd is ready for reading). Then I read the man pages more carefully!

Thanks everyone for the help!
DanB91
 
Posts: 11
Joined: Fri Nov 26, 2010 5:01 pm UTC

Re: reading from a socket in C

Postby Yakk » Mon Nov 14, 2011 2:52 pm UTC

Code: Select all
repeat forever
  do
    wait no time to see if there is user input ready (this gives priority to user input)
    process user input (possibly exiting if the user told you to, after cleanup)
  while there was user input

  wait forever on (download fds) and (user input)
  if it was user input
    process user input
    continue forever loop, unless user told you to quit the program, in which case break loop
  if it was a command fd
    process command (possibly changing which fd you are waiting on)
    continue loop
  if it was a data fd
    download a chunk of the file (up to a certain size, but no more than there is available) into a memory buffer.
    copy some or all of the memory buffer onto disk (note: an advanced program will do this asynchronously, but you can do it synchronously)
    continue forever loop
  if it was an error
    dump a notification message
    break loop

If FD_ISSET does what I think it does (tests if the fd has data), then your code might also be acceptable.

In addition, the first loop above is only required if select doesn't have a preference for earlier fd in their list. Its purpose is to prevent a stream of file data from starving the user input -- meanwhile, user input is allowed to starve incoming data, as responsiveness is more important in an ftp program than actually getting the data (you'd have to have a crazy ass process hooked up to your stdin to flood it with constant new data, assuming you get a non-trivial time slice -- but because the file reading process blocks on writing it out, it could easily starve user input if the incoming data outpaces the programs ability to write it to disk...)
One of the painful things about our time is that those who feel certainty are stupid, and those with any imagination and understanding are filled with doubt and indecision - BR

Last edited by JHVH on Fri Oct 23, 4004 BCE 6:17 pm, edited 6 times in total.
User avatar
Yakk
 
Posts: 10039
Joined: Sat Jan 27, 2007 7:27 pm UTC
Location: E pur si muove


Return to Coding

Who is online

Users browsing this forum: No registered users and 7 guests