Moderators: phlip, Moderators General, Prelates
Thesh wrote:What are you trying to accomplish by avoiding malloc?
Jplus wrote:The only real variation I can think of is to let your procedure take an additional char* as an argument, assume that a large enough buffer has already been allocated to that pointer by the calling environment and write to that instead. But that wouldn't be a good idea.
size_t get_string( char* buf, size_t buf_size );
char stack_buf[1024];
char* answer = 0;
char* allocated_buf = 0;
size_t wanted = get_string( stack_buf, sizeof(stack_buf)/sizeof(stack_buf[0])) );
if (wanted > sizeof(stack_buf)/sizeof(stack_buf[0])) )
{
allocated_buf = malloc( wanted );
size_t wanted2 = get_string( allocated_buf, wanted );
Assert(wanted2 == wanted);
answer = allocated_buf;
} else {
answer = stack_buf;
}
typedef char*(*string_reallocator)(void*, char*, size_t);
char* get_buff( string_reallocator realloc, void* );
char* my_simple_realloc( void* unused, char* old, size_t new_size )
{
return realloc( old, new_size )
}
char * function( unsigned int )
{
char string[15] = 0;
char * pointer = string;
do some work, placing result in string[]...
return( pointer );
}korona wrote:If your mentor never used malloc he never had to code a serious program. End of story. Any program that manages a variable amount of objects has to use malloc.
The code you posted has undefined behavior as the memory pointed to by the variable "string" belongs to your function and is not valid anymore after your function returns.
char * int_to_string(int n) {
int number_of_characters_needed = 1 + snprintf(NULL, 0, "%d", n);
char * buf = malloc(number_of_characters_needed);
sprintf(buf, "%d", n);
return buf;
}int int_to_string(int n, char * out, size_t size) {
return 1 + snprintf(out, size, "%d", n);
}char* int_to_string(int n) {
static char buffer[20] = {'\0'};
snprintf(buffer, 20, "%d", n);
return buffer;
}char * one = int_to_string(1);
char * two = int_to_string(2);
printf("%s = %d!\n", one, 1);korona wrote:If your mentor never used malloc he never had to code a serious program. End of story. Any program that manages a variable amount of objects has to use malloc.
korona wrote:The code you posted has undefined behavior as the memory pointed to by the variable "string" belongs to your function and is not valid anymore after your function returns. That is definitely not the way to do it and it doesn't work.
PM 2Ring wrote:I don't understand the point of calculating the string length in this particular application. We know the maximum string length, since we're just converting an unsigned int to string. The code required to determine the actual string length wastes more bytes than it saves, and also wastes time.
PM 2Ring wrote:I don't understand the point of calculating the string length in this particular application. We know the maximum string length, since we're just converting an unsigned int to string.
char buffer[(sizeof(int)*CHAR_BIT+2)/3+2]; // sufficient to hold any int value, including sign and null terminatorEvanED wrote:PM 2Ring wrote:I don't understand the point of calculating the string length in this particular application. We know the maximum string length, since we're just converting an unsigned int to string.
No you don't.
Otherwise you wind up with code that assumes that an int is 16 bytes and 5 digits is sufficient, and then causes segfaults and security vulnerabilities when compiled with today's compilers. Or you assume that an int is 32-bits and 10 digits is sufficient, and then causes segfaults and security vulnerabilities when compiled on a system that uses an ILP64 model, like Crays.
How many digits is sufficient for a 64-bit int? Do you know off the top of your head? I don't.
And those environments are both things that have actually happened; that's not even getting into the fact that you could theoretically have a 128-bit int or a 2048-bit int by the standard.
phlip wrote:[edit] Though, for clarity, I'm in no way suggesting that doing this is a good idea...
PM 2Ring wrote:Fair point. But why not handle that sort of thing with a bunch of #if stuff? At least, the common cases can be handled automatically, and really weird cases can chuck a compile-time error.
Even if (for example) you have a 64 bit environment that will happily run an executable that was compiled to run in 32 bit environment, the 32 bit exe will only "see " the int size it was compiled for. Won't it?
void int_to_string(int n, char * out, int size) {
int needed = 1 + snprintf(size, out, "%d", n);
assert(size >= needed);
}void int_to_string(int n, char * out, int size) {
int needed = 1 + sprintf(out, "%d", n);
assert(size >= needed);
}typedef struct {
char string [20];
} string_return_type;
string_return_type int_to_string (int i)
{
string_return_type r;
sprintf (r.string, "%d", i);
return r;
}csanders wrote:You could also do something like:
- Code: Select all
typedef struct {
char string [20];
} string_return_type;
string_return_type int_to_string (int i)
{
string_return_type r;
sprintf (r.string, "%d", i);
return r;
}
Stuffing it into a struct can get a fixed size string copied back to the caller's stack without any malloc. (Of course, replace that size of 20 with whatever more correct formula there is for maximum string lengths.)
Sc4Freak wrote:Otherwise known as the C Programmer's Disease.
csanders wrote:I think after the instructor's suggestion to avoid malloc that it's time to find a new instructor. It's like learning from a carpenter who tells you never to use a screwdriver. "I've built plenty of houses with just a hammer, that's all you should need." Memory allocation is a very important tool in the toolbox, and one you need to learn how to use.
Here is the issue... It is a matter of real-world programming vs. theoretical-world programming.
In the real-world of programming, the vast majority of the time the programmer knows IN-ADVANCE what the constraints of the problem are.
In the theoretical-world of programming, the programmer knows nothing about the environment they are working in.
Simply because things are possible in programming, doesn't mean it is a good thing to do, especially when efficiency is an issue. By efficiency here I mean things like execution speed and memory space. One might argue that with today's huge memories, space isn't really much of a consideration, but that's the way the world was back in the 1950s when littering wasn't considered a bad thing. I can recall my mother telling me to simply throw garbage out of the car window when we were on a trip. No one cared, there was plenty of space out there and how much space does a candy wrapper or a piece of kleenex take up? Well, when everyone does it and when there are a bunch of people it does matter. So, in my world, if all programmers do their best, it is a better world to live in.
So, back to the problem at hand. It is a very rare problem where the programmer doesn't know in advance the scope (size) of the problem. So, in your example, how often is it really a possibility that the string needs to grow infinitely? And, think about the cost associated with using a device like malloc in such a situation. If the goal is to be able to store the most possible data does malloc make sense? No. Why? Because there is a heavy memory overhead associated with malloc, especially when the size of the items being allocated are small.
Now, because no machine is infinite, no data structure can grow infinitely. Because 99.9999% of the time, the programmer knows in advance what the largest amount of data will be, he can simply plan for that ahead of time and deal with the program at design time as opposed to run-time.
C++ has many inherent issues with efficiency (which is why I avoid it). The theory of such programming languages is that they make things "simpler" for the programmer by disconnecting him from the real-world by generalizing things (or abstracting things) to the point where he no longer knows what is going on inside the computer. There are heavy penalties associated with these generalizations. As a result, applications today are huge lumbering things that respond slowly and consume huge computer resources. How many times have you pressed a button or clicked on an icon and nothing has happened? Do you simply wait a while to see if something is going to happen? Do you click pr press again? Did the computer "see" the event? Is the button broken?
I grew up in a world where when you pressed a button something happened NOW. The only delays were the speed of electricity in a wire. And in mechanical systems where things don't move that quickly we would get immediate feedback that something was happening because there was a sound associated with the button press.
I guess today folks simply are used to having their requests delayed or ignored. I am of the belief that humans are in charge and the machines are our slaves and they should respond to us (not vise-a-versa).
So, good, efficient programming is a desirable thing. Abstraction leads to inefficiency by it's very nature. I know that you know about hardware electronics, what if people built hardware the way people built software? How would that be? The hardware engineer needs to know in advance how much voltage and current will need to pass through a resistor at the time he is designing his circuit. If you think of a resistor as a function, in today's "abstract" world the resistor would have to be designed in advance to handle whatever was thrown at it. This would mean that all resistors would need to be as big as possible, or worse yet, they would need to allocate some resistive material at the time they ware called upon to do their work. I know this is an absurd example, but just because it is possible to generalize things is software doesn't mean it should be done.
Of course, programming in the way I propose looks like it would take longer, it really doesn't. ALso, if you think about the amount of real-time you'll save for your users and multiply that out, doesn't it make sense to have the programmer spend a bit of extra time up front to make things simpler and faster for the public each and every time they run the application?
If I were given the task to create a data structure that required me to to "insert a node", I would ask many questions before I wrote a single line of code. Things like:
1. Exactly how is this going to be used.
2. What is the maximum number of nodes I'll ever see?
3. What is the size of each node?
4. What is the most important consideration?
a. Speed of execution?
b. Speed of programming?
c. Memory utilization?
d. Program maintainability.
e. Other?
Let's say you were working on a huge system where you were told to implement this insert_inorder function. Now let's say there are hundreds of programmers that have called your routine. Now, let's say that on one of the cases, speed is very important, and in all of the other cases, no one cares. Now let's say there are thousands of places in the code where your function is called and you are faces with speeding this thing up. Where do you begin? You could try to optimize the routine but experience tells us that the best way to optimize anything is to KNOW as much in advance as possible. How do you solve such a problem.
On the other hand, if a special purpose routine were implemented for this specific situation, with advanced knowledge as to how it was to be used, and the stresses that would be placed upon it, all of the trade-offs could be done at the time it was written and this crisis I mentioned above most likely would never take place.
I could go on and on, but hopefully you get my point. I don't want to live in a world where every resistor is protected by a fuse because hardware engineers stop designing things up-front. Of course, in that world wouldn't you want to have a circuit-breaker in there to protect the fuse? Where does it end?
Basically he's saying in a long-winded way that (1) you want your code to be as general as possible and (2) general code should pose as few requirements on the execution environment as possible, so (3) your code should be efficient. I fullheartedly agree to that and besides, efficiency is a virtue in itself.Here is the issue... It is a matter of real-world programming vs. theoretical-world programming.
In the real-world of programming, the vast majority of the time the programmer knows IN-ADVANCE what the constraints of the problem are.
In the theoretical-world of programming, the programmer knows nothing about the environment they are working in.
[...]
This seems rather misleading though. What does he mean by "heavy memory overhead"? You should know that processes in modern operating systems are granted some number of memory pages that they can use, and that they can ask for more memory pages when they run out of space. Modern allocators are optimized to squeeze as much as they can out of the memory pages that are already available to your program before asking for more. Of course the allocators have to store some additional data to keep track of what is in use and what isn't, but you wouldn't have to be afraid that they'd double the amount of memory required or anything. For really small items you can use special purpose allocators that pool the items into arrays, if necessary.[...] If the goal is to be able to store the most possible data does malloc make sense? No. Why? Because there is a heavy memory overhead associated with malloc, especially when the size of the items being allocated are small.
This is wrong, and I can tell you why with a simple counterexample: the phpBB post form. Forum members may post message anywhere from a single character (see Random bitstream!) to lengthy works like this one. Of course, you can arbitrarily assume that users will never post a message of more than 20k characters (this post is 10234 characters). So your mentor is suggesting to determine that size at design time and use a fixed size array.Now, because no machine is infinite, no data structure can grow infinitely. Because 99.9999% of the time, the programmer knows in advance what the largest amount of data will be, he can simply plan for that ahead of time and deal with the program at design time as opposed to run-time.
Your mentor obviously knows very little about C++ -- or about C, for that matter. High-level languages with modern, advanced compilers or interpreters, which definitely includes C, never reliably reflect what's going on in your machine. The only way to find that out is to inspect the assembly code that the compiler generates. One compiler may generate vastly different assembly from another, and of course different machine architectures require different assembly code. That's the entire point of all high-level languages: you're abstracting away from your machine so you can write portable code.C++ has many inherent issues with efficiency (which is why I avoid it). The theory of such programming languages is that they make things "simpler" for the programmer by disconnecting him from the real-world by generalizing things (or abstracting things) to the point where he no longer knows what is going on inside the computer. [...]
This is only marginally true. Statically typed, compiled, GC-free languages such as C, C++, Fortran and ATS produce very efficient assembly code. Only if you're highly skilled at assembly programming and you're thorougly knowledgeable about the target architecture you'll be able to improve such code, and even then the speedup will be insignificant most of the time.[...]
[...] Abstraction leads to inefficiency by it's very nature. [...]
It does, and very much so: it causes you to avoid all kind of existing, very useful tools and you'll have to reinvent the wheel many times. You can do it as an exercise once in a while, but don't waste too much time on it, especially since it doesn't train you to do anything useful.Of course, programming in the way I propose looks like it would take longer, it really doesn't.
Such practices will only cost your users, both because it takes you longer to release your software and because you write sub-optimal solutions.ALso, if you think about the amount of real-time you'll save for your users and multiply that out, [...]
In real-world situations you shouldn't be given such a task, because such datastructures are already available. And if you ever, ever find a need to implement some new kind of datastructure, please do it in C++ so you can just wrap your class in a template and never need to duplicate your code or use ugly preprocessor trickses to make it work for new value types.If I were given the task to create a data structure that required me to to "insert a node", [...]
Here your mentor seems to be contradicting himself. Does he want you to write code that is as general as possible, so it will be efficient enough in all situations for which it is intended, or does he want you to write new code every time your old code turns out not to be efficient enough? Curious.Let's say you were working on a huge system where you were told to implement this insert_inorder function. Now let's say there are hundreds of programmers that have called your routine. Now, let's say that on one of the cases, speed is very important, and in all of the other cases, no one cares. Now let's say there are thousands of places in the code where your function is called and you are faces with speeding this thing up. Where do you begin? You could try to optimize the routine but experience tells us that the best way to optimize anything is to KNOW as much in advance as possible. How do you solve such a problem.
On the other hand, if a special purpose routine were implemented for this specific situation, with advanced knowledge as to how it was to be used, and the stresses that would be placed upon it, all of the trade-offs could be done at the time it was written and this crisis I mentioned above most likely would never take place.
Jplus wrote:This seems rather misleading though. What does he mean by "heavy memory overhead"? You should know that processes in modern operating systems are granted some number of memory pages that they can use, and that they can ask for more memory pages when they run out of space. Modern allocators are optimized to squeeze as much as they can out of the memory pages that are already available to your program before asking for more. Of course the allocators have to store some additional data to keep track of what is in use and what isn't, but you wouldn't have to be afraid that they'd double the amount of memory required or anything. For really small items you can use special purpose allocators that pool the items into arrays, if necessary.[...] If the goal is to be able to store the most possible data does malloc make sense? No. Why? Because there is a heavy memory overhead associated with malloc, especially when the size of the items being allocated are small.
Now, this is not to say there are no problems with memory allocations. For one, memory allocation may add heavy time overhead. Context switches are required to add more pages, which can take a lot of time by itself, and if your memory gets severely fragmented allocators may have to do a lot of searching in order to find a gap of the right size. Another problem is that the responsibility to manage the allocated memory is on the programmer (if not wrapped in some kind of convenient handle like a C++ std::string), which turns out to be something that humans are not really good at.
So you may want to avoid free store allocation because of time efficiency, but not because of space efficiency (see below). The management pitfalls are a good reason to avoid manual allocation, but since you should especially work on your weaknesses, it's a good idea to exercise such responsibilities before you walk into them while doing serious, real-world work.
Your mentor obviously knows very little about C++ -- or about C, for that matter. High-level languages with modern, advanced compilers or interpreters, which definitely includes C, never reliably reflect what's going on in your machine. The only way to find that out is to inspect the assembly code that the compiler generates. One compiler may generate vastly different assembly from another, and of course different machine architectures require different assembly code. That's the entire point of all high-level languages: you're abstracting away from your machine so you can write portable code.C++ has many inherent issues with efficiency (which is why I avoid it). The theory of such programming languages is that they make things "simpler" for the programmer by disconnecting him from the real-world by generalizing things (or abstracting things) to the point where he no longer knows what is going on inside the computer. [...]
[...]or in embedded systems, or in a situation where you have discovered a performance bottleneck, his approach makes sense.
korona wrote:For many problems malloc actually helps to save memory as you do not over-allocate any buffers when you know ither exact size.
Yakk wrote:In addition, there are reasons to write inefficient code -- it is often programmer-time efficient, and there are more software development tasks than programmers to do them in general.
Jplus wrote:Modern allocators are optimized to squeeze as much as they can out of the memory pages that are already available to your program before asking for more. Of course the allocators have to store some additional data to keep track of what is in use and what isn't, but you wouldn't have to be afraid that they'd double the amount of memory required or anything. For really small items you can use special purpose allocators that pool the items into arrays, if necessary.
Jplus wrote:This is wrong, and I can tell you why with a simple counterexample: the phpBB post form.
Little Richie wrote:Yakk wrote:In addition, there are reasons to write inefficient code -- it is often programmer-time efficient, and there are more software development tasks than programmers to do them in general.
Sure, there are reasons, but I don't want to learn to be inefficient. I'd rather know both ways to solve a problem, and leave it up to me when to be inefficient.
All Shadow priest spells that deal Fire damage now appear green.
Big freaky cereal boxes of death.
Yakk wrote:I sort of don't like relying on the compiler to magically optimize without saying "by the way, this can be optimal". It seems overly brittle: with something like emplace insert or operator move, it happens (and if it doesn't, because it isn't possible, you get a compiler error). When you have an optimization like (N)RVO, doing something you don't realize is important might break the compilers ability to optimize it, and silently your code gets (possibly significantly) slower.
Users browsing this forum: No registered users and 11 guests