Bootstrapped Page

C NOTES

Readme Page - 2


Readme Page - 1


Memory Segments of an object file

Memory Segments

Text

The code segment, also known as a text segment or simply as text, is where a portion of an object file or the corresponding section of the program's address space that contains executable instructions is stored and is generally read-only and fixed size.

Data

In computing, a data segment (often denoted .data) is a portion of an object file or the corresponding address space of a program that contains initialized static variables, that is, global variables and static local variables. The size of this segment is determined by the size of the values in the program's source code, and does not change at run time.

The data segment is read/write, since the values of variables can be altered at run time. This is in contrast to the read-only data segment (rodata segment or .rodata), which contains static constants rather than variables; it also contrasts to the code segment, also known as the text segment, which is read-only on many architectures. Uninitialized data, both variables and constants, is instead in the BSS segment.

The .data segment contains any global or static variables which have a pre-defined value and can be modified. That is any variables that are not defined within a function (and thus can be accessed from anywhere) or are defined in a function but are defined as static so they retain their address across subsequent calls. Examples, in C, include:

   int val = 3;
           char string[] = "Hello World";
        

The values for these variables are initially stored within the read-only memory (typically within .text) and are copied into the .data segment during the start-up routine of the program. more

Note that in the above example, if these variables had been declared from within a function, they would default to being stored in the local stack frame.

BSS [block start by symbol]

The BSS segment, also known as uninitialized data, is usually adjacent to the data segment. The BSS segment contains all global variables and static variables that are initialized to zero or do not have explicit initialization in source code. For instance, a variable defined as static int i; would be contained in the BSS segment.

More on BSS

In C code, any variable with static storage duration is defined to be initialized to 0 by the spec (Section 6.7.8 Initialization, paragraph 10):

If an object that has static storage duration is not initialized explicitly, then:

Some program loaders will fill the whole section with zeroes to start with, and others will fill it 'on demand' as a perfomance improvement. So while you are technically correct that the .bss section may not really contain all zeroes when the C code starts executing, it logically does. In any case, assuming you have a standard compliant toolchain, you can think of it as being all zero.

Any variables that are initialized to non-zero values will never end up in the .bss section; they are handled in the .data or .rodata sections, depending on their particular characteristics.

Static storage duration

"Global variables" are defined at file scope, outside any function. All variables that are defined at file scope and all variables that are declared with the keyword static have something called static storage duration. This means that they will be allocated in a separate part of the memory and exist throughout the whole lifetime of the program.

It also means that they are guaranteed to be initialized to zero on any C compiler.

bss

Heap

The heap area commonly begins at the end of the .bss and .data segments and grows to larger addresses from there. The heap area is managed by malloc, calloc, realloc, and free, which may use the brk and sbrk system calls to adjust its size (note that the use of brk/sbrk and a single "heap area" is not required to fulfill the contract of malloc/calloc/realloc/free; they may also be implemented using mmap/munmap to reserve/unreserve potentially non-contiguous regions of virtual memory into the process' virtual address space). The heap area is shared by all threads, shared libraries, and dynamically loaded modules in a process.

Stack

The stack area contains the program stack, a LIFO structure, typically located in the higher parts of memory. A "stack pointer" register tracks the top of the stack; it is adjusted each time a value is "pushed" onto the stack. The set of values pushed for one function call is termed a "stack frame". A stack frame consists at minimum of a return address. Automatic variables are also allocated on the stack.

The stack area traditionally adjoined the heap area and they grew towards each other; when the stack pointer met the heap pointer, free memory was exhausted. With large address spaces and virtual memory techniques they tend to be placed more freely, but they still typically grow in a converging direction. On the standard PC x86 architecture the stack grows toward address zero, meaning that more recent items, deeper in the call chain, are at numerically lower addresses and closer to the heap. On some other architectures it grows the opposite direction.

where is rvalue stored in c?

more

Where it stores it is actually totally up to the compiler. The standard does not dictate this behavior. (typically within .text segment)

  int a;
          a = 10 + 5 - 3;
        

Here's a disassembly from MSVC:

0041338E mov dword ptr [a],0Ch

Why are global variables always initialized to '0', but not local variables?

Because that's the way it is, according to the C Standard. The reason for that is efficiency:

static - since their address is known and fixed. Initializing them to 0 does not incur a runtime cost. Actually static variables are initialized at runtime too. The C runtime (crt) will initialize them before calling main. Of course this happens only once but it still at runtime.

automatic variables can have different addresses for different calls and would have to be initialized at runtime each time the function is called, incurring a runtime cost that may not be needed. If you do need that initialization, then request it.

Also, you cannot initialize anything at compile time. The program needs to be started and loaded in memory so that you can 0 out the contents (in this case the bss section). It is impossible to that at compile time, only runtime.

Literal

In computer science, a literal is a notation for representing a fixed value in source code. Almost all programming languages have notations for atomic values such as integers, floating-point numbers, and strings, and usually for booleans and characters; some also have notations for elements of enumerated types and compound values such as arrays, records, and objects. An anonymous function is a literal for the function type.

In contrast to literals, variables or constants are symbols that can take on one of a class of fixed values, the constant being constrained not to change. Literals are often used to initialize variables, for example, in the following, 1 is an integer literal and the three letter string in "cat" is a string literal:

int a = 1;
        string s = "cat";
        

Expressions

An expression is a sequence of operators and their operands, that specifies a computation.

Expression evaluation may produce a result (e.g., evaluation of 2+2 produces the result 4), may generate side-effects (e.g. evaluation of printf("%d",4) sends the character '4' to the standard output stream), and may designate objects or functions.

Unevaluated expressions

The operands of the sizeof operator , the _Alignof operator, and the controlling expression of a generic selection, (since C11) are expressions that are not evaluated (unless they are VLAs) (since C99). Thus, size_t n = sizeof(printf("%d", 4)); does not perform console output.

Value categories

The words "lvalue" and "rvalue" (that's how the C standard spells them) have a long history. The terms come from 'l' for "left" and 'r' for "right", referring to the left and right sides of an assignment.

In some contexts, an expression may be either evaluated for its lvalue or evaluated for its rvalue. Given those definitions of the terms, an "rvalue" is what you'd normally think of as the value of an expression; evaluating 2+2 yields 4. Evaluating an expression for its lvalue meant determining what object it refers to. For example, given:

int x;
        x = 2 + 2;
        

the right side of the assignment, 2 + 2 would be evaluated for its rvalue, yielding 4, and the left side would be evaluated for its lvalue, which means determining the object to which it refers. (The rvalue of the expression is not evaluated; the value previously stored in x, if any, is not used.)

The C standard defines them differently. In C, an lvalue is not a value; it's a kind of expression. Specifically, quoting the 2011 ISO C standard, section 6.3.2.1:

An lvalue is an expression (with an object type other than void) that potentially designates an object; if an lvalue does not designate an object when it is evaluated, the behavior is undefined.

(The word "potentially" was added to cover cases like ptr, where ptr is a pointer object; if ptr == NULL then ptr doesn't currently designate an object, but it's still an lvalue. You can always determine at compile time whether a given expression is an lvalue or not. Earlier editions of the C standard has flawed definitions for lvalue.)

So basically an lvalue in C is an expression that designates an object. You can think of it as an expression that can appear on the left side of an assignment, though that's not entirely accurate; for example, the name of a const object can't be on the LHS of an assignment, but it's still an lvalue. (As you can see, nailing down a precise and consistent definition for lvalue can be tricky.)

Neither x++ nor ++x is an lvalue in C.

The C standard doesn't use the term rvalue beyond mentioning it in a single footnote:

What is sometimes called "rvalue" is in this International Standard described as the "value of an expression".

So, as C defines the terms, an lvalue is a kind of expression (something that exists in C source code), but an rvalue is the result of evaluating an expression (something that exists during program execution).

Is a pointer an lvalue or rvalue?

A pointer is not the kind of thing that can be an rvalue or an lvalue. A pointer is a type. The only thing that can be an rvalue or an lvalue is an expression.

Consider this similar question: "Is an integer an lvalue or an rvalue". Well, neither. "3" is an integer, and an rvalue. "3=i;" is illegal. But "i=3;" is legal if 'i' is an integer. So 'i' is an integer and an lvalue. '3' is an integer and a rvalue.

What is the reason for explicitly declaring L or UL for long values

Because it's not "after"; it's "before".

First you have the literal, then it is converted to whatever the type is of the variable you're trying to squeeze it into.

When a suffix L or UL is not used, the compiler uses the first type that can contain the constant from a list (see details in C99 standard, clause 6.4.4:5. For a decimal constant, the list is int, long int, long long int).

As a consequence, most of the times, it is not necessary to use the suffix. It does not change the meaning of the program. It does not change the meaning of your example initialization of x for most architectures, although it would if you had chosen a number that could not be represented as a long long. See also codebauer's answer for an example where the U part of the suffix is necessary.

Using Assembly Language with C

The asm keyword allows you to embed assembler instructions within C code. GCC provides two forms of inline asm statements. A basic asm statement is one with no operands (see Basic Asm), while an extended asm statement (see Extended Asm) includes one or more operands. The extended form is preferred for mixing C and assembly language within a function, but to include assembly language at top level you must use basic asm.

more

Basic Asm — Assembler Instructions Without Operands

A basic asm statement has the following syntax:

asm asm-qualifiers ( AssemblerInstructions )
        

The asm keyword is a GNU extension. When writing code that can be compiled with -ansi and the various -std options, use __asm__ instead of asm.

This is a literal string that specifies the assembler code. The string can contain any instructions recognized by the assembler, including directives. GCC does not parse the assembler instructions themselves and does not know what they mean or even whether they are valid assembler input.

You may place multiple assembler instructions together in a single asm string, separated by the characters normally used in assembly code for the system. A combination that works in most places is a newline to break the line, plus a tab character (written as ‘\n\t’). Some assemblers allow semicolons as a line separator. However, note that some assembler dialects use semicolons to start a comment.

Extended Asm - Assembler Instructions with C Expression Operands

With extended asm you can read and write C variables from assembler and perform jumps from assembler code to C labels. Extended asm syntax uses colons (‘:’) to delimit the operand parameters after the assembler template:

asm asm-qualifiers ( AssemblerTemplate 
                         : OutputOperands 
                         [ : InputOperands 
                         [ : Clobbers ] ])
        
        asm asm-qualifiers ( AssemblerTemplate 
                              : 
                              : InputOperands
                              : Clobbers
                              : GotoLabels)
        where in the last form, asm-qualifiers contains goto (and in the first form, not).
        

The asm keyword is a GNU extension.

Parameters

asm statements may not perform jumps into other asm statements, only to the listed GotoLabels. GCC’s optimizers do not know about other jumps; therefore they cannot take account of them when deciding how to optimize.

The total number of input + output + goto operands is limited to 30.

Examples:

In below example, %0 is just the first input/output operand defined in your code. In practice, this could be a stack variable, a heap variable or a register depending on how the assembly code generated by the compiler.

For example:

int a=10, b;
        asm ("movl %1, %%eax; 
              movl %%eax, %0;"
             :"=r"(b)        /* output */
             :"r"(a)         /* input */
             :"%eax"         /* clobbered register */
             );
        

%0 is b in this case and %1 is a.

Another Example:

int current_task;
        asm( "str %[output]"
           : [output] "=r" (current_task)
            );
        

more

Pointer - Value type casting

Example - UINT32_T & FLOAT

uint32_t FloatToUint(float n) {
           return (uint32_t)(*(uint32_t*)&n);
        }
        
        float UintToFloat(uint32_t n) {
           return (float)(*(float*)&n);
        }
        

Difference between const char *p, char * const p and const char * const p

const keyword applies to whatever is immediately to its left. If there is nothing to its left, it applies to whatever is immediately to its right.

const char *ptr : This is a pointer to a constant character. You cannot change the value pointed by ptr, but you can change the pointer itself. const char * is a (non-const) pointer to a const char.

NOTE: There is no difference between const char *p and char const *p as both are pointer to a const char and position of * (asterik) is also same.

char *const ptr : This is a constant pointer to non-constant character. You cannot change the pointer p, but can change the value pointed by ptr.

const char * const ptr : This is a constant pointer to constant character. You can neither change the value pointed by ptr nor the pointer ptr.

NOTE: char const * const ptr is same as const char *const ptr.

Difference between int* var and int *var

prefer int *i; because the parser attaches the star to the variable, and not the type.

This only becomes meaningful when you try to define two variables on the line. Regardless of how you write it:

int* i,j;
        int*i,j;
        int *i,j;`
        

in each of those, i is a pointer to an int, while j is just an int. The last syntax makes that clearer

size_t type

According to the 1999 ISO C standard (C99), size_t is an unsigned integer type of at least 16 bit (see sections 7.17 and 7.18.3).

This type is used to represent the size of an object. Library functions that take or return sizes expect them to be of type or have the return type of size_t. Further, the most frequently used compiler-based operator sizeof should evaluate to a constant value that is compatible with size_t.

It is guaranteed to be big enough to contain the size of the biggest object the host system can handle. Basically the maximum permissible size is dependent on the compiler; if the compiler is 32 bit then it is simply a typedef(i.e., alias) for unsigned int but if the compiler is 64 bit then it would be a typedef for unsigned long long. The size_t data type is never negative.

To convert the decimal into floating point, we have 3 elements in a 32-bit floating point representation:

i) Sign (MSB)

ii) Exponent (8 bits after MSB)

iii) Mantissa (Remaining 23 bits)

Sign bit is the first bit of the binary representation. '1' implies negative number and '0' implies positive number. Example: To convert -17 into 32-bit floating point representation Sign bit = 1

Exponent is decided by the nearest smaller or equal to 2n number. For 17, 16 is the nearest 2n. Hence the exponent of 2 will be 4 since 24 = 16. 127 is the unique number for 32 bit floating point representation. It is known as bias. It is determined by 2k-1 -1 where 'k' is the number of bits in exponent field. Thus bias = 127 for 32 bit. (28-1 -1 = 128-1 = 127)

Now, 127 + 4 = 131 i.e. 10000011 in binary representation.

Mantissa: 17 in binary = 10001. Move the binary point so that there is only one bit from the left. Adjust the exponent of 2 so that the value does not change. This is normalizing the number. 1.0001 x 24. Now, consider the fractional part and represented as 23 bits by adding zeros.

00010000000000000000000

Thus the floating point representation of -17 is 1 10000011 00010000000000000000000

To convert the decimal into floating point, we have 3 elements in a 32-bit floating point representation:

i) Sign (MSB)

ii) Exponent (8 bits after MSB)

iii) Mantissa (Remaining 23 bits)

Sign bit is the first bit of the binary representation. '1' implies negative number and '0' implies positive number. Example: To convert -17 into 32-bit floating point representation Sign bit = 1

Exponent is decided by the nearest smaller or equal to 2n number. For 17, 16 is the nearest 2n. Hence the exponent of 2 will be 4 since 24 = 16. 127 is the unique number for 32 bit floating point representation. It is known as bias. It is determined by 2k-1 -1 where 'k' is the number of bits in exponent field. Thus bias = 127 for 32 bit. (28-1 -1 = 128-1 = 127)

Now, 127 + 4 = 131 i.e. 10000011 in binary representation.

Mantissa: 17 in binary = 10001. Move the binary point so that there is only one bit from the left. Adjust the exponent of 2 so that the value does not change. This is normalizing the number. 1.0001 x 24. Now, consider the fractional part and represented as 23 bits by adding zeros.

00010000000000000000000

Thus the floating point representation of -17 is 1 10000011 00010000000000000000000

__cplusplus macro

The __cplusplus preprocessor macro is defined if the compilation unit is compiled with a C++ compiler. If defined, its value corresponds to the C++ standard that the compiler uses to compile a compilation unit.

extern C with __cplusplus macro

extern "C" is meant to be recognized by a C++ compiler and to notify the compiler that the function is (or to be) compiled in C style.

Example:

#ifdef __cplusplus
        extern "C" {
        #endif
        
        \\ statements to be compiled in C style
        
        #ifdef __cplusplus
        }
        #endif
        

static variables

Static variables have a property of preserving their value even after they are out of their scope

If global static variable is declared in a header file and its included in multiple source files, then, the variable will be treated as an independant (seperate) variable in each source files.

In the C programming language, static is used with global variables and functions to set their scope to the containing file. In local variables, static is used to store the variable in the statically allocated memory instead of the automatically allocated memory. While the language does not dictate the implementation of either type of memory, statically allocated memory is typically reserved in data segment of the program at compile time, while the automatically allocated memory is normally implemented as a transient call stack.

Static global variables and functions are also possible in C/C++. The purpose of these is to limit scope of a variable or function to a file.

Static variables should not be declared inside structure. The reason is C compiler requires the entire structure elements to be placed together (i.e.) memory allocation for structure members should be contiguous. It is possible to declare structure inside the function (stack segment) or allocate memory dynamically(heap segment) or it can be even global (BSS or data segment). Whatever might be the case, all structure members should reside in the same memory segment because the value for the structure element is fetched by counting the offset of the element from the beginning address of the structure. Separating out one member alone to data segment defeats the purpose of static variable and it is possible to have an entire structure as static.

In C, functions are global by default. The “static” keyword before a function name makes it static.

Unlike global functions in C, access to static functions is restricted to the file where they are declared. Therefore, when we want to restrict access to functions, we make them static. Another reason for making functions static can be reuse of the same function name in other files.

The C language is pass-by-value without exception. Passing a pointer as a parameter does not mean pass-by-reference.

A function is not able to change the actual parameters value.

#include <stdio.h>
        
        void function2(int *param) {
            printf("param's address %d\n", param);
            param = NULL;
        }
        
        int main(void) {
            int variable = 111;
            int *ptr = &variable;
        
            function2(ptr);
            printf("ptr's address %d\n", ptr);
            return 0;
        }
        

The result will be that the two addresses are equal

Example result:

param's address -1846583468
        ptr's address -1846583468
        

Header and source files in C

Converting C source code files to an executable program is normally done in two steps: compiling and linking.

First, the compiler converts the source code to object files (*.o) Then, the linker takes these object files, together with statically-linked libraries and creates an executable program.

In the first step, the compiler takes a compilation unit, which is normally a preprocessed source file (so, a source file with the contents of all the headers that it #includes) and converts that to an object file.

In each compilation unit, all the functions that are used must be declared, to let the compiler know that the function exists and what its arguments are. In your example, the declaration of the function returnSeven is in the header file header.h. When you compile main.c, you include the header with the declaration so that the compiler knows that returnSeven exists when it compiles main.c.

When the linker does its job, it needs to find the definition of each function. Each function has to be defined exactly once in one of the object files - if there are multiple object files that contain the definition of the same function, the linker will stop with an error.

Linkage

There is external linkage and internal linkage.

By default, functions have external linkage, which means that the compiler makes these functions visible to the linker. If you make a function static, it has internal linkage - it is only visible inside the compilation unit in which it is defined (the linker won't know that it exists). This can be useful for functions that do something internally in a source file and that you want to hide from the rest of the program.

Location of local pointer

const char* func()
        {
            char *ptr = "OK";
            return ptr;
        }
        

If you are referring to the location where the string OK is stored, then its stored in the code section of the memory and ptr is stored in the stack.

And the location of OK in code section is still accessible as its address is being returned by func().

Also, the code section is read only. That is the reason why other answers suggested to make your function declaration as:

const char * func ()

How to change a pointer inside a function (addr. it points, not value) / pointer to pointer

use pointer to pointer (double pointer)

Use \*\* when you want to preserve (OR retain change in) the Memory-Allocation or Assignment even outside of a function call.

void allocate(int** p)
        {
          *p = (int*)malloc(sizeof(int));
        }
        
        int main()
        {
          int* p = NULL;
          allocate(&p);
          *p = 42;
          free(p);
        }
        

Also,

If you want to have a list of characters (a word), you can use char *word

If you want a list of words (a sentence), you can use char **sentence

If you want a list of sentences (a monologue), you can use char ***monologue

so on...

realloc()

Size of dynamically allocated memory can be changed by using realloc().

void *realloc(void *ptr, size_t size);

realloc deallocates the old object pointed to by ptr and returns a pointer to a new object that has the size specified by size. The contents of the new object is identical to that of the old object prior to deallocation, up to the lesser of the new and old sizes. Any bytes in the new object beyond the size of the old object have indeterminate values.

The point to note is that realloc() should only be used for dynamically allocated memory.

Finding element size of arrays in functions

sizeof( a ) / sizeof( a[0] )

NOTE:

Parameters declared like arrays are adjusted to pointers to the type of the array element.

void f( int a[10] )
        {
            size_t n = sizeof( a ) / sizeof( a[0] );
            //...
        }
        

SAME AS

void f( int *a );
        

and within the function in expression

size_t n = sizeof( a ) / sizeof( a[0] ); parameter a is just a pointer.

Pointers do not keep an information about whether they point to a single object or the first object of some array.

In this case you should declare the function with second parameter that specifies the number of elements in the array.

Difference between char str[10] = "string" and char *str = "string"

char str[10] = "string"

char *str = "string"

What is the difference between these initializations?

char a[] = "string literal";
        char *p  = "string literal";
        

Alignment

Every complete object type has a property called alignment requirement, which is an integer value of type size_t representing the number of bytes between successive addresses at which objects of this type can be allocated. The valid alignment values are non-negative integral powers of two.

The alignment requirement of a type can be queried with _Alignof. (since C11)

In order to satisfy alignment requirements of all members of a struct, padding may be inserted after some of its members.

#include <stdio.h>
        #include <stdalign.h>
        
        // objects of struct S can be allocated at any address
        // because both S.a and S.b can be allocated at any address
        struct S {
            char a; // size: 1, alignment: 1
            char b; // size: 1, alignment: 1
        }; // size: 2, alignment: 1
        
        // objects of struct X must be allocated at 4-byte boundaries
        // because X.n must be allocated at 4-byte boundaries
        // because int's alignment requirement is (usually) 4
        struct X {
            int n;  // size: 4, alignment: 4
            char c; // size: 1, alignment: 1
            // three bytes padding
        }; // size: 8, alignment: 4
        
        int main(void)
        {
            printf("sizeof(struct S) = %zu\n", sizeof(struct S));
            printf("alignof(struct S) = %zu\n", alignof(struct S));
            printf("sizeof(struct X) = %zu\n", sizeof(struct X));
            printf("alignof(struct X) = %zu\n", alignof(struct X));
        }
        

Difference between sizeof and alignof

REF: sizeof_alignof.c

#include <stdio.h>
        #include <stdalign.h>
        #include <stdint.h> 
        
        typedef struct _GOOD_STRUCT {
            uint64_t eight_byte;
            uint16_t four_byte;
            uint8_t one_byte;
        } GOOD_STRUCT;
        
        typedef struct _BAD_STRUCT {
            uint8_t one_byte;
            uint32_t eight_byte;
            uint16_t four_byte;
            GOOD_STRUCT gd_struct;
        } BAD_STRUCT;
        
        void main() {
            printf("GOOD: sizeof -> %d   alignof -> %d\n", sizeof(GOOD_STRUCT), alignof(GOOD_STRUCT));
            printf("BAD : sizeof -> %d   alignof -> %d", sizeof(BAD_STRUCT), alignof(BAD_STRUCT));
        }
        

OP:

GOOD: sizeof -> 16   alignof -> 8
        BAD : sizeof -> 32   alignof -> 8
        

Here sizeof of BAD is 32 because as GOOD is included inside BAD the alignment which is higher in GOOD follows in BAD as well.

an alignment of 4 means that data of this type should (or must, depends on the CPU) be stored starting at an address that is a multiple of 4.

lvalue and rvalue

An lvalue (locator value) represents an object that occupies some identifiable location in memory (i.e. has an address).

rvalues are defined by exclusion. Every expression is either an lvalue or an rvalue, so, an rvalue is an expression that does not represent an object occupying some identifiable location in memory.

For example, An assignment expects an lvalue as its left operand, so the following is valid:

int i = 10;
        But this is not:
        int i;
        10 = i;
        

This is because i has an address in memory and is a lvalue. While 10 doesn't have an identifiable memory location and hence is an rvalue. So assigning the value of i to 10 doesn't make any sense.

Unlike C++, pointer incrementation/decrementation does not return an lvalue in C.

Type groups

Compiling multiple C files and linking them

If you have your two source files, you can compile them into object files without linking, as so:

gcc main.c -o main.o -c
        gcc module.c -o module.o -c
        

Where the -c flag tells the compiler to stop after the compilation phase, without linking. Then, you can link your two object files as so:

gcc -o myprog main.o module.o
        

Undefined behaviour (not exactly undefined) while using scanf()

Check for return value of scanf to check if ip buffer is pollutted

On success, the function returns the number of items of the argument list successfully filled. This count can match the expected number of items or be less (even zero) due to a matching failure, a reading error, or the reach of the end-of-file.

If a reading error happens or the end-of-file is reached while reading, the proper indicator is set (feof or ferror). And, if either happens before any data could be successfully read, EOF is returned.

If an encoding error happens interpreting wide characters, the function sets errno to EILSEQ.

Example:

printf("Enter option: ");
        if(!scanf("%d", &usr_option))
            scanf_err_handle_flush_n_get_ip(&usr_option);
        
        void scanf_err_handle_flush_n_get_ip(int *ip_buff) {
        
            int c;
            while((c = getchar()) != '\n' && c != EOF)
                /* stdin ip discard */ ;
        
            /* recursively ask for ip till scanf exits with success */
            printf("\nEnter option: ");
        
            if(!scanf("%d", ip_buff))     
                scanf_err_handle_flush_n_get_ip(ip_buff);
        
        }
        

Difference between pointers and reference

A pointer is a variable that holds a memory address. A reference has the same memory address as the item it references.

Example:


        // C++ program to swap two numbers using 
        // pass by reference. 
        
        #include <iostream> 
        using namespace std; 
        void swap(int& x, int& y) 
        { 
            int z = x; 
            x = y; 
            y = z; 
        } 
        
        int main() 
        { 
            int a = 45, b = 35; 
            cout << "Before Swap\n"; 
            cout << "a = " << a << " b = " << b << "\n"; 
        
            swap(a, b); 
        
            cout << "After Swap with pass by reference\n"; 
            cout << "a = " << a << " b = " << b << "\n"; 
        }
        

A pointer to a class/struct uses ‘->'(arrow operator) to access it’s members whereas a reference uses a ‘.'(dot operator) A pointer needs to be dereferenced with * to access the memory location it points to, whereas a reference can be used directly.

Turn on different warnings while compiling c files:

Example:

gcc main.c -o main.o -c -pedantic -Wall -Wextra -Wconversion

-Wpedantic
        -pedantic
        

Issue all the warnings demanded by strict ISO C and ISO C++; reject all programs that use forbidden extensions, and some other programs that do not follow ISO C and ISO C++. For ISO C, follows the version of the ISO C standard specified by any -std option used.

-Wall
        

This enables all the warnings about constructions that some users consider questionable, and that are easy to avoid (or modify to prevent the warning), even in conjunction with macros. This also enables some language-specific warnings described in C++ Dialect Options and Objective-C and Objective-C++ Dialect Options.

-Wall turns on the following warning flags: -Waddress
-Warray-bounds=1 (only with -O2)
-Wbool-compare
-Wbool-operation
-Wc++11-compat -Wc++14-compat
-Wcatch-value (C++ and Objective-C++ only)
-Wchar-subscripts
-Wcomment
-Wduplicate-decl-specifier (C and Objective-C only) -Wenum-compare (in C/ObjC; this is on by default in C++) -Wenum-conversion in C/ObjC; -Wformat
-Wint-in-bool-context
-Wimplicit (C and Objective-C only) -Wimplicit-int (C and Objective-C only) -Wimplicit-function-declaration (C and Objective-C only) -Winit-self (only for C++) -Wzero-length-bounds -Wlogical-not-parentheses -Wmain (only for C/ObjC and unless -ffreestanding)
-Wmaybe-uninitialized -Wmemset-elt-size -Wmemset-transposed-args -Wmisleading-indentation (only for C/C++) -Wmissing-attributes -Wmissing-braces (only for C/ObjC) -Wmultistatement-macros
-Wnarrowing (only for C++)
-Wnonnull
-Wnonnull-compare
-Wopenmp-simd -Wparentheses
-Wpessimizing-move (only for C++)
-Wpointer-sign
-Wreorder
-Wrestrict
-Wreturn-type
-Wsequence-point
-Wsign-compare (only in C++)
-Wsizeof-pointer-div -Wsizeof-pointer-memaccess -Wstrict-aliasing
-Wstrict-overflow=1
-Wswitch
-Wtautological-compare
-Wtrigraphs
-Wuninitialized
-Wunknown-pragmas
-Wunused-function
-Wunused-label
-Wunused-value
-Wunused-variable
-Wvolatile-register-var

-Wextra
        

This enables some extra warning flags that are not enabled by -Wall. (This option used to be called -W. The older name is still supported, but the newer name is more descriptive.)

The option -Wextra also prints warning messages for the following cases:

A pointer is compared against integer zero with <, <=, >, or >=. (C++ only) An enumerator and a non-enumerator both appear in a conditional expression. (C++ only) Ambiguous virtual bases. (C++ only) Subscripting an array that has been declared register. (C++ only) Taking the address of a variable that has been declared register. (C++ only) A base class is not initialized in the copy constructor of a derived class.

-Wconversion
        

Warn for implicit conversions that may alter a value. This includes conversions between real and integer, like abs (x) when x is double; conversions between signed and unsigned, like unsigned ui = -1; and conversions to smaller types, like sqrtf (M_PI). Do not warn for explicit casts like abs ((int) x) and ui = (unsigned) -1, or if the value is not changed by the conversion like in abs (2.0). Warnings about conversions between signed and unsigned integers can be disabled by using -Wno-sign-conversion.

For C++, also warn for confusing overload resolution for user-defined conversions; and conversions that never use a type conversion operator: conversions to void, the same type, a base class or a reference to them. Warnings about conversions between signed and unsigned integers are disabled by default in C++ unless -Wsign-conversion is explicitly enabled.

Warnings about conversion from arithmetic on a small type back to that type are only given with -Warith-conversion.

Why is (arr + 1) and (&arr + 1) different though arr and &arr point to the same location?

They are different types. arr is of the type int *, where as &arr is of the type int (*)[size].

So, &arr points to the entire array where as arr points to the first element of the array.

The ``Clockwise/Spiral Rule''

By David Anderson

There is a technique known as the ``Clockwise/Spiral Rule'' which enables any C programmer to parse in their head any C declaration!

There are three simple steps to follow:

Starting with the unknown element, move in a spiral/clockwise direction; when ecountering the following elements replace them with the corresponding english statements:

Keep doing this in a spiral/clockwise direction until all tokens have been covered.

Always resolve anything in parenthesis first!

Examples:
                     +-----------------------------+
                              |                  +---+      |
                              |  +---+           |+-+|      |
                              |  ^   |           |^ ||      |
                        void (*signal(int, void (*fp)(int)))(int);
                         ^    ^      |      ^    ^  ||      |
                         |    +------+      |    +--+|      |
                         |                  +--------+      |
                         +----------------------------------+
        
Question we ask ourselves: What is `signal'?
        
        Notice that signal is inside parenthesis, so we must resolve this first!
        
        Moving in a clockwise direction we see `(' so we have...
        ``signal is a function passing an int and a...
        
        Hmmm, we can use this same rule on `fp', so... What is fp? fp is also inside parenthesis so continuing we see an `*', so...
        fp is a pointer to...
        
        Continue in a spiral clockwise direction and we get to `(', so...
        ``fp is a pointer to a function passing int returning...''
        
        Now we continue out of the function parenthesis and we see void, so...
        ``fp is a pointer to a function passing int returning nothing (void)''
        
        We have finished with fp so let's catch up with `signal', we now have...
        ``signal is a function passing an int and a pointer to a function passing an int returning nothing (void) returning...
        
        We are still inside parenthesis so the next character seen is a `*', so...
        ``signal is a function passing an int and a pointer to a function passing an int returning nothing (void) returning a pointer to...
        
        We have now resolved the items within parenthesis, so continuing clockwise, we then see another `(', so...
        ``signal is a function passing an int and a pointer to a function passing an int returning nothing (void) returning a pointer to a function passing an int returning...
        
        Finally we continue and the only thing left is the word `void', so the final complete definition for signal is:
        ``signal is a function passing an int and a pointer to a function passing an int returning nothing (void) returning a pointer to a function passing an int returning nothing (void)''
        

Can enum member be the size of an array in ANSI-C?

Yes,

C const actually means read only.

The const qualifier really means ``read-only''; an object so qualified is a run-time object which cannot (normally) be assigned to. The value of a const-qualified object is therefore not a constant expression in the full sense of the term, and cannot be used for array dimensions, case labels, and the like. (C is unlike C++ in this regard.) When you need a true compile-time constant, use a preprocessor #define (or perhaps an enum).
        
        References: ISO Sec. 6.4 H&S Secs. 7.11.2,7.11.3 pp. 226-7
        

Can a const variable be used to declare the size of an array in C?

In C, const is a misnomer for read-only. const variables can change their value, e.g. it is perfectly okay to declare

const volatile int timer_tick_register; /* A CPU register. */
        

which you can read and get a different value with each read, but not write to. The language specification thus treats const qualified objects not as constant expressions suitable for array sizes.

In a very simple way because the compiler must know the dimension of the array at compilation time and since you can initialize const variable at run time you can't do it. So the size of statically declared arrays must be a constant expression and a const variable is not it. For constant expression you should use either a macro (#define) or enum.

When and for what purposes should the const keyword be used in C for variables?

int find(const int *data, size_t size, int value);
        
const double PI = 3.14;
        
// don't add const to 'value' or 'size'
        int find(const int *data, size_t size, const int value);
        
const volatile int32_t *DEVICE_STATUS =  (int32_t*) 0x100;
        

Is extern needed for functions in header file

Functions declared in header files do not need to be declared extern. They are implicitly declared with "extern".

Difference between defining a variable and declaring a variable

It is important to understand the difference between defining a variable and declaring a variable:

You may declare a variable multiple times (though once is sufficient); you may only define it once within a given scope. A variable definition is also a declaration, but not all variable declarations are definitions.

Best way to declare and define global variables

The clean, reliable way to declare and define global variables is to use a header file to contain an extern declaration of the variable.

The header is included by the one source file that defines the variable and by all the source files that reference the variable. For each program, one source file (and only one source file) defines the variable. Similarly, one header file (and only one header file) should declare the variable. The header file is crucial; it enables cross-checking between independent TUs (translation units — think source files) and ensures consistency.

What is external linkage and internal linkage?

When you write an implementation file (.cpp, .cxx, etc) your compiler generates a translation unit. This is the object file from your implementation file plus all the headers you #included in it.

Internal linkage refers to everything only in scope of a translation unit.

External linkage refers to things that exist beyond a particular translation unit. In other words, accessible through the whole program, which is the combination of all translation units (or object files).

You can explicitly control the linkage of a symbol by using the extern and static keywords. If the linkage isn't specified then the default linkage is extern for non-const symbols and static (internal) for const symbols.

What is the difference between scope and linkage?

"scope" is a namespace of the compiler; "linkage" is about compiled units.

I explain a bit more: A variable declared in a function has the scope of that function, i.e. it is visible only within that function. A variable declared as static in a source file, can be seen only by the code in that source file (and all included files!). Variables can also have global scope: they can be referred to in a source file, but not declared (allocated) in that source file but declared in another source file.

In stead of "source file" we should say "compilation unit" as it is the C source file being compiled, plus all included files. Scope refers to everything the compiler can "see" in a compilation unit. These are namespaces.

After compilation of a project there are a number of object files, one for each compile unit. Each may refer to variables used that are not declared in the compile unit. The linker must now resolve these references between object files: linkage.

This also holds for functions.

How can I access structure fields by name at run time?

A: Keep track of the field offsets as computed using the offsetof() macro (see question 2.14). If structp is a pointer to an instance of the structure, and field f is an int having offset offsetf, f's value can be set indirectly with

*(int *)((char *)structp + offsetf) = value;

Bit Fields in C

In C, we can specify size (in bits) of structure and union members. The idea is to use memory efficiently when we know that the value of a field or group of fields will never exceed a limit or is withing a small range.

#include <stdio.h> 
        
        // Space optimized representation of the date 
        struct date { 
            // d has value between 1 and 31, so 5 bits 
            // are sufficient 
            unsigned int d : 5; 
        
            // m has value between 1 and 12, so 4 bits 
            // are sufficient 
            unsigned int m : 4; 
        
            unsigned int y; 
        }; 
        
        int main() 
        { 
            printf("Size of date is %lu bytes\n", sizeof(struct date)); 
            struct date dt = { 31, 12, 2014 }; 
            printf("Date is %d/%d/%d", dt.d, dt.m, dt.y); 
            return 0; 
        }
        

Facts about bit fields in C.

Is there a quick way to determine endianness of your machine?

#include <stdio.h> 
        int main()  
        { 
           unsigned int i = 1; 
           char *c = (char*)&i; 
           if (*c)     
               printf("Little endian"); 
           else
               printf("Big endian"); 
           getchar(); 
           return 0; 
        }
        

The behavior of code which contains multiple, ambiguous side effects has always been undefined. (Loosely speaking, by ``multiple, ambiguous side effects'' we mean any combination of increment, decrement, and assignment operators (++, --, =, +=, -=, etc.) in a single expression which causes the same object either to be modified twice or modified and then inspected.

Example:

int i = 3;
        i = i++;
        

on several compilers. Some give i the value 3, and some gave 4.

short-circuiting behavior in C

while((c = getchar()) != EOF && c != '\n')

the right-hand side is not evaluated if the left-hand side determines the outcome (i.e. is true for || or false for &&). Therefore, left-to-right evaluation is guaranteed, as it also is for the comma operator . Furthermore, all of these operators (along with ?:) introduce an extra internal sequence point .

Note:

printf("%d %d", f1(), f2()); call f2 first? I thought the comma operator guaranteed left-to-right evaluation.

The comma operator does guarantee left-to-right evaluation, but the commas separating the arguments in a function call are not comma operators. The order of evaluation of the arguments to a function call is unspecified.

sequence point

A sequence point is a point in time at which the dust has settled and all side effects which have been seen so far are guaranteed to be complete. The sequence points listed in the C standard are:

How can I avoid these undefined evaluation order difficulties if I don't feel like learning the complicated rules?

A: The easiest answer is that if you steer clear of expressions which don't have reasonably obvious interpretations, for the most part you'll steer clear of the undefined ones, too. (Of course, reasonably obvious'' means different things to different people. This answer works as long as you agree that a[i] = i++ and i = i++ are not reasonably obvious.'')

To be a bit more precise, here are some simpler rules which, though slightly more conservative than the ones in the Standard, will help to make sure that your code is ``reasonably obvious'' and equally understandable to both the compiler and your fellow programmers:

Why doesn't this code int a = 1000, b = 1000; long int c = a * b; work?

Under C's integral promotion rules, the multiplication is carried out using int arithmetic, and the result may overflow or be truncated before being promoted and assigned to the long int left-hand side. Use an explicit cast on at least one of the operands to force long arithmetic:

`long int c = (long int)a * b;`
        

or perhaps

`long int c = (long int)a * (long int)b;`
        

(both forms are equivalent).

Notice that the expression (long int)(a * b) would not have the desired effect. An explicit cast of this form (i.e. applied to the result of the multiplication) is equivalent to the implicit conversion which would occur anyway when the value is assigned to the long int left-hand side, and like the implicit conversion, it happens too late, after the damage has been done.

If I have a struct in C / C++, is there no way to safely read/write it to a file that is cross-platform/compiler compatible?

If you have the opportunity to design the struct yourself, it should be possible. The basic idea is that you should design it so that there would be no need to insert pad bytes into it. the second trick is that you must handle differences in endianess.

I'll describe how to construct the struct using scalars, but the you should be able to use nested structs, as long as you would apply the same design for each included struct.

First, a basic fact in C and C++ is that the alignment of a type can not exceed the size of the type. If it would, then it would not be possible to allocate memory using malloc(N*sizeof(the_type)).

Layout the struct, starting with the largest types.

 struct
         {
           uint64_t alpha;
           uint32_t beta;
           uint32_t gamma;
           uint8_t  delta;
        

Next, pad out the struct manually, so that in the end you will match up the largest type:

   uint8_t  pad8[3];    // Match uint32_t
           uint32_t pad32;      // Even number of uint32_t
         }
        

Next step is to decide if the struct should be stored in little or big endian format. The best way is to "swap" all the element in situ before writing or after reading the struct, if the storage format does not match the endianess of the host system.

TO DO: