28

An interesting discussion has arisen in the comments to this recent question: Now, although the language there is C, the discussion has drifted to what the C++ Standard specifies, in terms of what constitutes undefined behaviour when accessing the elements of a multidimensional array using a function like std::memcpy.

First, here's the code from that question, converted to C++ and using const wherever possible:

#include <iostream>
#include <cstring>

void print(const int arr[][3], int n)
{
    for (int r = 0; r < 3; ++r) {
        for (int c = 0; c < n; ++c) {
            std::cout << arr[r][c] << " ";
        }
        std::cout << std::endl;
    }
}

int main()
{
    const int arr[3][3] = { {1, 2, 3}, {4, 5, 6}, {7, 8, 9} };
    int arr_copy[3][3];
    print(arr, 3);
    std::memcpy(arr_copy, arr, sizeof arr);
    print(arr_copy, 3);
    return 0;
}

The issue is in the call to std::memcpy: the arr argument will yield (by decay) a pointer to the first int[3] subarray so, according to one side of the discussion (led by Ted Lyngmo), when the memcpy function accesses data beyond the third element of that subarray, there is formally undefined behaviour (and the same would apply to the destination, arr_copy).

However, the other side of the debate (to which mediocrevegetable1 and I subscribe) uses the rationale that each of the 2D arrays will, by definition, occupy continuous memory and, as the arguments to memcpy are just void* pointers to those locations (and the third, size argument is valid), then there cannot be UB here.

Here's a summary of some of the comments most pertinent to the debate, in case any "clean-up" occurs on the original question (bolding for emphasis mine):

I don't think there's any out-of-bounds here. Just like memcpy works for an array of ints, it works for an array of int [3]s, and both should be contiguous (but I'm not 100% sure). – mediocrevegetable1

The out of bounds access happens when you copy the first byte from arr[0][3]. I've never seen it actually fail, but, in C++, it has UB. – Ted Lyngmo

But the memcpy function/call doesn't do any array indexing - it's just given two void* pointers and copies memory from one to the other. – Adrian Mole

I can't say for sure if that matters in C. In C++ it doesn't. You get a pointer to the first int[3] and any access out of its range has UB. I haven't found any exception to that in the C++ standard. – Ted Lyngmo

I don't think the arr[0][3] thing applies. By that logic, I think copying the second int of an int array through memcpy would be UB as well. int [3] is simply the type of arr's elements, and the bounds of arr as a whole in bytes should be sizeof (int [3]) * 3. I'm probably missing something though :/ – mediocrevegetable1

Are there any C++ Language-Lawyers who can settle the matter – preferably with (an) appropriate citation(s) from the C++ Standard?

Also, relevant citations from the C Standard may be helpful – especially if the two language Standards differ – so I've included the C tag in this question.

7
  • Somewhat related C question, which cites the C standard, but is not specific to memcpy: Cast T[][] to T* yesterday
  • I'm sure there are dups about memcpy beyond the object bounds. (C11 7.24.1 String function conventions p.1 «If an array [of characters comprising an object] is accessed beyond the end of an object, the behavior is undefined.») yesterday
  • 2
    I would think that library functions, being part of the implementation, would be exempt from some of the rules regarding how to access objects.
    – dbush
    yesterday
  • 2
    Why the fact that arrays are 2D is relevant here? The question is simpler. «If one wants to memcpy from an object denoted by s to an object denoted by d, they write memcpy(&d, &s, sizeof s). What gives one a permission, in case the objects are of array type, to pass pointers to the first elements of the arrays, i.e. memcpy(d, s, sizeof s)» (Or memcpy(d, &s, sizeof s), or memcpy(&d, s, sizeof s)) 21 hours ago
  • 2
    Since I'm quoted, I'd just like to make my logic clear, as I feel I might not have been fully clear about what exactly I was saying in my original comments (it was 1am at the time :p). As an example, if you're copying one int foo[3] to another int bar[3] with memcpy(bar, foo, sizeof bar); I think everyone agrees this is well-defined. Just like that, arr and arr_copy are both simply arrays of int [3]s, and I think they should not behave differently than any other array type. HolyBlackCat definitely explains it better. 6 hours ago
15

std::memcpy(arr_copy, arr, sizeof arr); (your example) is well-defined.

std::memcpy(arr_copy, arr[0], sizeof arr);, on the other hand, would cause undefined behavior.


Multidimensional arrays are 1D arrays of arrays. As far as I know, they don't get much (if any) special treatment compared to true 1D arrays (i.e. arrays with elements of non-array type).

Consider an example with a 1D array:

int a[3] = {1,2,3}, b[3];
std::memcpy(b, a, sizeof(int) * 3);

This is obviously well-defined, so I'm not going to cite the standard.

Notice that memcpy receives a pointer to the first element of the array, and can access other elements.

The element type doesn't affect the validity of this example. If you use a 2D array, the element type becomes int[N] rather than int, but the validity is not affected.

Now, consider a different example:

int a[2][2] = {{1,2},{3,4}}, b[4];
std::memcpy(b, a[0], sizeof(int) * 4);
//             ^~~~

This one causes UB, because since memcpy is given a pointer to the first element of a[0], it can only access the elements of a[0] (a[0][i]), and not a[j][i].

But, if you want my opinion, this is a "tame" kind of UB, likely to not cause problems in practice (but, as always, UB should be avoided if possible).

23
  • 3
    std::memcpy(arr_copy, arr, sizeof arr); (your example) is well-defined. Except that it is not. yesterday
  • 8
    @LanguageLawyer So, the debate continues. Care to offer an answer? yesterday
  • 5
    @LanguageLawyer strcpy is bound by the same limitations, and it always receives const char * as input parameter. By your logic, giving it a non-empty string would always cause UB, correct? yesterday
  • 3
    @LanguageLawyer We both know that your second example is ok by common sense. Even if it turned out to be technically UB, this would be an obvious defect in the standard. Trying to avoid this supposed "UB" makes no sense. yesterday
  • 8
    This is obviously well-defined, so I'm not going to cite the standard. The most sound language-lawyer tagged question answer I've seen. yesterday
4

Is copying 2D arrays with "memcpy" technically undefined behaviour?

(n.b., this only covers C, per the draft C11 standard at https://port70.net/~nsz/c/c11/n1570.html)

No, it is not.

TLDR Summary:

6.7.6.3 Function declarators (including prototypes), paragraph 7 defines decay of arrays to pointers in function calls. BUT that decay is done under the auspices of 6.9.1 Function definitions, paragraph 7, which states "... in either case, the type of each parameter is adjusted as described in 6.7.6.3 for a parameter type list; the resulting type shall be a complete object type."

That directly refutes the concept that the pointer that results from array decay when an array is passed as a function parameter does not refer to the entire array.

Detailed Answer

First arrays are "complete objects".

Why arrays must be "complete objects"

(If someone can find a statement in the standard[s] defining arrays as "complete objects" this entire section of this answer is redundant.)

While not explicitly defined as such in the (draft) C11 standard (at least not anywhere that I have been able to find), arrays are implicitly "complete objects" in multiple statements, such as statements where arrays are explicitly removed from the "complete object" category:

6.5.2.2 Function calls, paragraph 1:

The expression that denotes the called function shall have type pointer to function returning void or returning a complete object type other than an array type.

6.7.2.1 Structure and union specifiers does not explicitly allow array members of structures and unions other than "flexible array members" in paragraph 18:

As a special case, the last element of a structure with more than one named member may have an incomplete array type; this is called a flexible array member. ...

The only paragraph of 6.7.2.1 Structure and union specifiers is paragraph 9:

A member of a structure or union may have any complete object type other than a variably modified type.

That is the only statement in the (draft) C11 standard that allows for the inclusion of arrays in structures and unions.

Array initialization is covered by 6.7.9 Initialization, paragraph 3:

The type of the entity to be initialized shall be an array of unknown size or a complete object type that is not a variable length array type.

That only covers arrays of fixed, known size via the category "complete object".

Function return values have arrays explicitly removed from the "complete object" category by 6.9.1 Function definitions, paragraph 3:

The return type of a function shall be void or a complete object type other than array type.

So, we have established that arrays are "complete objects".

Parameters to functions are "complete object types"

Per 6.9.1 Function definitions, Semantics, paragraph 7:

the type of each parameter is adjusted as described in 6.7.6.3 for a parameter type list; the resulting type shall be a complete object type.

Why "complete object" is important

6.5.2.1 Array subscripting, paragraph 1 states:

One of the expressions shall have type ''pointer to complete object type'', the other expression shall have integer type, and the result has type ''type''.

And per 6.9.1p7, the array was passed as a "complete object type", which means the pointer can be dereferenced to access the entire array.

Q.E.D.

11
  • 2
    @AdrianMole Thanks. But as I just added, this only covers C per the C11 draft standard. 10 hours ago
  • Drive-by DV, care to explain your reasoning? 6 hours ago
  • NMDV, and I have yet to form an opinion on most of the content of this answer, but its focus on the term "complete object" seems to indicate a mis-parse. "Complete object type" should be read as "complete (object type)", and it describes a type that is both an object type (as opposed to a function type) and a complete type (as opposed, for example, to an array type with one dimension unspecified). 6 hours ago
  • 1
    Now having read this answer more completely, I'm not buying it at all. Although I agree that the behavior of copying an array via a function call of the form memcpy(dest_array, src_array, sizeof(src_array)); is well defined for arrays with any element type, including other array types, I don't accept that this argument establishes that. 6 hours ago
  • 2
    Again, I agree with you that the behavior in question is well defined. But no, I'm still not buying your argument for that. I think we agree that the question is whether the pointer must be interpreted as pointing to only one object for memcpy()'s purposes, and I think the fact that memcpy() receives the result of its conversion to void *, which carries no information about the pointed-to type, is probably a better handle on that. 5 hours ago
4

It's well-defined, even if you use memcpy(arr_cpy, arr, size) rather than
memcpy(&arr_cpy, &arr, size) (which @LanguageLawyer has finally explained is what they've been arguing for the whole time), for reasons explained by @HolyBlackCat and others.

Experimental evidence for how compiler-developers chose to interpret the standard as letting memcpy read from the whole outer object (array-of-array-of-int) which is pointed-to by the void* arg, even if that void* was obtained as a pointer to the first element (i.e. to the first array-of-int):

If you pass a size that's too large, you do get a warning, and for GCC the warning even spells out exactly what object and what size it sees being memcpyed:

#include <cstring>

int dst[2][2];
void foo(){
    int arr[2][2] = {{1,1},{1,1}};
    std::memcpy(dst, arr, sizeof(arr));  // compiles cleanly
}

void size_too_large(){
    int arr[2][2] = {{1,1},{1,1}};
    std::memcpy(dst, arr, sizeof(arr)+4);
}

Using &dst, &src makes no difference here to warnings or lack thereof.
Godbolt compiler explorer for GCC and clang -O2 -Wall -Wextra -pedantic -fsanitize=undefined, and MSVC -Wall.

GCC's warning for size_too_large() is:

warning: 'void* memcpy(void*, const void*, size_t)' forming offset [16, 19] is  \
  out of the bounds [0, 16] of object 'dst' with type 'int [2][2]' [-Warray-bounds]
   11 |     std::memcpy(dst, arr, sizeof(arr)+4);
      |     ~~~~~~~~~~~^~~~~~~~~~~~~~~~~~~~~~~~~
<source>:3:5: note: 'dst' declared here
    3 | int dst[2][2];

clang's doesn't spell out the object type, but does still show sizes:

<source>:11:5: warning: 'memcpy' will always overflow; destination buffer has size 16, but size argument is 20 [-Wfortify-source]
    std::memcpy(dst, arr, sizeof(arr)+4);
    ^

So it's clearly safe in practice with real compilers, a fact which we already knew. Both see the destination arg as being the whole 16-byte int [2][2] object.

The only question is whether there's any merit to the argument that there's a defect in the standard's wording for the way it explains which object is relevant for the language beyond the end of an object, whether that's limited to the single pointed-to object after array to pointer "decay" for passing an arg to memcpy. (And yes, that would be a defect in the standard; it's widely assumed that you don't need and shouldn't use &arr with an array type for memcpy, or basically ever AFAIK.)

To me, that sounds like a misinterpretation of the standard, but I may be biased because I of course want to read it as saying what we all know is true in practice. I still think that having it be well-defined is a valid interpretation of the wording in the standard, but the other interpretation may also be valid. (i.e. it could be ambiguous whether it's UB or not, which would be a defect.)

A void* pointing to the first element of an array can be cast back to an int (*)[2] to access the whole array object. That isn't how memcpy uses it, but it shows that the pointer hasn't lost its status as a pointer to the whole N-dimensional array. I think the authors of the standard are assuming this reasoning, that this void* can be considered a pointer to the whole object, not just the first element.

3
  • The C++ standard is essentially silent about memcpy and refers to the C standard; the C standard appears to assume that a bytewise copy doesn't need much of an explanation, a notion I agree with, and is otherwise unconcerned about the ways we arrived at the void pointers, as it should: They are void pointers. That a compiler warns about out-of-bounds access for known objects is almost irrelevant here: Pass the addresses through a function in a different TU and the information about the source object is not accessible any longer anyway. The only question is whether the memory is accessible. 2 hours ago
  • @Peter-ReinstateMonica: Some who are arguing that this memcpy could be UB are basing the argument on a reading of the standard which only defines the behaviour for memcpy to access the object it gets passed a pointer to. So the question becomes, which object is that; the whole 2D array, or the first 1D array element of it, when you have a true array, not just a pointer to one. (It seems from \@LanguageLawyer's arguments that calling a foo(int (*)[2]) function which in turn calls memcpy should be fine since you already have a pointer, or maybe they think one should use &arr there, too?) 1 hour ago
  • @Peter-ReinstateMonica: It's a well-known fact that passing args through non-inline functions will hide UB from the compiler, depriving it of the opportunity to capriciously break code that assumes any asm-level thinking like bytes are just bytes. Anyway, that's irrelevant; the key is the warning I got containing positive evidence that it does agree we passed a 16-byte object. 1 hour ago
2

The question is about C++; I can only answer for C. In C, this is well-defined behavior. I'll be quoting from a December 11, 2020 draft of the C2x standard, found at http://www.open-std.org/jtc1/sc22/wg14/www/docs/n2596.pdf; all emphasis will be as in the original.

The question is whether we can apply a memcpy to an int[3][3]. An int[3][3] is an array of arrays, while memcpy works on bytes. So we will need to know what the standard says about the representation of arrays as bytes.

We start with arrays. Section 6.2.5, "Types", paragraph 22, defines array types:

An array type describes a contiguously allocated nonempty set of objects with a particular member object type, called the element type.

An int[3][3] is therefore a contiguously allocated nonempty set of three int[3] objects. Each of those is a contiguously allocated nonempty set of three int objects.

Let's first ask about int objects. Everyone expects a memcpy of a single int to work. To see that the standard requires this, we look in section 6.2.6.1, "General", paragraph 2:

Except for bit-fields, objects are composed of contiguous sequences of one or more bytes, the number, order, and encoding of which are either explicitly specified or implementation-defined.

So an int is a contiguous sequence of one or more bytes. Therefore our int[3][3] is a contiguous sequence of three contiguous sequences of three contiguous sequences of sizeof(int) bytes; the standard requires that it is 9 × sizeof(int) contiguous bytes.

The standard also puts requirements on how these bytes relate to the array indices. Section 6.5.2.1, "Array subscripting," paragraph 2, says:

A postfix expression followed by an expression in square brackets [] is a subscripted designation of an element of an array object. The definition of the subscript operator [] is that E1[E2] is identical to (*((E1)+(E2))).

So arr[1] == *((arr)+(1)) is the second int[3], arr[1][2] == *((*((arr)+(1)))+(2)) is its third element, and this must be the sixth int past the start of arr. Paragraph 3 is explicit about this:

Successive subscript operators designate an element of a multidimensional array object. If E is an n-dimensional array (n ≥ 2) with dimensions i × j × ··· × k, then E (used as other than an lvalue) is converted to a pointer to an (n − 1)-dimensional array with dimensions j × ··· × k. If the unary * operator is applied to this pointer explicitly, or implicitly as a result of subscripting, the result is the referenced (n − 1)-dimensional array, which itself is converted into a pointer if used as other than an lvalue. It follows from this that arrays are stored in row-major order (last subscript varies fastest).

Despite this, you're still not allowed to access arr[0][4]. As Ted Lyngmo's answer notes, Appendix J.2 specifically says:

An array subscript is out of range, even if an object is apparently accessible with the given subscript (as in the lvalue expression a[1][7] given the declaration int a[4][5]) (6.5.6).

But since memcpy is really about bytes, it's okay. Its source and destination aren't multidimensional arrays but void *. 7.24.2.1, "The memcpy function," explains:

The memcpy function copies n characters from the object pointed to by s2 into the object pointed to by s1.

A "character" can have three meanings according to section 3.7. The relevant one seems to be "single-byte character" (3.7.1), and therefore memcpy copies n bytes. Hence memcpy(arr_copy, arr, sizeof(arr)) must copy arr to arr_copy correctly.

Though come to think of it, memcpy doesn't say that it copies n contiguous bytes. I suppose it could copy the same byte n times. Or pick n random bytes. That would make debugging ... interesting.

1

With all due respect, HolyBlackCat is utterly wrong, for very first principles. My C17 standard draft says in 7.24.1: "For all functions in this subclause [containing memcpy], each character shall be interpreted as if it had the type unsigned char." The C standard doesn't really make any type considerations for these trivial functions: memcpy copies memory. As far as semantics are at all considered, it is treated as a sequence of unsigned characters. Therefore, the following first C principle applies:

As long as there is an initialized object at an address you can access it through a char pointer.

Let's repeat it for emphasis and clarity:

Any initialized object can be accessed by a char pointer.

If you know that an object is at a specific address 0x42, for example because the hardware of your computer maps the x coordinate of your mouse there, you can convert that into a char pointer and read it. If the coordinate is a 16 bit value you can read the next byte too.

Nobody cares how you know that there is an integer: If there is one, you can read it.

Now that we have 3 arrays of 3 ints we know that 9 ints are placed consecutively in memory; that is a language requirement. The entire memory there is full of ints, and we can iterate manually over it through char pointers, or we can turf it to memcpy. Whether we use arr or arr[0] or obtain the address through a stack offset from some other variable or some other magic or simply make an educated guess is entirely irrelevant as long as the address is correct, and of that there is no doubt here.

4
  • Interesting - and a well-made case. But that's your C17 Draft Standard. What does the C++17 Standard have to say? 1 hour ago
  • Remember that C (and C++) don't assume a flat memory model. Pointers don't have to be simple integers like 0x42. A hypothetical C++ on a segmented memory model (which doesn't extend the language with far pointers) might have a max object size of 64k, and arrange for single objects not to cross segment boundaries when accessed with a seg:off pointer derived in a valid way. (Because there can be multiple representations for a pointer to the same byte, but some of them would have an off component too close to wrapping around to iterate over the rest of the object). 26 mins ago
  • See my answer on Does C have an equivalent of std::less from C++? for some discussion of that kind of thing. But the key point is that you only get into trouble when you try to get a pointer into one object by adding or subtracting relative to a pointer to a different object. So it's still valid to memcpy because any pointers it creates will be derived from a pointer to the start of the entire 2D array object, and it must be smaller than the implementation's max object size (else you already have UB). 23 mins ago
  • So TL:DR: I agree with the idea of the reasoning in this answer, but it goes a bit too far in its argument by assuming a flat memory model. Of course everything is easy in that case. As it is for accesses within a single object, using pointers derived from the start of it. However, just because something would make sense in asm does not mean it's legal in C. e.g. signed integer wraparound is UB in C, despite (almost?) every machine having add instructions that wrap, e.g. MIPS addu. IDK about old 1's complement machines, if any of them didn't have easy wrapping. 20 mins ago
-3

This answer is based on some wrong assumptions I made by reading something a long time ago. I'll leave the answer and the comments up to perhaps prevent other people from walking into the same mind-trap.


What is passed to the function decays into a pointers to the first elements, that is in this case, two int(*)[3]s.

C draft Annex J (informative) Portability issues J.2 Undefined behavior:

An array subscript is out of range, even if an object is apparently accessible with the given subscript (as in the lvalue expression a[1][7] given the declaration int a[4][5]) (6.5.6).

memcpy(arr_copy, arr, sizeof arr); get's two int(*)[3] and will access both out of range, hence, UB.

18
  • How the reference to 6.5.6 is relevant here? memcpy is not pointer arithmetic on pointers to arr or arr_copy elements. memcpy doesn't even need to be implemented in C, so you can't apply core language rules to its «body». 22 hours ago
  • @LanguageLawyer The reference to 6.5.6 was in the paragraph I copied. What's relevant is that it gets pointers to int[3]s,
    – Ted Lyngmo
    22 hours ago
  • What's relevant is that it gets pointers to int[3]s The fact that pointers to first elements of arrays are passed may be relevant, but since memcpy behavior is not specified as pointer arithmetic on int(*)[3], how the rules about such arithmetic apply? 22 hours ago
  • 2
    By your logic: if you have int foo[3]; and call memcpy(foo, .., sizeof foo);, memcpy() gets int* which is a pointer to single integer. Is that undefined behavior too?
    – jpa
    18 hours ago
  • 1
    @PeterCordes I think I get it. My interpretation seems to have been way off. That's nice :-)
    – Ted Lyngmo
    6 hours ago
-4

C++ standard says ([cstring.syn]/1):

The contents and meaning of the header <cstring> are the same as the C standard library header <string.h>.

C11 7.24.2.1 The memcpy function says:

Synopsis

1

         #include <string.h>
         void *memcpy(void * restrict s1,
              const void * restrict s2,
              size_t n);

Description

2 The memcpy function copies n characters from the object pointed to by s2 into the object pointed to by s1

Given this description, one may wonder what if n is greater than the size of the object pointed to by s1/s2. «Common sense» suggests that copying more than, say, sizeof(int) bytes from an int object should be meaningless.

And indeed, there is 7.24.1 String function conventions p.1 saying:

The header <string.h> declares one type and several functions, and defines one macro useful for manipulating arrays of character type and other objects treated as arrays of character type. … Various methods are used for determining the lengths of the arrays, but in all cases a char * or void * argument points to the initial (lowest addressed) character of the array. If an array is accessed beyond the end of an object, the behavior is undefined.

Thus, when passing a pointer to the first element of an array, it is «the object» from memcpy p.2 and trying to copy more bytes than this object has is UB.

38
  • 1
    You probably want to explain how pointer decay works with what they point to.
    – Passer By
    12 hours ago
  • 6
    I do not see any basis for your implicit claim that an array decaying to a pointer to the first element when passed to a function means that pointer no longer refers to the entire array. Without that implicit assumption, this entire line of argument falls apart. 11 hours ago
  • 3
    What is your argument here? Would (void*)arr get a valid pointer to the whole array-of-arrays object, so memcpy((void*)arr_copy, (void*)arr, sizeof(arr)) would be well-defined, but isn't without the casts? Or does pointer decay always mean you just have a pointer to a single element that you're not allowed to offset? If so, is *(char*) (p + 1) UB for a function that takes an int *p arg, even if the caller passes a pointer to (the first element of) an array? It seems like for this argument to be valid, it would by extension make basically everything involving arrays in C undefined. 10 hours ago
  • 2
    In C, a pointer to a whole array is the same thing as a pointer to the first element. 6.3.2.1p3 doesn't mean you can no longer offset that pointer. Your reasoning would make it UB to use memcpy on int arr_1d[2], and I think everyone agrees that should be allowed, so as @HolyBlackCat argued, anything that would make that UB is an obvious defect in the standard, or a misinterpretation of what it says. 10 hours ago
  • 5
    If you want to argue that the standard's wording implies that memcpy is unusable to copy more than one primitive type in any situation, because of the "an object" wording (and thus is defective), that's fine, but make that clear in your answer. The question as-asked assumes that memcpy is fine on 1D arrays of primitive types, so if you're contradicting that, too, you should really say so. 9 hours ago

Your Answer

By clicking “Post Your Answer”, you agree to our terms of service, privacy policy and cookie policy

Not the answer you're looking for? Browse other questions tagged or ask your own question.