Data Transmission with Delphi (Part 2: Arrays and Pointer Math)
In this blog post I will continue with arrays in communicating cross-platform via DLLs.
The purpose of this blog series is to understand how to transfer raw data at high speed with direct access using pointers to structures that can be supported by most programming platforms.
Arrays
Arrays are contiguous blocks of items of the same size. In Delphi arrays can be declared explicitly in a few ways
Fixed Length Arrays
If we know the size of the array of data we need to declare, or if the size never varies, then the array can be declared as follows:
The notation within []
allows for a start and end index, or you can provide an enumerated type as an index. You should usually start the index range at 0, but if you can not, then the examples below will need to be offset by the value of the start index.
At declaration of the array a block of memory will now exist consisting of 10 doubles in a row. The address of a static array is the same as the address of its first element. This means @LStaticArray
and @LStaticArray[0]
return the same value.
If we call SizeOf
on the static array we get the total size of all the elements.
The value of SizeOf
is the same size as the structure in memory, because the distance from element to element (stride) is simply the size of the element type.
In the example above @StaticArray[0]
and @StaticArray[1]
are pointers to our first and second element. We can see that their addresses differ by the size of the element: SizeOf(Double)
.In production code many developers would rather use SizeOf(LStaticArray[0])
because the code will continue to work if the type of the array needs to change.
If we assign one fixed array to another we copy by value. The array does not move around in memory but stays at its initial location. Assignment simply copies the entire memory content.
Note: We had to declare a type to allow assignment, if you’d like to understand why then read about the rules of assignment compatibility.
When we assign a record, the data contained within the size of the record’s definition is copied, but the memory locations remain the same. Static arrays and record structures can have an overhead due to copy by value. If we were to declare a method that takes our fixed length array or record as a parameter, we should usually declare the parameter as var
or const
.
If we declare a fixed array as a field in our record it will contribute to the size of the record. The memory will be contained within the record structure. This provides for simple memory management.
Fixed arrays can easily be used cross platform when defined as part of our records.
Dynamic Arrays
Dynamic arrays are declared without specifying the number of elements:
Even though these two arrays are the same, I recommend that you use the generic syntax (TArray<T>
) because it allows us to assign arrays of the same type. Dynamic arrays are allocated by calling SetLength()
Like our static array the distance between elements is the size of the elements
However, the memory address of dynamic array is not the same as address of the first element:
If we change our dynamic array size we will find that the pointer to the dynamic array stays the same, while the pointer to the data (the raw memory array), may change. SetLength
also copies the memory of the array to the newly allocated array as needed.
Note: When calling SetLength
we may get the same memory address for our raw data, usually when shrinking the array. You may need to grow or shrink the array by many elements to force a new address.
So what exactly is a dynamic array? It is a pointer variable type that holds the address to the first element of data.
Let us think of pointers in simple terms again, they are essentially just numeric values. If we assign an integer variable to another their values match, even though the variables that hold these values have their own independent addresses. Similarly, each dynamic array variable has its own address, but once we assign a dynamic array the value held in this variable is shared between the two.
However, as soon as we call SetLength()
on a dynamic array a new value may be written, so the arrays may independent again.
Since assigning dynamic arrays sets the two variables to refer to the same data, we may encounter some unexpected concequences.
The benefit of this behavior is that we can have large amounts of data passed around using dynamic arrays. Dynamic arrays are reference counted and automatically disposed. Any API that uses the raw data pointer in a dynamic array must ensure that:
- The dynamic array is referenced for as long as we need a pointer to the raw data
- The pointer to the raw data needs to be updated after
SetLength
is called against the dynamic array
Note: If you are interested in how dynamic arrays know the element count and number of references look at TDynArrayRec
in system.pas
. This record is stored right before the first element of the array.
Dynamic arrays, strings and interfaces are reference counted and are automatically initialized to nil.
One last note on dynamic arrays. Since a dynamic array is simply a pointer to an array, it should not be a surprise that SizeOf(LDynamicArray)
for any dynamic array is always SizeOf(Pointer)
.
Records with Variable Length Arrays
We can declare records that hold arrays where we use the syntax of a fixed length array (typically with a single element), but where the actual size of the array varies.
In Part 1 we encountered one of these:
DEV_BROADCAST_DEVICEINTERFACE. Which is defined as
Which I translated to Delphi as
Strictly speaking, my direct conversion of dbcc_name
should have been declared as dbcc_name: array[0..0] of Char;
, a fixed length array of one character! Well, the actual name of the device is actually longer than just one character. The record’s size as described in dbcc_size
includes the length of the string contained in dbcc_name
including the null character.
In Delphi pChar
is a pointer to Char
. We can read characters beyond that first character by indexing (e.g. LMyPChar[5]
). Delphi provides support to convert from pChar
to String
. When a variable of type pChar
is converted to String
the characters are read until a null character (character with value 0) is reached. This means that we can read a null terminated character array without knowing its size, or in this case without reading the size of the record.
Watch out! Assigning a variable of type DEV_BROADCAST_DEVICEINTERFACE
will transfer the dbcc_size
, but only the first character of dbcc_name
.
Typically records such as these are used in APIs where a single record is passed by reference. Since the size of DEV_BROADCAST_DEVICEINTERFACE
varies we cannot place it in an array. We can place a series of these records in a row, but indexing will not work because we do not have a fixed stride length between elements. In that case we would have to move our pointer by the size of each record in turn.
To create a record such as this you would allocate memory equal to the size of the record definition plus the length of the character string. That size value would then also be written to the dbcc_size
field.
Bonus: A dynamic array has a record header, TDynArrayRec
, written before the raw data of the array. Knowing this, you may notice the similarity between the dynamic array data and the DEV_BROADCAST_DEVICEINTERFACE
record. Both have headers before the raw data. Both are allocated with space for the header and the number of elements in the data. In the case of dynamic arrays the Setlength()
function allocates the header and data space for us and returns the address of the first element of the array data. If we want to read the number of elements or the reference count we move a pointer to the raw data back by
SizeOf(NativeInt)
to read the Length: NativeInt
and back by an additional 4 bytes to read RefCnt: Integer
. Next I will discuss ways we can move our pointer.
Pointer Math
We can pass a pointer to the first element of our array to our DLL, we can also pass the number of elements. If we know what stride to take from the start of one item to the next, then we can walk our pointer by simply adding (or deleting) the size of this stride to the numeric value of the pointer. As we walk our pointer and de-reference a memory address, we can read the items in our array.
The question then is, what is the stride between my elements? Can we safely use the size of our record?
In Part 1 we saw that our records are aligned to memory boundaries, but the record may also be padded at its end. If another record of the same type is now placed in memory directly after the first, then it’s starting address will be correct, and no alignment is needed. This end padding counts towards the size of the record, which means that we can use the size of the record as our stride from one element to the next.
Inc and Dec
Calling Inc
and Dec
increments and decrements our pointer’s value by the size of the pointer’s underlying type.
As we call Inc
on our pointer starting at the first element it will simply advance and reference element by element.
So, with a strongly typed pointer and the count we can easily access all of our elements in the array in a sequential fashion.
Treat the Pointer as a number
To avoid the Inc
and Dev
side effect of modifying the pointer, we can also do the math ourselves. All we must do is multiply the size of the element with the number of elements to walk and then add to our starting point.
There is a shorthand for this notation
Pointer Indexing
Writing PTestRec[i]
is essentially the same as PTestRec(NativeUInt(LTestPtr) + i *SizeOf(TTestRec))^
and allows us to use the same indexing notation as we would use with arrays. This notation works with a few built-in types, most notably PChar
, but we can use the {$POINTERMATH ON}
directive to allow indexing on our own pointer types
Most of you would have indexed PChar
. The index syntax here is the same, but just expanded for our own pointer types. PChar
is different from most other data pointers in that it usually does not need to be paired with a size. A PChar
is usually terminated by a null character. We can potentially index data beyond the end of our array with other pointer types and we need to have the element count passed via our API. With the count and a pointer to a known structure type (or some way to derive this information), we can read data in arrays in most programming languages
Bonus: You may be wondering why our spacing of elements work out so easily. Well in Delphi all data types are multiples of others (1, 2, 4, 8 etc.) and so they align to multiples of their size. There is one type that violates this rule :Extended
, and only in 32-bit. In 32-bit Extended
has 10 bytes, but in 64-bit it is an alias for Double
. This type is the only type that sometimes requires the rarely used packed array
. I will not cover that type since it is hard to use cross-platform and we should avoid it in APIs.
Section Conclusion
Arrays are contiguous blocks of same size values. Pointers to the raw data of arrays can be used to traverse the elements in an array. To successfully traverse an array in our API we need a pointer to the start of the data of the array, the size of the elements and the number of elements to read.
In the next section I will cover the basic API of our DLL that will allow it to receive records and arrays.
Leave a Comment