Встроенные функции
потокового SIMD расширения
(Streaming SIMD Extension intrinsics)
Типы данных
Для работы с векторными данными, содержащими несколько упакованных значений, в языках С/С++ используются следующие типы данных:
__m64 - 64-бит (регистр MMX): 1 * 64-битное целое, 2 * 32-битных целых, 4 * 16-битных целых, 8 * 8-битных целых.
__m128 - 128-бит (регистр XMM): 4 * 32-битных вещественных (SSE), 2 * 64-битных вещественных (SSE2).
Элементы таких типов данных
должны быть выровнены в памяти по соответствующей границе. Например, начало
массива элементов типа __m64
выравнивается по 8 байтам, а массив элементов __m128 - по 16 байтам. Для выделения
памяти с выравниванием используется функция:
void *_mm_malloc(int size, int align)
size - объем выделяемой памяти в байтах (как в malloc),
align - выравнивание в байтах.
Для освобождения памяти, выделенной таким образом, используется функция:
void _mm_free(void *p);
Например:
float *x; // массив для обработки с помощью SSE
x=(float)_mm_malloc(N*sizeof(float),16);
//
: здесь обработка :
_mm_free(x);
Встроенные функции SSE для работы с вещественными
числами
Заголовочный файл xmmintrin.h содержит объявления встроенных функций (intrinnsics) SSE.
Арифметические функции
Функция |
Инструкция |
Операция |
R0 |
R1 |
R2 |
R3 |
_mm_add_ss |
ADDSS |
сложение |
a0 [op]
b0 |
a1 |
a2 |
a3 |
_mm_add_ps |
ADDPS |
сложение |
a0 [op]
b0 |
a1 [op]
b1 |
a2 [op]
b2 |
a3 [op]
b3 |
_mm_sub_ss |
SUBSS |
вычитание |
a0 [op]
b0 |
a1 |
a2 |
a3 |
_mm_sub_ps |
SUBPS |
вычитание |
a0 [op]
b0 |
a1 [op]
b1 |
a2 [op]
b2 |
a3 [op]
b3 |
_mm_mul_ss |
MULSS |
умножение |
a0 [op]
b0 |
a1 |
a2 |
a3 |
_mm_mul_ps |
MULPS |
умножение |
a0 [op]
b0 |
a1 [op]
b1 |
a2 [op]
b2 |
a3 [op]
b3 |
_mm_div_ss |
DIVSS |
деление |
a0 [op]
b0 |
a1 |
a2 |
a3 |
_mm_div_ps |
DIVPS |
деление |
a0 [op]
b0 |
a1 [op]
b1 |
a2 [op]
b2 |
a3 [op]
b3 |
_mm_sqrt_ss |
SQRTSS |
квадратный корень |
[op] a0 |
a1 |
a2 |
a3 |
_mm_sqrt_ps |
SQRTPS |
квадратный корень |
[op] a0 |
[op] b1 |
[op] b2 |
[op] b3 |
_mm_rcp_ss |
RCPSS |
обратное значение |
[op] a0 |
a1 |
a2 |
a3 |
_mm_rcp_ps |
RCPPS |
обратное значение |
[op] a0 |
[op] b1 |
[op] b2 |
[op] b3 |
_mm_rsqrt_ss |
RSQRTSS |
обратное значение квадратного корня |
[op] a0 |
a1 |
a2 |
a3 |
_mm_rsqrt_ps |
RSQRTPS |
обратное значение квадратного корня |
[op] a0 |
[op] b1 |
[op] b2 |
[op] b3 |
_mm_min_ss |
MINSS |
минимум |
[op](a0,b0) |
a1 |
a2 |
a3 |
_mm_min_ps |
MINPS |
минимум |
[op](a0,b0) |
[op](a1,b1) |
[op](a2,b2) |
[op](a3,b3) |
_mm_max_ss |
MAXSS |
максимум |
[op](a0,b0) |
a1 |
a2 |
a3 |
_mm_max_ps |
MAXPS |
максимум |
[op](a0,b0) |
[op](a1,b1) |
[op](a2,b2) |
[op](a3,b3) |
__
m128 _mm_add_ss(__m128
a , __m128 b )
Adds the lower single-precision, floating-point
values of a
and b
; the upper three single-precision, floating-point values are passed
through from a
.
r0 := a0 + b0
r1 := a1 ; r2 := a2 ; r3 := a3
__m128 _mm_add_ps(__m128 a , __m128 b );
Adds the four single-precision, floating-point
values of a
and b
.
r0 := a0 + b0
r1 := a1 + b1
r2 := a2 + b2
r3 := a3 + b3
__m128 _mm_sub_ss(__m128 a , __m128 b );
Subtracts the lower single-precision,
floating-point values of a
and b
. The upper three single-precision,
floating-point values are passed through from a
.
r0 := a0 - b0
r1 := a1 ; r2 := a2 ; r3 := a3
__m128 _mm_sub_ps(__m128 a , __m128 b );
Subtracts the four single-precision,
floating-point values of a
and b
.
r0 := a0 - b0
r1 := a1 - b1
r2 := a2 - b2
r3 := a3 - b3
__m128 _mm_mul_ss(__m128 a , __m128 b );
Multiplies the lower single-precision,
floating-point values of a
and b
; the upper three single-precision,
floating-point values are passed through from a
.
r0 := a0 * b0
r1 := a1 ; r2 := a2 ; r3 := a3
__m128 _mm_mul_ps(__m128 a , __m128 b );
Multiplies the four single-precision,
floating-point values of a
and b
.
r0 := a0 * b0
r1 := a1 * b1
r2 := a2 * b2
r3 := a3 * b3
__m128 _mm_div_ss(__m128 a , __m128 b );
Divides the lower single-precision,
floating-point values of a
and b
; the upper three single-precision,
floating-point values are passed through from a
.
r0 := a0 / b0
r1 := a1 ; r2 := a2 ; r3 := a3
__m128 _mm_div_ps(__m128 a, __m128 b );
Divides the four single-precision,
floating-point values of a
and b
.
r0 := a0 / b0
r1 := a1 / b1
r2 := a2 / b2
r3 := a3 / b3
__m128 _mm_sqrt_ss(__m128 a );
Computes the square root of the lower
single-precision, floating-point value of a
; the upper three single-precision,
floating-point values are passed through.
r0 := sqrt(a0)
r1 := a1 ; r2 := a2 ; r3 := a3
__m128 _mm_sqrt_ps(__m128 a );
Computes the square roots of the four
single-precision, floating-point values of a
.
r0 := sqrt(a0)
r1 := sqrt(a1)
r2 := sqrt(a2)
r3 := sqrt(a3)
__m128 _mm_rcp_ss(__m128 a );
Computes the approximation of the reciprocal of
the lower single-precision, floating-point value of a
; the upper three single-precision,
floating-point values are passed through.
r0 := recip(a0)
r1 := a1 ; r2 := a2 ; r3 := a3
__m128 _mm_rcp_ps(__m128 a );
Computes the approximations of reciprocals of
the four single-precision, floating-point values of a
.
r0 := recip(a0)
r1 := recip(a1)
r2 := recip(a2)
r3 := recip(a3)
__m128 _mm_rsqrt_ss(__m128 a );
Computes the approximation of the reciprocal of
the square root of the lower single-precision, floating-point value of a
; the upper three single-precision,
floating-point values are passed through.
r0 := recip(sqrt(a0))
r1 := a1 ; r2 := a2 ; r3 := a3
__mm128 _mm_rsqrt_ps(__m128 a );
Computes the approximations of the reciprocals
of the square roots of the four single-precision, floating-point values of a
.
r0 := recip(sqrt(a0))
r1 := recip(sqrt(a1))
r2 := recip(sqrt(a2))
r3 := recip(sqrt(a3))
__m128 _mm_min_ss(__m128 a , __m128 b );
Computes the minimum of the lower
single-precision, floating-point values of a
and b
; the upper three single-precision,
floating-point values are passed through from a
.
r0 := min(a0, b0)
r1 := a1 ; r2 := a2 ; r3 := a3
__m128 _mm_min_ps(__m128 a , __m128 b );
Computes the minima of the four
single-precision, floating-point values of a
and b
.
r0 := min(a0, b0)
r1 := min(a1, b1)
r2 := min(a2, b2)
r3 := min(a3, b3)
__m128 _mm_max_ss(__m128 a , __m128 b );
Computes the maximum of the lower
single-precision, floating-point values of a
and b
; the upper three single-precision,
floating-point values are passed through from a
.
r0 := max(a0, b0)
r1 := a1 ; r2 := a2 ; r3 := a3
__m128 _mm_max_ps(__m128 a , __m128 b );
Computes the maximums of the four
single-precision, floating-point values of a
and b
.
r0 := max(a0, b0)
r1 := max(a1, b1)
r2 := max(a2, b2)
r3 := max(a3, b3)
Логические функции
Имя функции |
Операция |
Инструкция |
_mm_and_ps |
побитовое И |
ANDPS |
_mm_andnot_ps |
побитовое И-НЕ |
ANDNPS |
_mm_or_ps |
побитовое ИЛИ |
ORPS |
_mm_xor_ps |
побитовое исключающее ИЛИ |
XORPS |
__m128 _mm_and_ps(__m128 a , __m128 b );
Computes the bitwise AND
of the four single-precision, floating-point values of a
and b
.
r0 := a0 & b0
r1 := a1 & b1
r2 := a2 & b2
r3 := a3 & b3
__m128 _mm_andnot_ps(__m128 a , __m128 b );
Computes the bitwise AND-NOT
of the four single-precision, floating-point
values of a
and b
.
r0 := ~a0 & b0
r1 := ~a1 & b1
r2 := ~a2 & b2
r3 := ~a3 & b3
__m128 _mm_or_ps(__m128 a , __m128 b );
Computes the bitwise OR
of the four single-precision, floating-point values of a
and b
.
r0 := a0 | b0
r1 := a1 | b1
r2 := a2 | b2
r3 := a3 | b3
__m128 _mm_xor_ps(__m128 a , __m128 b );
Computes bitwise EXOR
(exclusive-or) of the four single-precision, floating-point values of a
and b
.
r0 := a0 ^ b0
r1 := a1 ^ b1
r2 := a2 ^ b2
r3 := a3 ^ b3
Функции сравнения
Каждая встроенная функция
сравнения выполняет сравнение операндов a и b. В
векторной форме сравниваются четыре вещественных значения параметра a с четырьмя вещественными
значениями параметра b,
и возвращается 128-битная маска. В скалярной форме сравниваются младшие
значения параметров, возвращается 32-битная маска, остальные три старших
значения копируются из параметра a. Маска устанавливается в значение 0
xffffffff
для тех
элементов, результат сравнения которых истина, и 0
x
0, где результат сравнения ложь.
Имя |
Сравнение |
Инструкция |
_mm_cmpeq_ss |
равно |
CMPEQSS |
_mm_cmpeq_ps |
равно |
CMPEQPS |
_mm_cmplt_ss |
меньше |
CMPLTSS |
_mm_cmplt_ps |
меньше |
CMPLTPS |
_mm_cmple_ss |
меньше или равно |
CMPLESS |
_mm_cmple_ps |
меньше или равно |
CMPLEPS |
_mm_cmpgt_ss |
больше |
CMPLTSS |
_mm_cmpgt_ps |
больше |
CMPLTPS |
_mm_cmpge_ss |
больше или равно |
CMPLESS |
_mm_cmpge_ps |
больше или равно |
CMPLEPS |
_mm_cmpneq_ss |
не равно |
CMPNEQSS |
_mm_cmpneq_ps |
не равно |
CMPNEQPS |
_mm_cmpnlt_ss |
не меньше |
CMPNLTSS |
_mm_cmpnlt_ps |
не меньше |
CMPNLTPS |
_mm_cmpnle_ss |
не меньше или равно |
CMPNLESS |
_mm_cmpnle_ps |
не меньше или равно |
CMPNLEPS |
_mm_cmpngt_ss |
не больше |
CMPNLTSS |
_mm_cmpngt_ps |
не больше |
CMPNLTPS |
_mm_cmpnge_ss |
не больше или равно |
CMPNLESS |
_mm_cmpnge_ps |
не больше или равно |
CMPNLEPS |
_mm_cmpord_ss |
упорядочены |
CMPORDSS |
_mm_cmpord_ps |
упорядочены |
CMPORDPS |
_mm_cmpunord_ss |
неупорядочены |
CMPUNORDSS |
_mm_cmpunord_ps |
неупорядочены |
CMPUNORDPS |
_mm_comieq_ss |
равно |
COMISS |
_mm_comilt_ss |
меньше |
COMISS |
_mm_comile_ss |
меньше или равно |
COMISS |
_mm_comigt_ss |
больше |
COMISS |
_mm_comige_ss |
большеили
равно |
COMISS |
_mm_comineq_ss |
не равно |
COMISS |
_mm_ucomieq_ss |
равно |
UCOMISS |
_mm_ucomilt_ss |
меньше |
UCOMISS |
_mm_ucomile_ss |
меньше или равно |
UCOMISS |
_mm_ucomigt_ss |
больше |
UCOMISS |
_mm_ucomige_ss |
больше или равно |
UCOMISS |
_mm_ucomineq_ss |
не равно |
UCOMISS |
__m128 _mm_cmpeq_ss(__m128 a , __m128 b );
Compares for equality.
r0 := (a0 == b0) ? 0xffffffff : 0x0
r1 := a1 ; r2 := a2 ; r3 := a3
__m128 _mm_cmpeq_ps(__m128 a , __m128 b );
Compares for equality.
r0 := (a0 == b0) ? 0xffffffff : 0x0
r1 := (a1 == b1) ? 0xffffffff : 0x0
r2 := (a2 == b2) ? 0xffffffff : 0x0
r3 := (a3 == b3) ? 0xffffffff : 0x0
__m128 _mm_cmplt_ss(__m128 a , __m128 b );
Compares for less than.
r0 := (a0 < b0) ? 0xffffffff : 0x0
r1 := a1 ; r2 := a2 ; r3 := a3
__m128 _mm_cmplt_ps(__m128 a, __m128 b );
Compares for less than.
r0 := (a0 < b0) ? 0xffffffff : 0x0
r1 := (a1 < b1) ? 0xffffffff : 0x0
r2 := (a2 < b2) ? 0xffffffff : 0x0
r3 := (a3 < b3) ? 0xffffffff : 0x0
__m128 _mm_cmple_ss(__m128 a , __m128 b );
Compares for less than or equal.
r0 := (a0 <= b0) ? 0xffffffff : 0x0
r1 := a1 ; r2 := a2 ; r3 := a3
__m128 _mm_cmple_ps(__m128 a , __m128 b );
Compares for less than or equal.
r0 := (a0 <= b0) ? 0xffffffff : 0x0
r1 := (a1 <= b1) ? 0xffffffff : 0x0
r2 := (a2 <= b2) ? 0xffffffff : 0x0
r3 := (a3 <= b3) ? 0xffffffff : 0x0
__m128 _mm_cmpgt_ss(__m128 a , __m128 b );
Compares for greater than.
r0 := (a0 > b0) ? 0xffffffff : 0x0
r1 := a1 ; r2 := a2 ; r3 := a3
__m128 _mm_cmpgt_ps(__m128 a, __m128 b );
Compares for greater than.
r0 := (a0 > b0) ? 0xffffffff : 0x0
r1 := (a1 > b1) ? 0xffffffff : 0x0
r2 := (a2 > b2) ? 0xffffffff : 0x0
r3 := (a3 > b3) ? 0xffffffff : 0x0
__m128 _mm_cmpge_ss(__m128 a , __m128 b );
Compares for greater than or equal.
r0 := (a0 >= b0) ? 0xffffffff : 0x0
r1 := a1 ; r2 := a2 ; r3 := a3
__m128 _mm_cmpge_ps(__m128 a, __m128 b );
Compares for greater than or equal.
r0 := (a0 >= b0) ? 0xffffffff : 0x0
r1 := (a1 >= b1) ? 0xffffffff : 0x0
r2 := (a2 >= b2) ? 0xffffffff : 0x0
r3 := (a3 >= b3) ? 0xffffffff : 0x0
__m128 _mm_cmpneq_ss(__m128 a , __m128 b );
Compares for inequality.
r0 := (a0 != b0) ? 0xffffffff : 0x0
r1 := a1 ; r2 := a2 ; r3 := a3
__m128 _mm_cmpneq_ps(__m128 a , __m128 b );
Compares for inequality.
r0 := (a0 != b0) ? 0xffffffff : 0x0
r1 := (a1 != b1) ? 0xffffffff : 0x0
r2 := (a2 != b2) ? 0xffffffff : 0x0
r3 := (a3 != b3) ? 0xffffffff : 0x0
__m128 _mm_cmpnlt_ss(__m128 a , __m128 b );
Compares for not less than.
r0 := !(a0 < b0) ? 0xffffffff : 0x0
r1 := a1 ; r2 := a2 ; r3 := a3
__m128 _mm_cmpnlt_ps(__m128 a, __m128 b );
Compares for not less than.
r0 := !(a0 < b0) ? 0xffffffff : 0x0
r1 := !(a1 < b1) ? 0xffffffff : 0x0
r2 := !(a2 < b2) ? 0xffffffff : 0x0
r3 := !(a3 < b3) ? 0xffffffff : 0x0
__m128 _mm_cmpnle_ss(__m128 a , __m128 b );
Compares for not less than or equal.
r0 := !(a0 <= b0) ? 0xffffffff : 0x0
r1 := a1 ; r2 := a2 ; r3 := a3
__m128 _mm_cmpnle_ps(__m128 a , __m128 b );
Compares for not less than or equal.
r0 := !(a0 <= b0) ? 0xffffffff : 0x0
r1 := !(a1 <= b1) ? 0xffffffff : 0x0
r2 := !(a2 <= b2) ? 0xffffffff : 0x0
r3 := !(a3 <= b3) ? 0xffffffff : 0x0
__m128 _mm_cmpngt_ss(__m128 a , __m128 b );
Compares for not greater than.
r0 := !(a0 > b0) ? 0xffffffff : 0x0
r1 := a1 ; r2 := a2 ; r3 := a3
__m128 _mm_cmpngt_ps(__m128 a , __m128 b );
Compares for not greater than.
r0 := !(a0 > b0) ? 0xffffffff : 0x0
r1 := !(a1 > b1) ? 0xffffffff : 0x0
r2 := !(a2 > b2) ? 0xffffffff : 0x0
r3 := !(a3 > b3) ? 0xffffffff : 0x0
__m128 _mm_cmpnge_ss(__m128 a , __m128 b );
Compares for not greater than or equal.
r0 := !(a0 >= b0) ? 0xffffffff : 0x0
r1 := a1 ; r2 := a2 ; r3 := a3
__m128 _mm_cmpnge_ps(__m128 a , __m128 b );
Compares for not greater than or equal.
r0 := !(a0 >= b0) ? 0xffffffff : 0x0
r1 := !(a1 >= b1) ? 0xffffffff : 0x0
r2 := !(a2 >= b2) ? 0xffffffff : 0x0
r3 := !(a3 >= b3) ? 0xffffffff : 0x0
__m128 _mm_cmpord_ss(__m128 a , __m128 b );
Compares for ordered.
r0 := (a0 ord? b0) ?
0xffffffff : 0x0
r1 := a1 ; r2 := a2 ; r3 := a3
__m128 _mm_cmpord_ps(__m128 a , __m128 b );
Compares for ordered.
r0 := (a0 ord? b0) ?
0xffffffff : 0x0
r1 := (a1 ord? b1) ?
0xffffffff : 0x0
r2 := (a2 ord? b2) ?
0xffffffff : 0x0
r3 := (a3 ord? b3) ?
0xffffffff : 0x0
__m128 _mm_cmpunord_ss(__m128 a , __m128 b );
Compares for unordered.
r0 := (a0 unord? b0) ?
0xffffffff : 0x0
r1 := a1 ; r2 := a2 ; r3 := a3
__m128 _mm_cmpunord_ps(__m128 a , __m128 b );
Compares for unordered.
r0 := (a0 unord? b0) ?
0xffffffff : 0x0
r1 := (a1 unord? b1) ?
0xffffffff : 0x0
r2 := (a2 unord? b2) ?
0xffffffff : 0x0
r3 := (a3 unord? b3) ?
0xffffffff : 0x0
Следующие
функции сравнивают только младшие значения и в зависимости от результата
возвращают 0 или 1.
int
_mm_comieq_ss(__m128 a,__m128 b );
Compares the lower single-precision,
floating-point value of a
and b
for a
equal to b
. If a
and b
are equal, 1
is returned. Otherwise, 0
is returned.
r := (a0 == b0) ? 0x1 : 0x0
int
_mm_comilt_ss(__m128 a,__m128 b );
Compares the lower single-precision,
floating-point value of a
and b
for a
less than b
. If a
is less
than b
, 1
is
returned. Otherwise, 0
is returned.
r := (a0 < b0) ? 0x1 : 0x0
int
_mm_comile_ss(__m128 a,__m128 b );
Compares the lower single-precision,
floating-point value of a
and b
for a
less than or equal to b
. If a
is less
than or equal to b
, 1
is
returned. Otherwise, 0
is returned.
r := (a0 <= b0) ? 0x1 : 0x0
int
_mm_comigt_ss(__m128 a,__m128 b );
Compares the lower single-precision, floating-point
value of a
and b
for a
greater than b
. If a
is
greater than b
are equal, 1
is returned. Otherwise, 0
is returned.
r := (a0 > b0) ? 0x1 : 0x0
int
_mm_comige_ss(__m128 a,__m128 b );
Compares the lower single-precision,
floating-point value of a
and b
for a
greater than or equal to b
. If a
is greater than or equal to b
, 1
is returned. Otherwise, 0
is returned.
r := (a0 >= b0) ? 0x1 : 0x0
int
_mm_comineq_ss(__m128 a,__m128 b );
Compares the lower single-precision,
floating-point value of a
and b
for a
not equal to b
. If a
and b
are not equal, 1
is returned. Otherwise, 0
is returned.
r := (a0 != b0) ? 0x1 : 0x0
int
_mm_ucomieq_ss(__m128 a,__m128 b );
Compares the lower single-precision,
floating-point value of a
and b
for a
equal to b
. If a
and b
are equal, 1
is returned. Otherwise, 0
is returned.
r :=
(a0 == b0) ? 0x1 : 0x0
int
_mm_ucomilt_ss(__m128 a,__m128 b);
Compares the lower single-precision,
floating-point value of a
and b
for a
less than b
. If a
is less
than b
, 1
is
returned. Otherwise, 0
is returned.
r := (a0 < b0) ? 0x1 : 0x0
int
_mm_ucomile_ss(__m128 a,__m128 b );
Compares the lower single-precision,
floating-point value of a
and b
for a
less than or equal to b
. If a
is less
than or equal to b
, 1
is
returned. Otherwise, 0
is returned.
r := (a0 <= b0) ? 0x1 : 0x0
int
_mm_ucomigt_ss(__m128 a,__m128 b );
Compares the lower single-precision,
floating-point value of a
and b
for a
greater than b
. If a
is
greater than b
are equal, 1
is returned. Otherwise, 0
is returned.
r := (a0 > b0) ? 0x1 : 0x0
int
_mm_ucomige_ss(__m128 a,__m128 b );
Compares the lower single-precision,
floating-point value of a
and b
for a
greater than or equal to b
. If a
is greater than or equal to b
, 1
is returned. Otherwise, 0
is returned.
r :=
(a0 >= b0) ? 0x1 : 0x0
int
_mm_ucomineq_ss(__m128 a,__m128 b );
Compares the lower single-precision,
floating-point value of a
and b
for a
not equal to b
. If a
and b
are not equal, 1
is returned. Otherwise, 0
is returned.
r :=
(a0 != b0) ? 0x1 : 0x0
Операции преобразования типов
Имя функции |
Операция |
Инструкция |
_mm_cvtss_si32 |
Преобразует младший float в 32-битное целое |
CVTSS2SI |
_mm_cvtps_pi32 |
Преобразует два младших float в два упакованных 32-битных целых |
CVTPS2PI |
_mm_cvttss_si32 |
Преобразует младший float в 32-битное целое, отбрасывая дробную часть |
CVTTSS2SI |
_mm_cvttps_pi32 |
Преобразует два младших float в два упакованных 32-битных целых, отбрасывая дробную часть |
CVTTPS2PI |
_mm_cvtsi32_ss |
Преобразует 32-битное целое в float |
CVTSI2SS |
_mm_cvtpi32_ps |
Преобразует два упакованных 32-битных целых в два младших float |
CVTTPS2PI |
_mm_cvtpi16_ps |
Преобразует четыре упакованных 16-битных целых в упакованные float |
составная |
_mm_cvtpu16_ps |
Преобразует четыре упакованных беззнаковых 16-битных целых в упакованные float |
составная |
_mm_cvtpi8_ps |
Преобразует четыре младших упакованных 8-битных целых в четыре упакованных float |
составная |
_mm_cvtpu8_ps |
Преобразует четыре младших упакованных беззнаковых 8-битных целых в четыре упакованных float |
составная |
_mm_cvtpi32x2_ps |
Преобразует две пары упакованных 32-битных целых в четыре упакованных float |
составная |
_mm_cvtps_pi16 |
Преобразует четыре упакованных float в четыре 16-битных целых |
составная |
_mm_cvtps_pi8 |
Преобразует четыре упакованных float в четыре младших 8-битных целых |
составная |
int
_mm_cvtss_si32(__m128 a );
Converts the lower single-precision,
floating-point value of a
to a 32-bit integer according to
the current rounding mode.
r := (int)a0
__m64 _mm_cvtps_pi32(__m128 a );
Converts the two lower single-precision,
floating-point values of a
to two 32-bit integers according to
the current rounding mode, returning the integers in packed form.
r0 := (int)a0
r1 := (int)a1
int
_mm_cvttss_si32(__m128 a );
Converts the lower single-precision,
floating-point value of a
to a 32-bit integer with
truncation.
r :=
(int)a0
__m64 _mm_cvttps_pi32(__m128 a );
Converts the two lower single-precision,
floating-point values of a
to two 32-bit integer with
truncation, returning the integers in packed form.
r0
:= (int)a0
r1
:= (int)a1
__m128 _mm_cvtsi32_ss(__m128 a , int b );
Converts the 32-bit integer value b
to an single-precision, floating-point value; the upper three
single-precision, floating-point values are passed through from a
.
r0
:= (float)b
r1
:= a1 ; r2 := a2 ; r3 := a3
__m128 _mm_cvtpi32_ps(__m128 a , __m64 b );
Converts the two 32-bit integer values in
packed form in b
to two single-precision,
floating-point values; the upper two single-precision, floating-point values
are passed through from a
.
r0 := (float)b0
r1 := (float)b1
r2 := a2
r3 := a3
__m128 _mm_cvtpi16_ps(__m64 a );
Converts the four 16-bit signed integer values
in a
to four single-precision, floating-point
values.
r0 := (float)a0
r1 := (float)a1
r2 := (float)a2
r3 := (float)a3
__m128 _mm_cvtpu16_ps(__m64 a );
Converts the four 16-bit unsigned integer
values in a
to four single-precision,
floating-point values.
r0 := (float)a0
r1 := (float)a1
r2 := (float)a2
r3 := (float)a3
__m128 _mm_cvtpi8_ps(__m64 a );
Converts the lower four 8-bit signed integer
values in a
to four single-precision,
floating-point values.
r0 := (float)a0
r1 := (float)a1
r2 := (float)a2
r3 := (float)a3
__m128 _mm_cvtpu8_ps(__m64 a );
Converts the lower four 8-bit unsigned integer
values in a
to four single-precision,
floating-point values.
r0 := (float)a0
r1 := (float)a1
r2 := (float)a2
r3 := (float)a3
__m128 _mm_cvtpi32x2_ps(__m64 a, __m64 b );
Converts the two 32-bit signed integer values
in a
and the two 32-bit signed integer values in b
to four single-precision, floating-point values.
r0 := (float)a0
r1 := (float)a1
r2 := (float)b0
r3 := (float)b1
__m64 _mm_cvtps_pi16( __m128 a );
Converts the four single-precision,
floating-point values in a
to four signed 16-bit integer
values.
r0 := (short)a0
r1 := (short)a1
r2 := (short)a2
r3 := (short)a3
__m64 _mm_cvtps_pi8( __m128 a );
Converts the four single-precision,
floating-point values in a
to the lower four signed 8-bit
integer values of the result.
r0 := (char)a0
r1 := (char)a1
r2 := (char)a2
r3 := (char)a3
Другие функции
Имя функции |
Операция |
Инструкция |
_mm_shuffle_ps |
перестановка упакованных
значений |
SHUFPS |
_mm_shuffle_pi16 |
перестановка упакованных
значений |
PSHUFW |
_mm_unpackhi_ps |
выборка старших значений |
UNPCKHPS |
_mm_unpacklo_ps |
выборка младших значений |
UNPCKLPS |
_mm_loadh_pi |
загрузка старших значений |
MOVHPS reg, mem |
_mm_storeh_pi |
сохранение старших значений |
MOVHPS mem, reg |
_mm_movehl_ps |
копирование старшей половины в
младшую |
MOVHLPS |
_mm_movelh_ps |
копирование младшей половины в
старшую |
MOVLHPS |
_mm_loadl_pi |
загрузка младших значений |
MOVLPS reg, mem |
_mm_storel_pi |
сохранение младших значений |
MOVLPS mem, reg |
_mm_movemask_ps |
создание знаковой маски |
MOVMSKPS |
_mm_getcsr |
сохранить регистр состояния |
STMXCSR |
_mm_setcsr |
установить регистр состояния |
LDMXCSR |
__m128 _mm_shuffle_ps(__m128 a , __m128 b , int i );
Selects four specific single-precision,
floating-point values from a
and b
, based
on the mask i
. The mask must be an immediate. See
<Макрос для перестановки значений> далее for a description of the shuffle
semantics.
__m64 _mm_shuffle_pi16(__m64 a, int imm);
Shuffles the four signed or unsigned 16-bit
integers in a
as specified by imm
. The shuffle value imm
must be an immediate.
__m128 _mm_unpackhi_ps(__m128 a , __m128 b );
Selects and interleaves the upper two
single-precision, floating-point values from a
and b
.
r0
:= a2
r1
:= b2
r2
:= a3
r3
:= b3
__m128 _mm_unpacklo_ps(__m128 a , __m128 b );
Selects and interleaves the lower two
single-precision, floating-point values from a
and b
.
r0 := a0
r1 := b0
r2 := a1
r3 := b1
__m128 _mm_loadh_pi( __m128 a , __m64 * p );
Sets the upper two single-precision,
floating-point values with 64 bits of data loaded from the address p
; the lower two values are passed through from a
.
r0 := a0
r1 := a1
r2 := *p0
r3 := *p1
void _mm_storeh_pi( __m64 * p ,__m128 a );
Stores the upper two single-precision,
floating-point values of a
to the address p
.
*p0
:= a2
*p1
:= a3
__m128 _mm_movehl_ps( __m128 a, __m128 b );
Moves the upper two single-precision,
floating-point values of b
to the lower two single-precision,
floating-point values of the result. The upper two single-precision,
floating-point values of a
are passed through to the result.
r3 := a3
r2 := a2
r1 := b3
r0 := b2
__m128 _mm_movelh_ps( __m128 a, __m128 b );
Moves the lower two single-precision,
floating-point values of b
to the upper two single-precision,
floating-point values of the result. The lower two single-precision,
floating-point values of a
are passed through to the result.
r3 := b1
r2 := b0
r1 := a1
r0 := a0
__m128 _mm_loadl_pi( __m128 a , __m64 * p );
Sets the lower two single-precision,
floating-point values with 64 bits of data loaded from the address p
; the upper two values are passed through from a
.
r0 := *p0
r1 := *p1
r2 := a2
r3 := a3
void _mm_storel_pi( __m64 * p , __m128 a );
Stores the lower two single-precision,
floating-point values of a
to the address p
.
*p0 := b0
*p1 := b1
int
_mm_movemask_ps( __m128 a );
Creates a 4-bit mask from the most significant
bits of the four single-precision, floating-point values.
r :=
sign(a3)<<3 | sign(a2)<<2 | sign(a1)<<1 | sign(a0)
unsigned int _mm_getcsr(void);
Returns the contents of the control register.
void _mm_setcsr(unsigned int
i );
Sets the control register to the value
specified.
Команды для инициализации и работы с памятью
Memory and Initialization Load
Operations
Имя функции |
Операция |
Инструкция |
_mm_load_ss |
загрузить младшее значение и
очистить остальные три значения |
MOVSS |
_mm_load1_ps |
загрузить одно значение во все
четыре позиции |
MOVSS + Shuffling |
_mm_load_ps |
Загрузить четыре значения по
выровненному адресу |
MOVAPS |
_mm_loadu_ps |
Загрузить четыре значения по невыровненному адресу |
MOVUPS |
_mm_loadr_ps |
Загрузить четыре значения в
обратном порядке |
MOVAPS + Shuffling |
__m128 _mm_load_ss(float * p );
Loads an single-precision, floating-point value
into the low word and clears the upper three words.
r0
:= *p
r1
:= 0.0 ; r2 := 0.0 ; r3 := 0.0
__m128 _mm_load1_ps(float * p );
-or-
__m128 _mm_load_ps1(float * p );
Loads a single single-precision, floating-point
value, copying it into all four words.
r0
:= *p
r1
:= *p
r2
:= *p
r3
:= *p
__m128 _mm_load_ps(float * p );
Loads four single-precision, floating-point
values. The address must be 16-byte aligned.
r0
:= p[0]
r1
:= p[1]
r2
:= p[2]
r3
:= p[3]
__m128 _mm_loadu_ps(float * p);
Loads four single-precision, floating-point
values. The address does not need to be 16-byte aligned.
r0 := p[0]
r1
:= p[1]
r2
:= p[2]
r3
:= p[3]
__m128 _mm_loadr_ps(float * p );
Loads four single-precision, floating-point
values in reverse order. The address must be 16-byte aligned.
r0
:= p[3]
r1
:= p[2]
r2
:= p[1]
r3
:= p[0]
Memory and Initialization Set
Operations
Имя функции |
Операция |
Инструкция |
_mm_set_ss |
устанавливает самое младшее
значение и обнуляет три остальных |
составная |
_mm_set1_ps |
устанавливает четыре позиции в
одно значение |
составная |
_mm_set_ps |
устанавливает четыре значения,
выровненные по адресу |
составная |
_mm_setr_ps |
устанавливает четыре значения в
обратном порядке |
составная |
_mm_setzero_ps |
Обнуляет все четыре значения |
составная |
__m128 _mm_set_ss(float w );
Sets the low word of an single-precision,
floating-point value to w
and clears the upper three words.
r0 := w
r1 := r2 := r3 := 0.0
__m128 _mm_set1_ps(float w );
-or-
__m128 _mm_set_ps1(float w );
Sets the four single-precision, floating-point
values to w
.
r0
:= r1 := r2 := r3 := w
__m128 _mm_set_ps(float z , float y , float x
, float w );
Sets the four single-precision, floating-point
values to the four inputs.
r0
:= w
r1
:= x
r2
:= y
r3
:= z
__m128 _mm_setr_ps(float z , float y , float x
, float w );
Sets the four single-precision, floating-point
values to the four inputs in reverse order.
r0
:= z
r1
:= y
r2
:= x
r3
:= w
__m128 _mm_setzero_ps(void);
Clears the four single-precision,
floating-point values.
r0
:= r1 := r2 := r3 := 0.0
Memory and Initialization Store
Operations
Имя функции |
Операция |
Инструкция |
_mm_store_ss |
записать младшее значение |
MOVSS |
_mm_store1_ps |
записать младшее значение во
все четыре позиции |
MOVSS + Shuffling |
_mm_store_ps |
записать четыре значения по
выровненному адресу |
MOVAPS |
_mm_storeu_ps |
записать четыре значения по невыровненному адресу |
MOVUPS |
_mm_storer_ps |
записать четыре значения в
обратном порядке |
MOVAPS + Shuffling |
_mm_move_ss |
записать младшее значение и
оставить без изменения три остальных значения |
MOVSS |
void _mm_store_ss(float * p , __m128 a );
Stores the lower single-precision,
floating-point value.
*p
:= a0
void _mm_store1_ps(float * p , __m128 a );
-or-
void _mm_store_ps1(float * p , __m128 a );
Stores the lower single-precision,
floating-point value across four words.
p[0]
:= a0
p[1]
:= a0
p[2]
:= a0
p[3]
:= a0
void _mm_store_ps(float *p, __m128 a );
Stores four single-precision, floating-point
values. The address must be 16-byte aligned.
p[0]
:= a0
p[1]
:= a1
p[2]
:= a2
p[3]
:= a3
void _mm_storeu_ps(float *p, __m128 a);
Stores four single-precision, floating-point
values. The address does not need to be 16-byte aligned.
p[0]
:= a0
p[1]
:= a1
p[2]
:= a2
p[3]
:= a3
void _mm_storer_ps(float * p , __m128 a );
Stores four single-precision, floating-point
values in reverse order. The address must be 16-byte aligned.
p[0]
:= a3
p[1]
:= a2
p[2]
:= a1
p[3]
:= a0
__m128 _mm_move_ss( __m128 a , __m128 b );
Sets the low word to the single-precision,
floating-point value of b
. The upper 3 single-precision,
floating-point values are passed through from a.
r0 := b0
r1 := a1
r2 := a2
r3 := a3
Поддержка
кэш-памяти в SSE
Имя функции |
Операция |
Инструкция |
|
|
|
|
|
|
|
|
|
|
|
|
void _mm_prefetch(char
* p , int i );
Loads one
cache line of data from address p
to a location closer to the
processor. The value i
specifies the type of prefetch operation: the constants _MM_HINT_T0
, _MM_HINT_T1
,
_MM_HINT_T2
, and _MM_HINT_NTA
, corresponding to the type of prefetch
instruction, should be used.
void _mm_stream_pi(__m64
* p , __m64 a );
Stores the
data in a
to the address p
without polluting the caches. This intrinsic requires you to empty the
multimedia state for the MMX register (_mm_empty()).
void _mm_stream_ps(float
* p , __m128 a );
Stores the
data in a
to the address p
without polluting the caches. The address must be 16-byte aligned.
void _mm_sfence(void);
Guarantees
that every preceding store is globally visible before any subsequent store.
Макрос для перестановки значений при использовании SSE
SSE предоставляет специальное макроопределение, чтобы помочь в создании констант, описывающих операции перестановки. Макрос получает на вход четыре целых числа (в диапазоне от 0 до 3) и собирает их в 8-битное значение, используемое инструкцией SHUFPL. Четыре числа в параметрах могут рассматриваться как селекторы при выборе того, какие два значения из первого операнда и два значения из второго будут помещены в соответствующие позиции результата.
_MM_SHUFFLE(z, y, x, w)
/* expands to the following value */
(z<<6) | (y<<4) | (x<<2) | w
Макрос для транспонирования матрицы
_MM_TRANSPOSE4_PS(row0, row1, row2, row3)
Транспонирует матрицу 4*4 вещественных значений одинарной точности. Аргументы row0, row1, row2, row3 - это значения типа __m128, составляющие строки матрицы. Результат возвращается в тех же аргументах, которые уже будут содержать столбцы матрицы-результата.