Skip to content
版 本

asc_copy_gm2l0a

产 品 支 持 情 况

产 品是 否 支 持
Atlas A3 训 练 系 列 产 品/Atlas A3 推 理 系 列 产 品
Atlas A2 训 练 系 列 产 品/Atlas A2 推 理 系 列 产 品

功 能 说 明

将 数 据 从GM搬 运 到L0A Buffer。

函 数 原 型

  • 常 规 搬 运

    C++
    __aicore__ inline void asc_copy_gm2l0a(__ca__ bfloat16_t* dst, __gm__ bfloat16_t* src, uint16_t base_idx, uint8_t repeat, uint16_t src_stride, uint16_t dst_gap)
    __aicore__ inline void asc_copy_gm2l0a(__ca__ half* dst, __gm__ half* src, uint16_t base_idx, uint8_t repeat, uint16_t src_stride, uint16_t dst_gap)
    __aicore__ inline void asc_copy_gm2l0a(__ca__ float* dst, __gm__ float* src, uint16_t base_idx, uint8_t repeat, uint16_t src_stride, uint16_t dst_gap)
    __aicore__ inline void asc_copy_gm2l0a(__ca__ int32_t* dst, __gm__ int32_t* src, uint16_t base_idx, uint8_t repeat, uint16_t src_stride, uint16_t dst_gap)
    __aicore__ inline void asc_copy_gm2l0a(__ca__ uint8_t* dst, __gm__ uint8_t* src, uint16_t base_idx, uint8_t repeat, uint16_t src_stride, uint16_t dst_gap)
    __aicore__ inline void asc_copy_gm2l0a(__ca__ int8_t* dst, __gm__ int8_t* src, uint16_t base_idx, uint8_t repeat, uint16_t src_stride, uint16_t dst_gap)
    __aicore__ inline void asc_copy_gm2l0a(__ca__ uint32_t* dst, __gm__ uint32_t* src, uint16_t base_idx, uint8_t repeat, uint16_t src_stride, uint16_t dst_gap)
    __aicore__ inline void asc_copy_gm2l0a(__ca__ int4b_t* dst, __gm__ int4b_t* src, uint16_t base_idx, uint8_t repeat, uint16_t src_stride, uint16_t dst_gap)
    
  • 同 步 搬 运

    C++
    __aicore__ inline void asc_copy_gm2l0a_sync(__ca__ bfloat16_t* dst, __gm__ bfloat16_t* src, uint16_t base_idx, uint8_t repeat, uint16_t src_stride, uint16_t dst_gap)
    __aicore__ inline void asc_copy_gm2l0a_sync(__ca__ half* dst, __gm__ half* src, uint16_t base_idx, uint8_t repeat, uint16_t src_stride, uint16_t dst_gap)
    __aicore__ inline void asc_copy_gm2l0a_sync(__ca__ float* dst, __gm__ float* src, uint16_t base_idx, uint8_t repeat, uint16_t src_stride, uint16_t dst_gap)
    __aicore__ inline void asc_copy_gm2l0a_sync(__ca__ int32_t* dst, __gm__ int32_t* src, uint16_t base_idx, uint8_t repeat, uint16_t src_stride, uint16_t dst_gap)
    __aicore__ inline void asc_copy_gm2l0a_sync(__ca__ uint8_t* dst, __gm__ uint8_t* src, uint16_t base_idx, uint8_t repeat, uint16_t src_stride, uint16_t dst_gap)
    __aicore__ inline void asc_copy_gm2l0a_sync(__ca__ int8_t* dst, __gm__ int8_t* src, uint16_t base_idx, uint8_t repeat, uint16_t src_stride, uint16_t dst_gap)
    __aicore__ inline void asc_copy_gm2l0a_sync(__ca__ uint32_t* dst, __gm__ uint32_t* src, uint16_t base_idx, uint8_t repeat, uint16_t src_stride, uint16_t dst_gap)
    __aicore__ inline void asc_copy_gm2l0a_sync(__ca__ int4b_t* dst, __gm__ int4b_t* src, uint16_t base_idx, uint8_t repeat, uint16_t src_stride, uint16_t dst_gap)
    

参 数 说 明

参 数 名输 入/输 出描 述
dst输 出目 的 操 作 数 起 始 地 址。
src输 入源 操 作 数 起 始 地 址。
base_idx输 入分 形 矩 阵ID,搬 运 起 始 位 置 为 源 操 作 数 的 第 几 个 分 形(0为 源 操 作 数 中 第1个 分 形 矩 阵)。取 值 范 围:[0, 65535]。
repeat输 入迭 代 次 数,每 个 迭 代 可 以 处 理512B数 据。取 值 范 围:[1, 255]。
src_stride输 入相 邻 迭 代 间,源 操 作 数 前 一 个 分 形 与 后 一 个 分 形 起 始 地 址 的 间 隔,单 位:512B。取 值 范 围:[0, 65535]。
dst_gap输 入相 邻 迭 代 间,目 的 操 作 数 前 一 个 迭 代 第 一 个 分 形 的 结 束 地 址 到 下 一 个 迭 代 第 一 个 分 形 起 始 地 址 的 间 隔。单 位:512B。取 值 范 围:[0, 65535]。

返 回 值 说 明

流 水 类 型

PIPE_MTE2

约 束 说 明

操 作 数 地 址 重 叠 约 束 请 参 考通 用 地 址 重 叠 约 束

调 用 示 例

C++
__ca__ half dst[256];
__gm__ half src[256];
asc_copy_gm2l0a(dst, src, 0, 1, 64, 64);

免 责 声 明:本 站 内 容 由 asc-devkit 仓 master 分 支 自 动 编 译 生 成,属 于 持 续 开 发 版 本,可 能 存 在 缺 陷,仅 供 预 览 与 参 考。如 需 稳 定 及 商 用 资 料,请 查 阅 官 方 昇 腾 社 区