Skip to content
版 本

asc_copy_l0c2ub

产 品 支 持 情 况

产 品是 否 支 持
Ascend 950PR/Ascend 950DT

功 能 说 明

矩 阵 计 算 完 成 后,对 结 果 进 行 量 化 处 理,之 后 将 处 理 结 果 搬 运 到Unified Buffer中。量 化 参 数 共 有2个:quant_pre和quant_post,分 别 对 应 预 处 理 和 后 处 理 阶 段。

quant_pre可 选 量 化 模 式 分 别 为:

  • NoQuant:不 开 启 量 化 功 能。
  • F322BF16:float量 化 成bfloat16_t。量 化 结 果 不 支 持INF_NAN模 式。
  • F322F16:float量 化 成half。量 化 结 果 支 持INF_NAN模 式。
  • DEQF16:int32_t量 化 成half。量 化 结 果 不 支 持INF_NAN模 式。
  • VDEQF16:int32_t量 化 成half。量 化 结 果 不 支 持INF_NAN模 式。
  • QF322B8_PRE:float量 化 成uint8_t/int8_t。scalar量 化。
  • VQF322B8_PRE:float量 化 成uint8_t/int8_t。矢 量 量 化。
  • REQ8:int32_t量 化 成uint8_t/int8_t。scalar量 化。
  • VREQ8:int32_t量 化 成uint8_t/int8_t。矢 量 量 化。
  • QF322FP8_PRE:float量 化 成fp8_e4m3fn_t,scalar量 化。
  • VQF322FP8_PRE:float量 化 成fp8_e4m3fn_t,矢 量 量 化。
  • QF322HIF8_PRE:float量 化 成hifloat8_t(Half to Away Round),scalar量 化。
  • VQF322HIF8_PRE:float量 化 成hifloat8_t(Half to Away Round),矢 量 量 化。
  • QF322HIF8_PRE_HYBRID:float量 化 成hifloat8_t(Hybrid Round),scalar量 化。
  • VQF322HIF8_PRE_HYBRID:float量 化 成hifloat8_t(Hybrid Round),矢 量 量 化。
  • QS322BF16_PRE:int32_t量 化 成bfloat16_t,scalar量 化。
  • VQS322BF16_PRE:int32_t量 化 成bfloat16_t,矢 量 量 化。
  • QF322F16_PRE:float量 化 成half,scalar量 化。
  • VQF322F16_PRE:float量 化 成half,矢 量 量 化。
  • QF322BF16_PRE:float量 化 成bfloat16_t,scalar量 化。
  • VQF322BF16_PRE:float量 化 成bfloat16_t,矢 量 量 化。
  • QF322F32_PRE:float量 化 成float,scalar量 化。该 量 化 模 式 精 度 无 法 达 到 双 万 分 之 一,可 以 达 到 双 千 分 之 一。
  • VQF322F32_PRE:float量 化 成float,矢 量 量 化。该 量 化 模 式 精 度 无 法 达 到 双 万 分 之 一,可 以 达 到 双 千 分 之 一。

quant_post可 选 量 化 模 式 分 别 为:

  • NoConv:不 开 启 量 化 功 能。
  • QS162B8_POST:int16_t量 化 成uint8_t/int8_t,scalar量 化。
  • VQS162B8_POST:int16_t量 化 成uint8_t/int8_t,矢 量 量 化。
  • QF162B8_POST:half量 化 成uint8_t/int8_t,scalar量 化。
  • VQF162B8_POST:half量 化 成uint8_t/int8_t,矢 量 量 化。
  • QS162S4_POST:int16_t量 化 成int4_t,scalar量 化。
  • VQS162S4_POST:int16_t量 化 成int4_t,矢 量 量 化。
  • QF162S4_POST:half量 化 成int4_t类 型,scalar量 化。
  • VQF162S4_POST:half量 化 成int4_t类 型,矢 量 量 化。
  • QS162S16_POST:int16_t量 化 成int16_t,scalar量 化。
  • VQS162S16_POST:int16_t量 化 成int16_t,矢 量 量 化。
  • QF162S16_POST:half量 化 成int16_t,scalar量 化。
  • VQF162S16_POST:half量 化 成int16_t,矢 量 量 化。

函 数 原 型

  • 常 规 搬 运

    C++
    __aicore__ inline void asc_copy_l0c2ub(__ubuf__ bfloat16_t *dst_addr, __cc__ float *src_addr, uint16_t n_size, uint16_t m_size,
                                uint32_t loop_dst_stride, uint16_t loop_src_stride, uint8_t dual_dst_ctl, bool sub_blockid, uint8_t clip_relu_pre,
                                uint8_t unit_flag_ctl, uint64_t quant_pre, uint8_t relu_pre, bool split_en, bool NZ2ND_en, uint64_t quant_post,
                                uint8_t relu_post, bool clip_relu_post, uint8_t eltwise_op, bool eltwise_antq_en,
                                bool C0_pad_en, bool broadcast_en, bool NZ2DN_en)
    __aicore__ inline void asc_copy_l0c2ub(__ubuf__ half *dst_addr, __cc__ float *src_addr, uint16_t n_size, uint16_t m_size,
                                uint32_t loop_dst_stride, uint16_t loop_src_stride, uint8_t dual_dst_ctl, bool sub_blockid, uint8_t clip_relu_pre,
                                uint8_t unit_flag_ctl, uint64_t quant_pre, uint8_t relu_pre, bool split_en, bool NZ2ND_en, uint64_t quant_post,
                                uint8_t relu_post, bool clip_relu_post, uint8_t eltwise_op, bool eltwise_antq_en,
                                bool C0_pad_en, bool broadcast_en, bool NZ2DN_en)
    __aicore__ inline void asc_copy_l0c2ub(__ubuf__ fp8_e4m3fn_t *dst_addr, __cc__ float *src_addr, uint16_t n_size, uint16_t m_size,
                                uint32_t loop_dst_stride, uint16_t loop_src_stride, uint8_t dual_dst_ctl, bool sub_blockid, uint8_t clip_relu_pre,
                                uint8_t unit_flag_ctl, uint64_t quant_pre, uint8_t relu_pre, bool split_en, bool NZ2ND_en, uint64_t quant_post,
                                uint8_t relu_post, bool clip_relu_post, uint8_t eltwise_op, bool eltwise_antq_en,
                                bool C0_pad_en, bool broadcast_en, bool NZ2DN_en)
    __aicore__ inline void asc_copy_l0c2ub(__ubuf__ fp8_e5m2_t *dst_addr, __cc__ float *src_addr, uint16_t n_size, uint16_t m_size,
                                uint32_t loop_dst_stride, uint16_t loop_src_stride, uint8_t dual_dst_ctl, bool sub_blockid, uint8_t clip_relu_pre,
                                uint8_t unit_flag_ctl, uint64_t quant_pre, uint8_t relu_pre, bool split_en, bool NZ2ND_en, uint64_t quant_post,
                                uint8_t relu_post, bool clip_relu_post, uint8_t eltwise_op, bool eltwise_antq_en,
                                bool C0_pad_en, bool broadcast_en, bool NZ2DN_en)
    __aicore__ inline void asc_copy_l0c2ub(__ubuf__ hifloat8_t *dst_addr, __cc__ float *src_addr, uint16_t n_size, uint16_t m_size,
                                uint32_t loop_dst_stride, uint16_t loop_src_stride, uint8_t dual_dst_ctl, bool sub_blockid, uint8_t clip_relu_pre,
                                uint8_t unit_flag_ctl, uint64_t quant_pre, uint8_t relu_pre, bool split_en, bool NZ2ND_en, uint64_t quant_post,
                                uint8_t relu_post, bool clip_relu_post, uint8_t eltwise_op, bool eltwise_antq_en,
                                bool C0_pad_en, bool broadcast_en, bool NZ2DN_en)
    __aicore__ inline void asc_copy_l0c2ub(__ubuf__ int8_t *dst_addr, __cc__ float *src_addr, uint16_t n_size, uint16_t m_size,
                                uint32_t loop_dst_stride, uint16_t loop_src_stride, uint8_t dual_dst_ctl, bool sub_blockid, uint8_t clip_relu_pre,
                                uint8_t unit_flag_ctl, uint64_t quant_pre, uint8_t relu_pre, bool split_en, bool NZ2ND_en, uint64_t quant_post,
                                uint8_t relu_post, bool clip_relu_post, uint8_t eltwise_op, bool eltwise_antq_en,
                                bool C0_pad_en, bool broadcast_en, bool NZ2DN_en)
    __aicore__ inline void asc_copy_l0c2ub(__ubuf__ uint8_t *dst_addr, __cc__ float *src_addr, uint16_t n_size, uint16_t m_size,
                                uint32_t loop_dst_stride, uint16_t loop_src_stride, uint8_t dual_dst_ctl, bool sub_blockid, uint8_t clip_relu_pre,
                                uint8_t unit_flag_ctl, uint64_t quant_pre, uint8_t relu_pre, bool split_en, bool NZ2ND_en, uint64_t quant_post,
                                uint8_t relu_post, bool clip_relu_post, uint8_t eltwise_op, bool eltwise_antq_en,
                                bool C0_pad_en, bool broadcast_en, bool NZ2DN_en)
    __aicore__ inline void asc_copy_l0c2ub(__ubuf__ float *dst_addr, __cc__ float *src_addr, uint16_t n_size, uint16_t m_size,
                                uint32_t loop_dst_stride, uint16_t loop_src_stride, uint8_t dual_dst_ctl, bool sub_blockid, uint8_t clip_relu_pre,
                                uint8_t unit_flag_ctl, uint64_t quant_pre, uint8_t relu_pre, bool split_en, bool NZ2ND_en, uint64_t quant_post,
                                uint8_t relu_post, bool clip_relu_post, uint8_t eltwise_op, bool eltwise_antq_en,
                                bool C0_pad_en, bool broadcast_en, bool NZ2DN_en)
    __aicore__ inline void asc_copy_l0c2ub(__ubuf__ bfloat16_t *dst_addr, __cc__ int32_t *src_addr, uint16_t n_size, uint16_t m_size,
                                uint32_t loop_dst_stride, uint16_t loop_src_stride, uint8_t dual_dst_ctl, bool sub_blockid, uint8_t clip_relu_pre,
                                uint8_t unit_flag_ctl, uint64_t quant_pre, uint8_t relu_pre, bool split_en, bool NZ2ND_en, uint64_t quant_post,
                                uint8_t relu_post, bool clip_relu_post, uint8_t eltwise_op, bool eltwise_antq_en,
                                bool C0_pad_en, bool broadcast_en, bool NZ2DN_en)
    __aicore__ inline void asc_copy_l0c2ub(__ubuf__ half *dst_addr, __cc__ int32_t *src_addr, uint16_t n_size, uint16_t m_size,
                                uint32_t loop_dst_stride, uint16_t loop_src_stride, uint8_t dual_dst_ctl, bool sub_blockid, uint8_t clip_relu_pre,
                                uint8_t unit_flag_ctl, uint64_t quant_pre, uint8_t relu_pre, bool split_en, bool NZ2ND_en, uint64_t quant_post,
                                uint8_t relu_post, bool clip_relu_post, uint8_t eltwise_op, bool eltwise_antq_en,
                                bool C0_pad_en, bool broadcast_en, bool NZ2DN_en)
    __aicore__ inline void asc_copy_l0c2ub(__ubuf__ fp8_e4m3fn_t *dst_addr, __cc__ int32_t *src_addr, uint16_t n_size, uint16_t m_size,
                                uint32_t loop_dst_stride, uint16_t loop_src_stride, uint8_t dual_dst_ctl, bool sub_blockid, uint8_t clip_relu_pre,
                                uint8_t unit_flag_ctl, uint64_t quant_pre, uint8_t relu_pre, bool split_en, bool NZ2ND_en, uint64_t quant_post,
                                uint8_t relu_post, bool clip_relu_post, uint8_t eltwise_op, bool eltwise_antq_en,
                                bool C0_pad_en, bool broadcast_en, bool NZ2DN_en)
    __aicore__ inline void asc_copy_l0c2ub(__ubuf__ fp8_e5m2_t *dst_addr, __cc__ int32_t *src_addr, uint16_t n_size, uint16_t m_size,
                                uint32_t loop_dst_stride, uint16_t loop_src_stride, uint8_t dual_dst_ctl, bool sub_blockid, uint8_t clip_relu_pre,
                                uint8_t unit_flag_ctl, uint64_t quant_pre, uint8_t relu_pre, bool split_en, bool NZ2ND_en, uint64_t quant_post,
                                uint8_t relu_post, bool clip_relu_post, uint8_t eltwise_op, bool eltwise_antq_en,
                                bool C0_pad_en, bool broadcast_en, bool NZ2DN_en)
    __aicore__ inline void asc_copy_l0c2ub(__ubuf__ hifloat8_t *dst_addr, __cc__ int32_t *src_addr, uint16_t n_size, uint16_t m_size,
                                uint32_t loop_dst_stride, uint16_t loop_src_stride, uint8_t dual_dst_ctl, bool sub_blockid, uint8_t clip_relu_pre,
                                uint8_t unit_flag_ctl, uint64_t quant_pre, uint8_t relu_pre, bool split_en, bool NZ2ND_en, uint64_t quant_post,
                                uint8_t relu_post, bool clip_relu_post, uint8_t eltwise_op, bool eltwise_antq_en,
                                bool C0_pad_en, bool broadcast_en, bool NZ2DN_en)
    __aicore__ inline void asc_copy_l0c2ub(__ubuf__ int8_t *dst_addr, __cc__ int32_t *src_addr, uint16_t n_size, uint16_t m_size,
                                uint32_t loop_dst_stride, uint16_t loop_src_stride, uint8_t dual_dst_ctl, bool sub_blockid, uint8_t clip_relu_pre,
                                uint8_t unit_flag_ctl, uint64_t quant_pre, uint8_t relu_pre, bool split_en, bool NZ2ND_en, uint64_t quant_post,
                                uint8_t relu_post, bool clip_relu_post, uint8_t eltwise_op, bool eltwise_antq_en,
                                bool C0_pad_en, bool broadcast_en, bool NZ2DN_en)
    __aicore__ inline void asc_copy_l0c2ub(__ubuf__ uint8_t *dst_addr, __cc__ int32_t *src_addr, uint16_t n_size, uint16_t m_size,
                                uint32_t loop_dst_stride, uint16_t loop_src_stride, uint8_t dual_dst_ctl, bool sub_blockid, uint8_t clip_relu_pre,
                                uint8_t unit_flag_ctl, uint64_t quant_pre, uint8_t relu_pre, bool split_en, bool NZ2ND_en, uint64_t quant_post,
                                uint8_t relu_post, bool clip_relu_post, uint8_t eltwise_op, bool eltwise_antq_en,
                                bool C0_pad_en, bool broadcast_en, bool NZ2DN_en)
    __aicore__ inline void asc_copy_l0c2ub(__ubuf__ int32_t *dst_addr, __cc__ int32_t *src_addr, uint16_t n_size, uint16_t m_size,
                                uint32_t loop_dst_stride, uint16_t loop_src_stride, uint8_t dual_dst_ctl, bool sub_blockid, uint8_t clip_relu_pre,
                                uint8_t unit_flag_ctl, uint64_t quant_pre, uint8_t relu_pre, bool split_en, bool NZ2ND_en, uint64_t quant_post,
                                uint8_t relu_post, bool clip_relu_post, uint8_t eltwise_op, bool eltwise_antq_en,
                                bool C0_pad_en, bool broadcast_en, bool NZ2DN_en)
    __aicore__ inline void asc_copy_l0c2ub(__ubuf__ void *dst_addr, __cc__ float *src_addr, uint16_t n_size, uint16_t m_size,
                                uint32_t loop_dst_stride, uint16_t loop_src_stride, uint8_t dual_dst_ctl, bool sub_blockid, uint8_t clip_relu_pre,
                                uint8_t unit_flag_ctl, uint64_t quant_pre, uint8_t relu_pre, bool split_en, bool NZ2ND_en, uint64_t quant_post,
                                uint8_t relu_post, bool clip_relu_post, uint8_t eltwise_op, bool eltwise_antq_en,
                                bool C0_pad_en, bool broadcast_en, bool NZ2DN_en)
    __aicore__ inline void asc_copy_l0c2ub(__ubuf__ void *dst_addr, __cc__ int32_t *src_addr, uint16_t n_size, uint16_t m_size,
                                uint32_t loop_dst_stride, uint16_t loop_src_stride, uint8_t dual_dst_ctl, bool sub_blockid, uint8_t clip_relu_pre,
                                uint8_t unit_flag_ctl, uint64_t quant_pre, uint8_t relu_pre, bool split_en, bool NZ2ND_en, uint64_t quant_post,
                                uint8_t relu_post, bool clip_relu_post, uint8_t eltwise_op, bool eltwise_antq_en,
                                bool C0_pad_en, bool broadcast_en, bool NZ2DN_en)
    
  • 同 步 搬 运

    C++
    __aicore__ inline void asc_copy_l0c2ub_sync(__ubuf__ bfloat16_t* dst_addr, __cc__ float* src_addr, uint16_t n_size, uint16_t m_size,
                            uint32_t loop_dst_stride, uint16_t loop_src_stride, uint8_t dual_dst_ctl, bool sub_blockid, uint8_t clip_relu_pre,
                            uint8_t unit_flag_ctl, uint64_t quant_pre, uint8_t relu_pre, bool split_en, bool NZ2ND_en, uint64_t quant_post,
                            uint8_t relu_post, bool clip_relu_post, uint8_t eltwise_op, bool eltwise_antq_en,
                            bool C0_pad_en, bool broadcast_en, bool NZ2DN_en)
    __aicore__ inline void asc_copy_l0c2ub_sync(__ubuf__ half* dst_addr, __cc__ float* src_addr, uint16_t n_size, uint16_t m_size,
                            uint32_t loop_dst_stride, uint16_t loop_src_stride, uint8_t dual_dst_ctl, bool sub_blockid, uint8_t clip_relu_pre,
                            uint8_t unit_flag_ctl, uint64_t quant_pre, uint8_t relu_pre, bool split_en, bool NZ2ND_en, uint64_t quant_post,
                            uint8_t relu_post, bool clip_relu_post, uint8_t eltwise_op, bool eltwise_antq_en,
                            bool C0_pad_en, bool broadcast_en, bool NZ2DN_en)
    __aicore__ inline void asc_copy_l0c2ub_sync(__ubuf__ fp8_e4m3fn_t* dst_addr, __cc__ float* src_addr, uint16_t n_size, uint16_t m_size,
                            uint32_t loop_dst_stride, uint16_t loop_src_stride, uint8_t dual_dst_ctl, bool sub_blockid, uint8_t clip_relu_pre,
                            uint8_t unit_flag_ctl, uint64_t quant_pre, uint8_t relu_pre, bool split_en, bool NZ2ND_en, uint64_t quant_post,
                            uint8_t relu_post, bool clip_relu_post, uint8_t eltwise_op, bool eltwise_antq_en,
                            bool C0_pad_en, bool broadcast_en, bool NZ2DN_en)
    __aicore__ inline void asc_copy_l0c2ub_sync(__ubuf__ fp8_e5m2_t* dst_addr, __cc__ float* src_addr, uint16_t n_size, uint16_t m_size,
                            uint32_t loop_dst_stride, uint16_t loop_src_stride, uint8_t dual_dst_ctl, bool sub_blockid, uint8_t clip_relu_pre,
                            uint8_t unit_flag_ctl, uint64_t quant_pre, uint8_t relu_pre, bool split_en, bool NZ2ND_en, uint64_t quant_post,
                            uint8_t relu_post, bool clip_relu_post, uint8_t eltwise_op, bool eltwise_antq_en,
                            bool C0_pad_en, bool broadcast_en, bool NZ2DN_en)
    __aicore__ inline void asc_copy_l0c2ub_sync(__ubuf__ hifloat8_t* dst_addr, __cc__ float* src_addr, uint16_t n_size, uint16_t m_size,
                            uint32_t loop_dst_stride, uint16_t loop_src_stride, uint8_t dual_dst_ctl, bool sub_blockid, uint8_t clip_relu_pre,
                            uint8_t unit_flag_ctl, uint64_t quant_pre, uint8_t relu_pre, bool split_en, bool NZ2ND_en, uint64_t quant_post,
                            uint8_t relu_post, bool clip_relu_post, uint8_t eltwise_op, bool eltwise_antq_en,
                            bool C0_pad_en, bool broadcast_en, bool NZ2DN_en)
    __aicore__ inline void asc_copy_l0c2ub_sync(__ubuf__ int8_t* dst_addr, __cc__ float* src_addr, uint16_t n_size, uint16_t m_size,
                            uint32_t loop_dst_stride, uint16_t loop_src_stride, uint8_t dual_dst_ctl, bool sub_blockid, uint8_t clip_relu_pre,
                            uint8_t unit_flag_ctl, uint64_t quant_pre, uint8_t relu_pre, bool split_en, bool NZ2ND_en, uint64_t quant_post,
                            uint8_t relu_post, bool clip_relu_post, uint8_t eltwise_op, bool eltwise_antq_en,
                            bool C0_pad_en, bool broadcast_en, bool NZ2DN_en)
    __aicore__ inline void asc_copy_l0c2ub_sync(__ubuf__ uint8_t* dst_addr, __cc__ float* src_addr, uint16_t n_size, uint16_t m_size,
                            uint32_t loop_dst_stride, uint16_t loop_src_stride, uint8_t dual_dst_ctl, bool sub_blockid, uint8_t clip_relu_pre,
                            uint8_t unit_flag_ctl, uint64_t quant_pre, uint8_t relu_pre, bool split_en, bool NZ2ND_en, uint64_t quant_post,
                            uint8_t relu_post, bool clip_relu_post, uint8_t eltwise_op, bool eltwise_antq_en,
                            bool C0_pad_en, bool broadcast_en, bool NZ2DN_en)
    __aicore__ inline void asc_copy_l0c2ub_sync(__ubuf__ float* dst_addr, __cc__ float* src_addr, uint16_t n_size, uint16_t m_size,
                            uint32_t loop_dst_stride, uint16_t loop_src_stride, uint8_t dual_dst_ctl, bool sub_blockid, uint8_t clip_relu_pre,
                            uint8_t unit_flag_ctl, uint64_t quant_pre, uint8_t relu_pre, bool split_en, bool NZ2ND_en, uint64_t quant_post,
                            uint8_t relu_post, bool clip_relu_post, uint8_t eltwise_op, bool eltwise_antq_en,
                            bool C0_pad_en, bool broadcast_en, bool NZ2DN_en)
    __aicore__ inline void asc_copy_l0c2ub_sync(__ubuf__ bfloat16_t* dst_addr, __cc__ int32_t* src_addr, uint16_t n_size, uint16_t m_size,
                            uint32_t loop_dst_stride, uint16_t loop_src_stride, uint8_t dual_dst_ctl, bool sub_blockid, uint8_t clip_relu_pre,
                            uint8_t unit_flag_ctl, uint64_t quant_pre, uint8_t relu_pre, bool split_en, bool NZ2ND_en, uint64_t quant_post,
                            uint8_t relu_post, bool clip_relu_post, uint8_t eltwise_op, bool eltwise_antq_en,
                            bool C0_pad_en, bool broadcast_en, bool NZ2DN_en)
    __aicore__ inline void asc_copy_l0c2ub_sync(__ubuf__ half* dst_addr, __cc__ int32_t* src_addr, uint16_t n_size, uint16_t m_size,
                            uint32_t loop_dst_stride, uint16_t loop_src_stride, uint8_t dual_dst_ctl, bool sub_blockid, uint8_t clip_relu_pre,
                            uint8_t unit_flag_ctl, uint64_t quant_pre, uint8_t relu_pre, bool split_en, bool NZ2ND_en, uint64_t quant_post,
                            uint8_t relu_post, bool clip_relu_post, uint8_t eltwise_op, bool eltwise_antq_en,
                            bool C0_pad_en, bool broadcast_en, bool NZ2DN_en)
    __aicore__ inline void asc_copy_l0c2ub_sync(__ubuf__ fp8_e4m3fn_t* dst_addr, __cc__ int32_t* src_addr, uint16_t n_size, uint16_t m_size,
                            uint32_t loop_dst_stride, uint16_t loop_src_stride, uint8_t dual_dst_ctl, bool sub_blockid, uint8_t clip_relu_pre,
                            uint8_t unit_flag_ctl, uint64_t quant_pre, uint8_t relu_pre, bool split_en, bool NZ2ND_en, uint64_t quant_post,
                            uint8_t relu_post, bool clip_relu_post, uint8_t eltwise_op, bool eltwise_antq_en,
                            bool C0_pad_en, bool broadcast_en, bool NZ2DN_en)
    __aicore__ inline void asc_copy_l0c2ub_sync(__ubuf__ fp8_e5m2_t* dst_addr, __cc__ int32_t* src_addr, uint16_t n_size, uint16_t m_size,
                            uint32_t loop_dst_stride, uint16_t loop_src_stride, uint8_t dual_dst_ctl, bool sub_blockid, uint8_t clip_relu_pre,
                            uint8_t unit_flag_ctl, uint64_t quant_pre, uint8_t relu_pre, bool split_en, bool NZ2ND_en, uint64_t quant_post,
                            uint8_t relu_post, bool clip_relu_post, uint8_t eltwise_op, bool eltwise_antq_en,
                            bool C0_pad_en, bool broadcast_en, bool NZ2DN_en)
    __aicore__ inline void asc_copy_l0c2ub_sync(__ubuf__ hifloat8_t* dst_addr, __cc__ int32_t* src_addr, uint16_t n_size, uint16_t m_size,
                            uint32_t loop_dst_stride, uint16_t loop_src_stride, uint8_t dual_dst_ctl, bool sub_blockid, uint8_t clip_relu_pre,
                            uint8_t unit_flag_ctl, uint64_t quant_pre, uint8_t relu_pre, bool split_en, bool NZ2ND_en, uint64_t quant_post,
                            uint8_t relu_post, bool clip_relu_post, uint8_t eltwise_op, bool eltwise_antq_en,
                            bool C0_pad_en, bool broadcast_en, bool NZ2DN_en)
    __aicore__ inline void asc_copy_l0c2ub_sync(__ubuf__ int8_t* dst_addr, __cc__ int32_t* src_addr, uint16_t n_size, uint16_t m_size,
                            uint32_t loop_dst_stride, uint16_t loop_src_stride, uint8_t dual_dst_ctl, bool sub_blockid, uint8_t clip_relu_pre,
                            uint8_t unit_flag_ctl, uint64_t quant_pre, uint8_t relu_pre, bool split_en, bool NZ2ND_en, uint64_t quant_post,
                            uint8_t relu_post, bool clip_relu_post, uint8_t eltwise_op, bool eltwise_antq_en,
                            bool C0_pad_en, bool broadcast_en, bool NZ2DN_en)
    __aicore__ inline void asc_copy_l0c2ub_sync(__ubuf__ uint8_t* dst_addr, __cc__ int32_t* src_addr, uint16_t n_size, uint16_t m_size,
                            uint32_t loop_dst_stride, uint16_t loop_src_stride, uint8_t dual_dst_ctl, bool sub_blockid, uint8_t clip_relu_pre,
                            uint8_t unit_flag_ctl, uint64_t quant_pre, uint8_t relu_pre, bool split_en, bool NZ2ND_en, uint64_t quant_post,
                            uint8_t relu_post, bool clip_relu_post, uint8_t eltwise_op, bool eltwise_antq_en,
                            bool C0_pad_en, bool broadcast_en, bool NZ2DN_en)
    __aicore__ inline void asc_copy_l0c2ub_sync(__ubuf__ int32_t* dst_addr, __cc__ int32_t* src_addr, uint16_t n_size, uint16_t m_size,
                            uint32_t loop_dst_stride, uint16_t loop_src_stride, uint8_t dual_dst_ctl, bool sub_blockid, uint8_t clip_relu_pre,
                            uint8_t unit_flag_ctl, uint64_t quant_pre, uint8_t relu_pre, bool split_en, bool NZ2ND_en, uint64_t quant_post,
                            uint8_t relu_post, bool clip_relu_post, uint8_t eltwise_op, bool eltwise_antq_en,
                            bool C0_pad_en, bool broadcast_en, bool NZ2DN_en)
    __aicore__ inline void asc_copy_l0c2ub_sync(__ubuf__ void* dst_addr, __cc__ float* src_addr, uint16_t n_size, uint16_t m_size,
                            uint32_t loop_dst_stride, uint16_t loop_src_stride, uint8_t dual_dst_ctl, bool sub_blockid, uint8_t clip_relu_pre,
                            uint8_t unit_flag_ctl, uint64_t quant_pre, uint8_t relu_pre, bool split_en, bool NZ2ND_en, uint64_t quant_post,
                            uint8_t relu_post, bool clip_relu_post, uint8_t eltwise_op, bool eltwise_antq_en,
                            bool C0_pad_en, bool broadcast_en, bool NZ2DN_en)
    __aicore__ inline void asc_copy_l0c2ub_sync(__ubuf__ void* dst_addr, __cc__ int32_t* src_addr, uint16_t n_size, uint16_t m_size,
                            uint32_t loop_dst_stride, uint16_t loop_src_stride, uint8_t dual_dst_ctl, bool sub_blockid, uint8_t clip_relu_pre,
                            uint8_t unit_flag_ctl, uint64_t quant_pre, uint8_t relu_pre, bool split_en, bool NZ2ND_en, uint64_t quant_post,
                            uint8_t relu_post, bool clip_relu_post, uint8_t eltwise_op, bool eltwise_antq_en,
                            bool C0_pad_en, bool broadcast_en, bool NZ2DN_en)
    

参 数 说 明

参 数 名输 入/输 出描 述
dst_addr输 出目 的 操 作 数(矢 量)的 起 始 地 址。
src_addr输 入源 操 作 数(矢 量)的 起 始 地 址。
n_size输 入源NZ矩 阵 在N方 向 上 的 大 小。
•不 开 启NZ2ND功 能,取 值 范 围:[1, 4095];
•开 启NZ2ND功 能,取 值 范 围:[1, 4095]。
m_size输 入源NZ矩 阵 在M方 向 上 的 大 小。
•不 开 启NZ2ND功 能,取 值 范 围:[1, 65535];
•开 启NZ2ND功 能,取 值 范 围:[1, 8192]。
loop_dst_stride输 入
- 不 开 启NZ2ND功 能,目 的NZ矩 阵 中 相 邻Z排 布 的 起 始 地 址 偏 移,取 值 不 为0,单 位:element。
- 开 启NZ2ND/NZ2DN功 能,目 的ND矩 阵 每 一 行 中 的 元 素 个 数,取 值 不 为0 ,单 位:element。
loop_src_stride输 入源NZ矩 阵 中 相 邻Z排 布 的 起 始 地 址 偏 移,取 值 范 围:[0, 65535],单 位:C0_Size(16*sizeof(T),T为src_addr的 数 据 类 型)。
dual_dst_ctl输 入双 目 的 控 制 参 数。
sub_blockid输 入子 块ID。
clip_relu_pre输 入预 处 理 阶 段 开 启clip_relu,需 搭 配normal relu(归 一 化 的relu函 数)一 起 使 用 且 需 要 开 启 量 化 功 能。
unit_flag_ctl输 入与unit_flag参 数 相 关,取 值 如 下:
•0保 留 值;
•2 开 启unit_flag,硬 件 执 行 完 指 令 之 后,不 会 设 置 寄 存 器;
•3 开 启unit_flag,硬 件 执 行 完 指 令 后,会 将unit_flag关 闭。
quant_pre输 入预 处 理 阶 段 量 化 参 数。取 值 见功 能 说 明
relu_pre输 入预 处 理 阶 段 开 启relu。
split_en输 入是 否 开 启 通 道 拆 分 的 功 能,默 认false,不 开 启 该 功 能。仅 在src_addr和dst_addr都 为float时 才 能 开 启 通 道 拆 分,且 不 能 同 时 开 启split_en和NZ2ND功 能。
NZ2ND_en输 入开 启NZ2ND开 关。
•false:不 开 启;
•true:开 启。
quant_post输 入后 处 理 阶 段 量 化 参 数。取 值 见功 能 说 明
relu_post输 入后 处 理 阶 段 开 启relu。
clip_relu_post输 入后 处 理 阶 段 开 启clip_relu,需 搭 配normal relu一 起 使 用,且 需 要 开 启 量 化 功 能。
eltwise_op输 入定 义 数 据 从l0c搬 运 至ub时 的 目 的 操 作 数 地 址 和 通 道 步 长。
eltwise_antq_en输 入按 位 开 启 元 素 的 反 量 化 操 作。
C0_pad_en输 入开 启 为C0配 置 填 充 位,C0是 通 道 循 环 的 目 标 步 长。
broadcast_en输 入是 否 开 启 广 播 能 力。
•false:不 开 启;
•true:开 启,在 数 据 搬 运 时 沿M轴 方 向 进 行 数 据 广 播。
NZ2DN_en输 入开 启NZ2DN开 关。
•false:不 开 启;
•true:开 启。

返 回 值 说 明

流 水 类 型

PIPE_FIX

约 束 说 明

  • src_addr的 起 始 地 址 要 求 按 照 对 应 数 据 类 型 所 占 字 节 数 对 齐。
  • dst_addr的 起 始 地 址 要 求32字 节 对 齐。

调 用 示 例

C++
__ubuf__ bfloat16_t dst[256];
__cc__ float src[256];
uint16_t n_size = 1;
uint16_t m_size = 1;
uint32_t loop_dst_stride = 0;
uint16_t loop_src_stride = 0;
uint8_t dual_dst_ctl = 5;
bool sub_blockid = true;
uint8_t clip_relu_pre = 0;
uint8_t unit_flag_ctl = 0;
uint64_t quant_pre = DEQF16;
uint8_t relu_pre = 0;
bool split_en = true;
bool NZ2ND_en = true;
uint64_t quant_post = VQS162B8_POST;
uint8_t relu_post = 0;
bool clip_relu_post = true;
uint8_t eltwise_op = 0;
bool eltwise_antq_en = true;
bool C0_pad_en = true;
bool broadcast_en = false;
bool NZ2DN_en = false;
asc_copy_l0c2ub(dst, src, n_size, m_size, loop_dst_stride, loop_src_stride, dual_dst_ctl, sub_blockid, clip_relu_pre,
        unit_flag_ctl, quant_pre, relu_pre, split_en, NZ2ND_en, quant_post, relu_post, clip_relu_post,
        eltwise_op, eltwise_antq_en, C0_pad_en, broadcast_en, NZ2DN_en);

免 责 声 明:本 站 内 容 由 asc-devkit 仓 master 分 支 自 动 编 译 生 成,属 于 持 续 开 发 版 本,可 能 存 在 缺 陷,仅 供 预 览 与 参 考。如 需 稳 定 及 商 用 资 料,请 查 阅 官 方 昇 腾 社 区