Skip to content
版 本

DumpTensor

产 品 支 持 情 况

产 品

是 否 支 持

Ascend 950PR/Ascend 950DT

Atlas A3 训 练 系 列 产 品/Atlas A3 推 理 系 列 产 品

Atlas A2 训 练 系 列 产 品/Atlas A2 推 理 系 列 产 品

Atlas 200I/500 A2 推 理 产 品

Atlas 推 理 系 列 产 品AI Core

Atlas 推 理 系 列 产 品Vector Core

x

Atlas 训 练 系 列 产 品

x

Kirin X90

Kirin 9030

功 能 说 明

头 文 件 路 径 为:"basic_api/kernel_operator_dump_tensor_intf.h"

该 接 口 可 以 打 印Tensor的 内 容,同 时 支 持 打 印 自 定 义 的 标 签(仅 支 持uint32_t数 据 类 型 的 信 息),比 如 打 印 当 前 行 号 等。 在 算 子kernel侧 实 现 代 码 中 需 要 打 印Tensor数 据 的 地 方,调 用DumpTensor接 口 打 印 相 关 内 容。样 例 如 下:

C++
AscendC::DumpTensor(srcLocal, 5, dataLen);

注 意

该 接 口 主 要 用 于 调 试 分 析,开 启 后 会 对 算 子 性 能 产 生 一 定 影 响,通 常 在 调 试 阶 段 使 用,生 产 环 境 建 议 关 闭。
默 认 情 况 下,调 用 该 接 口 就 会 打 印 相 关 内 容,开 发 者 可 以 参 考关 闭ASCENDC_DUMP说 明,按 需 关 闭 该 接 口 功 能。

打 印 示 例 如 下:

DumpTensor: desc=5, addr=0, data_type=float16, position=UB, dump_size=32
[19.000000, 4.000000, 38.000000, 50.000000, 39.000000, 67.000000, 84.000000, 98.000000, 21.000000, 36.000000, 18.000000, 46.000000, 10.000000, 92.000000, 26.000000, 38.000000, 39.000000, 9.000000, 82.000000, 37.000000, 35.000000, 65.000000, 97.000000, 59.000000, 89.000000, 63.000000, 70.000000, 57.000000, 35.000000, 3.000000, 16.000000,
42.000000]
DumpTensor: desc=5, addr=100, data_type=float16, position=UB, dump_size=32
[6.000000, 34.000000, 52.000000, 38.000000, 73.000000, 38.000000, 35.000000, 14.000000, 67.000000, 62.000000, 30.000000, 49.000000, 86.000000, 37.000000, 84.000000, 18.000000, 38.000000, 18.000000, 44.000000, 21.000000, 86.000000, 99.000000, 13.000000, 79.000000, 84.000000, 9.000000, 48.000000, 74.000000, 52.000000, 99.000000, 80.000000,
53.000000]
...
DumpTensor: desc=5, addr=0, data_type=float16, position=UB, dump_size=32
[35.000000, 41.000000, 41.000000, 22.000000, 84.000000, 49.000000, 60.000000, 0.000000, 90.000000, 14.000000, 67.000000, 80.000000, 16.000000, 46.000000, 16.000000, 83.000000, 6.000000, 70.000000, 97.000000, 28.000000, 97.000000, 62.000000, 80.000000, 22.000000, 53.000000, 37.000000, 23.000000, 58.000000, 65.000000, 28.000000, 4.000000,
29.000000]

函 数 原 型

  • 无Tensor shape的 打 印

    C++
    template <typename T>
    __aicore__ inline void DumpTensor(const LocalTensor<T> &tensor, uint32_t desc, uint32_t dumpSize)
    template <typename T>
    __aicore__ inline void DumpTensor(const GlobalTensor<T>& tensor, uint32_t desc, uint32_t dumpSize)
    
  • 带Tensor shape的 打 印

    C++
    template <typename T>
    __aicore__ inline void DumpTensor(const LocalTensor<T>& tensor, uint32_t desc, uint32_t dumpSize, const ShapeInfo& shapeInfo)
    template <typename T>
    __aicore__ inline void DumpTensor(const GlobalTensor<T>& tensor, uint32_t desc, uint32_t dumpSize, const ShapeInfo& shapeInfo)
    

参 数 说 明

表 1 模 板 参 数 说 明

参 数 名 称描 述
T需 要dump的Tensor的 数 据 类 型。

表 2 参 数 说 明

参 数 名 称输 入/输 出描 述
tensor输 入需 要dump的Tensor。
•待dump的tensor位 于Unified Buffer/L1 Buffer/L0 Buffer时 使 用LocalTensor类 型 的tensor参 数 输 入。
•待dump的tensor位 于Global Memory时 使 用GlobalTensor类 型 的tensor参 数 输 入。
desc输 入用 户 自 定 义 附 加 信 息(行 号 或 其 他 自 定 义 数 字)。
在 使 用DumpTensor功 能 时,用 户 可 通 过desc参 数 附 加 自 定 义 信 息,以 便 在 不 同 场 景 下 区 分Dump内 容 的 来 源。此 功 能 有 助 于 精 准 定 位 具 体DumpTensor的 输 出,提 升 调 试 与 分 析 效 率。
dumpSize输 入需 要dump的 元 素 个 数。
shapeInfo输 入传 入Tensor的shape信 息,可 按 照shape信 息 进 行 打 印。
•当Shape尺 寸 大 于dumpSize元 素 个 数 时,按 照ShapeInfo打 印 元 素,不 足 的Dump数 据 用"-"展 示。
•当Shape尺 寸 小 于 等 于dumpSize元 素 个 数 时,按 照ShapeInfo打 印 元 素,多 出 的Dump数 据 不 展 示。

数 据 类 型

Ascend 950PR/Ascend 950DT,T支 持 的 数 据 类 型 为:bool、int8_t、uint8_t、fp4x2_e2m1_t、fp4x2_e1m2_t、hifloat8_t、fp8_e8m0_t、fp8_e5m2_t、fp8_e4m3fn_t、int16_t、uint16_t、half、bfloat16_t、int32_t、uint32_t、float、int64_t、uint64_t。

Atlas A3 训 练 系 列 产 品/Atlas A3 推 理 系 列 产 品,T支 持 的 数 据 类 型 为:bool、int8_t、uint8_t、int16_t、uint16_t、half、bfloat16_t、int32_t、uint32_t、float、int64_t、uint64_t。

Atlas A2 训 练 系 列 产 品/Atlas A2 推 理 系 列 产 品,T支 持 的 数 据 类 型 为:bool、int8_t、uint8_t、int16_t、uint16_t、half、bfloat16_t、int32_t、uint32_t、float、int64_t、uint64_t。

Atlas 200I/500 A2 推 理 产 品,T支 持 的 数 据 类 型 为:bool、int8_t、uint8_t、int16_t、uint16_t、half、bfloat16_t、int32_t、uint32_t、float、int64_t、uint64_t。

Atlas 推 理 系 列 产 品 AI Core,T支 持 的 数 据 类 型 为:bool、int8_t、uint8_t、int16_t、uint16_t、half、bfloat16_t、int32_t、uint32_t、float、int64_t、uint64_t。

Atlas 训 练 系 列 产 品,T支 持 的 数 据 类 型 为:bool、int8_t、uint8_t、int16_t、uint16_t、half、bfloat16_t、int32_t、uint32_t、float、int64_t、uint64_t。

Kirin X90,T支 持 的 数 据 类 型 为:bool、int8_t、uint8_t、int16_t、uint16_t、half、bfloat16_t、int32_t、uint32_t、float、int64_t、uint64_t。

Kirin 9030,T支 持 的 数 据 类 型 为:bool、int8_t、uint8_t、int16_t、uint16_t、half、bfloat16_t、int32_t、uint32_t、float、int64_t、uint64_t。

约 束 说 明

  • 当 前 仅 支 持 打 印 存 储 位 置 为Unified Buffer/L1 Buffer/L0C Buffer/Global Memory的Tensor信 息。
针 对Ascend 950PR/Ascend 950DT,使 用 该 接 口 打 印L1 Tensor数 据 时,HDK版 本 需 要 至 少 升 级 到25.7.0以 上。
  • 操 作 数 地 址 对 齐 要 求 请 参 见通 用 地 址 对 齐 约 束
  • 单 次 调 用DumpTensor打 印 的 数 据 总 量 不 可 超 过30KB(还 包 括 少 量 框 架 需 要 的 头 尾 信 息,通 常 可 忽 略)。使 用 时 应 注 意,如 果 超 出 这 个 限 制,则 数 据 不 会 被 打 印。

调 用 示 例

  • 无Tensor shape的 打 印

    C++
    AscendC::DumpTensor(srcLocal, 5, dataLen);
    
  • 带Tensor shape的 打 印

    • Shape等 于dumpSize元 素 个 数

      C++
      uint32_t array[] = {static_cast<uint32_t>(8), static_cast<uint32_t>(8)};
      AscendC::ShapeInfo shapeInfo(2, array);       // dim为2, shape为(8,8)
      AscendC::DumpTensor(x, 2, 64, shapeInfo);     // dump x的64个 元 素,且 解 析 按 照shapeInfo的(8,8)排 列
      

      打 印 结 果 如 下:

      DumpTensor: desc=2, addr=0x0, data_type=float16, position=UB, dump_size=64
      [[2.048828,0.113037,4.042969,3.505859,4.554688,4.019531,0.598633,2.160156],
      [2.707031,0.117981,1.134766,4.835938,1.190430,3.085938,1.334961,0.406250],
      [2.658203,1.674805,3.791016,0.747070,3.541016,4.546875,0.394043,2.455078],
      [1.161133,2.775391,0.453857,2.857422,2.837891,1.052734,2.654297,1.828125],
      [0.358643,4.765625,3.681641,0.850098,2.250000,2.001953,0.446777,0.830078],
      [2.154297,4.781250,1.773438,0.201294,0.028412,3.285156,0.772949,3.261719],
      [0.532227,2.789062,0.588867,4.316406,0.146606,2.201172,3.775391,2.023438],
      [2.820312,2.835938,2.957031,2.398438,4.449219,0.516113,4.796875,0.786133]]
      
    • Shape小 于dumpSize元 素 个 数

      C++
      uint32_t array1[] = {static_cast<uint32_t>(7), static_cast<uint32_t>(8)};
      AscendC::ShapeInfo shapeInfo1(2, array1); // dim为2, shape为(7,8)
      AscendC::DumpTensor(x1, 3, 64, shapeInfo1); // 当Shape尺 寸 小 于 等 于dumpSize元 素 个 数 时, 按 照ShapeInfo打 印 元 素,多 出 的Dump数 据 不 展 示
      

      打 印 结 果 如 下:

      DumpTensor: desc=3, addr=0x0, data_type=float16, position=UB, dump_size=64
      shape is [7, 8], dumpSize is 64, dumpSize is greater than shapeSize.
      [[2.048828,0.113037,4.042969,3.505859,4.554688,4.019531,0.598633,2.160156],
      [2.707031,0.117981,1.134766,4.835938,1.190430,3.085938,1.334961,0.406250],
      [2.658203,1.674805,3.791016,0.747070,3.541016,4.546875,0.394043,2.455078],
      [1.161133,2.775391,0.453857,2.857422,2.837891,1.052734,2.654297,1.828125],
      [0.358643,4.765625,3.681641,0.850098,2.250000,2.001953,0.446777,0.830078],
      [2.154297,4.781250,1.773438,0.201294,0.028412,3.285156,0.772949,3.261719],
      [0.532227,2.789062,0.588867,4.316406,0.146606,2.201172,3.775391,2.023438]]
      
    • Shape大 于dumpSize元 素 个 数

      C++
      uint32_t array2[] = {static_cast<uint32_t>(9), static_cast<uint32_t>(8)};
      AscendC::ShapeInfo shapeInfo2(2, array2); // dim为2, shape为(9,8)
      AscendC::DumpTensor(x2, 4, 64, shapeInfo2); // 当Shape尺 寸 大 于dumpSize元 素 个 数 时, 按 照ShapeInfo打 印 元 素,不 足 的Dump数 据 用"-"展 示
      

      打 印 结 果 如 下:

      DumpTensor: desc=4, addr=0x0, data_type=float16, position=UB, dump_size=64
      shape is [9, 8], dumpSize is 64, data is not enough.
      [[2.048828,0.113037,4.042969,3.505859,4.554688,4.019531,0.598633,2.160156],
      [2.707031,0.117981,1.134766,4.835938,1.190430,3.085938,1.334961,0.406250],
      [2.658203,1.674805,3.791016,0.747070,3.541016,4.546875,0.394043,2.455078],
      [1.161133,2.775391,0.453857,2.857422,2.837891,1.052734,2.654297,1.828125],
      [0.358643,4.765625,3.681641,0.850098,2.250000,2.001953,0.446777,0.830078],
      [2.154297,4.781250,1.773438,0.201294,0.028412,3.285156,0.772949,3.261719],
      [0.532227,2.789062,0.588867,4.316406,0.146606,2.201172,3.775391,2.023438],
      [2.820312,2.835938,2.957031,2.398438,4.449219,0.516113,4.796875,0.786133],
      [-,-,-,-,-,-,-,-]]
      

免 责 声 明:本 站 内 容 由 asc-devkit 仓 master 分 支 自 动 编 译 生 成,属 于 持 续 开 发 版 本,可 能 存 在 缺 陷,仅 供 预 览 与 参 考。如 需 稳 定 及 商 用 资 料,请 查 阅 官 方 昇 腾 社 区