尝试复现第一弹:最最亲爱的视觉图像处理()

这里是学习PYNQ尝试复现项目的第一期,今天找了一个边缘检测的项目,原链接为:https://github.com/Sahanasannagoudar/Edge-detection-using-PYNQ-Z2-board#,这个项目在PYNQ上实现了sobel边缘检测算法,是一个软硬件协同全栈开发,从C++ HLS到底层FPGA硬件编译的全部流程,想必是很适合用来学习的(?)

我们需要实现在PYNQ板子上面运行sobel edge detection算法,同时应该充分利用FPGA的并行特性,进行快速的图像处理,尽量降低延迟和能耗。

复现第一步:使用 vitis HLS 创建IP核

创建IP核经典的三种方式,verilog,block,HLS,这里项目使用的是HLS完成IP核的创建。

但是由于在2024.2版本中AMD似乎弃用了vitis -mode hls 和相关的HLS Tcl模式,而是将HLS并入vitis IDE和Pyhton API里,所以目前似乎应该用v++进行HLS编译,并且用python替代tcl完成项目的构建。

对于原cpp代码的一些分析

在完成对sobel算法的C++程序编写后我们可以得到一个cpp文件sobel.cpp,内容如下(库中原文件):

1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
48
49
50
51
52
53
54
55
56
57
58
59
#include "ap_axi_sdata.h"
#include "ap_int.h"
#include "hls_stream.h"

typedef ap_axis<32, 2, 5, 6> pixel_t;

void sobel_filter(hls::stream<pixel_t> &input, hls::stream<pixel_t> &output,
int rows, int cols) {
#pragma HLS INTERFACE axis port = input
#pragma HLS INTERFACE axis port = output
#pragma HLS INTERFACE s_axilite port = rows
#pragma HLS INTERFACE s_axilite port = cols
#pragma HLS INTERFACE s_axilite port = return

int Gx[3][3] = {{-1, 0, 1}, {-2, 0, 2}, {-1, 0, 1}};
int Gy[3][3] = {{1, 2, 1}, {0, 0, 0}, {-1, -2, -1}};

pixel_t window[3][3];
for (int i = 0; i < rows; i++) {
for (int j = 0; j < cols; j++) {
#pragma HLS PIPELINE
pixel_t pixel_in = input.read();
// Update window
for (int k = 0; k < 2; k++) {
for (int l = 0; l < 3; l++) {
window[k][l] = window[k + 1][l];
}
}
window[2][0] = window[2][1];
window[2][1] = window[2][2];
window[2][2] = pixel_in;

// Apply Sobel filter
int Gx_val = 0;
int Gy_val = 0;
for (int k = 0; k < 3; k++) {
for (int l = 0; l < 3; l++) {
Gx_val += window[k][l].data * Gx[k][l];
Gy_val += window[k][l].data * Gy[k][l];
}
}

// Calculate magnitude using L1 norm (|Gx| + |Gy|)
int abs_Gx_val = Gx_val < 0 ? -Gx_val : Gx_val;
int abs_Gy_val = Gy_val < 0 ? -Gy_val : Gy_val;
int magnitude = abs_Gx_val + abs_Gy_val;
pixel_t pixel_out;
pixel_out.data = magnitude > 255 ? 255 : magnitude;
pixel_out.keep = pixel_in.keep;
pixel_out.strb = pixel_in.strb;
pixel_out.user = pixel_in.user;
pixel_out.last = pixel_in.last;
pixel_out.id = pixel_in.id;
pixel_out.dest = pixel_in.dest;
output.write(pixel_out);
}
}
}

一些分析如下:

ap_axi_sdata.h 包含了用于定义AXI Stream接口的数据结构。

ap_int.h 包含用于定点和任意位宽整数的ap_int类型。

hls_stream.h 包含了hls::stream类,包含硬件FIFO队列。

typedef ap_axis<23, 2, 5, 6> pixel_t 定义了一个AXI_Stream接口,包含32位数据和其他可选的信号,用于代表一个像素。

void sobel_filter(hls::stream<pixel_t> &input, hls::stream<pixel_t> &output, int rows, int cols) 提供函数主体。

#pragma HLS 是HLS工具的特有kw,告诉编译器如何将C++代码映射到硬件,比如:#pragma HLS INTERFACE axis port = input 意为将input参数隐射为AXI Stream接口,数据会以流的形式从外部传入;#pragma HLS INTERFACE s_axilite port = rows 意为将rows参数映射为AXI4-Lite接口,可以通过CPU或其他控制器写入信息到rows.

之后定义了sobel滤波器的两个核心矩阵,Gx用于检测垂直的边缘,Gy用于检测水平边缘。whindow[3][3] 是一个3x3的滑动窗口,存储当前像素和邻居像素,之后对这个窗口内的像素进行卷积计算。

#pragma HLS PIPELINE 将循环操作并行化,来实现每个时钟周期处理一个像素,提高吞吐量。

在循环的内部,对每一个像素执行以下操作:

通过input.read() 读入一个像素,然后更新窗口(整体上移以后把新读入的像素推入右下角);

进行卷积计算;

使用L1范数计算梯度赋值;

输出像素:把梯度赋值限制在0-255,创建pixel_out对象填好数据后用 output.write(pixel_out) 进行输出。

High Level Synthesis

可以利用vitis和vitis-hls的命令行工具完成开发,这样做的好处是:

  • vitis IDE本身除了作为editor 以外在这个项目中也就是点点文件点点选项,并非必要;
  • 使用tcl的方式的好处是可以写自动化脚本从而在需要修改源码或者相关设定时比较方便。

那我们进入正题,我使用的操作系统是Archlinux, 内核版本为:6.16.4-arch1-1,vivado/vitis/vitis-hls的版本为2024.2,(vivado standard)在这个版本中,vitis -mode hls已经被弃用因此此处使用其他方法来完成项目的构建。

我们需要先创建一个.tcl文件,用于自动化进行HLS的设计流程,在这一步中,我们将使用HLS工具将C++源代码sobel.cpp完成高层次语言到RTL硬件级描述的过程,我们最后会生成verilog或者VHDL.

run_hls.tcl文件内容如下:

1
2
3
4
5
6
7
8
9
10
11
12
13
14


open_project sobel_hls_ip
set_top sobel_filter
add_files hls_src/sobel.cpp
#add_files -tb sobel_filter_tb.cpp


open_solution "solution1"
set_part {xc7z010clg400-1}
create_clock -period 10
csynth_design


在这个文件里面,我们用vitis-hls打开一个项目,名为sobel_hls_ip,显式地指定顶层函数为 sobel_filter(它作为综合的入口点),添加上面写的sobel.cpp文件为源文件;此处我暂时没有写测试文件,因此这部分代码先注释掉。

我们创建了一个解决方案solution1(在每个项目里面可以有多个解决方案,每个解决方案可以有不同的设计和约束),指定目标开发板为xc7z010clg400-1,并为设计创建一个时钟,指定时钟为10ns.

按道理来说,这个周期应该需要慎重选择,首先需要保证板载时钟可以承受这个频率,比如zedboard为10ns;其次,需要满足性能的要求,比如如果要实现每秒处理x张图片,需要至少y的频率,那么可以选择比这个稍微高一点的频率。不过其实设置有问题也很容易修改,只需要修改频率的同时评估资源消耗的增加能否接受即可。

最后的csynth_design执行了HLS的指令,完成高层次综合。

我们现在运行如下命令:

1
2

vitis-run --mode hls --tcl run_hls.tcl

之后我们会在指定文件夹下(不过我没指定就会默认当前文件夹下)生成一个与工程同名的文件夹,在sobel_hls_ip/solution1/impl/verilog目就可以找到相关的文件了,我这里展示一个:
(很长,建议看一眼收起就行了,不重要)

1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
48
49
50
51
52
53
54
55
56
57
58
59
60
61
62
63
64
65
66
67
68
69
70
71
72
73
74
75
76
77
78
79
80
81
82
83
84
85
86
87
88
89
90
91
92
93
94
95
96
97
98
99
100
101
102
103
104
105
106
107
108
109
110
111
112
113
114
115
116
117
118
119
120
121
122
123
124
125
126
127
128
129
130
131
132
133
134
135
136
137
138
139
140
141
142
143
144
145
146
147
148
149
150
151
152
153
154
155
156
157
158
159
160
161
162
163
164
165
166
167
168
169
170
171
172
173
174
175
176
177
178
179
180
181
182
183
184
185
186
187
188
189
190
191
192
193
194
195
196
197
198
199
200
201
202
203
204
205
206
207
208
209
210
211
212
213
214
215
216
217
218
219
220
221
222
223
224
225
226
227
228
229
230
231
232
233
234
235
236
237
238
239
240
241
242
243
244
245
246
247
248
249
250
251
252
253
254
255
256
257
258
259
260
261
262
263
264
265
266
267
268
269
270
271
272
273
274
275
276
277
278
279
280
281
282
283
284
285
286
287
288
289
290
291
292
293
294
295
296
297
298
299
300
301
302
303
304
305
306
307
308
309
310
311
312
313
314
315
316
317
318
319
320
321
322
323
324
325
326
327
328
329
330
331
332
333
334
335
336
337
338
339
340
341
342
343
344
345
346
347
348
349
350
351
352
353
354
355
356
357
358
359
360
361
362
363
364
365
366
367
368
369
370
371
372
373
374
375
376
377
378
379
380
381
382
383
384
385
386
387
388
389
390
391
392
393
394
395
396
397
398
399
400
401
402
403
404
405
406
407
408
409
410
411
412
413
414
415
416
417
418
419
420
421
422
423
424
425
426
427
428
429
430
431
432
433
434
435
436
437
438
439
440
441
442
443
444
445
446
447
448
449
450
451
452
453
454
455
456
457
458
459
460
461
462
463
464
465
466
467
468
469
470
471
472
473
474
475
476
477
478
479
480
481
482
483
484
485
486
487
488
489
490
491
492
493
494
495
496
497
498
499
500
501
502
503
504
505
506
507
508
509
510
511
512
513
514
515
516
517
518
519
520
521
522
523
524
525
526
527
528
529
530
531
532
533
534
535
536
537
538
539
540
541
542
543
544
545
546
547
548
549
550
551
552
553
554
555
556
557
558
559
560
561
562
563
564
565
566
567
568
569
570
571
572
573
574
575
576
577
578
579
580
581
582
583
584
585
586
587
588
589
590
591
592
593
594
595
596
597
598
599
600
601
602
603
604
605
606
607
608
609
610
611
612
613
614
615
616
617
618
619
620
621
622
623
624
625
626
627
628
629
630
631
632
633
634
635
636
637
638
639
640
641
642
643
644
645
646
647
648
649
650
651
652
653
654
655
656
657
658
659
660
661
662
663
664
665
666
667
668
669
670
671
672
673
674
675
676
677
678
679
680
681
682
683
684
685
686
687
688
689
690
691
692
693
694
695
696
697
698
699
700
701
702
703
704
705
706
707
708
709
710
711
712
713
714
715
716
717
718
719
720
721
722
723
724
725
726
727
728
729
730
731
732
733
734
735
736
737
738
739
740
741
742
743
744
745
746
747
748
749
750
751
752

// ==============================================================
// Generated by Vitis HLS v2024.2
// Copyright 1986-2022 Xilinx, Inc. All Rights Reserved.
// Copyright 2022-2024 Advanced Micro Devices, Inc. All Rights Reserved.
// ==============================================================

`timescale 1 ns / 1 ps

(* CORE_GENERATION_INFO="sobel_filter_sobel_filter,hls_ip_2024_2,{HLS_INPUT_TYPE=cxx,HLS_INPUT_FLOAT=0,HLS_INPUT_FIXED=0,HLS_INPUT_PART=xc7z010-clg400-1,HLS_INPUT_CLOCK=10.000000,HLS_INPUT_ARCH=others,HLS_SYN_CLOCK=6.923000,HLS_SYN_LAT=-1,HLS_SYN_TPT=none,HLS_SYN_MEM=0,HLS_SYN_DSP=0,HLS_SYN_FF=1096,HLS_SYN_LUT=1589,HLS_VERSION=2024_2}" *)

module sobel_filter (
ap_clk,
ap_rst_n,
input_r_TDATA,
input_r_TVALID,
input_r_TREADY,
input_r_TKEEP,
input_r_TSTRB,
input_r_TUSER,
input_r_TLAST,
input_r_TID,
input_r_TDEST,
output_r_TDATA,
output_r_TVALID,
output_r_TREADY,
output_r_TKEEP,
output_r_TSTRB,
output_r_TUSER,
output_r_TLAST,
output_r_TID,
output_r_TDEST,
s_axi_control_AWVALID,
s_axi_control_AWREADY,
s_axi_control_AWADDR,
s_axi_control_WVALID,
s_axi_control_WREADY,
s_axi_control_WDATA,
s_axi_control_WSTRB,
s_axi_control_ARVALID,
s_axi_control_ARREADY,
s_axi_control_ARADDR,
s_axi_control_RVALID,
s_axi_control_RREADY,
s_axi_control_RDATA,
s_axi_control_RRESP,
s_axi_control_BVALID,
s_axi_control_BREADY,
s_axi_control_BRESP,
interrupt
);

parameter ap_ST_fsm_state1 = 6'd1;
parameter ap_ST_fsm_state2 = 6'd2;
parameter ap_ST_fsm_state3 = 6'd4;
parameter ap_ST_fsm_state4 = 6'd8;
parameter ap_ST_fsm_state5 = 6'd16;
parameter ap_ST_fsm_state6 = 6'd32;
parameter C_S_AXI_CONTROL_DATA_WIDTH = 32;
parameter C_S_AXI_CONTROL_ADDR_WIDTH = 5;
parameter C_S_AXI_DATA_WIDTH = 32;

parameter C_S_AXI_CONTROL_WSTRB_WIDTH = (32 / 8);
parameter C_S_AXI_WSTRB_WIDTH = (32 / 8);

input ap_clk;
input ap_rst_n;
input [31:0] input_r_TDATA;
input input_r_TVALID;
output input_r_TREADY;
input [3:0] input_r_TKEEP;
input [3:0] input_r_TSTRB;
input [1:0] input_r_TUSER;
input [0:0] input_r_TLAST;
input [4:0] input_r_TID;
input [5:0] input_r_TDEST;
output [31:0] output_r_TDATA;
output output_r_TVALID;
input output_r_TREADY;
output [3:0] output_r_TKEEP;
output [3:0] output_r_TSTRB;
output [1:0] output_r_TUSER;
output [0:0] output_r_TLAST;
output [4:0] output_r_TID;
output [5:0] output_r_TDEST;
input s_axi_control_AWVALID;
output s_axi_control_AWREADY;
input [C_S_AXI_CONTROL_ADDR_WIDTH - 1:0] s_axi_control_AWADDR;
input s_axi_control_WVALID;
output s_axi_control_WREADY;
input [C_S_AXI_CONTROL_DATA_WIDTH - 1:0] s_axi_control_WDATA;
input [C_S_AXI_CONTROL_WSTRB_WIDTH - 1:0] s_axi_control_WSTRB;
input s_axi_control_ARVALID;
output s_axi_control_ARREADY;
input [C_S_AXI_CONTROL_ADDR_WIDTH - 1:0] s_axi_control_ARADDR;
output s_axi_control_RVALID;
input s_axi_control_RREADY;
output [C_S_AXI_CONTROL_DATA_WIDTH - 1:0] s_axi_control_RDATA;
output [1:0] s_axi_control_RRESP;
output s_axi_control_BVALID;
input s_axi_control_BREADY;
output [1:0] s_axi_control_BRESP;
output interrupt;

reg ap_rst_n_inv;
wire ap_start;
reg ap_done;
reg ap_idle;
(* fsm_encoding = "none" *) reg [5:0] ap_CS_fsm;
wire ap_CS_fsm_state1;
reg ap_ready;
wire [31:0] rows;
wire [31:0] cols;
reg [31:0] cols_read_reg_168;
wire [30:0] smax_fu_144_p3;
reg [30:0] smax_reg_173;
wire [30:0] smax3_fu_152_p3;
reg [30:0] smax3_reg_178;
wire ap_CS_fsm_state2;
wire [61:0] grp_fu_120_p2;
reg [61:0] mul_ln7_reg_193;
wire ap_CS_fsm_state3;
wire grp_sobel_filter_Pipeline_VITIS_LOOP_19_1_VITIS_LOOP_20_2_fu_86_ap_start;
wire grp_sobel_filter_Pipeline_VITIS_LOOP_19_1_VITIS_LOOP_20_2_fu_86_ap_done;
wire grp_sobel_filter_Pipeline_VITIS_LOOP_19_1_VITIS_LOOP_20_2_fu_86_ap_idle;
wire grp_sobel_filter_Pipeline_VITIS_LOOP_19_1_VITIS_LOOP_20_2_fu_86_ap_ready;
wire grp_sobel_filter_Pipeline_VITIS_LOOP_19_1_VITIS_LOOP_20_2_fu_86_output_r_TREADY;
wire grp_sobel_filter_Pipeline_VITIS_LOOP_19_1_VITIS_LOOP_20_2_fu_86_input_r_TREADY;
wire [31:0] grp_sobel_filter_Pipeline_VITIS_LOOP_19_1_VITIS_LOOP_20_2_fu_86_output_r_TDATA;
wire grp_sobel_filter_Pipeline_VITIS_LOOP_19_1_VITIS_LOOP_20_2_fu_86_output_r_TVALID;
wire [3:0] grp_sobel_filter_Pipeline_VITIS_LOOP_19_1_VITIS_LOOP_20_2_fu_86_output_r_TKEEP;
wire [3:0] grp_sobel_filter_Pipeline_VITIS_LOOP_19_1_VITIS_LOOP_20_2_fu_86_output_r_TSTRB;
wire [1:0] grp_sobel_filter_Pipeline_VITIS_LOOP_19_1_VITIS_LOOP_20_2_fu_86_output_r_TUSER;
wire [0:0] grp_sobel_filter_Pipeline_VITIS_LOOP_19_1_VITIS_LOOP_20_2_fu_86_output_r_TLAST;
wire [4:0] grp_sobel_filter_Pipeline_VITIS_LOOP_19_1_VITIS_LOOP_20_2_fu_86_output_r_TID;
wire [5:0] grp_sobel_filter_Pipeline_VITIS_LOOP_19_1_VITIS_LOOP_20_2_fu_86_output_r_TDEST;
reg grp_sobel_filter_Pipeline_VITIS_LOOP_19_1_VITIS_LOOP_20_2_fu_86_ap_start_reg;
reg [5:0] ap_NS_fsm;
wire ap_NS_fsm_state4;
wire ap_CS_fsm_state5;
reg [31:0] output_r_TDATA_reg;
reg [3:0] output_r_TKEEP_reg;
reg [3:0] output_r_TSTRB_reg;
reg [1:0] output_r_TUSER_reg;
reg [0:0] output_r_TLAST_reg;
reg [4:0] output_r_TID_reg;
reg [5:0] output_r_TDEST_reg;
wire [30:0] grp_fu_120_p0;
wire [30:0] grp_fu_120_p1;
wire [0:0] empty_fu_138_p2;
wire [30:0] trunc_ln7_1_fu_128_p1;
wire [0:0] cmp2281_fu_132_p2;
wire [30:0] trunc_ln7_fu_124_p1;
wire ap_CS_fsm_state6;
wire regslice_both_output_r_V_data_V_U_apdone_blk;
reg ap_ST_fsm_state1_blk;
wire ap_ST_fsm_state2_blk;
wire ap_ST_fsm_state3_blk;
wire ap_ST_fsm_state4_blk;
reg ap_ST_fsm_state5_blk;
reg ap_ST_fsm_state6_blk;
wire regslice_both_input_r_V_data_V_U_apdone_blk;
wire [31:0] input_r_TDATA_int_regslice;
wire input_r_TVALID_int_regslice;
reg input_r_TREADY_int_regslice;
wire regslice_both_input_r_V_data_V_U_ack_in;
wire regslice_both_input_r_V_keep_V_U_apdone_blk;
wire [3:0] input_r_TKEEP_int_regslice;
wire regslice_both_input_r_V_keep_V_U_vld_out;
wire regslice_both_input_r_V_keep_V_U_ack_in;
wire regslice_both_input_r_V_strb_V_U_apdone_blk;
wire [3:0] input_r_TSTRB_int_regslice;
wire regslice_both_input_r_V_strb_V_U_vld_out;
wire regslice_both_input_r_V_strb_V_U_ack_in;
wire regslice_both_input_r_V_user_V_U_apdone_blk;
wire [1:0] input_r_TUSER_int_regslice;
wire regslice_both_input_r_V_user_V_U_vld_out;
wire regslice_both_input_r_V_user_V_U_ack_in;
wire regslice_both_input_r_V_last_V_U_apdone_blk;
wire [0:0] input_r_TLAST_int_regslice;
wire regslice_both_input_r_V_last_V_U_vld_out;
wire regslice_both_input_r_V_last_V_U_ack_in;
wire regslice_both_input_r_V_id_V_U_apdone_blk;
wire [4:0] input_r_TID_int_regslice;
wire regslice_both_input_r_V_id_V_U_vld_out;
wire regslice_both_input_r_V_id_V_U_ack_in;
wire regslice_both_input_r_V_dest_V_U_apdone_blk;
wire [5:0] input_r_TDEST_int_regslice;
wire regslice_both_input_r_V_dest_V_U_vld_out;
wire regslice_both_input_r_V_dest_V_U_ack_in;
reg [31:0] output_r_TDATA_int_regslice;
wire output_r_TVALID_int_regslice;
wire output_r_TREADY_int_regslice;
wire regslice_both_output_r_V_data_V_U_vld_out;
wire regslice_both_output_r_V_keep_V_U_apdone_blk;
reg [3:0] output_r_TKEEP_int_regslice;
wire regslice_both_output_r_V_keep_V_U_ack_in_dummy;
wire regslice_both_output_r_V_keep_V_U_vld_out;
wire regslice_both_output_r_V_strb_V_U_apdone_blk;
reg [3:0] output_r_TSTRB_int_regslice;
wire regslice_both_output_r_V_strb_V_U_ack_in_dummy;
wire regslice_both_output_r_V_strb_V_U_vld_out;
wire regslice_both_output_r_V_user_V_U_apdone_blk;
reg [1:0] output_r_TUSER_int_regslice;
wire regslice_both_output_r_V_user_V_U_ack_in_dummy;
wire regslice_both_output_r_V_user_V_U_vld_out;
wire regslice_both_output_r_V_last_V_U_apdone_blk;
reg [0:0] output_r_TLAST_int_regslice;
wire regslice_both_output_r_V_last_V_U_ack_in_dummy;
wire regslice_both_output_r_V_last_V_U_vld_out;
wire regslice_both_output_r_V_id_V_U_apdone_blk;
reg [4:0] output_r_TID_int_regslice;
wire regslice_both_output_r_V_id_V_U_ack_in_dummy;
wire regslice_both_output_r_V_id_V_U_vld_out;
wire regslice_both_output_r_V_dest_V_U_apdone_blk;
reg [5:0] output_r_TDEST_int_regslice;
wire regslice_both_output_r_V_dest_V_U_ack_in_dummy;
wire regslice_both_output_r_V_dest_V_U_vld_out;
wire [61:0] grp_fu_120_p00;
wire [61:0] grp_fu_120_p10;
wire ap_ce_reg;

// power-on initialization
initial begin
#0 ap_CS_fsm = 6'd1;
#0 grp_sobel_filter_Pipeline_VITIS_LOOP_19_1_VITIS_LOOP_20_2_fu_86_ap_start_reg = 1'b0;
end

sobel_filter_sobel_filter_Pipeline_VITIS_LOOP_19_1_VITIS_LOOP_20_2 grp_sobel_filter_Pipeline_VITIS_LOOP_19_1_VITIS_LOOP_20_2_fu_86(
.ap_clk(ap_clk),
.ap_rst(ap_rst_n_inv),
.ap_start(grp_sobel_filter_Pipeline_VITIS_LOOP_19_1_VITIS_LOOP_20_2_fu_86_ap_start),
.ap_done(grp_sobel_filter_Pipeline_VITIS_LOOP_19_1_VITIS_LOOP_20_2_fu_86_ap_done),
.ap_idle(grp_sobel_filter_Pipeline_VITIS_LOOP_19_1_VITIS_LOOP_20_2_fu_86_ap_idle),
.ap_ready(grp_sobel_filter_Pipeline_VITIS_LOOP_19_1_VITIS_LOOP_20_2_fu_86_ap_ready),
.input_r_TVALID(input_r_TVALID_int_regslice),
.output_r_TREADY(grp_sobel_filter_Pipeline_VITIS_LOOP_19_1_VITIS_LOOP_20_2_fu_86_output_r_TREADY),
.mul_ln7(mul_ln7_reg_193),
.cols(cols_read_reg_168),
.input_r_TDATA(input_r_TDATA_int_regslice),
.input_r_TREADY(grp_sobel_filter_Pipeline_VITIS_LOOP_19_1_VITIS_LOOP_20_2_fu_86_input_r_TREADY),
.input_r_TKEEP(input_r_TKEEP_int_regslice),
.input_r_TSTRB(input_r_TSTRB_int_regslice),
.input_r_TUSER(input_r_TUSER_int_regslice),
.input_r_TLAST(input_r_TLAST_int_regslice),
.input_r_TID(input_r_TID_int_regslice),
.input_r_TDEST(input_r_TDEST_int_regslice),
.output_r_TDATA(grp_sobel_filter_Pipeline_VITIS_LOOP_19_1_VITIS_LOOP_20_2_fu_86_output_r_TDATA),
.output_r_TVALID(grp_sobel_filter_Pipeline_VITIS_LOOP_19_1_VITIS_LOOP_20_2_fu_86_output_r_TVALID),
.output_r_TKEEP(grp_sobel_filter_Pipeline_VITIS_LOOP_19_1_VITIS_LOOP_20_2_fu_86_output_r_TKEEP),
.output_r_TSTRB(grp_sobel_filter_Pipeline_VITIS_LOOP_19_1_VITIS_LOOP_20_2_fu_86_output_r_TSTRB),
.output_r_TUSER(grp_sobel_filter_Pipeline_VITIS_LOOP_19_1_VITIS_LOOP_20_2_fu_86_output_r_TUSER),
.output_r_TLAST(grp_sobel_filter_Pipeline_VITIS_LOOP_19_1_VITIS_LOOP_20_2_fu_86_output_r_TLAST),
.output_r_TID(grp_sobel_filter_Pipeline_VITIS_LOOP_19_1_VITIS_LOOP_20_2_fu_86_output_r_TID),
.output_r_TDEST(grp_sobel_filter_Pipeline_VITIS_LOOP_19_1_VITIS_LOOP_20_2_fu_86_output_r_TDEST)
);

sobel_filter_control_s_axi #(
.C_S_AXI_ADDR_WIDTH( C_S_AXI_CONTROL_ADDR_WIDTH ),
.C_S_AXI_DATA_WIDTH( C_S_AXI_CONTROL_DATA_WIDTH ))
control_s_axi_U(
.AWVALID(s_axi_control_AWVALID),
.AWREADY(s_axi_control_AWREADY),
.AWADDR(s_axi_control_AWADDR),
.WVALID(s_axi_control_WVALID),
.WREADY(s_axi_control_WREADY),
.WDATA(s_axi_control_WDATA),
.WSTRB(s_axi_control_WSTRB),
.ARVALID(s_axi_control_ARVALID),
.ARREADY(s_axi_control_ARREADY),
.ARADDR(s_axi_control_ARADDR),
.RVALID(s_axi_control_RVALID),
.RREADY(s_axi_control_RREADY),
.RDATA(s_axi_control_RDATA),
.RRESP(s_axi_control_RRESP),
.BVALID(s_axi_control_BVALID),
.BREADY(s_axi_control_BREADY),
.BRESP(s_axi_control_BRESP),
.ACLK(ap_clk),
.ARESET(ap_rst_n_inv),
.ACLK_EN(1'b1),
.rows(rows),
.cols(cols),
.ap_start(ap_start),
.interrupt(interrupt),
.ap_ready(ap_ready),
.ap_done(ap_done),
.ap_idle(ap_idle)
);

sobel_filter_mul_31ns_31ns_62_2_1 #(
.ID( 1 ),
.NUM_STAGE( 2 ),
.din0_WIDTH( 31 ),
.din1_WIDTH( 31 ),
.dout_WIDTH( 62 ))
mul_31ns_31ns_62_2_1_U17(
.clk(ap_clk),
.reset(ap_rst_n_inv),
.din0(grp_fu_120_p0),
.din1(grp_fu_120_p1),
.ce(1'b1),
.dout(grp_fu_120_p2)
);

sobel_filter_regslice_both #(
.DataWidth( 32 ))
regslice_both_input_r_V_data_V_U(
.ap_clk(ap_clk),
.ap_rst(ap_rst_n_inv),
.data_in(input_r_TDATA),
.vld_in(input_r_TVALID),
.ack_in(regslice_both_input_r_V_data_V_U_ack_in),
.data_out(input_r_TDATA_int_regslice),
.vld_out(input_r_TVALID_int_regslice),
.ack_out(input_r_TREADY_int_regslice),
.apdone_blk(regslice_both_input_r_V_data_V_U_apdone_blk)
);

sobel_filter_regslice_both #(
.DataWidth( 4 ))
regslice_both_input_r_V_keep_V_U(
.ap_clk(ap_clk),
.ap_rst(ap_rst_n_inv),
.data_in(input_r_TKEEP),
.vld_in(input_r_TVALID),
.ack_in(regslice_both_input_r_V_keep_V_U_ack_in),
.data_out(input_r_TKEEP_int_regslice),
.vld_out(regslice_both_input_r_V_keep_V_U_vld_out),
.ack_out(input_r_TREADY_int_regslice),
.apdone_blk(regslice_both_input_r_V_keep_V_U_apdone_blk)
);

sobel_filter_regslice_both #(
.DataWidth( 4 ))
regslice_both_input_r_V_strb_V_U(
.ap_clk(ap_clk),
.ap_rst(ap_rst_n_inv),
.data_in(input_r_TSTRB),
.vld_in(input_r_TVALID),
.ack_in(regslice_both_input_r_V_strb_V_U_ack_in),
.data_out(input_r_TSTRB_int_regslice),
.vld_out(regslice_both_input_r_V_strb_V_U_vld_out),
.ack_out(input_r_TREADY_int_regslice),
.apdone_blk(regslice_both_input_r_V_strb_V_U_apdone_blk)
);

sobel_filter_regslice_both #(
.DataWidth( 2 ))
regslice_both_input_r_V_user_V_U(
.ap_clk(ap_clk),
.ap_rst(ap_rst_n_inv),
.data_in(input_r_TUSER),
.vld_in(input_r_TVALID),
.ack_in(regslice_both_input_r_V_user_V_U_ack_in),
.data_out(input_r_TUSER_int_regslice),
.vld_out(regslice_both_input_r_V_user_V_U_vld_out),
.ack_out(input_r_TREADY_int_regslice),
.apdone_blk(regslice_both_input_r_V_user_V_U_apdone_blk)
);

sobel_filter_regslice_both #(
.DataWidth( 1 ))
regslice_both_input_r_V_last_V_U(
.ap_clk(ap_clk),
.ap_rst(ap_rst_n_inv),
.data_in(input_r_TLAST),
.vld_in(input_r_TVALID),
.ack_in(regslice_both_input_r_V_last_V_U_ack_in),
.data_out(input_r_TLAST_int_regslice),
.vld_out(regslice_both_input_r_V_last_V_U_vld_out),
.ack_out(input_r_TREADY_int_regslice),
.apdone_blk(regslice_both_input_r_V_last_V_U_apdone_blk)
);

sobel_filter_regslice_both #(
.DataWidth( 5 ))
regslice_both_input_r_V_id_V_U(
.ap_clk(ap_clk),
.ap_rst(ap_rst_n_inv),
.data_in(input_r_TID),
.vld_in(input_r_TVALID),
.ack_in(regslice_both_input_r_V_id_V_U_ack_in),
.data_out(input_r_TID_int_regslice),
.vld_out(regslice_both_input_r_V_id_V_U_vld_out),
.ack_out(input_r_TREADY_int_regslice),
.apdone_blk(regslice_both_input_r_V_id_V_U_apdone_blk)
);

sobel_filter_regslice_both #(
.DataWidth( 6 ))
regslice_both_input_r_V_dest_V_U(
.ap_clk(ap_clk),
.ap_rst(ap_rst_n_inv),
.data_in(input_r_TDEST),
.vld_in(input_r_TVALID),
.ack_in(regslice_both_input_r_V_dest_V_U_ack_in),
.data_out(input_r_TDEST_int_regslice),
.vld_out(regslice_both_input_r_V_dest_V_U_vld_out),
.ack_out(input_r_TREADY_int_regslice),
.apdone_blk(regslice_both_input_r_V_dest_V_U_apdone_blk)
);

sobel_filter_regslice_both #(
.DataWidth( 32 ))
regslice_both_output_r_V_data_V_U(
.ap_clk(ap_clk),
.ap_rst(ap_rst_n_inv),
.data_in(output_r_TDATA_int_regslice),
.vld_in(grp_sobel_filter_Pipeline_VITIS_LOOP_19_1_VITIS_LOOP_20_2_fu_86_output_r_TVALID),
.ack_in(output_r_TREADY_int_regslice),
.data_out(output_r_TDATA),
.vld_out(regslice_both_output_r_V_data_V_U_vld_out),
.ack_out(output_r_TREADY),
.apdone_blk(regslice_both_output_r_V_data_V_U_apdone_blk)
);

sobel_filter_regslice_both #(
.DataWidth( 4 ))
regslice_both_output_r_V_keep_V_U(
.ap_clk(ap_clk),
.ap_rst(ap_rst_n_inv),
.data_in(output_r_TKEEP_int_regslice),
.vld_in(grp_sobel_filter_Pipeline_VITIS_LOOP_19_1_VITIS_LOOP_20_2_fu_86_output_r_TVALID),
.ack_in(regslice_both_output_r_V_keep_V_U_ack_in_dummy),
.data_out(output_r_TKEEP),
.vld_out(regslice_both_output_r_V_keep_V_U_vld_out),
.ack_out(output_r_TREADY),
.apdone_blk(regslice_both_output_r_V_keep_V_U_apdone_blk)
);

sobel_filter_regslice_both #(
.DataWidth( 4 ))
regslice_both_output_r_V_strb_V_U(
.ap_clk(ap_clk),
.ap_rst(ap_rst_n_inv),
.data_in(output_r_TSTRB_int_regslice),
.vld_in(grp_sobel_filter_Pipeline_VITIS_LOOP_19_1_VITIS_LOOP_20_2_fu_86_output_r_TVALID),
.ack_in(regslice_both_output_r_V_strb_V_U_ack_in_dummy),
.data_out(output_r_TSTRB),
.vld_out(regslice_both_output_r_V_strb_V_U_vld_out),
.ack_out(output_r_TREADY),
.apdone_blk(regslice_both_output_r_V_strb_V_U_apdone_blk)
);

sobel_filter_regslice_both #(
.DataWidth( 2 ))
regslice_both_output_r_V_user_V_U(
.ap_clk(ap_clk),
.ap_rst(ap_rst_n_inv),
.data_in(output_r_TUSER_int_regslice),
.vld_in(grp_sobel_filter_Pipeline_VITIS_LOOP_19_1_VITIS_LOOP_20_2_fu_86_output_r_TVALID),
.ack_in(regslice_both_output_r_V_user_V_U_ack_in_dummy),
.data_out(output_r_TUSER),
.vld_out(regslice_both_output_r_V_user_V_U_vld_out),
.ack_out(output_r_TREADY),
.apdone_blk(regslice_both_output_r_V_user_V_U_apdone_blk)
);

sobel_filter_regslice_both #(
.DataWidth( 1 ))
regslice_both_output_r_V_last_V_U(
.ap_clk(ap_clk),
.ap_rst(ap_rst_n_inv),
.data_in(output_r_TLAST_int_regslice),
.vld_in(grp_sobel_filter_Pipeline_VITIS_LOOP_19_1_VITIS_LOOP_20_2_fu_86_output_r_TVALID),
.ack_in(regslice_both_output_r_V_last_V_U_ack_in_dummy),
.data_out(output_r_TLAST),
.vld_out(regslice_both_output_r_V_last_V_U_vld_out),
.ack_out(output_r_TREADY),
.apdone_blk(regslice_both_output_r_V_last_V_U_apdone_blk)
);

sobel_filter_regslice_both #(
.DataWidth( 5 ))
regslice_both_output_r_V_id_V_U(
.ap_clk(ap_clk),
.ap_rst(ap_rst_n_inv),
.data_in(output_r_TID_int_regslice),
.vld_in(grp_sobel_filter_Pipeline_VITIS_LOOP_19_1_VITIS_LOOP_20_2_fu_86_output_r_TVALID),
.ack_in(regslice_both_output_r_V_id_V_U_ack_in_dummy),
.data_out(output_r_TID),
.vld_out(regslice_both_output_r_V_id_V_U_vld_out),
.ack_out(output_r_TREADY),
.apdone_blk(regslice_both_output_r_V_id_V_U_apdone_blk)
);

sobel_filter_regslice_both #(
.DataWidth( 6 ))
regslice_both_output_r_V_dest_V_U(
.ap_clk(ap_clk),
.ap_rst(ap_rst_n_inv),
.data_in(output_r_TDEST_int_regslice),
.vld_in(grp_sobel_filter_Pipeline_VITIS_LOOP_19_1_VITIS_LOOP_20_2_fu_86_output_r_TVALID),
.ack_in(regslice_both_output_r_V_dest_V_U_ack_in_dummy),
.data_out(output_r_TDEST),
.vld_out(regslice_both_output_r_V_dest_V_U_vld_out),
.ack_out(output_r_TREADY),
.apdone_blk(regslice_both_output_r_V_dest_V_U_apdone_blk)
);

always @ (posedge ap_clk) begin
if (ap_rst_n_inv == 1'b1) begin
ap_CS_fsm <= ap_ST_fsm_state1;
end else begin
ap_CS_fsm <= ap_NS_fsm;
end
end

always @ (posedge ap_clk) begin
if (ap_rst_n_inv == 1'b1) begin
grp_sobel_filter_Pipeline_VITIS_LOOP_19_1_VITIS_LOOP_20_2_fu_86_ap_start_reg <= 1'b0;
end else begin
if (((1'b1 == ap_CS_fsm_state3) & (1'b1 == ap_NS_fsm_state4))) begin
grp_sobel_filter_Pipeline_VITIS_LOOP_19_1_VITIS_LOOP_20_2_fu_86_ap_start_reg <= 1'b1;
end else if ((grp_sobel_filter_Pipeline_VITIS_LOOP_19_1_VITIS_LOOP_20_2_fu_86_ap_ready == 1'b1)) begin
grp_sobel_filter_Pipeline_VITIS_LOOP_19_1_VITIS_LOOP_20_2_fu_86_ap_start_reg <= 1'b0;
end
end
end

always @ (posedge ap_clk) begin
if ((1'b1 == ap_CS_fsm_state1)) begin
cols_read_reg_168 <= cols;
smax3_reg_178 <= smax3_fu_152_p3;
smax_reg_173 <= smax_fu_144_p3;
end
end

always @ (posedge ap_clk) begin
if ((1'b1 == ap_CS_fsm_state3)) begin
mul_ln7_reg_193 <= grp_fu_120_p2;
end
end

always @ (posedge ap_clk) begin
if (((grp_sobel_filter_Pipeline_VITIS_LOOP_19_1_VITIS_LOOP_20_2_fu_86_output_r_TVALID == 1'b1) & (1'b1 == ap_CS_fsm_state5))) begin
output_r_TDATA_reg <= grp_sobel_filter_Pipeline_VITIS_LOOP_19_1_VITIS_LOOP_20_2_fu_86_output_r_TDATA;
output_r_TDEST_reg <= grp_sobel_filter_Pipeline_VITIS_LOOP_19_1_VITIS_LOOP_20_2_fu_86_output_r_TDEST;
output_r_TID_reg <= grp_sobel_filter_Pipeline_VITIS_LOOP_19_1_VITIS_LOOP_20_2_fu_86_output_r_TID;
output_r_TKEEP_reg <= grp_sobel_filter_Pipeline_VITIS_LOOP_19_1_VITIS_LOOP_20_2_fu_86_output_r_TKEEP;
output_r_TLAST_reg <= grp_sobel_filter_Pipeline_VITIS_LOOP_19_1_VITIS_LOOP_20_2_fu_86_output_r_TLAST;
output_r_TSTRB_reg <= grp_sobel_filter_Pipeline_VITIS_LOOP_19_1_VITIS_LOOP_20_2_fu_86_output_r_TSTRB;
output_r_TUSER_reg <= grp_sobel_filter_Pipeline_VITIS_LOOP_19_1_VITIS_LOOP_20_2_fu_86_output_r_TUSER;
end
end

always @ (*) begin
if ((ap_start == 1'b0)) begin
ap_ST_fsm_state1_blk = 1'b1;
end else begin
ap_ST_fsm_state1_blk = 1'b0;
end
end

assign ap_ST_fsm_state2_blk = 1'b0;

assign ap_ST_fsm_state3_blk = 1'b0;

assign ap_ST_fsm_state4_blk = 1'b0;

always @ (*) begin
if ((grp_sobel_filter_Pipeline_VITIS_LOOP_19_1_VITIS_LOOP_20_2_fu_86_ap_done == 1'b0)) begin
ap_ST_fsm_state5_blk = 1'b1;
end else begin
ap_ST_fsm_state5_blk = 1'b0;
end
end

always @ (*) begin
if ((regslice_both_output_r_V_data_V_U_apdone_blk == 1'b1)) begin
ap_ST_fsm_state6_blk = 1'b1;
end else begin
ap_ST_fsm_state6_blk = 1'b0;
end
end

always @ (*) begin
if (((1'b1 == ap_CS_fsm_state6) & (regslice_both_output_r_V_data_V_U_apdone_blk == 1'b0))) begin
ap_done = 1'b1;
end else begin
ap_done = 1'b0;
end
end

always @ (*) begin
if (((1'b1 == ap_CS_fsm_state1) & (ap_start == 1'b0))) begin
ap_idle = 1'b1;
end else begin
ap_idle = 1'b0;
end
end

always @ (*) begin
if (((1'b1 == ap_CS_fsm_state6) & (regslice_both_output_r_V_data_V_U_apdone_blk == 1'b0))) begin
ap_ready = 1'b1;
end else begin
ap_ready = 1'b0;
end
end

always @ (*) begin
if ((1'b1 == ap_CS_fsm_state5)) begin
input_r_TREADY_int_regslice = grp_sobel_filter_Pipeline_VITIS_LOOP_19_1_VITIS_LOOP_20_2_fu_86_input_r_TREADY;
end else begin
input_r_TREADY_int_regslice = 1'b0;
end
end

always @ (*) begin
if (((grp_sobel_filter_Pipeline_VITIS_LOOP_19_1_VITIS_LOOP_20_2_fu_86_output_r_TVALID == 1'b1) & (1'b1 == ap_CS_fsm_state5))) begin
output_r_TDATA_int_regslice = grp_sobel_filter_Pipeline_VITIS_LOOP_19_1_VITIS_LOOP_20_2_fu_86_output_r_TDATA;
end else begin
output_r_TDATA_int_regslice = output_r_TDATA_reg;
end
end

always @ (*) begin
if (((grp_sobel_filter_Pipeline_VITIS_LOOP_19_1_VITIS_LOOP_20_2_fu_86_output_r_TVALID == 1'b1) & (1'b1 == ap_CS_fsm_state5))) begin
output_r_TDEST_int_regslice = grp_sobel_filter_Pipeline_VITIS_LOOP_19_1_VITIS_LOOP_20_2_fu_86_output_r_TDEST;
end else begin
output_r_TDEST_int_regslice = output_r_TDEST_reg;
end
end

always @ (*) begin
if (((grp_sobel_filter_Pipeline_VITIS_LOOP_19_1_VITIS_LOOP_20_2_fu_86_output_r_TVALID == 1'b1) & (1'b1 == ap_CS_fsm_state5))) begin
output_r_TID_int_regslice = grp_sobel_filter_Pipeline_VITIS_LOOP_19_1_VITIS_LOOP_20_2_fu_86_output_r_TID;
end else begin
output_r_TID_int_regslice = output_r_TID_reg;
end
end

always @ (*) begin
if (((grp_sobel_filter_Pipeline_VITIS_LOOP_19_1_VITIS_LOOP_20_2_fu_86_output_r_TVALID == 1'b1) & (1'b1 == ap_CS_fsm_state5))) begin
output_r_TKEEP_int_regslice = grp_sobel_filter_Pipeline_VITIS_LOOP_19_1_VITIS_LOOP_20_2_fu_86_output_r_TKEEP;
end else begin
output_r_TKEEP_int_regslice = output_r_TKEEP_reg;
end
end

always @ (*) begin
if (((grp_sobel_filter_Pipeline_VITIS_LOOP_19_1_VITIS_LOOP_20_2_fu_86_output_r_TVALID == 1'b1) & (1'b1 == ap_CS_fsm_state5))) begin
output_r_TLAST_int_regslice = grp_sobel_filter_Pipeline_VITIS_LOOP_19_1_VITIS_LOOP_20_2_fu_86_output_r_TLAST;
end else begin
output_r_TLAST_int_regslice = output_r_TLAST_reg;
end
end

always @ (*) begin
if (((grp_sobel_filter_Pipeline_VITIS_LOOP_19_1_VITIS_LOOP_20_2_fu_86_output_r_TVALID == 1'b1) & (1'b1 == ap_CS_fsm_state5))) begin
output_r_TSTRB_int_regslice = grp_sobel_filter_Pipeline_VITIS_LOOP_19_1_VITIS_LOOP_20_2_fu_86_output_r_TSTRB;
end else begin
output_r_TSTRB_int_regslice = output_r_TSTRB_reg;
end
end

always @ (*) begin
if (((grp_sobel_filter_Pipeline_VITIS_LOOP_19_1_VITIS_LOOP_20_2_fu_86_output_r_TVALID == 1'b1) & (1'b1 == ap_CS_fsm_state5))) begin
output_r_TUSER_int_regslice = grp_sobel_filter_Pipeline_VITIS_LOOP_19_1_VITIS_LOOP_20_2_fu_86_output_r_TUSER;
end else begin
output_r_TUSER_int_regslice = output_r_TUSER_reg;
end
end

always @ (*) begin
case (ap_CS_fsm)
ap_ST_fsm_state1 : begin
if (((1'b1 == ap_CS_fsm_state1) & (ap_start == 1'b1))) begin
ap_NS_fsm = ap_ST_fsm_state2;
end else begin
ap_NS_fsm = ap_ST_fsm_state1;
end
end
ap_ST_fsm_state2 : begin
ap_NS_fsm = ap_ST_fsm_state3;
end
ap_ST_fsm_state3 : begin
ap_NS_fsm = ap_ST_fsm_state4;
end
ap_ST_fsm_state4 : begin
ap_NS_fsm = ap_ST_fsm_state5;
end
ap_ST_fsm_state5 : begin
if (((grp_sobel_filter_Pipeline_VITIS_LOOP_19_1_VITIS_LOOP_20_2_fu_86_ap_done == 1'b1) & (1'b1 == ap_CS_fsm_state5))) begin
ap_NS_fsm = ap_ST_fsm_state6;
end else begin
ap_NS_fsm = ap_ST_fsm_state5;
end
end
ap_ST_fsm_state6 : begin
if (((1'b1 == ap_CS_fsm_state6) & (regslice_both_output_r_V_data_V_U_apdone_blk == 1'b0))) begin
ap_NS_fsm = ap_ST_fsm_state1;
end else begin
ap_NS_fsm = ap_ST_fsm_state6;
end
end
default : begin
ap_NS_fsm = 'bx;
end
endcase
end

assign ap_CS_fsm_state1 = ap_CS_fsm[32'd0];

assign ap_CS_fsm_state2 = ap_CS_fsm[32'd1];

assign ap_CS_fsm_state3 = ap_CS_fsm[32'd2];

assign ap_CS_fsm_state5 = ap_CS_fsm[32'd4];

assign ap_CS_fsm_state6 = ap_CS_fsm[32'd5];

assign ap_NS_fsm_state4 = ap_NS_fsm[32'd3];

always @ (*) begin
ap_rst_n_inv = ~ap_rst_n;
end

assign cmp2281_fu_132_p2 = (($signed(cols) > $signed(32'd0)) ? 1'b1 : 1'b0);

assign empty_fu_138_p2 = (($signed(rows) > $signed(32'd0)) ? 1'b1 : 1'b0);

assign grp_fu_120_p0 = grp_fu_120_p00;

assign grp_fu_120_p00 = smax_reg_173;

assign grp_fu_120_p1 = grp_fu_120_p10;

assign grp_fu_120_p10 = smax3_reg_178;

assign grp_sobel_filter_Pipeline_VITIS_LOOP_19_1_VITIS_LOOP_20_2_fu_86_ap_start = grp_sobel_filter_Pipeline_VITIS_LOOP_19_1_VITIS_LOOP_20_2_fu_86_ap_start_reg;

assign grp_sobel_filter_Pipeline_VITIS_LOOP_19_1_VITIS_LOOP_20_2_fu_86_output_r_TREADY = (output_r_TREADY_int_regslice & ap_CS_fsm_state5);

assign input_r_TREADY = regslice_both_input_r_V_data_V_U_ack_in;

assign output_r_TVALID = regslice_both_output_r_V_data_V_U_vld_out;

assign output_r_TVALID_int_regslice = grp_sobel_filter_Pipeline_VITIS_LOOP_19_1_VITIS_LOOP_20_2_fu_86_output_r_TVALID;

assign smax3_fu_152_p3 = ((cmp2281_fu_132_p2[0:0] == 1'b1) ? trunc_ln7_fu_124_p1 : 31'd0);

assign smax_fu_144_p3 = ((empty_fu_138_p2[0:0] == 1'b1) ? trunc_ln7_1_fu_128_p1 : 31'd0);

assign trunc_ln7_1_fu_128_p1 = rows[30:0];

assign trunc_ln7_fu_124_p1 = cols[30:0];

endmodule //sobel_filter


以上为HLS工具自动生成的顶层文件。

IP核打包

我们可以使用vitis-hls工具的export_ip命令完成打包,或者用vivado命令行工具的package_ip打包,效果是差不多的。

但是由于这两者需要绑定gui,会显得很不优雅,也没有上面我们讲的自动化的思想,所以不选择使用(其实是因为我懒得配置gui了很麻烦QWQ)

我们依旧是创建.tcl文件,这次我们需要将verilog RTL文件封装为完整可用的IP核,完成IP核的打包。

verilog_to_ip.tcl代码如下:

1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31

create_project sobel_ip_proj ./ip_sobel/ -part xc7z020clg400-1 -force




set hls_rtl_dir /home/jiao/vitis-projects/edge_detection/sobel_hls_ip/solution1/impl/verilog
read_verilog [glob $hls_rtl_dir/*.v]
set_property top sobel_filter [current_fileset]

update_compile_order -fileset sources_1

ipx::package_project -root_dir ./ip_sobel/ -vendor xilinx.com -library user -taxonomy /UserIP -import_files



set core [ipx::current_core]
set_property name sobel_filter $core
set_property display_name {sobel_filter HLS IP} $core
set_property version 1.0 $core
set_property vendor xilinx.com $core
set_property library user $core

#set_property CONFIG.FREQ_HZ 100000000 [ipx::get_bus_interfaces ap_clk $core]
#set_property CONFIG.ASSOCIATED_BUSIF {input_r:output_r:s_axi_control} [ipx::get_bus_interfaces ap_clk $core]




ipx::save_core $core

在这个文件里,我们创建了vivado IP项目,名为sobel_ip_proj,并指定板卡型号。

之后添加上一步生成的RTL代码的目录,添加其下的所有代码,此处使用verilog代码,并显式指定sobel_filter作为当前设计顶层模块。

我们通过ipx::package_project启动IP打包的进程,然后指定IP属性,最后ipx::save_core $core完成IP的打包。

关于此处时钟设置,我把它注释掉了,一是因为不写时钟只会在运行的时候warning,并不会影响打包进行;其次是因为时钟是可以在之后的设计中设定并修改的,因此无需过早设定。

自动化,启动!

这是看来无趣但是实际上还算适用的一步,谁不想一键运行自己的项目呢?

于是写了start.sh如下:

1
2
3
4
5
source ~/xilinx/Vitis/2024.2/settings64.sh
#source $XILINX_VITIS/settings64.sh
vitis-run --mode hls --tcl run_hls.tcl
vivado -mode batch -source verilog_to_ip.tcl

只要对源码有修改就可以一键更新项目了,还算解压。

IP 集成于工程内

我们现在已经完成了从C++语言到IP核的构建,使用HLS工具可以让我们集中于逻辑实现而非复杂的接口设计,也避免了写大型verilog和VHDL的麻烦,那么下一步我们就应该应用我们已经设计好的IP核来构建整个工程。

导入我们使用HLS自设的IP核

我们在创建好一个空项目以后,可以在Tcl console里面执行以下命令:

1
2
3
4
set_property ip_repo_paths ~/vitis-projects/edge_detection/ip_sobel/ [current_project]

update_ip_catalog

由于我们设定项目是UserIP文件夹下,所以我们可以在左侧的IP Catalog下的UserIP项目下找到我们设计的IP核。

IP Catalog

这就是打包好的IP:

IP show

进行 Block Design

完成 block design 如下:

关于block design 我对原项目略作修改,改为了使用2个AXI interconnect.

bitstream

我们可以通过BD轻松的生成wrapper文件,之后写好约束文件,就可以生成bitstream文件了。

此处由于我们只使用DDR相关的,就只写DDR的约束即可,官方文档直接改就好了。

PYNQ板上控制