• 0
HeroGian

Accessing PL-written memory from PS

Question

Hello,

I'm writing a simple Vivado example in which I have a HLS IP that performs a memcpy in hardware. From a baremetal software I can pass a src and dst pointers and the IP can write the src memory into the dst memory range. This is the hardware design that I implemented:

426155475_Schermatadel2019-08-2716-47-54.thumb.png.875c28d0782dcabae6b75f07bfdf362c.png

And this is a portion of the baremetal code:

#define N 12
#define SRC_ADDR 0x10000000
#define DST_ADDR SRC_ADDR + N
...
...
uint8_t *src = (uint8_t *)SRC_ADDR;
uint8_t *dst = (uint8_t *)DST_ADDR;
...
...
XCopymem XCopyMemInstance;
XCopymem_Initialize(&XCopyMemInstance, 0);

XCopymem_Set_dst(&XCopyMemInstance, dst);
XCopymem_Set_src(&XCopyMemInstance, src);
XCopymem_Set_bytes(&XCopyMemInstance, N);

XCopymem_Start(&XCopyMemInstance);

while(!XCopymem_IsDone(&XCopyMemInstance));

Xil_DCacheInvalidate();

and after the Xil_DCacheInvalidate I can correctly read the values written by the IP. The I am trying to port this software in Linux.
I created a kernel image with PetaLinux 2018.1 and it works, but I have to invalidate the Data Cache as I did in baremetal. How can I implement this functionality in Linux Userspace? Maybe there is something like Xil_DCacheInvalidate?

I tried with __clear_cache(), but seems not work. This is the code that I wrote:

/*
 * Copyright (c) 2012 Xilinx, Inc.  All rights reserved.
 *
 * Xilinx, Inc.
 * XILINX IS PROVIDING THIS DESIGN, CODE, OR INFORMATION "AS IS" AS A
 * COURTESY TO YOU.  BY PROVIDING THIS DESIGN, CODE, OR INFORMATION AS
 * ONE POSSIBLE   IMPLEMENTATION OF THIS FEATURE, APPLICATION OR
 * STANDARD, XILINX IS MAKING NO REPRESENTATION THAT THIS IMPLEMENTATION
 * IS FREE FROM ANY CLAIMS OF INFRINGEMENT, AND YOU ARE RESPONSIBLE
 * FOR OBTAINING ANY RIGHTS YOU MAY REQUIRE FOR YOUR IMPLEMENTATION.
 * XILINX EXPRESSLY DISCLAIMS ANY WARRANTY WHATSOEVER WITH RESPECT TO
 * THE ADEQUACY OF THE IMPLEMENTATION, INCLUDING BUT NOT LIMITED TO
 * ANY WARRANTIES OR REPRESENTATIONS THAT THIS IMPLEMENTATION IS FREE
 * FROM CLAIMS OF INFRINGEMENT, IMPLIED WARRANTIES OF MERCHANTABILITY
 * AND FITNESS FOR A PARTICULAR PURPOSE.
 *
 */

#include <stdio.h>
#include <stdlib.h>
#include <fcntl.h>
#include <sys/mman.h>
#include <unistd.h>
#include <stdint.h>

#include "xcopymem/xcopymem.h"

#define N 12
#define SRC_ADDR 0x10000000
#define DST_ADDR SRC_ADDR + N

int main() {

	size_t pagesize = sysconf(_SC_PAGE_SIZE);

	off_t page_base_src 	= (SRC_ADDR / pagesize) * pagesize;
	off_t page_base_dst 	= (DST_ADDR / pagesize) * pagesize;

	off_t page_offset_src 	= SRC_ADDR - page_base_src;
	off_t page_offset_dst 	= DST_ADDR - page_base_dst;

	int fd = open("/dev/mem", O_SYNC);

	uint8_t * src = mmap(NULL, page_offset_src + N, PROT_READ | PROT_WRITE | PROT_EXEC, MAP_PRIVATE, fd, page_base_src);
	uint8_t * dst = mmap(NULL, page_offset_dst + N, PROT_READ | PROT_WRITE | PROT_EXEC, MAP_PRIVATE, fd, page_base_dst);

	if (src == MAP_FAILED) {
		printf("Can't map src memory\n");
	    return -1;
	}
	if (dst == MAP_FAILED) {
		printf("Can't map src memory\n");
		return -1;
	}

	for(int i = 0; i < N; ++i){
		src[i] = i;
	}
	memset(dst, 0, N);

	printf("\nsrc =\n");
	for(int i = 0; i < N; ++i) {
		printf("%X ", src[i]);
	}
	printf("\n");

	printf("\ndst ( before memcpy ) =\n\n");
	for(int i = 0; i < N; ++i) {
		printf("%X ", dst[i]);
	}
	printf("\n");

	XCopymem XCopyMemInstance;
	XCopymem_Initialize(&XCopyMemInstance, "copyMem");

	XCopymem_Set_dst(&XCopyMemInstance, dst);
	XCopymem_Set_src(&XCopyMemInstance, src);
	XCopymem_Set_bytes(&XCopyMemInstance, N);

	XCopymem_Start(&XCopyMemInstance);

	while(!XCopymem_IsDone(&XCopyMemInstance));

	__clear_cache(dst, dst + N);

	printf("\ndst ( after memcpy and clear cache ) =\n\n");
	for(int i = 0; i < N; ++i) {
		printf("%X ", dst[i]);
	}
	printf("\n");

	close(fd);

    return 0;
}

thank you

Share this post


Link to post
Share on other sites

3 answers to this question

Recommended Posts

  • 0

@HeroGian

Your question is somewhat tricky. It is doable on an ARM but as far as I can gather it's not dynamic, you either start without caches and then everything moves slower or you continue using caches. The dynamical approach is somewhat tricky  because you risk freezing your OS. I you want to try that you can find more info here:

https://forums.xilinx.com/t5/Embedded-Linux/how-disable-cache-in-Linux/td-p/740556

There might be a better way to do it. What you actually want is just a hardware accelerated memcpy in the DDR, you could try to handle this like an DMA transfer in to DDR from the PL and use a coherency function for it. These are kernel space specific functions so I guess you will have to wirte a kernel driver for it. I strongly suggest reading about continuous memory (CMA) and coherent data transfers + the ACP port of the Zynq before proceeding. Here is a link which might give you an insight in to what and why I suggest this.

https://forums.xilinx.com/t5/Embedded-Processor-System-Design/Cache-Flushing/td-p/635653

https://forums.xilinx.com/t5/Embedded-Processor-System-Design/XC7Z010-can-I-automatically-invalidate-cache-with-my-PL-to-DDR/td-p/715220

Although you are using a Zynq, there might be some useful information in the ZynqMP wiki by xilinx about what you need.

https://xilinx-wiki.atlassian.net/wiki/spaces/A/pages/18842098/Zynq+UltraScale+MPSoC+Cache+Coherency

Sorry for not being able to give you a more coherent and straight forward answer, what you are attempting is tricky.

Good Luck,

-Ciprian

Share this post


Link to post
Share on other sites

Create an account or sign in to comment

You need to be a member in order to leave a comment

Create an account

Sign up for a new account in our community. It's easy!

Register a new account

Sign in

Already have an account? Sign in here.

Sign In Now