Ahmed Alfadhel

Members
  • Content Count

    113
  • Joined

  • Last visited

Reputation Activity

  1. Like
    Ahmed Alfadhel reacted to zygot in Does it possible to implement Python code on ARTY 7 board using Vivado HLS?   
    Vivado HLS has nothing to do with running Python code on an FPGA target. If you use a soft processor with suitable Linux support you can run a Python interpreter on a non-ZYNQ based FPGA board.. Depending on your OS and installed services you can certainly run Python on a ZYNQ based board.
    PYNQ is a rather self-contained framework.
  2. Like
    Ahmed Alfadhel reacted to zygot in Verilog Simulator   
    I've worked as a consultant or employee for a lot of companies that have years of accrued experience developing programmable logic based products that have been deployed by customers. All of them use the standard logic simulator tools. Some only use the free tools that come with the FPGA vendors' toolset. Some buy the full versions of ModelSim or other logic simulators. This isn't by accident or lack of sophistication.
    It's true that only a subset of Verilog and VHDL are supported by programmable logic synthesis tools. Verilog and VHDL were developed to do simulation long before the IEEE added standard libraries to them so that programmable logic vendors would adapt them for use as a synthesis source. All FPGA vendors provide documentation on what exactly their synthesis tools support for these languages. Logic simulators, even Xilinx's home grown one, support most of Verilog and VHDL. After all they are simulators! All of my testbenches use non-synthesizable Verilog or VHDL. Beginners and self-taught people often get into FPGA development with a lot of bad assumptions and conceptualizations. They also don't read the documentation provided by the vendors of the products that they want to use. I'd agree that finding good guidance on how to write a testbench is hard to come by. Beginners should start off using the minimal features of Verilog or VHDL until they've mastered those and then add to their repertoire as the gain experience.  
    Verilator is a cycle based simulator. That makes if fast; but it's fast because it simulates a greatly simplified model of digital logic. If you're simulating a PDP11 it's a good tool. If you're simulating the latest Intel Xenon processor I'm pretty sure that you don't have a tool that has complete code or behavioral coverage. I strongly disagree that sub-clock timing isn't important to beginners. I would argue that simplifying logic design into a cycle based conceptualization is more confusing and counter-productive. There certainly are valid reason to use a cycle based simulator; even for logic design. These types of simulators are not preferred nor should they be the first option though. They are simply not capable of performing all of the simulator duties required for a complete programmable logic design flow. Use the simulator provided by your FPGA vendor. Learn basic simulator language concepts. Learn about limits of FPGA device resources and support for  whatever source format you choose to submit to their syntheses tools. Most importantly, take some time to develop a basic understanding of digital design using real devices and real wires. I realize that LUT based programmable logic devices aren't a bunch of gates but the same concepts important to designing circuits with LSI and MSI devices on a PCB apply to FPGA design. If you find that ModelSim or Vivado simulator is taking too long you can adjust the sample unit of time to speed things up. A one pico-second timescale isn't necessary for most designs. You can finds ways to work around a lot of simulation time that isn't important to your analysis.
    All of the applications that you mention above are relatively simple designs in terms of timing analysis. At some point you may want to do something more challenging and will have to provide proper timing and even placement constraints in order to get the vendor's tools to properly synthesize and place logic so that your design will work. at some point sub-clock timing considerations will dominate the timing analysis of a high clock rate design. Verilator and SymbiYosys can't help you do that. Time based simulators can. You can only go so far with simple models of real things. My advice is to become reasonably expert with the tools recommended by the vendors who make the devices and make the tools to develop designs for. Once you have a level competence with those logic simulation tools then branch out and try other options.
    I would agree with anyone suggesting that Intel and Xilinx have done a poor job with code coverage, as you say, formal analysis of HDL source code. This is one area that needs to be addressed. There are companies that address this shortcoming.
    [edit] I forgot to mention that a lot of people resist post-route timing simulation because it often involves a bit more work. In the commercial environment where teams are working in parallel to make integration and delivery schedules having this analysis is vital to success. You need traditional logic time based simulators like the one that comes with your vendor's tools to do this. One more reason to become adept at using them.
  3. Like
    Ahmed Alfadhel reacted to [email protected] in Verilog Simulator   
    @xc6lx45,
    This is a valid question, and a common response I get when recommending Verilator.  Let's examine a couple of points here.
    Verilog is a very large language, consisting of both synthesizable and non-synthesizable subsets.  I've seen more than one student get these two subsets mixed up, using constructs like "always @* clk <= #4 !clk;" and struggling to figure out why their design either doesn't work or fails to synthesize. I've seen a lot of student/beginners try to use these non-synthesizable constructs to generate "programs" rather than "designs".  Things like "if (reset) for(k=0; k<MEMSIZE; k=k+1) mem[k] = 0;", or "always @(*) sum = 0; @(posedge clk) if (A[0]) sum = B; @(posedge clk) if (A[1]) sum = sum + (B<<1)", etc. Since verilator doesn't support #delay's, nor does it support 'x values, in many ways it does a better job matching what the synthesizer and the hardware will do together, leaving less room for confusion. C++ Verilator based wrappers can be used just as easily as Verilog for bench testing components.  That said, ... The Verilog simulation language is a fairly poor scripting language for finding bugs in a module when compared to formal methods.  There's been more than once that I've been deceived into thinking my design works, only to find a couple cases (or twenty) once I get to hardware where it didn't work.  Indeed, both Xilinx and Intel messed up their AXI demonstration designs--designs that passed simulation but not a formal verification check.  As a result, many individuals have posted unsolved bugs on the forums, complained about design quality, etc.  (Xilinx has been deleting posts that aren't flattering to their methodology.  I'm not yet sure about Intel in this regard)  Formal methods tend not to have this problem.  Why waste a student's time teaching a broken design methodology? So, if you aren't using Verilog for your bench test, then what other simulation based testing do you need?  Integration testing where all the modules come together to interact with the hardware in some (potentially) very complex ways.  At this point, you need hardware emulation, and Verilator provides a much better environment for integrating C/C++ hardware emulators into your design. My favorite example of this is building a VGA.  VGA's are classically debugged using a scope and a probe since the definition of "working" tends to be "what my monitor will accept."  The problem with this is that you lose access to all of the internal signals when abandoning your simulation environment.  On one project I was working on, this one for the Basys3 where there was a paucity of memory for a video framebuffer, I chose to use the flash and to place prior compressed frames onto the flash.  I would then decompress these frames on the fly as they were being displayed.  My struggle was then how to debug decompression failures, since I could only "see" them when the design ran from hardware.  Verilator fixes this, by allowing you to integrate a display emulator with your design making it easier to find where in the VCD/trace output file the bug lies.
    Another example would be a flash simulation.  Most of my designs include a 16MB flash emulation as part of their simulation.  This allows me to debug flash interactions in a way that I doubt you could using iverilog.  This allows me to simulate things like reading from flash, erasing and programming flash--even before I ever get to actual hardware, or perhaps after I've taken my design to hardware and then discovered a nasty bug.  More than once is the time where I've found a bug after reading through all 16MB of flash memory, or in the middle of programming and something doesn't read back properly.  I'm not sure how I would do debug this with iverilog.
    A third example would be SD-card simulation.  I'm currently working with a Nexys Video design with an integrated SD card.  It's not a challenge to create a 32GB FAT based image on my hard drive and then serve sectors from it to my running Verilator simulation, but I'm not sure how I would do this from iverilog.  So far in this project, I've been able to demonstrate an ability to read a file from the SD card--FAT system and all, and my next step will be writing data files to it via the FATFS library.  I find this to be an important simulation requirement, something provided by Verilator and quite valuable.
    Finally, I tend to interact with many of my designs over the serial port.  I find it valuable to interact with the simulation in (roughly) the same way as with hardware, and so I use a program to forward the serial port over a TCP/IP link.  I can do the same from Verilator (try that with iverilog), and so all of the programs that interact with my designs can do so in the same fashion regardless of whether the design is running in simulation or in hardware.
    Yes, there are downsides to using Verilator.
    It doesn't support non-synthesizable parts of the language.  This is the price you pay for getting access to the fastest simulator on the market--even beating out the various commercial simulators out there. Verilator is an open source simulator, and so it doesn't have the encryption keys necessary to run encrypted designs--such as the Vivado's FFT or even the FIFO generator that's a core component of their S2MM, MM2S, and their interconnect ... and probably quite a few other components as well.  This is one of the reasons why I've written alternative, open source designs to many of these common components. [FFT, S2MM, MM2S, AXI interconect, etc.]  As to which components are "better", it's a mixed bag--but that's another longer story for another day. Verilator does not support sub-clock timing simulations, although it can support multi-clock simulations.  At the same time, most students don't need to know the details of sub-clock timing in their first course.  (I'm not referring to clock-domain crossing issues here, since those are rarely simulated properly anyway.) Still, I find Verilator to be quite a valuable choice and one I highly recommend learning early on in the learning process.  This is the reason why my beginners Verilog tutorial centers around using both Verilator and SymbiYosys.
    Dan
  4. Like
    Ahmed Alfadhel reacted to xc6lx45 in Verilog Simulator   
    As a 2nd opinion, I would not recommend Verilator to learn the language. It does work on Windows (MSYS) but I'd ask for a good reason why you need Verilator in the first place instead of a conventional simulator.
    Have a look at iverilog / gtkwave: http://iverilog.icarus.com/
    It works fine from standard Windows (no need to create a virtual machine). You'd call it through the command line though (hint: create a .bat file with the simulation commands to keep them together with the project. Hint, the abovementioned MSYS environment is pretty good for this, e.g. use a makefile or shell script).
  5. Like
    Ahmed Alfadhel reacted to [email protected] in Verilog Simulator   
    @Ahmed Alfadhel,
    Don't put spaces between the "-" and "Wall", or between the "-" and the "cc" and you'll do better,
    Dan
  6. Like
    Ahmed Alfadhel reacted to [email protected] in Verilog Simulator   
    @Ahmed Alfadhel,
    I put instructions together for that some time ago.
    Let me know if they need to be updated any.
    Dan
  7. Like
    Ahmed Alfadhel reacted to zygot in Verilog   
    I haven't bought a textbook for quite a few years now so I don't have any suggestions. There are a lot of levels to learning an HDL. One is the language syntax and basic concepts of timing, concurrency and other aspects of simulating a model. Then there is the usage of languages like VHDL and Verilog for synthesis. Both are central aspects of designing logic in programmable devices. I can't emphasize enough how important learning basic digital design concepts is to developing competency in FPGA design regardless of your source preferences. I doubt that there is a good text that covers all of these facets. Unless there is a University nearby finding a place to browse though books to see if they might be worth the investment is a difficult proposition these days.
    I know that @[email protected] verilator. He's the only one concentrating on programmable logic design that I know of who uses it. Be aware that it is a cycle based simulator. These tend to be a lot faster than regular logic simulators and certainly have a place. Don't be afraid of the simulator tools widely used in industry where products are programmable logic based. The native Vivado simulator and ModelSim provided by Intel are preferred simulation tools for programmable logic. These are time based simulations that can simulate in units of picoseconds if that's warranted. They also use compiled libraries that understand the vendors device architecture. Best of all they can do post route timing simulations. Learn how to write good testbenches that work with the vendors simulators. Part of the design process is being able to conceptualize real world device behavior; the less idealistic the more complete your design process and logic will be.
  8. Like
    Ahmed Alfadhel reacted to [email protected] in Verilog   
    @Ahmed Alfadhel,
    Perhaps the most complete tutorial out there is asic-world's tutorial.  You might also find it the most vacuous, since although it tells you all the details of the language it doesn't really give you the practice or the tools to move forward from there.  There's also a litexsoc (IIRC) by enjoy-digital that I've heard about, but never looked into
    An alternative might be my own tutorial.  Admittedly, it's only a beginner's tutorial.  It'll only get you from blinky to a serial port with an attached FIFO.  That said, it does go over a lot of FPGA Verilog design practice and principles.  It also integrates learning how to use a simulator, in this case Verilator, and a formal verification tool, such as SymbiYosys, into your design process so that you can start learning how to build designs that work the first time they meet hardware.
    I'm also in the process of working to prepare an intermediate tutorial.  For now, if you are interested, you'd need to find most of the information that would be in such a tutorial on my blog.  (It's not all there ... yet, although there are articles on how to create AXI peripherals ..)
    Feel free to check it out.  Let me know what you think,
    Dan
  9. Like
    Ahmed Alfadhel reacted to hamster in Enevlope Detection using FPGA board   
    The error is because the magnitude as to be one bit longer than the inputs, (as the magnitude of (0xFFFFFF, 0xFFFFFF) is 0x16A09E4, which will overflow if you put it into a 25-bit signed value.
    It will however fit nicely into a 25-bit unsigned value, and as it is a magnitude it will be positive. So maybe snip off the top bit in the assignment, but remember it is unsigned!
  10. Like
    Ahmed Alfadhel reacted to hamster in Enevlope Detection using FPGA board   
    Using a tool for what it is meant to do is easy. Using a tool for something where it isn't suited, that is where the learning begins!
    (I now goes back to doing dental surgery with a steamroller, or maybe digging a tunnel with a teaspoon).
  11. Like
    Ahmed Alfadhel reacted to hamster in Enevlope Detection using FPGA board   
    Oh, a quick hack of a CORDIC magnitude
     
    library IEEE; use IEEE.STD_LOGIC_1164.ALL; use IEEE.NUMERIC_STD.ALL; entity magnitude is Port ( clk : in std_logic; x_in : in std_logic_vector; y_in : in std_logic_vector; x_out : out std_logic_vector := (others => '0'); y_out : out std_logic_vector := (others => '0'); magnitude_out : out std_logic_vector := (others => '0') -- Accurate to 5 bits or so ); end magnitude; architecture Behavioral of magnitude is type a_x is array(0 to 5) of signed(x_in'high+1 downto 0); type a_y is array(0 to 5) of signed(y_in'high+1 downto 0); type a_x_delay is array(0 to 5) of std_logic_vector(x_in'high downto 0); type a_y_delay is array(0 to 5) of std_logic_vector(y_in'high downto 0); signal x : a_x := (others => (others => '0')); signal y : a_y := (others => (others => '0')); signal x_delay : a_x_delay := (others => (others => '0')); signal y_delay : a_y_delay := (others => (others => '0')); begin magnitude_out <= std_logic_vector(y(5)); x_out <= x_delay(x_delay'high); y_out <= y_delay(y_delay'high); process(clk) begin if rising_edge(clk) then if x(4) >= 0 then -- x(5) is not needed y(5) <= y(4) + x(4)(x(4)'high downto 4); else -- x(5) is not needed y(5) <= y(4) - x(4)(x(4)'high downto 4); end if; if x(3) >= 0 then x(4) <= x(3) - y(3)(y(3)'high downto 3); y(4) <= y(3) + x(3)(x(3)'high downto 3); else x(4) <= x(3) + y(3)(y(3)'high downto 3); y(4) <= y(3) - x(3)(x(3)'high downto 3); end if; if x(2) >= 0 then x(3) <= x(2) - y(2)(y(2)'high downto 2); y(3) <= y(2) + x(2)(x(2)'high downto 2); else x(3) <= x(2) + y(2)(y(2)'high downto 2); y(3) <= y(2) - x(2)(x(2)'high downto 2); end if; if x(1) >= 0 then x(2) <= x(1) - y(1)(y(1)'high downto 1); y(2) <= y(1) + x(1)(x(1)'high downto 1); else x(2) <= x(1) + y(1)(y(1)'high downto 1); y(2) <= y(1) - x(1)(x(1)'high downto 1); end if; if x(0) >= 0 then x(1) <= x(0) - y(0)(y(0)'high downto 0); y(1) <= y(0) + x(0)(x(0)'high downto 0); else x(1) <= x(0) + y(0)(y(0)'high downto 0); y(1) <= y(0) - x(0)(x(0)'high downto 0); end if; if y_in(y_in'high) = '1' then x(0) <= signed(x_in(x_in'high) & x_in); y(0) <= signed(to_signed(0,y_in'length+1)-signed(y_in)); else x(0) <= signed(x_in(x_in'high) & x_in); y(0) <= signed(y_in(y_in'high) & y_in); end if; -- Delay to output the inputs, so they are aligned with the magnitudes x_delay(1 to 5) <= x_delay(0 to 4); y_delay(1 to 5) <= y_delay(0 to 4); x_delay(0) <= x_in; y_delay(0) <= y_in; end if; end process; end Behavioral; Chaining the two together, and it seems to work. Top trace is the input, second trace is the delayed input,
    Third is the delayed output of the Hilbert filter, and the last is the scaled magnitude of the complex x+iy signal.
    NOTE: I know for sure that these are buggy, as they have range overflows), but they should give the idea of how @Ahmed Alfadhel could be implement it.
     

    magnitude.vhd hilbert_transformer.vhd tb_hilbert_transformer.vhd
  12. Like
    Ahmed Alfadhel reacted to hamster in Enevlope Detection using FPGA board   
    Going to Incorporate it into my (MCU based) guitar tuner... but it is a nice tool to have in the kit.
  13. Like
    Ahmed Alfadhel reacted to [email protected] in Enevlope Detection using FPGA board   
    @hamster,
    Not bad, not bad at all ... just some feedback for you though:
    The "official" Hilbert transform tap generation suffers from the same Gibbs phenomena that keeps folks from using the "ideal lowpass filter" (i.e. sin x/x) You could "window" the filter to get better performance, or you could try using Parks-McClellan to get better taps.  There are tricks to designing filters with quantized taps as well ... however the ones I know are ad-hoc and probably about the same as what you did above There's symmetry in the filter.  For half as many multiplies you can take sample differences, and then apply the multiplies to those sample differences. Other than that, pretty cool!  Did you find anything useful to test it on?
    Dan
  14. Like
    Ahmed Alfadhel reacted to hamster in Enevlope Detection using FPGA board   
    I was intrigued enough by the Hilbert Transform to actually learn, experiment and implement it in VHDL. The math behind it is pretty nifty.
    Here's the very naively implemented example, using a short FIR filter.
    You can find this, a test bench and simulation output at http://hamsterworks.co.nz/mediawiki/index.php/Hilbert_Transform
    library IEEE; use IEEE.STD_LOGIC_1164.ALL; use IEEE.NUMERIC_STD.ALL; entity hilbert_transformer is Port ( clk : in STD_LOGIC; real_in : in STD_LOGIC_VECTOR (9 downto 0); real_out : out STD_LOGIC_VECTOR (10 downto 0) := (others => '0'); imag_out : out STD_LOGIC_VECTOR (10 downto 0) := (others => '0')); end hilbert_transformer; architecture Behavioral of hilbert_transformer is -- Constants are 2/(n * pi) * 512, for n of -7,-5,-3,-1,1,3,5,7 constant kernel0 : signed(real_in'length-1 downto 0) := to_signed( -47, real_in'length); constant kernel2 : signed(real_in'length-1 downto 0) := to_signed( -66, real_in'length); constant kernel4 : signed(real_in'length-1 downto 0) := to_signed(-109, real_in'length); constant kernel6 : signed(real_in'length-1 downto 0) := to_signed(-326, real_in'length); constant kernel8 : signed(real_in'length-1 downto 0) := to_signed( 326, real_in'length); constant kernel10 : signed(real_in'length-1 downto 0) := to_signed( 109, real_in'length); constant kernel12 : signed(real_in'length-1 downto 0) := to_signed( 66, real_in'length); constant kernel14 : signed(real_in'length-1 downto 0) := to_signed( 47, real_in'length); type a_delay is array (0 to 14) of signed(real_in'high downto 0); signal delay : a_delay := (others => (others => '0')); signal tap0 : signed(real_in'length+kernel0'length-1 downto 0) := (others => '0'); signal tap2 : signed(real_in'length+kernel2'length-1 downto 0) := (others => '0'); signal tap4 : signed(real_in'length+kernel4'length-1 downto 0) := (others => '0'); signal tap6 : signed(real_in'length+kernel6'length-1 downto 0) := (others => '0'); signal tap8 : signed(real_in'length+kernel8'length-1 downto 0) := (others => '0'); signal tap10 : signed(real_in'length+kernel10'length-1 downto 0) := (others => '0'); signal tap12 : signed(real_in'length+kernel12'length-1 downto 0) := (others => '0'); signal tap14 : signed(real_in'length+kernel14'length-1 downto 0) := (others => '0'); begin process(clk) variable imag_tmp : signed(real_in'length*2-1 downto 0); begin if rising_edge(clk) then real_out <= std_logic_vector(resize(delay(8),real_out'length)); -- deliberatly advanced by one due to latency imag_tmp := tap0 + tap2 + tap4 + tap6 + tap8 + tap10 + tap12 + tap14; imag_out <= std_logic_vector(imag_tmp(imag_tmp'high downto imag_tmp'high-imag_out'high)); tap0 <= delay(0) * kernel0; tap2 <= delay(2) * kernel2; tap4 <= delay(4) * kernel4; tap6 <= delay(6) * kernel6; tap8 <= delay(8) * kernel8; tap10 <= delay(10) * kernel10; tap12 <= delay(12) * kernel12; tap14 <= delay(14) * kernel14; -- Update the delay line delay(1 to 14) <= delay(0 to 13) ; delay(0) <= signed(real_in); end if; end process; end Behavioral;  
  15. Like
    Ahmed Alfadhel reacted to xc6lx45 in Enevlope Detection using FPGA board   
    yes, for an application with basic requirements, like receiver gain control this will probably work just fine (it's equivalent to an analog envelope detector). Now it needs a fairly high bandwidth margin between the modulation and the carrier, and that may make it problematic in more sophisticated DSP applications (say "polar" signal processing when I try to reconstruct the signal from the envelope) where the tolerable noise level is orders of magnitude lower.
     
     
  16. Like
    Ahmed Alfadhel reacted to hamster in Enevlope Detection using FPGA board   
    Oh, for what it's worth I've been toying with the Hilbert Transform. Here is a example of it;
    #include <math.h> #include <stdio.h> #define SAMPLES 1000 #define HALF_WIDTH 11 /* e.g. 11 filters from -11 to 11 */ float x[SAMPLES]; int main(int argc, char *argv[]) { int i; /* Build some test data */ for(i = 0; i < SAMPLES; i++) { x[i] = cos(2*M_PI*i/10.3); } /* Now apply the Hilbert Transform and see what we get */ /* It should be close to sin(2*M_PI*i/10.3) */ for(i = HALF_WIDTH; i < SAMPLES-HALF_WIDTH-1; i++) { double h = 0; int j; /* Apply the kernel */ for(j = 1; j <= HALF_WIDTH; j+=2) h += (x[i-j]-x[i+j]) * 2.0/(j*M_PI); /* Print result */ printf("%8.5f, %8.5f\n", x[i], h); } }  
  17. Like
    Ahmed Alfadhel reacted to hamster in Enevlope Detection using FPGA board   
    Oh having a look at the full signal chain, it looks like you just need to apply a low-pass filter on the absolute value of the signal. It might be just as simple as:
    if sample < 0 then filter := filter - filter/64 - sample; else filter := filter - filter/64 + sample; end if; With the value of "64" change depending on your sample rates and desired cutoff frequency. Or if your needs get very complex you might need to use a FIR low pass filter.
    Run some sample data through it in Matlab or Excel (or heavens forbid,  some C code) and see what happens.
  18. Like
    Ahmed Alfadhel reacted to hamster in Enevlope Detection using FPGA board   
    Hi @Ahmed Alfadhel
    I had the C code handy because I have been working on an atan2(y,x) implementation for FPGAs, and had been testing ideas.
    I left it in C because I don't really know your requirements, but I wanted to give you a working algorithm, complete with proof that it does work, and so you can tinker with it, see how it works, and make use of it. Oh, and I must admit that it was also because I am also lazy 😀
    But seriously:
    - I don't know if you use VHDL or Verilog, or some HLS tool
    - I don't know if your inputs are 4 bits or 40 bits long,
    - I don''t know if you need the answer to be within 10% or 0.0001%
    - I don't know if it has to run at 40Mhz or 400Mhz 
    - I don't know if you have 1000s of cycles to process each sample, or just one.
    - I don't even know if you need the algorithm at all!
    But it has been written to be trivially converted to any HDL as it only uses bit shifts and addition/subtraction. But maybe more importantly you can then use it during any subsequent debugging to verify that you correctly implemented it.
    For an example of how trivial it is to convert to HDL:
    if(x > 0) { x += -ty/8; y += tx/8;} else { x += ty/8; y += -tx/8;} could be implemented as
    IF x(x'high) = '0' THEN x := x - resize(y(y'high downto 3), y'length); y := y + resize(x(x'high downto 3), x'length); ELSE x := x + resize(y(y'high downto 3), y'length); y := y - resize(x(x'high downto 3), x'length); END IF My suggestion is that should you choose to use it, compile the C program, making the main() function a sort of test bench, and then work out exactly what you need to implement in your HDL., You will then spend very little time writing, debugging and improving the HDL because you will have a very clear idea of what you are implementing.
  19. Like
    Ahmed Alfadhel reacted to hamster in Enevlope Detection using FPGA board   
    Hi, Sorry to barge in, but if anybody can point me to the Hibbert Transformer info I would be very grateful.
    However, here is an FPGA friendly way to calculate   mag = sqrt(x*x+y*y), with about a 99% accuracy. You can easily see the pattern to get whatever accuracy you need.
     
    #include <math.h> #include <stdio.h> #define M_SCALE (16) /* Scaling for the magnitude calc */ void cordic_mag(int x,int y, int *mag) { int tx, ty; x *= M_SCALE; y *= M_SCALE; /* This step makes the CORDIC gain about 2 */ if(y < 0) { x = -(x+x/4-x/32-x/256); y = -(y+y/4-y/32-y/256); } else { x = (x+x/4-x/32-x/256); y = (y+y/4-y/32-y/256); } tx = x; ty = y; if(x > 0) { x += -ty/1; y += tx/1;} else { x += ty/1; y += -tx/1;} tx = x; ty = y; if(x > 0) { x += -ty/2; y += tx/2;} else { x += ty/2; y += -tx/2;} tx = x; ty = y; if(x > 0) { x += -ty/4; y += tx/4;} else { x += ty/4; y += -tx/4;} tx = x; ty = y; if(x > 0) { x += -ty/8; y += tx/8;} else { x += ty/8; y += -tx/8;} tx = x; ty = y; if(x > 0) { x += -ty/16; y += tx/16;} else { x += ty/16; y += -tx/16;} *mag = ty/M_SCALE/2; /* the 2 is to remove the CORDIC gain */ } int main(int argc, char *argv[]) { int i; int cases = 300; printf("Irput Calculated CORDIC Error\n"); for(i = 0; i < cases; i++) { float angle = 2*M_PI*i/cases; int x = sin(angle)*20000; int y = cos(angle)*20000; int mag, a_mag = (int)sqrt(x*x+y*y); cordic_mag(x,y, &mag); printf("%6i %6i = %6i vs %6i %4i\n", x, y, a_mag, mag, mag-a_mag); } } Oh, here is the output with a couple more iterations added.
    Irput Calculated CORDIC Error 0 20000 = 20000 vs 19999 -1 418 19995 = 19999 vs 19995 -4 837 19982 = 19999 vs 20001 2 1255 19960 = 19999 vs 19998 -1 1673 19929 = 19999 vs 19995 -4 2090 19890 = 19999 vs 20001 2 2506 19842 = 19999 vs 19998 -1 2921 19785 = 19999 vs 19996 -3 3335 19719 = 19999 vs 20001 2 3747 19645 = 19999 vs 19998 -1 4158 19562 = 19999 vs 19996 -3 4567 19471 = 19999 vs 20001 2 4973 19371 = 19999 vs 19997 -2 5378 19263 = 19999 vs 19996 -3 5780 19146 = 19999 vs 20001 2 6180 19021 = 19999 vs 19998 -1 6577 18887 = 19999 vs 19999 0 6971 18745 = 19999 vs 20001 2 7362 18595 = 19999 vs 19993 -6
  20. Like
    Ahmed Alfadhel reacted to [email protected] in Enevlope Detection using FPGA board   
    @hamster,
    The Hilbert transform is actually pretty simple.  You can build it from a half band lowpass filter that's just shifted up in frequency so that it cuts off at 0Hz and Nyquist.  Perhaps this core might give you some ideas, although ... if I recall correctly this implementation only calculates the imaginary part of the Hilbert transform.  (The real part doesn't change, but it does need to be delayed so that the two match.
    @Ahmed Alfadhel,
    I agree with @hamster, you are asking the wrong question.  The "envelope" of a signal is the amplitude of the function that gets multiplied by the carrier, such as the m(t) in the expression  m(t)cos(2*pi*f_c *t).  An FSK signal should not have any envelope to it at all, since all the information is contained in the frequency--something like the m(t) in cos(2*pi*(f_c + m(t))*t).  Technically, the signal that results should all be at the same (complex) amplitude so ... something in your question doesn't make sense.  Amplitude shift keying, phase shift keying, quadrature amplitude modulation, etc., those will all have non-constant envelopes to them, but they don't match the drawings in your second figure.  Now if you apply a matched filter to your FSK signal, that might put a bit of an amplitude on the result ... but that's another story.
    My guess is that either you aren't working with FSK, or there's something missing from your charts up above--the FM discriminator, but that's a longer discussion to have elsewhere.  Just to make matters worse, some FM discriminators are susceptible to amplitude variations and ... that'd really mess up what you are trying to accomplish above.
    Dan
  21. Like
    Ahmed Alfadhel reacted to xc6lx45 in Enevlope Detection using FPGA board   
    Well yes and no. The question I'd ask is, can you use a local oscillator somewhere in your signal path with a 90 degree offset replica. In many cases this is trivially easy ("trivially" because I can e.g. divide digitally from double frequency or somewhat less trivially, use, say, a polyphase filter. In any way, it's probably easier on the LO than on the information signal because it's a single discrete frequency at a time, where the Hilbert transform approach needs to deal with the information signal bandwidth).
    If so, downconvert with sine and cosine ("direct conversion") and the result will be just the same. After lowpass filtering, square, add, take square-root, there's your envelope . When throughput / cost matters (think "Envelope tracking" on cellphones) it is not uncommon to design RTL in square-of-envelope units to avoid the square root operation. Or if accuracy is not that critical, consider a nonlinear bit level approximation see "root of less evil, R. Lyons".
    Of course, Hilbert transform is a viable alternative, just a FIR filter (if complex-valued).
    In case you can't tell the answer right away, I recommend you do the experiment in the design tools what happens if you try to reach 0 Hz (hint, "Time-bandwidth product, Mr. Heisenberg". Eventually it boils down to fractional bandwidth and phase-shifting DC remains an unsolved problem...).
     
     
  22. Like
    Ahmed Alfadhel reacted to [email protected] in Visualizing FIR filter output on oscilloscope   
    @Ahmed Alfadhel,
    SPI transfers don't need to be 32bits in width.  Neither do they need to be 8-bits in width.  This particular SPI device appears to want 16-bit data.  I can't speak to whether or not Xilinx's SPI library is broken and unable to handle 16bit transfers, but I would be surprised if that were the case.
    Looking over your block design, it looks like you are running a DDS into an FIR filter and then .... reading the results from a GPIO port??  That doesn't make any sense.
    Signal processing in general is very sensitive to lost/dropped/inserted samples.  CPUs aren't known for processing data at known rates, and MicroBlaze CPUs aren't any different.   That means I'd expect your design to have gaps where data is getting dropped just from examining this structure alone.
    Consider redesigning your structure to (ideally) remove the MicroBlaze/CPU from your data path entirely.  If  you can't do that, then at least place FIFOs between the data source and the CPU, as well as between the CPU and the DAC.  Otherwise you are likely to spend quite a bit of time chasing even stranger bugs than this one.
    Dan
  23. Like
    Ahmed Alfadhel reacted to [email protected] in Visualizing FIR filter output on oscilloscope   
    @Ahmed Alfadhel,
    The PModDA3 has a 16-bit DAC.  Why are you sending it 8-bits at a time?  Why are your samples sourced from a 32-bit value?  Have you "packed" them somewhow and so need to unpack them?
    Dan
  24. Like
    Ahmed Alfadhel reacted to [email protected] in Visualizing FIR filter output on oscilloscope   
    @Ahmed Alfadhel,
    Any ideas?  Yes--watch the sign bit.  Some DACs count from zero to maximum, others from a negative number to a maximum.  If you get the sign bit wrong, you'll get a bunch of sudden and large jumps anytime your output signal crosses zero.
    Dan
  25. Like
    Ahmed Alfadhel reacted to xc6lx45 in IIR compiler   
    IIR filters are more challenging for several reasons (bitwidth / coefficient quantization, internal gain boosting / biquad Q, limit cycles, nonlinear group delay, non-feedforward data flow, ...)
    You will probably find that once you've gone all the way through a fixed point implementation, IIR filters are not as attractive as suggested by the MAC count.
    Of course, they do exist and it may work just fine (depends also on the values of the coefficients you're trying to implement).

    Your filtering problem from the other post had a fairly narrow passband and a huge stopband. This is very expensive use of a FIR filter...

    If you use a more sophisticated (multirate) architecture, you'll be able to get the same or better filtering with maybe 1..5 % of the MAC count.

    One approach is:
    - design an inexpensive band stop that suppresses the alias band of the following decimation step
    - discard every 2nd sample (said alias band folds over your wanted signal but we've made sure there is no significant energy)
    - repeat the procedure as many times as possible
    - design a final filter that provides steep edges and equalizes the sum of all earlier stages
    The point is that the cost of later stages gets much lower because the sampling rate drops (you may actually find most of the MACs get used in the first stage, and the last one is basically for free thanks to the much lower sampling rate).

    Now this isn't trivial, people like me get paid for this... fortunately there aren't (yet) wizards for everything.