I was able to get it to work using the MCS method via Vivado. It initially didn't work but finally it did. No change to the design of the SPI or anything. I am testing again with another program via the same method with MCS. I have listed a few points below.
1) I did make sure that I only ran through "Implementation" first and then opened it; once opened, I was able to set the settings for the bit file to include a .bin, needed for SDK, and to allow for modifying the advanced settings; this allows you to set compression, which I did, and to specify a secondary programming configuration for the device, of which you select SPI x4; once you apply the settings, you will be required to save them as a constraint, and then re-generate the bit file
2) Once the initial bit file is exported to a bin, compressed, and send to SDK, I generate the spi_srec bootloader that is availble; once configured, I changed the compilation settings to -g and -Os to compress the size on the compiler, I turned on the -s flag in the linker, and left VERBOSE on in the .c file so I could see output on the UART; all other settings, like changing the xilisf, I left the same as the guide
3) I evaluated the compiled code for Hello_World, bootloader, and the FPGA via SDK prior to going back to Vivado to make the MCS; I loaded an srec version of Hello_World into the flash first, reset board power, flashed the FPGA, and then ran the bootloader on bare metal as a debug; this validated I saw everything I needed to in the bootloader running and would correctly run the srec version of Hello_World; I then generated my own srec (since SDK just makes its in Cache and then deletes it) using the SDK version of bash under the Xilinx tools; the command to run this is "mb-objcopy -O srec <name_of_program>.elf <name_of_program>.srec"
4) Back in Vivado I attached the .elf for the bootloader not only to the design in IP Generator for synthesis but I also tagged it for simulation too
5) Once the bit file generates, I opened the hardware manager and programmed the FPGA; with the programmed FPGA opened I added the necessary memory device, and under "Tools" in Vivado used the option to generate and MCS file; make sure you use the device flag, select the correct Spansion model, and then fill out the fields accordingly; I also checked the "overwrite" and "checksum" fields to make sure I made a clean new MCS file (whatever you named it) and that there was a checksum to use later in programming the flash for verification
6) From the hardware manager I right click on the memory device, and then select the option to program it; I select the .mcs file, and the checksum file for the next two fields; I checked all the boxes, let the device program, and then reset the device; it worked immediately
NOTE: In resetting the device I still had issues with it not running the bootloader unless I held the button down for a count of 2; once I did that it worked every time; I am not sure if there is some latent power that is keeping a portion of the device initialized when you do a quick button press that is causing the issue, or if it is just my board.
I did notice that same option in the SPI and changed it to Spansion from Micron, not sure if that helped in any way, since I was specifying the correct Spansion model in SDK. I do believe SDK is nuking the whole memory, so any user program is getting overwritten. Least that is what my troubleshooting supports. I can't find any setting in SDK to change that.
Overall I used a design and process that combined the Avent guide you provided; the Digilent one I originally posted, and the MCS guide from the Xilinx forum.
I am currently working on getting Vitis 2019.2 installed to see how that changes things up, especially since it seems like moving forward Xilinx is going to bundle all their tools together into that package. I will see if I can post a link to my google drive later with my package files so you can look at my project and how I set things up.