Using the x16 may slow down the other x16, but the PCIe NIC itself should be faster. If you are OK with your graphics card running at x8, then you could just move the GPU into the other x16 slot and then plug the NIC into the x1. Most of the extension cables do look to be right around 1cm.
Full-duplex means uploading and downloading at the same time, so for Gbit is generally only something you can saturate on the local network. Just uploading is half-duplex so cannot exceed 1Gbit of total throughput (which the PCI bus has enough bandwidth to fully supply) and no game is going to approach anywhere near that kind of download bandwidth. Over the internet would require 1000/1000 service from your ISP and direct connection to the modem or using a PC as a router, as while Smallnetbuilder has tested many routers that can move WAN-to-LAN or LAN-to-WAN at ~940Mbps, none can do both at the same time. They just don't have enough CPU.
If you are moving that kind of data, then it's probably more important to get a NIC that can offload as much of the work as possible to itself rather than having the CPU do the work (as it can take up to 2GHz of a single core just to do this) so a discrete card can be a substantial upgrade over the onboard. While there's no such thing as an IP header checksum at all in iPv6 (that can only be offloaded in iPv4), TCP and UDP pseudo-headers are payload data in iPv6 for which checksums must be calculated--it's no longer optional for UDP due to the lack of an IP header checksum. Multi-port Intel cards can also offload IPsec AH and ESP used for VPNs.
Not even 10 Gigabit ethernet cards need more than x4. And many people have found when using RAID cards that CPU-connected PCIe lanes often only work with GPUs so that x1 card may not even work in the other x16 slot. Some boards do have 3rd or 4th x16 slots that are attached to the chipset and only wired x4 (at least the CPU to chipset connection is only equivalent to x4)--those should work.