Regarding Teraflops: one reason is because teraflop number in specifications for RX480 is for maximum boost clock, not for average operation. While for GTX1060 specs- teraflop number in specs is just some lower number, because it really runs at much higher average clocks than stated anywhere in specs.
Another reason: AMD has additional hardware- Asynchronous compute engines (ACE) in front of SPs (/Teraflops). This engine can help with computation by offloading some work from CPU to GPU, but it also acts a bit like a stop-light, making it harder to fully use teraflops unless programmed for it perfectly.
There are other differences: for example, nVidia has better color compression, and compressed data needs less bandwidth (so nVidia can...