At any given time, you probably have at least a hundred or more "apps" (system services, hardware driver/modules, start-up applications, desktop environment services etc) already running, before you even open your first tab. Most of them remain pretty much idle, leveraging no execution resources until they are called on to provide whatever function they are needed for.
When you start opening tabs in chrome, each one generates a new thread. Initially, when a page is loaded on that tab, the compute workload on that thread is very high as the data for that page is loaded, decompressed, interpreted, compiled, and executed. After the page has been "rendered," the compute workload for that page will typically drop to a pretty low level, but this varies from page to page. A wiki page for example, will have almost no ongoing compute overhead, as there is typically nothing on that page that would call for any execution resources once the page has been rendered (animations, automatic reloading/updates, effects etc). In other cases, like say, a page displaying your email, that tab will probably check with the server every few seconds to see if you have any new emails. This will require the attention of some execution resources, though not many. When we have dozens of threads open, usually at least a few of them have some form of continuous or repeating workload. Examples of websites that will demand execution resources either continuously or in short blips every few seconds, would be "live" news feeds, stock information pages, weather, etc... Of course then there's advertising. If you're like me, you probably run an ad-blocking plugin, which stops the vast majority of this stuff. However, if you don't run one, or you visit sites for which it does not work, or for which you have disabled the block, then advertising can potentially show up as a continuous workload (often animated, often on a revolving policy of sorts snagging new adverts every few seconds or minutes).
Short version: A bunch of open tabs could have a compute overhead of nearly nothing, or lots of overhead, depending on what is loaded and running (or not running) within those tabs.
--------------
All modern operating systems have the ability to manage these threads and assign them to different physical or logical cores according to adjustable policies. The task scheduler can be optimized to schedule with maximum compute efficiency goals, or with maximum performance goals. Most threads can be moved around from core to core on a CPU by the scheduler without restriction, though software can be written to prevent this (or application can be launched with flags to prevent it), it is generally only beneficial to do so for real-time workloads that would be negatively effected by the latency introduced by the transition from core to core (example: a computer used for running a CNC milling operation).
In your case, the reason you are seeing more of the workload appear on only 2 "cores" is because of the performance/power policy of the scheduler and how it works on a hyperthreaded system.
For simplicity sake think of hyper-threading as an additional inlet pipe leading to the same CPU core, and think of an Intel CPU core as a multi-port execution engine with 8 internal "execution ports." For best performance, the scheduler will prioritize balancing the work on the 2 physical cores first, until those inlet pipes are saturated, at which point, it will move on to attempt to increase the saturation of the cores by scheduling work on the extra pipes (hyperthread) leading to those same cores. This order of scheduler prioritizing will result in you only seeing "2" of those 4 "cores" getting work scheduled to them in many cases, even if the workload generates many more than 2 threads. The Intel core is internally very parallel, so it is often possible to achieve higher execution port saturation if multiple inlet ports are leveraged simultaneously (that's hyperthreading), but from a performance perspective, it doesn't make sense to start doing this until we have exhausted all of the opportunities to schedule work on all available physical cores first. So in a CPU with threads 0, 1, 2, 3. With threads 0 and 1 belong to the first core, and threads 2 and 3 belonging to the second core, the scheduler will saturate "threads" 0 and 2 before assigning work to 1 and 3.
The same thing happens on an AMD platform, which has CPU "modules," each containing a pair of multi-port (4 port) execution engines, but sharing numerous other resources, like L2 cache, instruction decoders, instruction fetch, branch prediction, FPU and some schedulers. In an 8 core AMD CPU, there are 4 "modules." Performance in unrelated workloads scales best when the work is assigned to separate modules first, then to the remaining cores on each module once saturation on the first 4 "cores" is reached.
When the system scheduler is in power saving mode, it will prioritize on the saturation of entire cores or modules first, so that more of the CPU can remain gated off (low power dissipation modes). Thus, in the example above with a 4 threaded CPU with threads 0 and 1 sharing a core or module, and threads 2 and 3 sharing a core or module, the scheduler would prioritize on scheduling to "thread" 0 and 1 first, until saturation is reached, before scheduling on the second physical core/module (2/3). This scheduling policy maximizes compute efficiancy at the expensive of achieving the best performance scaling.
In both cases (power saving or performance scheduling policies), it is normal to see favoritism in scheduling towards half of the "threads" on Intel hyperthreaded CPUs and on AMD construction architecture CPUs. This doesn't mean your workload can't leverage more threads/cores/modules, it just means your workload at the time you are observing it is "fitting" within the ideal parameters of one of these power policies.
If you move to a non-hyperthreaded i5, you'll undoubtedly see CPU usage "balanced" out on the 4 cores more evenly when multi-tasking heavily. When using an FX-83XX, the effect that you are observing now with hyperthreading will be in effect but not with the same strictness, as there are cases where sharing multiple threads on a module results in better performance than splitting them up onto separate modules (especially if they are sharing work on the same dataset which can fit within the L2 cache of a module). I see all sorts of scheduling behaviors on my FX-8350 for different workloads, but rest assured, multi-tasking in chrome scales just fine to 8-core and/or 8 threaded CPUs.
----------
Confused yet?