How to Use VLOOKUP in Excel or Google Sheets

Status
Not open for further replies.

elroy.coltof

Commendable
Jun 29, 2018
13
5
1,515
Please don't, I swear a kitten is killed everytime someone uses VLOOKUP when INDEX should be used. I've spent too many wasted hours fixing painfully slow excel sheets that do searches when a direct indexing is possible.

The kitten is only maimed if you use VLOOKUP when INDEX and MATCH should be used. INDEX and MATCH are much more robust against column insertions and you can directly read what column your results will be pulled from. Not to mention the huge advantage that you can now skip repeating your key column because you couldn't count out the 20+ columns in between to determine the column number in your vlookup search area. (The key column could even be to the right of your results column, but try to prevent that if at all possible.
 

Soaptrail

Distinguished
Jan 12, 2015
302
96
19,420
As an accountant we hired a senior accountant who did not know how to do a vlookup. I was flabergasted. But i knew he would not last long at our company and I was right. Excel is everyone's best friend they just don't realize it.
 

silverblue

Distinguished
Jul 22, 2009
1,199
4
19,285
Ah VLOOKUP... how I dislike you. Articles on INDEX and MATCH would be very helpful, especially when somebody's working with a lot of data (and especially when you're on a fair to middling laptop).

Array formulas take the cake, though. :)
 
I need better Excel skills like / unlike this; not sure judging by comments.

At work I do alot of traveling between locations pretty much at random.

I may go from branch 4 to branch 2 on Monday and then branch 7 to branch 1 to branch 11 on Tuesday.

So in an effort to make it easier to calculate distances between branches I hard coded in excel's visual basic "i believe it's visual basic" the distance between all 16 branches to every other branch on the spreadsheet for the first 3 lines on the excel page.

if a1 = branch 1 and a2 = branch 2 then a3 = 4 (distance between branches)

Each row in excel has at minimum 16x16 or 256 if then statements
So with 3 rows I choose to write that's 256x3 or 768 lines of code instead of a chart lookup table thingy.

Payroll also wants the actual name of the branch so I made another hard coded script that replaces "branch 1" with the actual name and address of the branch.

If a1 = branch 1 then a1 = Branch 1 Street, Streetville, OH

This effectively doubles the previous number from 768 to 1536 lines of code.

I am 99% sure this is the wrong way to do it but this 1500+ line monstrosity works well and is remarkably quick.
 

hermitboy

Distinguished
Feb 11, 2013
9
1
18,515
A little history could be nice as well - which can help with optimization.
Back when computers were much less powerful, doing any sort of lookup could take a really long time.
Then along comes Vlookup.
Vlookup works best when your data is sorted, because it was created with a specific shortcut in mind - narrowing the search range by half.
Vlookup compares your lookup value to the middle value in the primary column. Then it will compare the lookup value to middle of the top half or bottom half - and will continue comparing the lookup value to increasingly smaller ranges until it finds the match.

Not sorting the data defeats the purpose of Vlookup's shortcut processing - and can make the actual lookup take much longer than if Index-Match were used.

Personally, I prefer index-match. Especially in situations where I need the data sorted on a column that is not related to the lookup value. But there are times when I'll use Vlookup instead.
 

elroy.coltof

Commendable
Jun 29, 2018
13
5
1,515
Reading back all the comments I feel it's best to explain why VLOOKUP is a terrible choice for this particular problem.

First off it should be noted that VLOOKUP has two wildly different modes of operation. If the last parameter is TRUE (the default if you omit it) it expects a sorted first column and is really fast using a binary search. However it also means it looks for an aproximate match. If the value you look for is not in the table it returns the closest result after the position after where your result should have been. BUT IT DOESNT TELL YOU THIS! The search is O(logN)

The second mode, last parameter FALSE looks for an exact match and does not require sorted data, it does a lineair search and is bog slow. It does however return an N/A if no exact match is found. The search is O(N).

To speed things up therefore you could sort the data you're searching in and use a double aproximate vlookup in an if statement: The first one you is in the condition and checks against the search value, if they are different then the item is not present, return N/A manually. If they are equal do the VLOOKUP again and return the value. If you do it in this order at least the second lookup would not be done if an item is not present. If you have a large amount of data and you are able to sort them this is faster than a single exact vlookup! Never use FALSE on large sorted datasets!!

Now consider that if you want to merge multiple columns you have to repeat those VLOOKUPs for every column, so 20 columns is 20x as slow!

And now INDEX/MATCH: The power here is that they are two different functions.

INDEX allows you to take a range and directly reference a value in that range by row and column number. No searching happens, it's O(1)!

MATCH determines the row number the matching item is in using a lineair search (the slow one, I know, but because of the way it's implemented it's still faster and uses less memory). You can store this value in a cell and then use Index for every column you want to match. It's O(N) for an exact match (last param = 0) and O(log(N)) for an aproximate search (last param = -1 or 1) and you only have to do it once for every row and not for every column! You can do the same trick with a double aproximate MATCH as you could do with VLOOKUP to benefit from the speed of binary search and still be guarantueed to be told if there is no exact match.

So how do you then match up to data tables with the same key column?
You make a helper column with MATCH that has the row numbers in the second file in the order they appear in the first file. This column can be on a different sheet or even in a different file. And then you use INDEX to add the data to the first file.
 
Jun 30, 2018
1
0
10
Everything that Elroy said about VLOOKUP (not to mention notorious LOOKUP) is right. Today. But, VLOOKUP was made in times when spreadsheet was ment to be spreadsheet, and not database as is case today. Times when you were very limited by size of the spreadsheet and where people would be making small tables (coutning hundreds of rows) where VLOOKUP made sense. It still does if your tables are small and you want fast rezults.
I use it on dayly basis in such cases, but if tables grow over 10000 rows INDEX/MATCH combo cannot be beaten! Especially if you have 50+ columns and key column is third or 10th and not the first one.
Also purpose of TRUE variant and FALSE variant are totaly different! TRUE variant gives you exact match when that match exists. But if you deal with ranges and only limits of the ranges are quoted in compare to table, FALSE variant is the only one usefull and right.
 

elroy.coltof

Commendable
Jun 29, 2018
13
5
1,515
Nice Peter, a handy macro if you do that type of work a lot.

I rarely use Excel anymore, except that I recieve a lot of data in Excel files and I'm supposed to deliver results back in Excel.

I've switched to using Python and Pandas, but I feel that's a bit to far out of the left field for this thread. Read into dataframes from excel, manipulate data in with python/pandas/numpy/scikit-learn and export dataframe to excel.

The python stuff is version controlled for future reference.
 
G

Guest

Guest
+1 Elroy.coltof

=INDEX(ResultColumn,MATCH(SearchTerm,SearchColumn,0)) is 10x faster and the SearchColumn does not have to be on the left.
 

bit_user

Polypheme
Ambassador

Given a list of branches to visit in a week and the distances between them, can you show me how to write a spreadsheet that computes the optimal order in which to visit them?

Every time I try, my spreadsheet just hangs if I enter more than a few locations.
 

sebastienm

Distinguished
May 18, 2005
26
0
18,530
Note1 -- When the column to match is not the first column of the table:
If the key column is not the first column of a table -- even if it is on the right side of the columns you need to return -- you can adjust the VLOOKUP with the CHOOSE function, eg:
You key is in D1:D10 and you want to return corresponding the values from B1:B10:
=VLOOKUP( key_here , CHOOSE( {1,2} , D1:D10 , B1:B10 ) , 2 , FALSE )
--> in the CHOOSE section is put D column first to search on it.
--> no need to touch the source data

Note2 -- Returning multiple columns of data
INDEX and Match are definitely more efficient as =INDEX( MATCH( )
, but especially if you need return values from multiple columns.
- You can use a 'dummy' column, say col A, to return the row of the match with MATCH() in the data table
- then in B,C,D... use INDEX( using value in in A) in all subsequent columns.
This way, the hard part 'search and compare' handled by the MATCH only once per row ( in col A).

Note 3:regarding the 'travel to all branches' problem -- this is a TSP problem. You can use the Excel Solver addin to solve this types of problems when the number of places to visit is small. Worth trying in your case.

Excel is an incredible piece of software.
 


For me working in I.T it is almost impossible to predict when something will break, with 16 branches electronics/computers break all the time.

Thankfully I am not the only technician that repairs and our boss tends to give out work orders for branches that are physically near each other.

What you are asking may be impossible to do with Excel due to Excel not knowing where each branch is in relation to another branch.

For example
If the distance between branch 1 and branch 2 is 3 miles
and the distance between branch 2 and branch 3 is 4 miles
We could use a^2 + b^2 = c^2 to correctly deduce that the absolute distance between branch 1 and branch 3 is 5 miles, solving for the hypotenuse obviously lol.

And this works great with geometry, but in the real world maybe there is a swamp in between branch 1 and 3 which would force you to backtrack from branch 3 to branch 2 then to branch 1.

Thankfully I rarely travel between more than 3 branches in a day and I do try to structure it so the last stop is nearest my house, for all those mileage dollars.

Of course if I was traveling between 6+ branches in a day I would use MapQuest to figure out the most optimal path.

MapQuest knows where branches are in relation to each other, branch 1 is northwest of branch 2, and more importantly 99% of all the roads used to get to those branches.

To actually do this with excel would basically be recreating the background logic of Mapquest which I imagine requires a lot of memory unless you limited the map area to your city.

 

bit_user

Polypheme
Ambassador

Dude, I'm sorry. It was a CS troll. When I saw you talking about traveling between branches and computing distances, I just could not resist.

https://en.wikipedia.org/wiki/Travelling_salesman_problem

I was hoping somebody would catch it and "get" the joke, before anyone tried to answer it seriously. Seemed to be enough nerds up in this thread.

To optimally solve this for significant numbers of branches, you would need a quantum computer.
 
Status
Not open for further replies.