Hello everyone! I have been working with a data set of mine that needed this issue resolved, and I think I got it. I seem to have resolved it in a very roundabout (and still somewhat unstable, I know - but bear with me) fashion, but it works . Let me give you my reasoning.
My goal is to have essentially make a "live" Top 5 chart in Excel, that when data is posted, it automatically makes the decisions about what goes where using logic. HOWEVER, here are my constraints:
No VBA/macros allowed - many people I work with have difficulty with normal PC tasks, much less intimate Excel use - and even LESS VBA and macro usage. Also, I will not be constantly tending this file.
No array equations allowed - Again, array equations are sensitive to Ctrl + Shift + Enter, and there is a chance that they will be turned off of array mode and interpret incorrect data. Also, the nature of the data is large enough to where large amounts of array equations can bog the file itself down. (one of my last gigs, I was using array equations on ~200,000 cells with a single core processor-let's just say it took a little while to calculate)
So the goal was to make these transmutations by using normal Excel logic, that can be readily transferred from computer to computer and still count on it functioning.
I want to post this, to A) help everyone out that may be in the same predicament as me and B) I hope for a little input, to drastically refine it and remove some of the unnecessary steps. So any input is appreciated
So here goes the problem:
I am tasked with developing a Top 5 chart (can be top 10, 100, etc.). Let's make a scenario with car dealerships. Let's say this Top 5 chart wants to show the dealerships that sold the most vehicles in the country (let's say ~100 dealerships, each dealership has a 3-character code). Of those dealerships, there are two vehicles that can be sold - cars and trucks.
Of the Top 5 Dealerships, which of these dealerships sold the most cars as opposed to trucks? (forgive me if the table doesn't line up)
-----------------A ------------------------ B ---------------C------ D
1 --Dealership Code - Total Vehicles Sold - Cars - Trucks
2 ---DU2 -------------------------- 10 ------------------- 4 -------- 6
3 ---6F1 --------------------------- 16 ----------------- 14 -------- 2
4 ---5F3 --------------------------- 10 -------------------5 --------- 5
5 ---31C --------------------------- 9 ---------------------5 --------- 4
6 ---719 --------------------------- 14 ------------------ 10 --------- 4
Also, the dealership codes can be ANY CHARACTER - which can throw a wrench in things, since it's formatted as text. Anything from 111 to ZZZ so I can't "predict" what I'll get, all I have is an index showing me what COULD show up. And as you know, multiple dealerships can sell the same amount of vehicles/cars/trucks.
I plan on using the Large(range,k) function in conjunction with INDEX/MATCH to pick out the Dealership Code and the Total Vehicles sold. Upon doing so, you get these equations, with the table below them:
Cell E7's equation would look something like this:
=INDEX($A$2:$D$6,MATCH(LARGE($B$2:$B$6,1),$B$2:$B$6,0),1)
Cell F7's equation would look something like this:
=INDEX($A$2:$D$6,MATCH(LARGE($B$2:$B$6,1),$B$2:$B$6,0),2)
-------------E -----------F
7 ---------6F1 --------16
8 ---------719 ------- 14
9 ---------5F3 ------- 10
10 -------5F3 ------- 10
11 -------31C -------- 9
Yuck, big problem. There's no good way for Excel to understand you're looking for duplicates in a range when it's only focusing on one cell value at a time (k). There's also no way to sort the text reliably (keep in mind! -This must be done completely automatically, and without special code or arrays. No grubby error-prone human fingers doing any kind of moving, sorting, pivot tables, etc. either! )
Also, if we solve this - we essentially solve the "how many cars over trucks did they sell" problem as well. If you want me to go into it (it's the same thing, just replicated another step) I can.
An index column is already supplied with the data based on the dealer codes, with three numeric spaces, 100-999. I can append these to the Vehicles Sold values using "&"...
(aka, if "Total Vehicles Sold" cell = 10, with index number 106 = 10106. Looks like =G4&A4 or something like that.)
I'm sticking to an index that maintains the same number of spaces, so that when we use LARGE() or SMALL() again, we won't lose the meaning of the numbers we want. I'll explain in a bit.
Now we have numeric values for each dealership code, appended with its own index. This essentially makes the dealership code itself searchable by the LARGE() function a little later.
So:
--------A----------------------------- G
2 6F1 =(B3&T1) ------------ 16100
3 719 =(B6&T2) ------------ 14101
4 5F3 =(B4&T3) ----------- 10102
5 DU2 =(B2&T4) ----------- 10103
6 31C = (B5&T5) ------------ 9104
(Column T being the index)
Notice how 5F3 and DU2 are no longer duplicated...
However! Now if you try to use THIS with the LARGE() function, you'll quickly find out that Excel still treats it as text since it's a formula. "Well, that STILL puts us back where we started in that case, but now in ONE cell instead of two! What the heck, Will?!"
Since we know for a fact that all of our numeric codes are (in fact) numbers now, we can force excel to treat them as numbers. Encompass your equations that result in our finished numeric codes () and add a +0 on the end..
[Example: =(B3&T1)+0]
So now excel thinks they're numbers! (well, they ARE numbers.) I understand this will not sort the original values alphabetically or anything, but at least we now no longer have duplicate values. Now reverting back to the INDEX/MATCH style to look for data, do THIS! (based on table # 1 and #4, if they were adjacent to each other)
=INDEX($A$2:$G$6,MATCH(LARGE($G$2:$G$6,1),$G$2:$G$6,0),2)
=INDEX($A$2:$G$6,MATCH(LARGE($G$2:$G$6,1),$G$2:$G$6,0),3)
etc.
Use this in iterations of the column operator at the end of INDEX/MATCH, and you will end up with this table:
7 ----6F1 ------------16
8 ----719 -------------14
9 ----5F3 ------------ 10
10 --DU2 ---------- 10
11 --31C ------------ 9
All right.
With the help of an index column and a subsequent helper column, you can use the LARGE() and SMALL() functions to reliably (as reliable as your index is, that is) find values and their corresponding info without fear of duplicating them. Also, to solve the "how many cars over trucks did they sell?" question, take it one step further and use "Cars Sold" in place of "Total Vehicles Sold". You can use the resultant table directly above this paragraph, just use INDEX/MATCH in the same fashion to return all of the column values you wish to include.
Well, what do you think? I know it may seem much ado about nothing, but I am using it for a *colossal* data set which complexity is not represented by the car problem I have posed here, so I want a method that can reliably extrapolate itself without me ushering it along every few steps.
This allows me to dump in the data - grab my charts and go.
Again, I would really like your input to see if it can be greatly simplified and to increase its robustness. Also, let me know if anything's unclear. It seemed like a good thing to post, since it appears the only way to answer this question is using array equations and VB, and people seem to be having trouble with those instances.
Thanks everyone!
My goal is to have essentially make a "live" Top 5 chart in Excel, that when data is posted, it automatically makes the decisions about what goes where using logic. HOWEVER, here are my constraints:
No VBA/macros allowed - many people I work with have difficulty with normal PC tasks, much less intimate Excel use - and even LESS VBA and macro usage. Also, I will not be constantly tending this file.
No array equations allowed - Again, array equations are sensitive to Ctrl + Shift + Enter, and there is a chance that they will be turned off of array mode and interpret incorrect data. Also, the nature of the data is large enough to where large amounts of array equations can bog the file itself down. (one of my last gigs, I was using array equations on ~200,000 cells with a single core processor-let's just say it took a little while to calculate)
So the goal was to make these transmutations by using normal Excel logic, that can be readily transferred from computer to computer and still count on it functioning.
I want to post this, to A) help everyone out that may be in the same predicament as me and B) I hope for a little input, to drastically refine it and remove some of the unnecessary steps. So any input is appreciated
So here goes the problem:
I am tasked with developing a Top 5 chart (can be top 10, 100, etc.). Let's make a scenario with car dealerships. Let's say this Top 5 chart wants to show the dealerships that sold the most vehicles in the country (let's say ~100 dealerships, each dealership has a 3-character code). Of those dealerships, there are two vehicles that can be sold - cars and trucks.
Of the Top 5 Dealerships, which of these dealerships sold the most cars as opposed to trucks? (forgive me if the table doesn't line up)
-----------------A ------------------------ B ---------------C------ D
1 --Dealership Code - Total Vehicles Sold - Cars - Trucks
2 ---DU2 -------------------------- 10 ------------------- 4 -------- 6
3 ---6F1 --------------------------- 16 ----------------- 14 -------- 2
4 ---5F3 --------------------------- 10 -------------------5 --------- 5
5 ---31C --------------------------- 9 ---------------------5 --------- 4
6 ---719 --------------------------- 14 ------------------ 10 --------- 4
Also, the dealership codes can be ANY CHARACTER - which can throw a wrench in things, since it's formatted as text. Anything from 111 to ZZZ so I can't "predict" what I'll get, all I have is an index showing me what COULD show up. And as you know, multiple dealerships can sell the same amount of vehicles/cars/trucks.
I plan on using the Large(range,k) function in conjunction with INDEX/MATCH to pick out the Dealership Code and the Total Vehicles sold. Upon doing so, you get these equations, with the table below them:
Cell E7's equation would look something like this:
=INDEX($A$2:$D$6,MATCH(LARGE($B$2:$B$6,1),$B$2:$B$6,0),1)
Cell F7's equation would look something like this:
=INDEX($A$2:$D$6,MATCH(LARGE($B$2:$B$6,1),$B$2:$B$6,0),2)
-------------E -----------F
7 ---------6F1 --------16
8 ---------719 ------- 14
9 ---------5F3 ------- 10
10 -------5F3 ------- 10
11 -------31C -------- 9
Yuck, big problem. There's no good way for Excel to understand you're looking for duplicates in a range when it's only focusing on one cell value at a time (k). There's also no way to sort the text reliably (keep in mind! -This must be done completely automatically, and without special code or arrays. No grubby error-prone human fingers doing any kind of moving, sorting, pivot tables, etc. either! )
Also, if we solve this - we essentially solve the "how many cars over trucks did they sell" problem as well. If you want me to go into it (it's the same thing, just replicated another step) I can.
An index column is already supplied with the data based on the dealer codes, with three numeric spaces, 100-999. I can append these to the Vehicles Sold values using "&"...
(aka, if "Total Vehicles Sold" cell = 10, with index number 106 = 10106. Looks like =G4&A4 or something like that.)
I'm sticking to an index that maintains the same number of spaces, so that when we use LARGE() or SMALL() again, we won't lose the meaning of the numbers we want. I'll explain in a bit.
Now we have numeric values for each dealership code, appended with its own index. This essentially makes the dealership code itself searchable by the LARGE() function a little later.
So:
--------A----------------------------- G
2 6F1 =(B3&T1) ------------ 16100
3 719 =(B6&T2) ------------ 14101
4 5F3 =(B4&T3) ----------- 10102
5 DU2 =(B2&T4) ----------- 10103
6 31C = (B5&T5) ------------ 9104
(Column T being the index)
Notice how 5F3 and DU2 are no longer duplicated...
However! Now if you try to use THIS with the LARGE() function, you'll quickly find out that Excel still treats it as text since it's a formula. "Well, that STILL puts us back where we started in that case, but now in ONE cell instead of two! What the heck, Will?!"
Since we know for a fact that all of our numeric codes are (in fact) numbers now, we can force excel to treat them as numbers. Encompass your equations that result in our finished numeric codes () and add a +0 on the end..
[Example: =(B3&T1)+0]
So now excel thinks they're numbers! (well, they ARE numbers.) I understand this will not sort the original values alphabetically or anything, but at least we now no longer have duplicate values. Now reverting back to the INDEX/MATCH style to look for data, do THIS! (based on table # 1 and #4, if they were adjacent to each other)
=INDEX($A$2:$G$6,MATCH(LARGE($G$2:$G$6,1),$G$2:$G$6,0),2)
=INDEX($A$2:$G$6,MATCH(LARGE($G$2:$G$6,1),$G$2:$G$6,0),3)
etc.
Use this in iterations of the column operator at the end of INDEX/MATCH, and you will end up with this table:
7 ----6F1 ------------16
8 ----719 -------------14
9 ----5F3 ------------ 10
10 --DU2 ---------- 10
11 --31C ------------ 9
All right.
With the help of an index column and a subsequent helper column, you can use the LARGE() and SMALL() functions to reliably (as reliable as your index is, that is) find values and their corresponding info without fear of duplicating them. Also, to solve the "how many cars over trucks did they sell?" question, take it one step further and use "Cars Sold" in place of "Total Vehicles Sold". You can use the resultant table directly above this paragraph, just use INDEX/MATCH in the same fashion to return all of the column values you wish to include.
Well, what do you think? I know it may seem much ado about nothing, but I am using it for a *colossal* data set which complexity is not represented by the car problem I have posed here, so I want a method that can reliably extrapolate itself without me ushering it along every few steps.
This allows me to dump in the data - grab my charts and go.
Again, I would really like your input to see if it can be greatly simplified and to increase its robustness. Also, let me know if anything's unclear. It seemed like a good thing to post, since it appears the only way to answer this question is using array equations and VB, and people seem to be having trouble with those instances.
Thanks everyone!
Last edited: