Vlookup not quite what I need.....

  • Thread starter Thread starter Mike McKenzie
  • Start date Start date
M

Mike McKenzie

I have a large spreadsheet which is effectively being used
as a database (names of people who have been on training)

I want to be able to view info on a particular individuals
training. My boss just wants to type a person's name into
a cell and see ALL instances of that person in the
spreadsheet. The Vlookup function would work perfectly if
it wasn't for the fact that it only finds the FIRST
instance of that person in the spreadsheet - the person
can occur many times

Is there a function (apart from AutoFilter) that can tell
me all instances of a particular record??
 
If the names are in the same column, what's wrong with using data filter.
You could have a worksheet change event or button assigned to a macro to
make the autofilter automatic.
 
Please read the answer to my question "Lookup solutions" perhaps that will
work for you...

/Melker
 
...
...
I want to be able to view info on a particular individuals
training. My boss just wants to type a person's name into
a cell and see ALL instances of that person in the
spreadsheet. The Vlookup function would work perfectly if
it wasn't for the fact that it only finds the FIRST
instance of that person in the spreadsheet - the person
can occur many times

Is there a function (apart from AutoFilter) that can tell
me all instances of a particular record??

There's SQL.REQUEST. It requires that the ranges you're accessing be set up as
tables, meaning that the top row contain valid, distinct field names and that
these ranges including their top row of field names be given defined names. If
you had such a table named Tbl in a file named C:\foo\bar.xls, you could fetch
all records for 'Smith, John' in Tbl's Name field where this name was entered in
cell A2 (without the single quotes) using the array formula

=SQL.REQUEST("DSN=Excel Files;DBQ=C:\foo\bar.xls",,,
"Select * From Tbl Where Name='"&A2&"'",1)

You could also extract the distinct names from Tbl using the array formula

=SQL.REQUEST("DSN=Excel Files;DBQ=C:\foo\bar.xls",,,
"Select Distinct Name From Tbl",0)

in a range such as AA1:AA100, where the 100 is large enough to ensure all
distinct people (Name field entries) are included in the result, then create a
dynamic named range such as Names referring to

=OFFSET($AA$1:$AA$100,0,0,COUNTIF($AA$1:$AA$100,"?*"),1)

Then set the cell your boss would use to enter employee names to use a Data >
Validation drop-down list and make the Source =Names (include the '=').

I've finally gained enlightenment! If you're trying to use Excel as a database,
you should use SQL.REQUEST often. It can even do some things in a single
function call that Excel itself needs a few nested functions to achieve, e.g.,

=SQL.REQUEST("DSN=Excel Files;DBQ=C:\foo\bar.xls",,,
"Select Count(*) From (Select Distinct Name From Tbl Where Name Is Not Null)",0)

Can anyone be as tedious as he who's just attained enlightenment?
 
Harlan Grove said:
There's SQL.REQUEST. It requires that the ranges
you're accessing be set up as tables, meaning
that the top row contain valid, distinct field
names ...

No, this is not a requirement. If you omit column headers they will be
given default names of F1, F2, F3 etc.
... and that
these ranges including their top row of field
names be given defined names.

No, this is not a requirement. You can query a book level name e.g.

SELECT MyCol1, MyCol2 FROM [BookLevelName]

or a sheet level name e.g.

SELECT MyCol1, MyCol2 FROM [Sheet1$SheetLevelName]

or a worksheet name e.g.

SELECT MyCol1, MyCol2 FROM [Sheet1$]

or a range address e.g.

SELECT MyCol1, MyCol2 FROM [Sheet1$A1:B999]

However, a dynamic array of the

=OFFSET($AA$1:$AA$100,0,0,COUNTIF($AA$1:$AA$100,"?*"),1)

kind cannot be used :-(
you had such a table named Tbl in a file named
C:\foo\bar.xls, you could fetch all records for
'Smith, John' in Tbl's Name field

In RDBMS theory, the correct terminology is 'column' and 'row' rather
than 'field' and 'record'. While one could argue that querying Excel
in this way is using an ISAM to which 'field' and 'record' have
correct technical meaning, I'd argue that 'column' and 'row' have more
meaning to an Excel user.
If you're trying to use Excel as a database,
you should use SQL.REQUEST often.

It can be very slow if you are using SQL.REQUEST in many cell formulas
because it must establish a connection for each formula when e.g. the
sheet recalculates. So I wouldn't recommend you use SQL.REQUEST often
in one workbook. And then there's the question of whether it is safe
to use SQL.REQUEST to query an open workbook...
Can anyone be as tedious as he who's just attained enlightenment?

A know-all pedant who's hard on the 'newbie'? ;-) And for real
enlightenment, try ADO in VBA code.

--
 
what kind of training are you tracking? i have written a program that tracks
training for safety related purposes but can be used for anything. i hope to
market it someday but would like to get a few people using it to make sure
the bugs are out of it.
 
...
...

Thanks for further enlightenment. Even better, even more flexible than Advanced
Filters or Pivot Tables.
No, this is not a requirement. If you omit column headers they will be
given default names of F1, F2, F3 etc.
...

If all fields/columns were text, how would SQL.REQUEST distinguish between top
row as field names vs top row as first record/row? I'm genuinely curious.
In RDBMS theory, the correct terminology is 'column' and 'row' rather
than 'field' and 'record'. While one could argue that querying Excel
in this way is using an ISAM to which 'field' and 'record' have
correct technical meaning, I'd argue that 'column' and 'row' have more
meaning to an Excel user.

Why aren't you picking on me for using 'table' rather than the theoretically
correct term 'relation'? Anyway, field and record are the terms used in online
help for DSUM and similar functions, though column and row are used in online
help for Advanced Filter. Microsoft seems to use these terms interchangeably.
It can be very slow if you are using SQL.REQUEST in many cell formulas
because it must establish a connection for each formula when e.g. the
sheet recalculates. So I wouldn't recommend you use SQL.REQUEST often
in one workbook. And then there's the question of whether it is safe
to use SQL.REQUEST to query an open workbook...
...

OK, this is reasonable. The advantage is that SQL.REQUEST updates during recalc,
while Advanced Filters and Pivot Tables require user intervention. It's a
time/reliability trade-off.

What's the concern about querying open workbooks? The fact that SQL.REQUEST is
still written XLM, and XLM is suspect if not outright dangerous?
 
Response in-line:

Harlan Grove wrote...
"onedaywhen" wrote...

Jamie, if you prefer.
If all fields/columns were text, how would SQL.REQUEST distinguish between top
row as field names vs top row as first record/row? I'm genuinely curious.

It's a registry key:

Hkey_current_user\Software\ODBC\ODBC.INI\Excel
Files\Engines\Excel\FirstRowHasNames

where 0 (zero binary) is FirstRowHasNames=No and 1 (one binary) is
FirstRowHasNames=Yes.

One *should* be able to override the registry setting in the
connection string i.e.

DSN=Excel Files;DBQ= C:\foo\bar.xls;FirstRowHasNames=0;

However, as is all too common with drivers, there is a bug:

Microsoft Knowledge Base Article - 288343
BUG: Excel ODBC Driver Disregards the FirstRowHasNames or Header
Setting
http://support.microsoft.com/default.aspx?scid=kb;EN-US;288343

So you have to manually or programmatically toggle the registry
settings, easy to do but permissions can be a problem. Anyhow, it
means that if your cell formulas are relying on column headers being
present you should be checking that the registry setting for the
current user will yield the behavior you expect. Could be even more of
a pain that toggling it yourself.

Note you don't have this problem when using ADO and the MS OLEDB
provider for Jet on an Excel data source. You can use e.g. HDR=No in
the connection string and it is honored to override the registry
setting

Hkey_Local_Machine\Software\Microsoft\Jet\4.0\Engines\Excel\FirstRowHasNames=1

(and note the Jet registry key is local machine).
Why aren't you picking on me for using 'table' rather than the theoretically
correct term 'relation'?

Because we're not talking in theoretical terms. It strikes me as more
pragmatic to use 'column', 'row' and 'table' when talking about Excel
data because they have some meaning to Excel users, 'fields',
'records' and 'relations' less so. It's just convenient to my argument
that 'column' and 'row' are also used in RDBMS theory. I regret that
the Jet, MS Access and SQL Server documentation use 'field' and
'record' because this seems to be theoretically and technically
incorrect (depends on whether you consider Jet to be a file-based ISAM
or a true DBMS: the use of 'field' and 'record' suggest the former but
I'm sure MS would like us to conclude the latter so why the
terminology? Beats me).
The advantage is that SQL.REQUEST updates during recalc,
while Advanced Filters and Pivot Tables require user intervention.

I use neither, and don't doubt your expertise on this, but assume they
could be programmatically refreshed in e.g. a Worksheet_Calculate
event handler.
What's the concern about querying open workbooks? The fact that SQL.REQUEST is
still written XLM, and XLM is suspect if not outright dangerous?

I can offer no evidence that it is. I *think* I've seen it mentioned
(by Jake Marx?) in this ng but I can't find any Usenet threads in the
google archive to support this. Querying an open workbook using ADO
results in a demonstrable memory leak; I've previously done some
testing with ODBC and could detect no equivalent memory leak. I will
offer that problems could arise because there is no way AFAIK of
effectively specifying database-style lock type, cursor type, etc for
an Excel workbook so querying and especially executing SQL against an
open (and potentially modified and unsaved) workbook may yield
unexpected results.

Jamie.

--
 
Harlan Grove wrote... ...
It's a registry key:

Hkey_current_user\Software\ODBC\ODBC.INI\Excel
Files\Engines\Excel\FirstRowHasNames

where 0 (zero binary) is FirstRowHasNames=No and 1 (one binary) is
FirstRowHasNames=Yes.

One *should* be able to override the registry setting in the
connection string i.e.

DSN=Excel Files;DBQ= C:\foo\bar.xls;FirstRowHasNames=0;

However, as is all too common with drivers, there is a bug:

Microsoft Knowledge Base Article - 288343
BUG: Excel ODBC Driver Disregards the FirstRowHasNames or Header
Setting
http://support.microsoft.com/default.aspx?scid=kb;EN-US;288343

So you have to manually or programmatically toggle the registry
settings, easy to do but permissions can be a problem. Anyhow, it
means that if your cell formulas are relying on column headers being
present you should be checking that the registry setting for the
current user will yield the behavior you expect. Could be even more of
a pain that toggling it yourself.
...

Right. So as a practical matter if one wants to avoid VBA, one must either set
this registry value to 0 or 1 and adapt all SQL.REQUEST calls to the setting. If
it defaults to first row having names, then since I didn't tell the OP to go
changing registry settings, it was appropriate to tell him/her to make sure
there were field/column names in the top row.
Because we're not talking in theoretical terms. It strikes me as more
pragmatic to use 'column', 'row' and 'table' when talking about Excel
data because they have some meaning to Excel users, 'fields',
'records' and 'relations' less so. It's just convenient to my argument
that 'column' and 'row' are also used in RDBMS theory. I regret that
the Jet, MS Access and SQL Server documentation use 'field' and
'record' because this seems to be theoretically and technically
incorrect (depends on whether you consider Jet to be a file-based ISAM
or a true DBMS: the use of 'field' and 'record' suggest the former but
I'm sure MS would like us to conclude the latter so why the
terminology? Beats me).

Intruding a bit of history, breadth and low level representations into this
discussion, fields and records are terms that predate computing - see

http://www.cs.uiowa.edu/~jones/cards/history.html

Codd intentionally used different terminology in laying the mathematical
foundation for the relational model, but everyone who had used databases prior
to that knew that columns = fields and rows = records. If we're worried about
what terminology would make the most sense for newer/younger users, if they have
any programming experience, they know about composite mixed type data
structures, which are called records in COBOL and Pascal, structs in C/C++
(classes are something else), and types in VB[A]. Most scripting languages,
certainly awk and Perl, use field/record terminology. And if users have any
experience with, um, record keeping they may also be familiar with field/record
terminology.

Then there's your equivalence of ISAM with field/record terminology. The
relational model doesn't specify storage or access mechanisms, and a fair number
of RDBMS tasks are performed at low level using ISAM. Then there's the fact that
computer memory is one big linear, single dimension expanse. In memory there are
no columns and rows, just offsets from various base memory addresses.

Fields and records are colloquially equivalent to columns and rows in the
relational model even if that model distinguishes between them.
I use neither, and don't doubt your expertise on this, but assume they
could be programmatically refreshed in e.g. a Worksheet_Calculate
event handler.

Yup. And if the goal were database access, then ADO would make more sense still.
However, if one wants to avoid VBA, which is often the case, one can't use event
handlers.
I can offer no evidence that it is. I *think* I've seen it mentioned
(by Jake Marx?) in this ng but I can't find any Usenet threads in the
google archive to support this. Querying an open workbook using ADO
results in a demonstrable memory leak; I've previously done some
testing with ODBC and could detect no equivalent memory leak. I will
offer that problems could arise because there is no way AFAIK of
effectively specifying database-style lock type, cursor type, etc for
an Excel workbook so querying and especially executing SQL against an
open (and potentially modified and unsaved) workbook may yield
unexpected results.

Good point. It's still a design trade-off - reliability of data refresh on
recalc vs reliability in terms of avoiding the nastier aspects of such a bug. If
the workbook would be open for long periods and recalced often, then such a bug
may argue against using SQL.REQUEST against open workbooks. If the workbook
wouldn't be open for long periods or the SQL.REQUEST calls wouldn't recalc
often, then such a bug may not affect normal use of the workbook.

With regard to cursors, not an issue. SQL.REQUEST would return full query
results. Cursors wouldn't/couldn't be used. With respect to table locking, not
an issue querying a workbook open in the same Excel instance since the normal
recalc sequence would effectively lock the table during SQL.REQUEST access.
However, I suppose there could be mucho nastiness if there were any circular
references between cell formulas in the table and SQL.REQUEST results. That
could be dangerous.

I have to conclude by saying that anyone (like me) who got used to 123's @DSUM
criteria expressions as opposed to criteria ranges and its ability to operate on
multiple tables via effective inner joins just has to laugh (ruefully) at
Excel's archaic DSUM criteria ranges. SUMPRODUCT formulas are usually a better
approach, but they're inefficient, cumbersome and complicated. SQL.REQUEST, if
there were no or not too severe memory leak bugs, is a far more elegant way to
achieve the same results.
 
Harlan,
Reponses in-line:
Intruding a bit of history, breadth and low level
representations into this discussion <snipped>

That's a good summary, well argued and much appreciated. I stand by
what I originally said:

"While one could argue ...'field' and 'record' have correct technical
meaning [and you did :-) ], I'd argue that 'column' and 'row' have
more meaning to an Excel user."

Though now I'm thinking perhaps it's best to use both.
since I didn't tell the OP to go
changing registry settings, it was appropriate
to tell him/her to make sure
there were field/column names in the top row.

Again, I stand by what I originally said:

"No, this is not a requirement [that the ranges contain valid,
distinct field names]"

I was making a general point. I often find I want to extract a single
cell value from a worksheet that contains only a single cell value
(for which SQL.REQUEST is ideally suited), say in cell A1, so I'd use:

SELECT F1 FROM [Sheet1$A1:A1]

I wouldn't want readers to think this was unfeasible because it isn't
within a defined name with column/field headers.
And if the goal were database access then ADO
would make more sense still

Didn't you originally say, "If you're trying to use Excel as a
database you should use SQL.REQUEST often"? I jest of course, we can
all see your distinction I'm sure. As a final word from me, I offer
that if you found your recent discovery of SQL.REQUEST enlightening
and you've yet to discover ADO, you have more fun to come.

Jamie.

P.S. I bet you haven't been called a newbie in a while, even if my
tongue was firmly embedded in cheek!

--
 
...
...
since I didn't tell the OP to go
changing registry settings, it was appropriate
to tell him/her to make sure
there were field/column names in the top row.

Again, I stand by what I originally said:

"No, this is not a requirement [that the ranges contain valid,
distinct field names]"

I was making a general point. I often find I want to extract a single
cell value from a worksheet that contains only a single cell value
(for which SQL.REQUEST is ideally suited), say in cell A1, so I'd use:

SELECT F1 FROM [Sheet1$A1:A1]

I wouldn't want readers to think this was unfeasible because it isn't
within a defined name with column/field headers.

Semantics. With default registry settings, isn't it *necessary* that tables
accessed using SQL.REQUEST have field names in the top row? (Ain't I a stinker
mixing field and row? Some may respond that mixing field and row is superfluous
to whether or not I'm a stinker.) If so, then the complete directions may be
that either field names in the top row or a change of registry settings is
necessary.

With respect to the query, should that be a dollar sign or an exclamation point
in the range address?
Didn't you originally say, "If you're trying to use Excel as a
database you should use SQL.REQUEST often"? I jest of course, we can
all see your distinction I'm sure. As a final word from me, I offer
that if you found your recent discovery of SQL.REQUEST enlightening
and you've yet to discover ADO, you have more fun to come.

If you want to use VBA, ADO may be the better option. I haven't used it. I have
tried DAO a while ago. Does ADO require using cursors? Back to my point: if you
*don't* want to use VBA, then SQL.REQUEST or some other add-in would be the only
options aside from Excel's cloned 20-year-old 123-like advanced filters and (all
too often huge and complicated) array formulas. I suppose there are also pivot
tables, and if anyone can show me how to use them with text as the calculated
variables (e.g., given quarters across the top and products down the side, show
the region [TEXT!] selling the most of each product in each quarter) I may use
them some day. BTW, I know how to do this with array formulas, so I'm not asking
for general help with this.
P.S. I bet you haven't been called a newbie in a while, even if my
tongue was firmly embedded in cheek!

Ignorance is nothing to be ashamed of as long as one doesn't cling to it. I
freely admit that I'm with regard to ADO I'm a rank beginner. I've used SQL
before interactively and in 123 against Notes databases, but I haven't written
low-level code, so I only know about cursors from what little I've read. All
things considered, I don't want to work with them if I can avoid doing so.
 
Harlan Grove wrote ...
should that be a dollar sign or an exclamation
point in the range address?

The query should be as posted i.e. a dollar sign:

SELECT F1 FROM [DropMe$A1:A1]

The dollar sign is used as a 'sheet indicator':

SELECT F1 FROM [MySheet$]
SELECT F1 FROM [MySheet$MySheetLevelName]

A bit counterintuitive for Excel users :-(
Does ADO require using cursors?

Usually it does. The Connection object can be used to execute SQL e.g.

UPDATE [MySheet$] SET F1='onedaywhen'

but the most widely used object is the Recordset which is essentially
a cursor (has fields and records, even BOF and EOF!)

--
 
Back
Top