Can't Explain This In Few Words

  • Thread starter Thread starter mcl
  • Start date Start date
M

mcl

I'm a climatologist and have asked many questions here before although not
for a few months. I'm trying to import climate data from different source
and their format is very different and it's causing a problem.
It's comma delimited (that's no problem) in this format:
Station ID, Element Code (ie, TMAX for max Temp, PCRP for precipitaion),
Unit (ie. F), Year, Month, Day01, Day02, Day03, etc.
In previous data sources DAY was an element with numbers 1-31. But here each
day is a separate element. That's the problem. Of course Days 29-31 can be
blank (-99999) which is not a problem).
The logic of trying to crunch this data when each day is a different element
has me stumped. For example, I tried to generate in a query with one extra
column where for each Year & Month where it would display the Max temp by
Year and Month and couldn't do it. In past cases where Day was one element
with number 1-31 I just pulled the Max of Tmax by Year, Month. In this case
I tried Max([Day01,Day02, etc) and it would do it. I am able to "Total" for
PRCP by using [Day01]+[Day02]+[etc] (doing that right now while also typing
this. But max([Day01],[Day02],etc) seemed to go nowhere although MS Access
help seemed to show it would BUT said could only handle up to 30 values (of
course I need 31).
 
OK, ran into a problem with the ([Day01]+[Day02]+etc). Totals for months
with missing values were blank even if it were just because the month had <
31 days. How do I tell it to give me the totals whether there are missing
values or not?
 
Dear Marc:

This may not make you very happy, but the problem is the way you've
designed the table.

This data is not normalized. Normalization means conformance to a set
of rules that make subsequent evaluation easier and much quicker.

You will probably need to have an additional table, perhaps like this:

StationID
DateMeasured
TMax
PCRP

Without details, I don't know what to do with Unit. Perhaps it is C/F
for temperature, inches/cm for precipitation, etc. However, it would
be difficult to sort, agregate, or filter such data of mixed units, so
I'd suggest they all be converted to common units before entry into
the table.

A table like the above will accomodate missing data by just not having
a row for a date where there is no data.

The Max() aggregate function will get the maximum across an
"unlimited" number of values, so you could get the maximum for a
month, a week, a year, or a century. Sums and averages will be easy
across any range of dates, not just months.

I guarantee you this is the best way to go.

Tom Ellison
Microsoft Access MVP
Ellison Enterprises - Your One Stop IT Experts


OK, ran into a problem with the ([Day01]+[Day02]+etc). Totals for months
with missing values were blank even if it were just because the month had <
31 days. How do I tell it to give me the totals whether there are missing
values or not?

mcl said:
I'm a climatologist and have asked many questions here before although not
for a few months. I'm trying to import climate data from different source
and their format is very different and it's causing a problem.
It's comma delimited (that's no problem) in this format:
Station ID, Element Code (ie, TMAX for max Temp, PCRP for precipitaion),
Unit (ie. F), Year, Month, Day01, Day02, Day03, etc.
In previous data sources DAY was an element with numbers 1-31. But here each
day is a separate element. That's the problem. Of course Days 29-31 can be
blank (-99999) which is not a problem).
The logic of trying to crunch this data when each day is a different element
has me stumped. For example, I tried to generate in a query with one extra
column where for each Year & Month where it would display the Max temp by
Year and Month and couldn't do it. In past cases where Day was one element
with number 1-31 I just pulled the Max of Tmax by Year, Month. In this case
I tried Max([Day01,Day02, etc) and it would do it. I am able to "Total" for
PRCP by using [Day01]+[Day02]+[etc] (doing that right now while also typing
this. But max([Day01],[Day02],etc) seemed to go nowhere although MS Access
help seemed to show it would BUT said could only handle up to 30 values (of
course I need 31).
 
OK, NZ fixed this problem. I had asked this once before and yes I e-mail
replies here to myself and keep them.
 
I didn't design anything. It's based on the format used by the National
Climatic Data Center (NCDC) when you go on their site and try to get data
for a station. It's when you choose comma delimited and no flags (flags make
it an even bigger mess).
http://www.ncdc.noaa.gov/oa/ncdc.html
Specifically try:
http://hurricane.ncdc.noaa.gov/CDO/cdo
and check out sample data. The sample here is space delimited (although I'm
having trouble accessing that link righ now).

I imported the data into Access and then try to do something with the mess.
Yes, Unit for Temp Data is either C or F. For precip it's Inches. The first
thing I did was build queries to pull a specific element. For example my
primary precip query sets the element code to "PRCP". The Max Temp sets it
to "TMAX", etc. All other queries and attempts to crunch this data use the
results of these queries NOT the original table.

Tom Ellison said:
Dear Marc:

This may not make you very happy, but the problem is the way you've
designed the table.

This data is not normalized. Normalization means conformance to a set
of rules that make subsequent evaluation easier and much quicker.

You will probably need to have an additional table, perhaps like this:

StationID
DateMeasured
TMax
PCRP

Without details, I don't know what to do with Unit. Perhaps it is C/F
for temperature, inches/cm for precipitation, etc. However, it would
be difficult to sort, agregate, or filter such data of mixed units, so
I'd suggest they all be converted to common units before entry into
the table.

A table like the above will accomodate missing data by just not having
a row for a date where there is no data.

The Max() aggregate function will get the maximum across an
"unlimited" number of values, so you could get the maximum for a
month, a week, a year, or a century. Sums and averages will be easy
across any range of dates, not just months.

I guarantee you this is the best way to go.

Tom Ellison
Microsoft Access MVP
Ellison Enterprises - Your One Stop IT Experts


OK, ran into a problem with the ([Day01]+[Day02]+etc). Totals for months
with missing values were blank even if it were just because the month had <
31 days. How do I tell it to give me the totals whether there are missing
values or not?

mcl said:
I'm a climatologist and have asked many questions here before although not
for a few months. I'm trying to import climate data from different source
and their format is very different and it's causing a problem.
It's comma delimited (that's no problem) in this format:
Station ID, Element Code (ie, TMAX for max Temp, PCRP for precipitaion),
Unit (ie. F), Year, Month, Day01, Day02, Day03, etc.
In previous data sources DAY was an element with numbers 1-31. But here each
day is a separate element. That's the problem. Of course Days 29-31 can be
blank (-99999) which is not a problem).
The logic of trying to crunch this data when each day is a different element
has me stumped. For example, I tried to generate in a query with one extra
column where for each Year & Month where it would display the Max temp by
Year and Month and couldn't do it. In past cases where Day was one element
with number 1-31 I just pulled the Max of Tmax by Year, Month. In this case
I tried Max([Day01,Day02, etc) and it would do it. I am able to "Total" for
PRCP by using [Day01]+[Day02]+[etc] (doing that right now while also typing
this. But max([Day01],[Day02],etc) seemed to go nowhere although MS Access
help seemed to show it would BUT said could only handle up to 30 values (of
course I need 31).
 
Dear Marc:

I wasn't referring to the form of your source data. I realize you
probably don't have control of this. I'm referring to the form of the
data in Access.

What I would typically do is import such data "as is" into tables that
match the format of that information. Then I would alter its
appearance and put it into tables that are normalized. All the work
done in the database subsequent to that would be on the normalized
tables.

I believe this approch would work well for you as well.

Tom Ellison
Microsoft Access MVP
Ellison Enterprises - Your One Stop IT Experts


I didn't design anything. It's based on the format used by the National
Climatic Data Center (NCDC) when you go on their site and try to get data
for a station. It's when you choose comma delimited and no flags (flags make
it an even bigger mess).
http://www.ncdc.noaa.gov/oa/ncdc.html
Specifically try:
http://hurricane.ncdc.noaa.gov/CDO/cdo
and check out sample data. The sample here is space delimited (although I'm
having trouble accessing that link righ now).

I imported the data into Access and then try to do something with the mess.
Yes, Unit for Temp Data is either C or F. For precip it's Inches. The first
thing I did was build queries to pull a specific element. For example my
primary precip query sets the element code to "PRCP". The Max Temp sets it
to "TMAX", etc. All other queries and attempts to crunch this data use the
results of these queries NOT the original table.

Tom Ellison said:
Dear Marc:

This may not make you very happy, but the problem is the way you've
designed the table.

This data is not normalized. Normalization means conformance to a set
of rules that make subsequent evaluation easier and much quicker.

You will probably need to have an additional table, perhaps like this:

StationID
DateMeasured
TMax
PCRP

Without details, I don't know what to do with Unit. Perhaps it is C/F
for temperature, inches/cm for precipitation, etc. However, it would
be difficult to sort, agregate, or filter such data of mixed units, so
I'd suggest they all be converted to common units before entry into
the table.

A table like the above will accomodate missing data by just not having
a row for a date where there is no data.

The Max() aggregate function will get the maximum across an
"unlimited" number of values, so you could get the maximum for a
month, a week, a year, or a century. Sums and averages will be easy
across any range of dates, not just months.

I guarantee you this is the best way to go.

Tom Ellison
Microsoft Access MVP
Ellison Enterprises - Your One Stop IT Experts


OK, ran into a problem with the ([Day01]+[Day02]+etc). Totals for months
with missing values were blank even if it were just because the month had <
31 days. How do I tell it to give me the totals whether there are missing
values or not?

"mcl" <marc.levine_at_no_spam_afccc.af.mil_no_spam> wrote in message
I'm a climatologist and have asked many questions here before although not
for a few months. I'm trying to import climate data from different source
and their format is very different and it's causing a problem.
It's comma delimited (that's no problem) in this format:
Station ID, Element Code (ie, TMAX for max Temp, PCRP for precipitaion),
Unit (ie. F), Year, Month, Day01, Day02, Day03, etc.
In previous data sources DAY was an element with numbers 1-31. But here
each
day is a separate element. That's the problem. Of course Days 29-31 can be
blank (-99999) which is not a problem).
The logic of trying to crunch this data when each day is a different
element
has me stumped. For example, I tried to generate in a query with one
extra
column where for each Year & Month where it would display the Max temp by
Year and Month and couldn't do it. In past cases where Day was one element
with number 1-31 I just pulled the Max of Tmax by Year, Month. In this
case
I tried Max([Day01,Day02, etc) and it would do it. I am able to "Total"
for
PRCP by using [Day01]+[Day02]+[etc] (doing that right now while also
typing
this. But max([Day01],[Day02],etc) seemed to go nowhere although MS Access
help seemed to show it would BUT said could only handle up to 30 values
(of
course I need 31).
 
I kinda did that with the initial queries for each elemented I wanted and
then use that query as the input for other things.
The unit is really informational (Don't want to crash any spacecraft into
Mars because you mix up english and metric units.). The initial query I
build for precip for example just pulls the station ID, Year, Month, and
then each of the days [Day01], [Day02], etc. I do this by selecting element
code eq "PRCP". I have done many analysis on other databases where DAY was
just one of the parameters with values of missing or 1-31. That works much
better. In this case here Day01 has the precip value for day one, etc. To
find the max temps by month for the station I was working on (data from 1945
to 2004/01) I just took the query result with ID, Year, Month, Day01-Day31
and copied the damn thing into Excel then created a max column and did a
pivot table there. Pure brute force type of stuff.

Tom Ellison said:
Dear Marc:

I wasn't referring to the form of your source data. I realize you
probably don't have control of this. I'm referring to the form of the
data in Access.

What I would typically do is import such data "as is" into tables that
match the format of that information. Then I would alter its
appearance and put it into tables that are normalized. All the work
done in the database subsequent to that would be on the normalized
tables.

I believe this approch would work well for you as well.

Tom Ellison
Microsoft Access MVP
Ellison Enterprises - Your One Stop IT Experts


I didn't design anything. It's based on the format used by the National
Climatic Data Center (NCDC) when you go on their site and try to get data
for a station. It's when you choose comma delimited and no flags (flags make
it an even bigger mess).
http://www.ncdc.noaa.gov/oa/ncdc.html
Specifically try:
http://hurricane.ncdc.noaa.gov/CDO/cdo
and check out sample data. The sample here is space delimited (although I'm
having trouble accessing that link righ now).

I imported the data into Access and then try to do something with the mess.
Yes, Unit for Temp Data is either C or F. For precip it's Inches. The first
thing I did was build queries to pull a specific element. For example my
primary precip query sets the element code to "PRCP". The Max Temp sets it
to "TMAX", etc. All other queries and attempts to crunch this data use the
results of these queries NOT the original table.

Tom Ellison said:
Dear Marc:

This may not make you very happy, but the problem is the way you've
designed the table.

This data is not normalized. Normalization means conformance to a set
of rules that make subsequent evaluation easier and much quicker.

You will probably need to have an additional table, perhaps like this:

StationID
DateMeasured
TMax
PCRP

Without details, I don't know what to do with Unit. Perhaps it is C/F
for temperature, inches/cm for precipitation, etc. However, it would
be difficult to sort, agregate, or filter such data of mixed units, so
I'd suggest they all be converted to common units before entry into
the table.

A table like the above will accomodate missing data by just not having
a row for a date where there is no data.

The Max() aggregate function will get the maximum across an
"unlimited" number of values, so you could get the maximum for a
month, a week, a year, or a century. Sums and averages will be easy
across any range of dates, not just months.

I guarantee you this is the best way to go.

Tom Ellison
Microsoft Access MVP
Ellison Enterprises - Your One Stop IT Experts


OK, ran into a problem with the ([Day01]+[Day02]+etc). Totals for months
with missing values were blank even if it were just because the month
had
<
31 days. How do I tell it to give me the totals whether there are missing
values or not?

"mcl" <marc.levine_at_no_spam_afccc.af.mil_no_spam> wrote in message
I'm a climatologist and have asked many questions here before
although
not
for a few months. I'm trying to import climate data from different source
and their format is very different and it's causing a problem.
It's comma delimited (that's no problem) in this format:
Station ID, Element Code (ie, TMAX for max Temp, PCRP for precipitaion),
Unit (ie. F), Year, Month, Day01, Day02, Day03, etc.
In previous data sources DAY was an element with numbers 1-31. But here
each
day is a separate element. That's the problem. Of course Days 29-31
can
be
blank (-99999) which is not a problem).
The logic of trying to crunch this data when each day is a different
element
has me stumped. For example, I tried to generate in a query with one
extra
column where for each Year & Month where it would display the Max
temp
by
Year and Month and couldn't do it. In past cases where Day was one element
with number 1-31 I just pulled the Max of Tmax by Year, Month. In this
case
I tried Max([Day01,Day02, etc) and it would do it. I am able to "Total"
for
PRCP by using [Day01]+[Day02]+[etc] (doing that right now while also
typing
this. But max([Day01],[Day02],etc) seemed to go nowhere although MS Access
help seemed to show it would BUT said could only handle up to 30 values
(of
course I need 31).
 
Back
Top