Straw Man: data validation against a secured SQL Server db over a VPN tunnel

  • Thread starter Thread starter Clif McIrvin
  • Start date Start date
C

Clif McIrvin

I'm looking for help in knowing the right questions to ask.

Situation: Main Office accounting software data lives in a SQL Server
database in another city. Local office connects to the accounting suite
(as I understand it) through a VPN tunnel. Historically, daily sales and
delivery data from various points of sale was hand carried to the local
office, where it was manually entered into the accounting system.

Recently, one of the points of sale received a software upgrade that
makes it possible for the local office to download a .csv file of the
daily sales info, which can be imported into the accounting system.

Apparently, there is no validation step before the import. If the point
of sale created a new customer (or item) that the accounting software
doesn't know about, the import dies a violent and messy death. (The
obvious question - why isn't there a validation process - has been
asked, but no-one admits to knowing the an$wer.)

I think I know enough Access to create a tool to validate the incoming
..csv against the current customer and item tables, *if* I knew how to
obtain that data.

I accept as a given that I won't ever get authorization from the
IT/Accounting gods to get at the master data directly. I gather that
it's possible to create read-only views of the various tables that I
could link to to run my validation against ... but I don't know what to
ask for.

Can someone in this room point me in the right direction to get started?
 
I'm looking for help in knowing the right questions to ask.

Situation: Main Office accounting software data lives in a SQL Server
database in another city. Local office connects to the accounting suite
(as I understand it) through a VPN tunnel. Historically, daily sales and
delivery data from various points of sale was hand carried to the local
office, where it was manually entered into the accounting system.

Recently, one of the points of sale received a software upgrade that
makes it possible for the local office to download a .csv file of the
daily sales info, which can be imported into the accounting system.

Apparently, there is no validation step before the import. If the point
of sale created a new customer (or item) that the accounting software
doesn't know about, the import dies a violent and messy death. (The
obvious question - why isn't there a validation process - has been
asked, but no-one admits to knowing the an$wer.)

I think I know enough Access to create a tool to validate the incoming
.csv against the current customer and item tables, *if* I knew how to
obtain that data.

I accept as a given that I won't ever get authorization from the
IT/Accounting gods to get at the master data directly.  I gather that
it's possible to create read-only views of the various tables that I
could link to to run my validation against ... but I don't know what to
ask for.

Can someone in this room point me in the right direction to get started?

I have a similar situation, including a SQL Server database in another
city and what I imagine to be a VPN tunnel, except that my situation
is not quite as dreary :-). What I do until that happy day when I
will be able to connect directly to the master data is request that a
report that can be exported to .csv or even .xls format be created in
the accounting software. Getting someone in the Accounting Dept. to
create that custom view or report can possibly take weeks where I work
(it takes perhaps ten minutes to do something similar in Access).
Buzz phrases like "local business requirement" and "will save the
company time and money" seem to help the process along. That data is
not usually exactly where I'd like it, so I have an import routine in
Access that allows me to specify which spreadsheet columns to import
and to specify the starting row for the data. That import routine
only has to be created once. When you have the data imported into an
Access table, you can reuse a link to the imported table location to
validate new customers. Most accounting software programs can export
their tables into a delimited format (perhaps labeled something like
'ODBC export'). Tab delimited format seems better to me than CSV, if
the accounting software has that export option available. In my case,
a SAP report is exported to a 'Local File' in 'Spreadsheet' format
(turns out to be .xls format). In summary, my recommendation is to
have Accounting provide a view or report of the active customer list
that can be exported to .xls format. If that's not possible, ask for
a view or report that can be exported to .csv format. Import that
data into Access and link to it to provide a source for customer
validation. Post back if I've missed the point of your question.

BTW, so far, where I work (not a parts production environment), all
the non-management level users up through lower management consider
SAP to be a nightmare. Yet I remind them that it would take me years
to implement in Access much of what SAP can do out of the box. I
think if they can eventually get SAP to be half as efficient as the
legacy Access system was within a few of years of SAP modifications,
they'll be ecstatic. They've even put a lot more optic fiber in the
office and guard the VPN bandwidth jealously. A few of our largest
customers have even reverted from using SAP. One employee estimated
that our main office has invested nearly four million dollars so far
into implementing SAP. Their original plan was to roll over each
sister company to SAP every few months, but for some reason they now
want to wait to see how the new processes work out here first. I am
cautiously optimistic.

James A. Fortune
(e-mail address removed)

I've seen many battles over getting local business requirements
integrated into SAP. If it can be demonstrated that it is indeed a
local business requirement, the local company usually gets its way
from the main office. -- Ron Long, V. P., Dassian, Inc.
 
Hi Clif,

This is a fairly common problem where I work. My scenario stems from
disparate and non-communicative (or collaborative) systems each with
its own 'master' data. In merging and importing data from these
systems into my departments required formats and structures the
validation issue kept on coming up. At first I tried to solve this
with Access alone, and kept having to modify the codebase and SQL to
match ever changing requirements and unforseen circumstances (read:
dirty data).

What I have done in the end is to approach the problem from an ETL
perspective (Extract Transform and Load) and built a series of ETL
transformations to read and validate the data in any number of ways I
want. You have probably heard me rambling on before about this
product: Pentaho Data Integration (www.pentaho.com) (also available on
SourceForge.net for download). It has allowed me to approach the
problem in a very clean, fast and robust way. Thats not a combination
I come across frequently so I find myself taken with the products
capabilities.

In short I use Pentaho to read in first the header row of a CSV or
Excel file (take your pick) and check its layout and structure against
a definition that I defined. If it passes this test the file is
considered suitable for processing. I then load the contents of the
file into a 'raw data' table and the filename into a 'filename' table
giving each filename a unique 'batch' number and each row in each file
is sequentially numbered. This way I know which row of data belongs to
which file at all times (standard normalisation, nothing special
here).

I read the 'raw data table' with the tool as the next step in the
process. Each row for each file is parsed through a data validation
routine, and a result is recorded in an extra field indicating the
outcome. PDI has native data validation steps that are exceedingly
fast and flexible. If any errors are encountered they are recorded in
two extra additional fields, one that records the error fields name(s)
and the other is the specific error itself (eg/ the field failed to
match a data type, or a regex). Any errors are then immediately
identifiable and can be manually addressed (and sometimes
automatically too.)

Lets say now we have 'clean' raw data to work with. The next thing I
do in similar scenarios as yours is to perform lookups on the fields
from PDI against another 'trusted' data source. In your case you would
need a customer list with the PK's and the names. The lookup for a
customer for example would check the name against the known list of
names and return the PK for that customer. If there isnt one then you
have found a problem for your import routine. I repeat this for all
relevant fields and end up with a 'stage' table similar to the 'raw'
table that specifies the batch, rownum, and incoming fields as well as
an extra field for each lookup performed and a final field to indicate
the status of the row. You can probably see by now where I am going
with this.

The Access apps that I build for these processes simply provide user
forms for correcting / approving the incoming data and allowing the
process to complete successfully. I would imagine in your scenario
that outputting a list of customers that the recipient system doesnt
know about might be a good start. You could also always have PDI
create the customer entries for you. When all the 'status lights' are
green and the errors taken care of the import can proceed and the file
is marked as finished / done. One of the apps I built also performs
logging of user actions for the data correction / approval so that
'who does what' can always be answered (though after implementing it
no one has ever actually asked for it to be shown!)

Most of the processes I build that are PDI based run between 2000 rows
per second and 20k rows per second with mdb files as the source /
target for most of the work. On an Oracle data warehouse I can
approach 100k rows per second if the connection can handle it at the
time.

Validation and 'sync' processes that used to take ages by hand or with
Access alone now take place in just minutes and are infinitely more
reliable and flexible at the same time. You can always review your
processes for 'sense checking' just like reviewing a flow chart.

I hope this helps.

Cheers

The Frog
 
The Frog said:
Validation and 'sync' processes that used to take ages by hand or with
Access alone now take place in just minutes and are infinitely more
reliable and flexible at the same time. You can always review your
processes for 'sense checking' just like reviewing a flow chart.

I hope this helps.

Cheers

The Frog

Thanks muchly! Ideas were what I asked for, and you supplied plenty <g>.
At this point, my understanding is that the primary concern is checking
for customers or items that don't exist in the accounting software; but
there are other known validation issues that sounds like could be
addressed with the structure and tools you outlined. Definately a
direction for research.

Thanks again.

If this project actually gets off the ground, I'll plan to post back
under this thread with an update on how it went.
 
I'm looking for help in knowing the right questions to ask.
<...>
I have a similar situation, including a SQL Server database in another
city and what I imagine to be a VPN tunnel, except that my situation
is not quite as dreary :-). What I do until that happy day when I
will be able to connect directly to the master data is request that a
report that can be exported to .csv or even .xls format be created in
the accounting software. Getting someone in the Accounting Dept. to
create that custom view or report can possibly take weeks where I work
(it takes perhaps ten minutes to do something similar in Access).
Buzz phrases like "local business requirement" and "will save the
company time and money" seem to help the process along. That data is
not usually exactly where I'd like it, so I have an import routine in
Access that allows me to specify which spreadsheet columns to import
and to specify the starting row for the data. That import routine
only has to be created once. When you have the data imported into an
Access table, you can reuse a link to the imported table location to
validate new customers. Most accounting software programs can export
their tables into a delimited format (perhaps labeled something like
'ODBC export'). Tab delimited format seems better to me than CSV, if
the accounting software has that export option available. In my case,
a SAP report is exported to a 'Local File' in 'Spreadsheet' format
(turns out to be .xls format). In summary, my recommendation is to
have Accounting provide a view or report of the active customer list
that can be exported to .xls format. If that's not possible, ask for
a view or report that can be exported to .csv format. Import that
data into Access and link to it to provide a source for customer
validation. Post back if I've missed the point of your question.

<...>

James A. Fortune
(e-mail address removed)

I've seen many battles over getting local business requirements
integrated into SAP. If it can be demonstrated that it is indeed a
local business requirement, the local company usually gets its way
from the main office. -- Ron Long, V. P., Dassian, Inc.

------

Exported copies of the relevant master table data is one of the ideas
that had occurred to me, and is more likely to be available to me than
creation of a read-only view.

"Business need" is definately a relevant component of the discussion;
perhaps one of the biggest obstacles at this point is a variation of
"but it's never been done that way before."

Thanks for the suggestions -- you read my question well.
 
what the **** are you talking about RETARD?

he's talking about using VPN. Jet DOESN'T ****ING WORK over VPN or
WAN.

So I'll reccomend he uses ADP -WHENEVER-THE-****-IT-IS-THE-BEST-
SOLUTION-





"a a r o n . k e m p f @gmail.com [MCITP: DBA]" <[email protected]>
wrote in messageyou shoud use Access Data Projects-- they work great over a VPN.

-----

Sheesh ... did you even read the question?

I'm looking for help in knowing the right questions to ask.
Situation: Main Office accounting software data lives in a SQL Server
database in another city. Local office connects to the accounting
suite
(as I understand it) through a VPN tunnel. Historically, daily sales
and
delivery data from various points of sale was hand carried to the
local
office, where it was manually entered into the accounting system.
Recently, one of the points of sale received a software upgrade that
makes it possible for the local office to download a .csv file of the
daily sales info, which can be imported into the accounting system.
Apparently, there is no validation step before the import. If the
point
of sale created a new customer (or item) that the accounting software
doesn't know about, the import dies a violent and messy death. (The
obvious question - why isn't there a validation process - has been
asked, but no-one admits to knowing the an$wer.)
I think I know enough Access to create a tool to validate the incoming
.csv against the current customer and item tables, *if* I knew how to
obtain that data.
I accept as a given that I won't ever get authorization from the
IT/Accounting gods to get at the master data directly. I gather that
it's possible to create read-only views of the various tables that I
could link to to run my validation against ... but I don't know what
to
ask for.
Can someone in this room point me in the right direction to get
started?
(clare reads his mail with moe, nomail feeds the bit bucket :-)
 
On Fri, 14 Jan 2011 04:12:51 -0800 (PST), The Frog

Validation and 'sync' processes that used to take ages by hand or with
Access alone now take place in just minutes and are infinitely more
reliable and flexible at the same time. You can always review your
processes for 'sense checking' just like reviewing a flow chart.

Exactly what I've done in the past although I didn't use colour
coding.

Tony
--
Tony Toews, Microsoft Access MVP
Tony's Main MS Access pages - http://www.granite.ab.ca/accsmstr.htm
Tony's Microsoft Access Blog - http://msmvps.com/blogs/access/
For a convenient utility to keep your users FEs and other files
updated see http://www.autofeupdater.com/
 
The said:
<snip>
I also do something like this, but instead of using my ETL tool (SSIS - SQL
Server Integration Services) to validate the data, I use a stored procedure.
I suspect that using an internal process (stored procedure) vs an external
process would be more efficient. I, however, deal only with a single
destination server brand. Your approach will allow you to use many
destination servers with a single solution, so my efficiency is trumped by
your ease of maintenance :-)
 
a a r o n . k e m p f @ g m a i l . c o m said:
what the **** are you talking about RETARD?

You are a naughty boy, Aaron. If you talk like that to people in person,
sooner or later, even in a generally peaceful place like Washington, you're
likely to run into someone that'll make you wish you had just stayed in the
slammer and learned to appreciate the protection it offered you.
he's talking about using VPN. Jet DOESN'T
****ING WORK over VPN or WAN.

MDB and Jet or ACCDB and ACE, ODBC, SQL Server _or any other ODBC-compliant
server_ works very nicely (and has longer than the "extensive" database
career you claim); and, although you don't keep sufficiently up-to-date to
know it, it is what is recommended by the Access team at Microsoft as the
preferred solution. ADP, on the other hand, is "to all intents and
purposes, deprecated" and has not been enhanced in the last several versions
of Access
So I'll reccomend he uses ADP -WHENEVER-
THE-****-IT-IS-THE-BEST-SOLUTION-

If your posting history is any indication, you are lying, because you have a
history of recommending people use ADP when it is clearly not the "best"
solution, except in your warped and biased view.
 
Clif or anyone else,

If you want to see a sample approach that uses Pentaho Data
Integration (PDI) for loading files to a table and validating them as
part of the process I am happy to skooch a few examples I have created
to you. One other thing you can do with PDI that I havent mentioned,
and is especially useful if you have giant data sets to work with, is
the use of PDI slaves. You can run PDI as a slave (background process)
in either windows or linux and have an independant machine (or army of
machines) doing the integration process heavy lifting, including
pushing the data to and fro from db servers. You can also execute the
ETL process(es) you want from command line if it works for you best.

Cheers

The Frog
 
The Frog said:
Clif or anyone else,

If you want to see a sample approach that uses Pentaho Data
Integration (PDI) for loading files to a table and validating them as
part of the process I am happy to skooch a few examples I have created
to you. One other thing you can do with PDI that I havent mentioned,
and is especially useful if you have giant data sets to work with, is
the use of PDI slaves. You can run PDI as a slave (background process)
in either windows or linux and have an independant machine (or army of
machines) doing the integration process heavy lifting, including
pushing the data to and fro from db servers. You can also execute the
ETL process(es) you want from command line if it works for you best.

Cheers

The Frog


Very generous offer! and is a sample of why I consider these ng's my
primary source of information and instruction! <g>

For my situation, this is pre-mature, but I'll definately keep it in
mind.

Thanks!
 
Back
Top