File Storage Question

  • Thread starter Thread starter Redhairs
  • Start date Start date
R

Redhairs

In a web farm environment, how to store the user uploading files for future
access?
Store them in db, local file system or centralized file server?

If trying to storing the file in local file system or centralized file
server, how the web server
receives the uploading file and sync to each web server's local file system?
or the centralized
file server?

Should I create only one folder or lots of subfolders to organize the file
structure for better access
from file system?
 
Depends on what you want to do with the files and what they represent. Since
you haven't said, ASP.NET uploads files into server memory, buffers it and
then writes it to disk in the simplest case. Where you choose to store it
depends on how frequently you want to access it and the responsiveness you
want from the application, among other things.

--
--
Regards,
Alvin Bruney [MVP ASP.NET]

[Shameless Author plug]
The O.W.C. Black Book, 2nd Edition
Exclusively on www.lulu.com/owc $19.99
 
Thanks for your reply.
Just like simple photo album web app for user to upload image files and
review them later.


Alvin Bruney said:
Depends on what you want to do with the files and what they represent.
Since you haven't said, ASP.NET uploads files into server memory, buffers
it and then writes it to disk in the simplest case. Where you choose to
store it depends on how frequently you want to access it and the
responsiveness you want from the application, among other things.

--
--
Regards,
Alvin Bruney [MVP ASP.NET]

[Shameless Author plug]
The O.W.C. Black Book, 2nd Edition
Exclusively on www.lulu.com/owc $19.99


Redhairs said:
In a web farm environment, how to store the user uploading files for
future access?
Store them in db, local file system or centralized file server?

If trying to storing the file in local file system or centralized file
server, how the web server
receives the uploading file and sync to each web server's local file
system? or the centralized
file server?

Should I create only one folder or lots of subfolders to organize the
file structure for better access
from file system?
 
In solutions that I've designed that have this as a requirement, I've
typically used Network Attached Storage.

In most cases, all the servers in the web farm are on the same gigabit
network segment, and the NAS (which is usually multi-homed) sits on that
same segment.

In most cases, storing files in the database will lead to long term
problems. As amazing as current databases are, they're still not file
systems...
 
Chris,

What type of long term problems?
Can you share please I store images in a database.

Thank You,

LVP
 
Any tips on file structure design based on performance and maintenance
concerns?
e.g. one folder for all image files? or many subfolders?
 
The biggest problems I've seen with storing files in a database is that it
grows without bound, and current databases don't seem to do very well with
really large data sets. In the large range (1TB -> 1PB) the file systems (in
my experience) deals with large sets more effectivly. The database is great
for storing "Which computers have that image on it?", but for the actual
image.... not so hot.

If you are adding files quickly enough, you'll need to be adding disk space
into your database every hours / days / weeks. Even with current large
drives, this is both expensive and time consuming. Using a DB, there's
really no good way to scatter the data across a large number of computers -
you could use FileGroup in SQL2K5 (Enterprise only), or something similar,
but these are not very good for dynamically adding data.

... if you're not adding drives, you're archiving files. This means a job
that runs, and moves images from the DB to a disk. You then need to design
the system to first check the DB, then check the disk. This makes things 2x
as complicated as just skipping the DB step and storing files on disk.

My preference at this point is:
- Store files on a NAS, or a set of NAS devices.
- Store the index (file names, metadata, location) in the database.

I would love to hear of an out-of-the-box approach people have used for
this - especially in Oracle land (I'm familar with the SQL Server
landscape).
 
This seems to be somewhat one sided argument, storing in the DB works well in
some situations, it can work well if the file sizes aren't too large. If you
are storing several TB of images then you probably don't want to put them in
the database, but if the sizing is smaller it should not be a problem.

The argument you make about disk space really isn't a database issue it is a
disk issue and you'll run out of disk space just as quickly in the file
system as in the DB.

The advantages to storing in the database are a central location and
syncronized backups between the images and the database. The primary problem
is really if you have a lot of clients(in this case web servers in the farm)
hitting the database they will be chewing up bandwidth into your db server to
get the images. Another disadvantage is as noted in parent post the file
system generally is faster.

However I have used SQL Server before for storing images for web use, in
these cases though I was limiting the actual file size of the image to 55KB
or less and made use of some caching on the webserver, I never saw a
performance problem. Also because I controlled the point of entry for the
images I would scale the images down with GDI+ server side before saving to
the db.

Chris Mullins said:
The biggest problems I've seen with storing files in a database is that it
grows without bound, and current databases don't seem to do very well with
really large data sets. In the large range (1TB -> 1PB) the file systems (in
my experience) deals with large sets more effectivly. The database is great
for storing "Which computers have that image on it?", but for the actual
image.... not so hot.

If you are adding files quickly enough, you'll need to be adding disk space
into your database every hours / days / weeks. Even with current large
drives, this is both expensive and time consuming. Using a DB, there's
really no good way to scatter the data across a large number of computers -
you could use FileGroup in SQL2K5 (Enterprise only), or something similar,
but these are not very good for dynamically adding data.

... if you're not adding drives, you're archiving files. This means a job
that runs, and moves images from the DB to a disk. You then need to design
the system to first check the DB, then check the disk. This makes things 2x
as complicated as just skipping the DB step and storing files on disk.

My preference at this point is:
- Store files on a NAS, or a set of NAS devices.
- Store the index (file names, metadata, location) in the database.

I would love to hear of an out-of-the-box approach people have used for
this - especially in Oracle land (I'm familar with the SQL Server
landscape).

--
Chris Mullins

LVP said:
Chris,

What type of long term problems?
Can you share please I store images in a database.

Thank You,

LVP
 
This seems to be somewhat one sided argument, storing in the DB works wellin
some situations, it can work well if the file sizes aren't too large.  If you
are storing several TB of images then you probably don't want to put them in
the database, but if the sizing is smaller it should not be a problem.  

The argument you make about disk space really isn't a database issue it isa
disk issue and you'll run out of disk space just as quickly in the file
system as in the DB.  

The advantages to storing in the database are a central location and
syncronized backups between the images and the database.  The primary problem
is really if you have a lot of clients(in this case web servers in the farm)
hitting the database they will be chewing up bandwidth into your db serverto
get the images.  Another disadvantage is as noted in parent post the file
system generally is faster.

However I have used SQL Server before for storing images for web use, in
these cases though I was limiting the actual file size of the image to 55KB
or less and made use of some caching on the webserver, I never saw a
performance problem.  Also because I controlled the point of entry for the
images I would scale the images down with GDI+ server side before saving to
the db.

:


The biggest problems I've seen with storing files in a database is that it
grows without bound, and current databases don't seem to do very well with
really large data sets. In the large range (1TB -> 1PB) the file systems(in
my experience) deals with large sets more effectivly. The database is great
for storing "Which computers have that image on it?", but for the actual
image.... not so hot.
If you are adding files quickly enough, you'll need to be adding disk space
into your database every hours / days / weeks. Even with current large
drives, this is both expensive and time consuming. Using a DB, there's
really no good way to scatter the data across a large number of computers -
you could use FileGroup in SQL2K5 (Enterprise only), or something similar,
but these are not very good for dynamically adding data.
... if you're not adding drives, you're archiving files. This means a job
that runs, and moves images from the DB to a disk. You then need to design
the system to first check the DB, then check the disk. This makes things2x
as complicated as just skipping the DB step and storing files on disk.
My preference at this point is:
- Store files on a NAS, or a set of NAS devices.
- Store the index (file names, metadata, location) in the database.

I would love to hear of an out-of-the-box approach people have used for
this - especially in Oracle land (I'm familar with the SQL Server
landscape).

- Show quoted text -


To BLOB or Not To BLOB: Large Object Storage in a Database or a
Filesystem?
http://research.microsoft.com/research/pubs/view.aspx?msr_tr_id=MSR-TR-2006-45
 
You may want to consider a tree like hierarchy where the top folder
represents the logged on user account with a sub folder that has the image
files. It's easier to trouble shoot or retrieve images that way since you
index on the logged on user and then read all the files in that folder. When
you get the file upload from the user, simply find the logged on user
account and store the file inside the directory. If this is anonymous, that
approach won't work.

--
--
Regards,
Alvin Bruney [MVP ASP.NET]

[Shameless Author plug]
The O.W.C. Black Book, 2nd Edition
Exclusively on www.lulu.com/owc $19.99


Redhairs said:
Thanks for your reply.
Just like simple photo album web app for user to upload image files and
review them later.


Alvin Bruney said:
Depends on what you want to do with the files and what they represent.
Since you haven't said, ASP.NET uploads files into server memory, buffers
it and then writes it to disk in the simplest case. Where you choose to
store it depends on how frequently you want to access it and the
responsiveness you want from the application, among other things.

--
--
Regards,
Alvin Bruney [MVP ASP.NET]

[Shameless Author plug]
The O.W.C. Black Book, 2nd Edition
Exclusively on www.lulu.com/owc $19.99


Redhairs said:
In a web farm environment, how to store the user uploading files for
future access?
Store them in db, local file system or centralized file server?

If trying to storing the file in local file system or centralized file
server, how the web server
receives the uploading file and sync to each web server's local file
system? or the centralized
file server?

Should I create only one folder or lots of subfolders to organize the
file structure for better access
from file system?
 
Back
Top