How Would you Store this Data?

  • Thread starter Thread starter Jonathan Wood
  • Start date Start date
J

Jonathan Wood

I'm thinking about building a site that would include user-contributed
articles, file attachments, images, and maybe videos.

I'm trying to determine the best way to store this data. Specifically:

A) What are the pros and cons of storing the file attachments, images, and
videos in a database vs. storing them as separate files?

B) I've decided user-contributed articles should not be submitted as HTML.
I'd have some code to convert newlines to <br /> and otherwise convert the
articles to HTML. I'm thinking about separate database columns for both the
original and the converted articles. The advantages would be that the site
would be fast because no conversion would be needed when serving the
articles, and it would be possible for a contributor to obtain the original
if they wanted to edit it and resubmit. I'd be curious if others thought
this was the best approach.

I'm interested in hearing all ideas.

Thanks for any suggestions
 
Jonathan said:
I'm thinking about building a site that would include user-contributed
articles, file attachments, images, and maybe videos.

I'm trying to determine the best way to store this data. Specifically:

A) What are the pros and cons of storing the file attachments, images,
and videos in a database vs. storing them as separate files?

B) I've decided user-contributed articles should not be submitted as
HTML. I'd have some code to convert newlines to <br /> and otherwise
convert the articles to HTML. I'm thinking about separate database
columns for both the original and the converted articles. The advantages
would be that the site would be fast because no conversion would be
needed when serving the articles, and it would be possible for a
contributor to obtain the original if they wanted to edit it and
resubmit. I'd be curious if others thought this was the best approach.

I'm interested in hearing all ideas.

Those are files and you need a control that does file uploads and
downloads. The files should be kept on a file share or file share array.

The database table would only have a record that has the name of the
file, and the path to the file on the file share a pointer record.

You need a viewer control so when the file is selected for viewing, it
takes the file extension and makes a determination as to what
application would be used to allow the viewing of the file. If is's a
*.doc, then the viewer is going to select MS Word to view the file, etc,
etc.

The various application used to view a file will reside on the Web
server like a file association that the Windows O/S does is most likely
going to be invoked.

That's how you do a document repository that's controlled by a Web
application using a browser.
 
I'm thinking about building a site that would include user-contributed
articles, file attachments, images, and maybe videos.

I'm trying to determine the best way to store this data. Specifically:

A) What are the pros and cons of storing the file attachments, images, and
videos in a database vs. storing them as separate files?

B) I've decided user-contributed articles should not be submitted as HTML..
I'd have some code to convert newlines to <br /> and otherwise convert the
articles to HTML. I'm thinking about separate database columns for both the
original and the converted articles. The advantages would be that the site
would be fast because no conversion would be needed when serving the
articles, and it would be possible for a contributor to obtain the original
if they wanted to edit it and resubmit. I'd be curious if others thought
this was the best approach.

I'm interested in hearing all ideas.

Thanks for any suggestions

A) It depends on size of attachments and many other things. You should
estimate how many users, files and traffic you will get. Imagine that
your database would be 10 TB, are you sure that you can manage it? In
most cases you will get an answer that going for a file share is the
easiest and scalable way.

B) Replacing <br> on-the-fly should not be a performance problem
 
Jonathan Wood said:
I'm thinking about building a site that would include user-contributed
articles, file attachments, images, and maybe videos.

I'm trying to determine the best way to store this data. Specifically:

A) What are the pros and cons of storing the file attachments, images, and
videos in a database vs. storing them as separate files?
There are lots of differences in storing in databases VS file system files.
The main thing for me is if the files them selves are going to be somewhat
on the huge size, consider file system storage. The good side of database
storage is that your files/videos are secured by default (you have to do
almost no extra work to keep them away from prying fingers) except the
streaming factor from the database to the browser. In file system storage,
you need to implement file security (and that can be quite a difficult area
to deal with). Either way you decide, make sure not to put your files that
will be accessed through the browser in the app_data folder. I tried this
and coldn't figure out why nothing worked. Come to find out, it has special
permissions on it preventing browser access of any kind.
B) I've decided user-contributed articles should not be submitted as HTML.
I'd have some code to convert newlines to <br /> and otherwise convert the
articles to HTML. I'm thinking about separate database columns for both
the original and the converted articles. The advantages would be that the
site would be fast because no conversion would be needed when serving the
articles, and it would be possible for a contributor to obtain the
original if they wanted to edit it and resubmit. I'd be curious if others
thought this was the best approach.
Converting from html to plain text on the fly shouldn't be a problem. Here
is a good article that shows you how to do it with Regex.
http://www.4guysfromrolla.com/webtech/042501-1.shtml

If you don't want people sending articles and stuff in html format, there
isn't any point in keeping the html version. If you don't be careful, it
could leave room for hackers into your database. Just be careful to close
all your holes in database access code if you do decide to keep the original
html version. You also might want to consider scanning the incoming article
text for any kind of scripting languages as well as any sql code. That way
you disarm someone if the intent is to mess with your website. Wish I had
some links for the scripting/sql scanning/removal code, but I use DotNetNuke
5.2 which has the security features I talked about already built in. Any
html/scripting/sql code will be rendered as plain text. The link above just
removes the html/xml code alltogether.
 
Alexey said:
A) It depends on size of attachments and many other things. You should
estimate how many users, files and traffic you will get. Imagine that
your database would be 10 TB, are you sure that you can manage it? In
most cases you will get an answer that going for a file share is the
easiest and scalable way.

What do you mean by "are you sure that you can manage it?" What part of
managing do you question if I can do? (Sure not disk space as that would be
needed if I used files as well.)
B) Replacing <br> on-the-fly should not be a performance problem

There would be other changes too, which could possibly include color-coding
source code.

Thanks.
 
Andy said:
There are lots of differences in storing in databases VS file system
files. The main thing for me is if the files them selves are going to be
somewhat on the huge size, consider file system storage. The good side of
database storage is that your files/videos are secured by default (you
have to do almost no extra work to keep them away from prying fingers)
except the streaming factor from the database to the browser. In file
system storage, you need to implement file security (and that can be quite
a difficult area to deal with). Either way you decide, make sure not to
put your files that will be accessed through the browser in the app_data
folder. I tried this and coldn't figure out why nothing worked. Come to
find out, it has special permissions on it preventing browser access of
any kind.

Thanks for the last tip.

I can't think of any case where access to the files would be restricted to
anybody--they would be available to all anonymous users. If that's what you
meant by "secured", then that won't be an issue.
Converting from html to plain text on the fly shouldn't be a problem. Here
is a good article that shows you how to do it with Regex.
http://www.4guysfromrolla.com/webtech/042501-1.shtml

Thanks for the link, but the articles won't initially have HTML. They might
be closer to Wikipedia articles that have their own coding, which must then
be converted to HTML. Also, as I mentioned elsewhere, my conversion might be
fairly involved and include color coding source code and other processing.
html version. You also might want to consider scanning the incoming
article text for any kind of scripting languages as well as any sql code.
That way you disarm someone if the intent is to mess with your website.
Wish I had some links for the scripting/sql scanning/removal code, but I
use DotNetNuke 5.2 which has the security features I talked about already
built in. Any html/scripting/sql code will be rendered as plain text.

Right, that's the main reason I don't want to accept articles that already
have HTML. (Any existing HTML not removed on review will simply be
HTML-encoded.)

Thanks.

Jon
 
Well, that doesn't add much to the discuss about the pros and cons of
storing as files vs. storing in the database, but thanks for your comments.

Jon
 
Jonathan Wood said:
Thanks for the last tip.

I can't think of any case where access to the files would be restricted to
anybody--they would be available to all anonymous users. If that's what
you meant by "secured", then that won't be an issue.

That's what I originally meant. The thing you would want to decide is this:
1. Do you want the files to be accessed directly from a URL? Something like
www.yourDomain.com/Filename.mv4. This would allow anyone to bookmark/put it
in their favorites. Depending, this [might] cause some possible security
issues but I wouldn't know at this point what ones they would be. or
2. Have the website be a proxy for the files themselves. Possibly have a
generic handler dealing with the files directly. This would give you better
control on what happens to the files as they go to the browser stream.
Meaning you can tell the server what to do with the files before they are
downloaded. A good example would be when you see an mv4 file that is 300MB
in size on a website, you download it but find out it was actually zipped
and is now 200MB instead of the original uncompressed video. I want to use
this idea for my rss feeds sometime down the road. or
3. Database option. The link would be something like
www.yourDomain.com/fileID=12948&compress=1. This would get the file from the
database and set a query string flag to let you compress the file (if you
wanted to provide that feature). Database option for files has some
drawbacks:
A. It takes more load on the server especially for larger files because the
webserver has to make 2 trips. 1 to the database to get the file stream and
1 to the browser to give the user their content. All of a sudden, a 300MB
file just used 600MB of bandwidth. If you have to worry about bandwidth with
huge files, this might not be for you.
B. You have to write the code to get the files (binary) out of the database,
figure out how to temporarly store it (if you don't just stream it directly)
and then how you are going to get it to the user.
- Are you going to directly stream it with Response.Write? or
- Write it to the filesystem and generate a page with a link on it for
download.
 
What do you mean by "are you sure that you can manage it?" What part of
managing do you question if I can do? (Sure not disk space as that would be
needed if I used files as well.)


There would be other changes too, which could possibly include color-coding
source code.

Thanks.

Managing large database could be a challenge. It's not only about disk
space but also hardware and software requirements, and technical
personnel. The database server for a large database would need more
support than a file share.

I have a project with a small 1GB SQL Server database where I have
about 20,000 small pictures (~50-100 Kb). It works well. This way was
chosen because of data replication over multiple web and SQL Servers,
easy backup and security. In order to decrease the load on the
database and to improve the system's performance, I use pre-cached
files for most requested pictures.
 
Back
Top