What’s the difference between data mining and data
warehousing?
|
|
|
|
Data mining is the process of
finding patterns in a given data set. These patterns can often provide
meaningful and insightful data to whoever is interested in that data. Data
mining is used today in a wide variety of contexts – in fraud detection, as
an aid in marketing campaigns, and even supermarkets use it to study their
consumers.
Data warehousing can be said to be
the process of centralizing or aggregating data from multiple
sources into one common repository.
Example
of data mining
If you’ve ever used a credit card,
then you may know that credit card companies will alert you when they think
that your credit card is being fraudulently used by someone other than you.
This is a perfect example of data mining – credit card companies have a
history of your purchases from the past and know geographically where those
purchases have been made. If all of a sudden some purchases are made in a
city far from where you live, the credit card companies are put on alert to a
possible fraud since their data mining shows that you don’t normally make
purchases in that city. Then, the credit card company can disable your card
for that transaction or just put a flag on your card for suspicious activity.
Another interesting example of
data mining is how one grocery store in the USA used the data it collected on
it’s shoppers to find patterns in their shopping habits. They found that when
men bought diapers on Thursdays and Saturdays, they also had a strong
tendency to buy beer. The grocery store could have used this valuable
information to increase their profits. One thing they could have done – odd
as it sounds – is move the beer display closer to the diapers. Or, they could
have simply made sure not to give any discounts on beer on Thursdays and
Saturdays. This is data mining in action – extracting meaningful data from a
huge data set.
Example
of data warehousing – Facebook
A great example of data
warehousing that everyone can relate to is what Facebook does. Facebook
basically gathers all of your data – your friends, your likes, who you stalk,
etc – and then stores that data into one central repository. Even though
Facebook most likely stores your friends, your likes, etc, in separate
databases, they do want to take the most relevant and important information
and put it into one central aggregated database. Why would they want to do
this? For many reasons – they want to make sure that you see the most
relevant ads that you’re most likely to click on, they want to make sure that
the friends that they suggest are the most relevant to you, etc – keep in
mind that this is the data mining phase, in which meaningful data and
patterns are extracted from the aggregated data. But, underlying all these
motives is the main motive: to make more money – after all, Facebook is a
business.
We can say that data warehousing
is basically a process in which data from multiple sources/databases is
combined into one comprehensive and easily accessible database. Then this
data is readily available to any business professionals, managers, etc. who
need to use the data to create forecasts – and who basically use the data for
data mining.
Datawarehousing
vs Datamining
Remember that data warehousing is
a process that must occur before any data mining can take place. In other
words, data warehousing is the process of compiling and organizing data into
one common database, and data mining is the process of extracting meaningful
data from that database. The data mining process relies on the data compiled
in the datawarehousing phase in order to detect meaningful patterns.
In the Facebook example that we
gave, the data mining will typically be done by business users who are not
engineers, but who will most likely receive assistance from engineers when
they are trying to manipulate their data. The data warehousing phase is a
strictly engineering phase, where no business users are involved. And this
gives us another way of defining the 2 terms: data mining is typically done
by business users with the assistance of engineers, and data warehousing is
typically a process done exclusively by engineers.
|
SQL Injection
A SQL injection attack is exactly
what the name suggests – it is where a hacker tries to “inject” his
harmful/malicious SQL code into someone else’s database, and force that
database to run his SQL. This could potentially ruin their database tables, and
even extract valuable or private information from their database tables. The
idea behind SQL injection is to have the application under attack run SQL that
it was never supposed to run. How do hackers do this? As always, it’s best to
show this with examples that will act as a tutorial on SQL injection.
SQL
Injection Example
In this tutorial on SQL injection,
we present a few different examples of SQL injection attacks, along with how
those attacks can be prevented. SQL injection attacks typically start with a
hacker inputting his or her harmful/malicious code in a specific form field on
a website. A website ‘form’, if you don’t already know, is something you have
definitely used – like when you log into Facebook you are using a form to
login, and a form input field can be any field on a form that asks for your
information – whether it’s an email address or a password, these are all form
fields.
For our example of SQL injection, we
will use a hypothetical form which many people have probably dealt with before:
the “email me my password” form, which many websites have in case one of their
users forgets their password.
The way a typical “email me my
password” form works is this: it takes the email address as an input from the
user, and then the application does a search in the database for that email
address. If the application does not find anything in the database for that
particular email address, then it simply does not send out an email with
a new password to anyone. However, if the application does successfully
find that email address in its database, then it will send out an email to that
email address with a new password, or whatever information is required to reset
the password.
But, since we are talking about SQL
injection, what would happen if a hacker was not trying to input a valid email
address, but instead some harmful SQL code that he wants to run on someone
else’s database to steal their information or ruin their data? Well, let’s
explore that with an example, starting from how a hacker would typically get
started in order to figure out a system works.
Starting
the SQL Injection Process
The SQL that would retrieve the
email address in the “email me my password” form would typically look something
like this:
SELECT
data
FROM table
WHERE Emailinput =
'$email_input';
This is, of course, a guess at what
the SQL being run by the application would look like, because a hacker would
not know this information since he does not have access to the application
code. The “$email_input” variable is used to hold whatever text the user inputs
into the email address form field.
Step
1: Figure out how the application handles bad inputs
Before a hacker can really start
taking advantage of a weak or insecure application, he must figure out how the
application handles a simple bad input first. Think of this initial step as the
hacker “feeling out” his opponent before he releases the really bad SQL.
So, with that in mind, the first
step a hacker would typically take is inputting an email address with a quote
appended to the end into the email form field. We will of course explain why
further down below. But for now, the input from the hacker would look something
like this – pay special attention to the fact that there is a quote appended to
the end of the email address:
hacker@programmerinterview.com'
If the hacker puts that exact text
into the email address form field then there are basically 2 possibilities:
- 1. The application will first “sanitize” the input by removing the extra quote at the end, since email addresses can not have quotes. Sanitizing data is the act of stripping out any characters that aren’t needed from the data that is supplied – in our case, the email address. Then, the application may run the sanitized input in the database query, and search for that particular email address in the database (without the quote of course).
- 2. The application will not sanitize the input first, and will take the input from the hacker and immediately run it as part of the SQL. This is what the hacker is hoping would happen, and we will assume that this is what our hypothetical application is doing. This is also known as constructing the SQL literally, without sanitizing. What it means is that the SQL being run by the application would look like this – pay extra attention to the fact that there is now an extra quote at the end of the WHERE statement in the SQL below:
SELECT
data
FROM table
WHERE Emailinput = 'hacker@programmerinterview.com'';
Now, what would happen if the SQL
above is executed by the application? Well, the SQL parser would see that there
is an extra quote mark at the end, and it will abort with a syntax error.
The
error response is key, and tells the hacker a lot
But, what will the hacker see on the
actual form page when he tries to input this email address with a quote at the
end? Well, it really depends on how the application is set up to handle errors
in the database, but the key here is that the hacker will most likely not
receive an error saying something like “This email address is unknown. Please
register to create an account” – which is what the hacker would see if the application
is actually sanitizing the input. It’s more likely that the hacker would see
something like “Internal error” or “Database error” – and that tells the
hacker a lot – because it tells him whether or not the application is
sanitizing its input. And if the application is not sanitizing it’s input
then it means that the database can most probably be exploited, destroyed,
and/or manipulated in some way that could be very bad for the application
owner.
Step
2: Run the actual SQL injection attack
Now, let’s say that the hacker now
knows that the database is vulnerable, and that he can attack further to get
some really good information. What could our hacker do? Well, if he’s been able
to successfully figure out the layout of the table, he could just type this
harmful code on the form field (where the email address would normally go):
Y';
UPDATE table
SET email =
'hacker@ymail.com'
WHERE email =
'joe@ymail.com';
Note the use of the SQL compliant
code – the extra quote followed by a semicolon, which allows the hacker to
close the statement and then incredibly run another statement of his own!
Then, if this malicious code is run
by the application under attack, it would look like this:
SELECT
data
FROM table
WHERE Emailinput = 'Y';
UPDATE table
SET email =
'hacker@ymail.com'
WHERE email =
'joe@ymail.com';
Can you see what this code is doing?
Well, it is resetting the email address that belongs to “joe@ymail.com” to
“hacker@ymail.com”. This means that the hacker is now changing a user’s account
so that it uses his own email address – hacker@ymail.com. This then means that
the hacker can reset the password – and have it sent to his own email address!
Now, he also has a login and a password to the application, but it is under
someone else’s account.
In the example above, we did skip
some steps that a hacker would have taken to figure out the table name and the
table layout, because we wanted to keep this article relatively short. But, the
idea is that SQL injection is a real threat, and taking measures to prevent it
is extremely important.
No comments:
Post a Comment