| 
What’s the difference between data mining and data
  warehousing? | |
|  | |
| 
Data mining is the process of
  finding patterns in a given data set. These patterns can often provide
  meaningful and insightful data to whoever is interested in that data. Data
  mining is used today in a wide variety of contexts – in fraud detection, as
  an aid in marketing campaigns, and even supermarkets use it to study their
  consumers.  
Data warehousing can be said to be
  the process of centralizing or aggregating data from multiple
  sources into one common repository.  
Example
  of data mining 
If you’ve ever used a credit card,
  then you may know that credit card companies will alert you when they think
  that your credit card is being fraudulently used by someone other than you.
  This is a perfect example of data mining – credit card companies have a
  history of your purchases from the past and know geographically where those
  purchases have been made. If all of a sudden some purchases are made in a
  city far from where you live, the credit card companies are put on alert to a
  possible fraud since their data mining shows that you don’t normally make
  purchases in that city. Then, the credit card company can disable your card
  for that transaction or just put a flag on your card for suspicious activity.
   
Another interesting example of
  data mining is how one grocery store in the USA used the data it collected on
  it’s shoppers to find patterns in their shopping habits. They found that when
  men bought diapers on Thursdays and Saturdays, they also had a strong
  tendency to buy beer. The grocery store could have used this valuable
  information to increase their profits. One thing they could have done – odd
  as it sounds – is move the beer display closer to the diapers. Or, they could
  have simply made sure not to give any discounts on beer on Thursdays and
  Saturdays. This is data mining in action – extracting meaningful data from a
  huge data set.  
Example
  of data warehousing – Facebook 
A great example of data
  warehousing that everyone can relate to is what Facebook does. Facebook
  basically gathers all of your data – your friends, your likes, who you stalk,
  etc – and then stores that data into one central repository. Even though
  Facebook most likely stores your friends, your likes, etc, in separate
  databases, they do want to take the most relevant and important information
  and put it into one central aggregated database. Why would they want to do
  this? For many reasons – they want to make sure that you see the most
  relevant ads that you’re most likely to click on, they want to make sure that
  the friends that they suggest are the most relevant to you, etc – keep in
  mind that this is the data mining phase, in which meaningful data and
  patterns are extracted from the aggregated data. But, underlying all these
  motives is the main motive: to make more money – after all, Facebook is a
  business.  
We can say that data warehousing
  is basically a process in which data from multiple sources/databases is
  combined into one comprehensive and easily accessible database. Then this
  data is readily available to any business professionals, managers, etc. who
  need to use the data to create forecasts – and who basically use the data for
  data mining.  
Datawarehousing
  vs Datamining 
Remember that data warehousing is
  a process that must occur before any data mining can take place. In other
  words, data warehousing is the process of compiling and organizing data into
  one common database, and data mining is the process of extracting meaningful
  data from that database. The data mining process relies on the data compiled
  in the datawarehousing phase in order to detect meaningful patterns.  
In the Facebook example that we
  gave, the data mining will typically be done by business users who are not
  engineers, but who will most likely receive assistance from engineers when
  they are trying to manipulate their data. The data warehousing phase is a
  strictly engineering phase, where no business users are involved. And this
  gives us another way of defining the 2 terms: data mining is typically done
  by business users with the assistance of engineers, and data warehousing is
  typically a process done exclusively by engineers.  | 
SQL Injection 
A SQL injection attack is exactly
what the name suggests – it is where a hacker tries to “inject” his
harmful/malicious SQL code into someone else’s database, and force that
database to run his SQL. This could potentially ruin their database tables, and
even extract valuable or private information from their database tables. The
idea behind SQL injection is to have the application under attack run SQL that
it was never supposed to run. How do hackers do this? As always, it’s best to
show this with examples that will act as a tutorial on SQL injection. 
SQL
Injection Example
In this tutorial on SQL injection,
we present a few different examples of SQL injection attacks, along with how
those attacks can be prevented. SQL injection attacks typically start with a
hacker inputting his or her harmful/malicious code in a specific form field on
a website. A website ‘form’, if you don’t already know, is something you have
definitely used – like when you log into Facebook you are using a form to
login, and a form input field can be any field on a form that asks for your
information – whether it’s an email address or a password, these are all form
fields. 
For our example of SQL injection, we
will use a hypothetical form which many people have probably dealt with before:
the “email me my password” form, which many websites have in case one of their
users forgets their password. 
The way a typical “email me my
password” form works is this: it takes the email address as an input from the
user, and then the application does a search in the database for that email
address. If the application does not find anything in the database for that
particular email address, then it simply does not send out an email with
a new password to anyone. However, if the application does successfully
find that email address in its database, then it will send out an email to that
email address with a new password, or whatever information is required to reset
the password. 
But, since we are talking about SQL
injection, what would happen if a hacker was not trying to input a valid email
address, but instead some harmful SQL code that he wants to run on someone
else’s database to steal their information or ruin their data? Well, let’s
explore that with an example, starting from how a hacker would typically get
started in order to figure out a system works. 
Starting
the SQL Injection Process
The SQL that would retrieve the
email address in the “email me my password” form would typically look something
like this: 
SELECT
data 
          FROM table
              WHERE Emailinput =
'$email_input';
This is, of course, a guess at what
the SQL being run by the application would look like, because a hacker would
not know this information since he does not have access to the application
code. The “$email_input” variable is used to hold whatever text the user inputs
into the email address form field. 
Step
1: Figure out how the application handles bad inputs 
Before a hacker can really start
taking advantage of a weak or insecure application, he must figure out how the
application handles a simple bad input first. Think of this initial step as the
hacker “feeling out” his opponent before he releases the really bad SQL. 
So, with that in mind, the first
step a hacker would typically take is inputting an email address with a quote
appended to the end into the email form field. We will of course explain why
further down below. But for now, the input from the hacker would look something
like this – pay special attention to the fact that there is a quote appended to
the end of the email address: 
hacker@programmerinterview.com'
If the hacker puts that exact text
into the email address form field then there are basically 2 possibilities: 
- 1. The application will first “sanitize” the input by removing the extra quote at the end, since email addresses can not have quotes. Sanitizing data is the act of stripping out any characters that aren’t needed from the data that is supplied – in our case, the email address. Then, the application may run the sanitized input in the database query, and search for that particular email address in the database (without the quote of course).
- 2. The application will not sanitize the input first, and will take the input from the hacker and immediately run it as part of the SQL. This is what the hacker is hoping would happen, and we will assume that this is what our hypothetical application is doing. This is also known as constructing the SQL literally, without sanitizing. What it means is that the SQL being run by the application would look like this – pay extra attention to the fact that there is now an extra quote at the end of the WHERE statement in the SQL below:
SELECT
data 
      FROM table
         WHERE Emailinput = 'hacker@programmerinterview.com'';
Now, what would happen if the SQL
above is executed by the application? Well, the SQL parser would see that there
is an extra quote mark at the end, and it will abort with a syntax error. 
The
error response is key, and tells the hacker a lot 
But, what will the hacker see on the
actual form page when he tries to input this email address with a quote at the
end? Well, it really depends on how the application is set up to handle errors
in the database, but the key here is that the hacker will most likely not
receive an error saying something like “This email address is unknown. Please
register to create an account” – which is what the hacker would see if the application
is actually sanitizing the input. It’s more likely that the hacker would see
something like “Internal error” or “Database error” – and that tells the
hacker a lot – because it tells him whether or not the application is
sanitizing its input. And if the application is not sanitizing it’s input
then it means that the database can most probably be exploited, destroyed,
and/or manipulated in some way that could be very bad for the application
owner. 
Step
2: Run the actual SQL injection attack
Now, let’s say that the hacker now
knows that the database is vulnerable, and that he can attack further to get
some really good information. What could our hacker do? Well, if he’s been able
to successfully figure out the layout of the table, he could just type this
harmful code on the form field (where the email address would normally go): 
     Y';
     UPDATE table
      SET email =
'hacker@ymail.com'
      WHERE email =
'joe@ymail.com';
Note the use of the SQL compliant
code – the extra quote followed by a semicolon, which allows the hacker to
close the statement and then incredibly run another statement of his own! 
Then, if this malicious code is run
by the application under attack, it would look like this: 
SELECT
data 
          FROM table
              WHERE Emailinput = 'Y';
     UPDATE table
      SET email =
'hacker@ymail.com'
      WHERE email =
'joe@ymail.com';
Can you see what this code is doing?
Well, it is resetting the email address that belongs to “joe@ymail.com” to
“hacker@ymail.com”. This means that the hacker is now changing a user’s account
so that it uses his own email address – hacker@ymail.com. This then means that
the hacker can reset the password – and have it sent to his own email address!
Now, he also has a login and a password to the application, but it is under
someone else’s account. 
In the example above, we did skip
some steps that a hacker would have taken to figure out the table name and the
table layout, because we wanted to keep this article relatively short. But, the
idea is that SQL injection is a real threat, and taking measures to prevent it
is extremely important. 
 
No comments:
Post a Comment