Oracle DBA: data mining and data warehousing

What’s the difference between data mining and data warehousing?

Data mining is the process of finding patterns in a given data set. These patterns can often provide meaningful and insightful data to whoever is interested in that data. Data mining is used today in a wide variety of contexts – in fraud detection, as an aid in marketing campaigns, and even supermarkets use it to study their consumers.

Data warehousing can be said to be the process of centralizing or aggregating data from multiple sources into one common repository.

Example of data mining

If you’ve ever used a credit card, then you may know that credit card companies will alert you when they think that your credit card is being fraudulently used by someone other than you. This is a perfect example of data mining – credit card companies have a history of your purchases from the past and know geographically where those purchases have been made. If all of a sudden some purchases are made in a city far from where you live, the credit card companies are put on alert to a possible fraud since their data mining shows that you don’t normally make purchases in that city. Then, the credit card company can disable your card for that transaction or just put a flag on your card for suspicious activity.

Another interesting example of data mining is how one grocery store in the USA used the data it collected on it’s shoppers to find patterns in their shopping habits. They found that when men bought diapers on Thursdays and Saturdays, they also had a strong tendency to buy beer. The grocery store could have used this valuable information to increase their profits. One thing they could have done – odd as it sounds – is move the beer display closer to the diapers. Or, they could have simply made sure not to give any discounts on beer on Thursdays and Saturdays. This is data mining in action – extracting meaningful data from a huge data set.

Example of data warehousing – Facebook

A great example of data warehousing that everyone can relate to is what Facebook does. Facebook basically gathers all of your data – your friends, your likes, who you stalk, etc – and then stores that data into one central repository. Even though Facebook most likely stores your friends, your likes, etc, in separate databases, they do want to take the most relevant and important information and put it into one central aggregated database. Why would they want to do this? For many reasons – they want to make sure that you see the most relevant ads that you’re most likely to click on, they want to make sure that the friends that they suggest are the most relevant to you, etc – keep in mind that this is the data mining phase, in which meaningful data and patterns are extracted from the aggregated data. But, underlying all these motives is the main motive: to make more money – after all, Facebook is a business.

We can say that data warehousing is basically a process in which data from multiple sources/databases is combined into one comprehensive and easily accessible database. Then this data is readily available to any business professionals, managers, etc. who need to use the data to create forecasts – and who basically use the data for data mining.

Datawarehousing vs Datamining

Remember that data warehousing is a process that must occur before any data mining can take place. In other words, data warehousing is the process of compiling and organizing data into one common database, and data mining is the process of extracting meaningful data from that database. The data mining process relies on the data compiled in the datawarehousing phase in order to detect meaningful patterns.

In the Facebook example that we gave, the data mining will typically be done by business users who are not engineers, but who will most likely receive assistance from engineers when they are trying to manipulate their data. The data warehousing phase is a strictly engineering phase, where no business users are involved. And this gives us another way of defining the 2 terms: data mining is typically done by business users with the assistance of engineers, and data warehousing is typically a process done exclusively by engineers.

SQL Injection

A SQL injection attack is exactly what the name suggests – it is where a hacker tries to “inject” his harmful/malicious SQL code into someone else’s database, and force that database to run his SQL. This could potentially ruin their database tables, and even extract valuable or private information from their database tables. The idea behind SQL injection is to have the application under attack run SQL that it was never supposed to run. How do hackers do this? As always, it’s best to show this with examples that will act as a tutorial on SQL injection.

SQL Injection Example

In this tutorial on SQL injection, we present a few different examples of SQL injection attacks, along with how those attacks can be prevented. SQL injection attacks typically start with a hacker inputting his or her harmful/malicious code in a specific form field on a website. A website ‘form’, if you don’t already know, is something you have definitely used – like when you log into Facebook you are using a form to login, and a form input field can be any field on a form that asks for your information – whether it’s an email address or a password, these are all form fields.

For our example of SQL injection, we will use a hypothetical form which many people have probably dealt with before: the “email me my password” form, which many websites have in case one of their users forgets their password.

The way a typical “email me my password” form works is this: it takes the email address as an input from the user, and then the application does a search in the database for that email address. If the application does not find anything in the database for that particular email address, then it simply does not send out an email with a new password to anyone. However, if the application does successfully find that email address in its database, then it will send out an email to that email address with a new password, or whatever information is required to reset the password.

But, since we are talking about SQL injection, what would happen if a hacker was not trying to input a valid email address, but instead some harmful SQL code that he wants to run on someone else’s database to steal their information or ruin their data? Well, let’s explore that with an example, starting from how a hacker would typically get started in order to figure out a system works.

Starting the SQL Injection Process

The SQL that would retrieve the email address in the “email me my password” form would typically look something like this:

SELECT data

FROM table

WHERE Emailinput = '$email_input';

This is, of course, a guess at what the SQL being run by the application would look like, because a hacker would not know this information since he does not have access to the application code. The “$email_input” variable is used to hold whatever text the user inputs into the email address form field.

Step 1: Figure out how the application handles bad inputs

Before a hacker can really start taking advantage of a weak or insecure application, he must figure out how the application handles a simple bad input first. Think of this initial step as the hacker “feeling out” his opponent before he releases the really bad SQL.

So, with that in mind, the first step a hacker would typically take is inputting an email address with a quote appended to the end into the email form field. We will of course explain why further down below. But for now, the input from the hacker would look something like this – pay special attention to the fact that there is a quote appended to the end of the email address:

hacker@programmerinterview.com'

If the hacker puts that exact text into the email address form field then there are basically 2 possibilities:

1. The application will first “sanitize” the input by removing the extra quote at the end, since email addresses can not have quotes. Sanitizing data is the act of stripping out any characters that aren’t needed from the data that is supplied – in our case, the email address. Then, the application may run the sanitized input in the database query, and search for that particular email address in the database (without the quote of course).

2. The application will not sanitize the input first, and will take the input from the hacker and immediately run it as part of the SQL. This is what the hacker is hoping would happen, and we will assume that this is what our hypothetical application is doing. This is also known as constructing the SQL literally, without sanitizing. What it means is that the SQL being run by the application would look like this – pay extra attention to the fact that there is now an extra quote at the end of the WHERE statement in the SQL below:

SELECT data

FROM table

WHERE Emailinput = 'hacker@programmerinterview.com'';

Now, what would happen if the SQL above is executed by the application? Well, the SQL parser would see that there is an extra quote mark at the end, and it will abort with a syntax error.

The error response is key, and tells the hacker a lot

But, what will the hacker see on the actual form page when he tries to input this email address with a quote at the end? Well, it really depends on how the application is set up to handle errors in the database, but the key here is that the hacker will most likely not receive an error saying something like “This email address is unknown. Please register to create an account” – which is what the hacker would see if the application is actually sanitizing the input. It’s more likely that the hacker would see something like “Internal error” or “Database error” – and that tells the hacker a lot – because it tells him whether or not the application is sanitizing its input. And if the application is not sanitizing it’s input then it means that the database can most probably be exploited, destroyed, and/or manipulated in some way that could be very bad for the application owner.

Step 2: Run the actual SQL injection attack

Now, let’s say that the hacker now knows that the database is vulnerable, and that he can attack further to get some really good information. What could our hacker do? Well, if he’s been able to successfully figure out the layout of the table, he could just type this harmful code on the form field (where the email address would normally go):

Y';

UPDATE table

SET email = 'hacker@ymail.com'

WHERE email = 'joe@ymail.com';

Note the use of the SQL compliant code – the extra quote followed by a semicolon, which allows the hacker to close the statement and then incredibly run another statement of his own!

Then, if this malicious code is run by the application under attack, it would look like this:

SELECT data

FROM table

WHERE Emailinput = 'Y';

UPDATE table

SET email = 'hacker@ymail.com'

WHERE email = 'joe@ymail.com';

Can you see what this code is doing? Well, it is resetting the email address that belongs to “joe@ymail.com” to “hacker@ymail.com”. This means that the hacker is now changing a user’s account so that it uses his own email address – hacker@ymail.com. This then means that the hacker can reset the password – and have it sent to his own email address! Now, he also has a login and a password to the application, but it is under someone else’s account.

In the example above, we did skip some steps that a hacker would have taken to figure out the table name and the table layout, because we wanted to keep this article relatively short. But, the idea is that SQL injection is a real threat, and taking measures to prevent it is extremely important.

Oracle DBA

Monday, July 15, 2013

data mining and data warehousing

No comments:

Post a Comment

Hi. I am Sandeep Pannu DBA