Enjoy Upto 50% off on all Your Assignments ORDER NOW
Download Free Sample Order New Solution

Data Warehousing - Question 1

The main idea for the mini-batch K average algorithm can be stored in memory using a small random data batch with a fixed size, so you can use the mini-batch K average algorithm. Iteration takes a new random sample from the data set as well as uses it to update the cluster. This is repeated until it converges. Each minibatch updates the cluster using a combination of prototype values as well as data convexity to apply the learning rate that decreases with the number of iterations.

Data Warehousing - Question 2

Overfitting is a term used in statistics, which refers to modeling errors that occur when a function is very closely associated with a particular data set. Validation metrics typically increase until the model begins to become stagnant or degraded under the influence of over-fit. During the upward trend, the model requires good adaptation, as well as when it is achieved, the trend begins to decline or stagnate.

The easiest way to avoid over-adaptation is to ensure that the number of independent parameters for the adaptation is much less than the number of data points you having. An independent parameter is not the number of independent variables, but the number of polynomial coefficients or the number of weights as well as bias in the neural network. My rule of thumb is to choose an approximation form so that the number of data points is 5 to 10 times the number of factors. If you cannot afford to buy luxury, you will not be less than twice as old. A simple example: If a single variable has 10 data points, y = f(x), the ninth polynomial provides a complete approximation - a classic example of an Overfitting. I'm going to use my rules of experience to try to adapt to a quadratic or quaternary curve.

Data Warehousing - Question 3

Noise-driven application density-based spatial clustering (DBSCAN) is a well known data clustering algorithm that is commonly used for data mining as well as machine learning. The DBSCAN algorithm basically requires two parameters.

eps: Specifies how close the points must be to each other to be considered part of the cluster. This means that if the distance between the two points is less than or equal to this value (eps), these points are considered neighbors.

minPoints: The minimum number of points to form a dense area. For example, if you set the minPoints parameter to 5, you must have at least 5 points to form a dense region.

Data Warehousing - Question 4

Supervised learning:

Supervised learning is an information mining assignment that estimates capacity from marked prepared data. The preparation information comprises a configuration for preparing the illustration. In a supervised learning, all illustrations are a couple consisting of information protests (usually vectors) as well as a desired yield assessment (also called a supervision flag). The Supervised Learning Calculation not only examines the preparation information, but also generates the derived capacity available for mapping new cases. In an ideal situation, the calculation is considered to effectively determine the class name of the inconspicuous example. This requires that calculations be captured in a "wise" manner to summarize into a hidden situation from the prepared information.

Unsupervised Learning:

In data mining, the problem with unsupervised learning is to try to find a structure covered with unlabeled information. The illustrations given to the students are not labeled, so there are no mistakes or compensation flags to evaluate the potential placement.

Data Warehousing - Question 5

Here are four features of the data warehouse —>

  1. Target orientation
  2. The data warehouse is subject-oriented to provide information about the subject rather than continuous operation of the organization. These subjects include products, customers, suppliers, sales, as well as revenue.
  3. This data warehouse allows you to answer questions such as, "Who was the best customer of this item last year?"Or, "Who do you think will be our best customer next year?" This feature defines a data warehouse for each theme, as well as the sales in this case are thematically oriented to the data warehouse.
  4. Integration
  5. The data warehouse must have a consistent format of data from different sources.
  6. You need to resolve issues such as name conflicts as well as discrepancies between units of measure.
  7. Non-volatile
  8. Non-volatile means that data is not changed after it is entered into the data warehouse.
  9. Available
  10. The update driven approach pre-consolidates information from multiple heterogeneous sources as well as stores it in the warehouse.
  11. This information is available for direct queries as well as analysis.
  12. You can still get more information on line.
  13. Data is typically available in all departments.

Data Warehousing - Question 6

Snowflake schema will be the most appropriate schema to draw the conceptual model.

Data Warehousing - Question 7

P(Red|yes), P(SUV|Yes), P(Domestic|Yes),

P(Red|No),P(SUV|No) and P(Domestic|No) and multiply them by P(Yes) and P(No) respectively. (AND operation)

P(Red|yes)=5 where dj= yes (where any type of car is stolen(yes)) and in 3 of those cases Ki=Red

Therefore for P(Red|yes), n('yes' of the feature)=5 and nc(red car stolen 'yes')=3

Note: There is no data for Red Domestic SUV in our data set. Given all attributes are binary values(so 2 possible values). Therefore the probability is p=1/2=0.5 in all cases.

m is arbitrary(random) value. Assume the equivalent sample size,m=3.

so for P(Red|yes) n=5, nc=3, p=0.5 and m=3. Now see the tables below for all the probabilities

From the above formula

P(ki|dj)=f(x)=(nc+mp)/(n+m)

P(Red|Yes)=f(x)= (3+3*0.5)/(5+3)=0.56

P(Red|No)=f(x)=(2+3*0.5)/(5+3)=0.43

P(SUV|Yes)=f(x)=(1+3*0.5)/(5+3)=0.31

P(SUV|No)=f(x)=(3+3*0.5)/(5+3)=0.56

P(Domestic|yes)=f(x)=(2+3*0.5)/((5+3)=0.43

P(Domestic|No)=f(x)=(3+3*0.5)/(5+3)=0.56

We know P(Yes)=0.5 and P(No)=0.5

Applying in the equation

dnb= argmaxdjEvP(dj)P(ki|dj)

for d=yes ,we have P(yes)*P(Red|Yes)*P(SUV|Yes)*P(Domestic|Yes)=0.5*0.56*0.31*0.43=0.03749

and for d=no

P(No)*P(Red|No)*P(SUV|No)*P(Domestic|No)=0.5*0.43*0.56*0.56=0.0674

since 0.068>0.037, this example gets classified as 'no'.

Data Warehousing - Question 8

A data warehouse goes about as a storehouse that stores verifiable information that can be utilized for examination. OLAP is an on line investigation process that you can use to break down just as assess information in a distribution center. The distribution center has information from different sources. The OLAP instrument encourages you arrange information in your distribution center utilizing a multidimensional model.

Data Warehousing - Question 9

s( {i5}) = 8 / 10 = 0.8 s( {i2, i4}) = 2 / 10 = 0.2 s( {i2, i4, i5}) = 2 / 10 = 0.2

s( {i5}) = 4 / 5 = 0.8 s( {i2, i4}) = 5 / 5 = 1 s( {i2, i4, i5}) = 4 / 5 = 0.8

 Remember, at the center of any academic work, lies clarity and evidence. Should you need further assistance, do look up to our Data Warehouse Assignment Help

Upto 50% Off*
Get A Free Quote in 5 Mins*
Applicable Time Zone is AEST [Sydney, NSW] (GMT+11)
+

Why Us


Complete Confidentiality
All Time Assistance

Get 24x7 instant assistance whenever you need.

Student Friendly Prices
Student Friendly Prices

Get affordable prices for your every assignment.

Before Time Delivery
Before Time Delivery

Assure you to deliver the assignment before the deadline

No Plag No AI
No Plag No AI

Get Plagiarism and AI content free Assignment

Expert Consultation
Expert Consultation

Get direct communication with experts immediately.

Get
500 Words Free
on your assignment today

ezgif

It's Time To Find The Right Expert to Prepare Your Assignment!

Do not let assignment submission deadlines stress you out. Explore our professional assignment writing services with competitive rates today!

Secure Your Assignment!

Online Assignment Expert - Whatsapp Get Best OffersOn WhatsApp

refresh