Tue Feb 12 2019
What is mining algorithm and how does it work?
Data mining is the process of discovering actionable information from large sets of data. Data mining uses mathematical analysis to derive patterns and trends that exist in data. Typically, these patterns cannot be discovered by traditional data exploration because the relationships are too complex or because there is too much data. Building a mining model is part of a larger process that includes everything from asking questions about the data and creating a model to answer those questions, to deploying the model into a working environment.
An algorithm in data mining (or machine learning) is a set of heuristics and calculations that create a model from data. To create a model, the algorithm first analyzes the data you provide, looking for specific types of patterns or trends. The algorithm uses the results of this analysis over many iterations to find the optimal parameters for creating the mining model. These parameters are then applied across the entire data set to extract actionable patterns and detailed statistics.
Data Classification Methods
Statistical Algorithms
Statistical analysis systems such as SAS and SPSS have been used by analysts to detect unusual patterns and explain patterns using statistical models such as linear models. Such systems have their place and will continue to be used.
Neural Networks
Artificial neural networks mimic the pattern-finding capacity of the human brain and hence some researchers have suggested applying Neural Network algorithms to pattern-mapping. Neural networks have been applied successfully in a few applications that involve classification.
Genetic algorithms
Optimization techniques that use processes such as genetic combination, mutation, and natural selection in a design based on the concepts of natural evolution.
Nearest neighbor method
A technique that classifies each record in a dataset based on a combination of the classes of the k record(s) most similar to it in a historical dataset. Sometimes called the k-nearest neighbor technique.
Rule induction
The extraction of useful if-then rules from data based on statistical significance.
Data visualization
The visual interpretation of complex relationships in multidimensional data.
Many existing algorithms suggest abstracting the test data before classifying it into various classes. There are several alternatives for doing abstraction before classification: A data set can be generalized to either a minimally generalized abstraction level, an intermediate abstraction level, or a rather high abstraction level. Too low an abstraction level may result in scattered classes, bushy classification trees, and difficulty at concise semantic interpretation; whereas too high a level may result in the loss of classification accuracy. The generalization-based multi-level classification process has been implemented in the DB-Miner system.
How mining works
-
For each block of transactions, miners use computers to repeatedly and very quickly guess answers to a puzzle until one of them wins.
-
More specifically, the miners will run the block’s unique header metadata (including timestamp and software version) through a hash function (which will return a fixed-length, scrambled string of numbers and letters that looks random), only changing the ‘nonce value’, which impacts the resulting hash value.
-
If the miner finds a hash that matches the current target, the miner will be awarded ether and broadcast the block across the network for each node to validate and add to their own copy of the ledger. If miner B finds the hash, miner A will stop work on the current block and repeat the process for the next block.
-
It’s difficult for miners to cheat at this game. There’s no way to fake this work and come away with the correct puzzle answer. That’s why the puzzle-solving method is called ‘proof-of-work’.
-
On the other hand, it takes almost no time for others to verify that the hash value is correct, which is exactly what each node does.
-
Approximately every 12–15 seconds, a miner finds a block. If miners start to solve the puzzles more quickly or slowly than this, the algorithm automatically readjusts the difficulty of the problem so that miners spring back to roughly the 12-second solution time.
-
The miners randomly earn these ether, and their profitability depends on luck and the amount of computing power they devote to it.
-
The specific proof-of-work algorithm designed to require more memory to make it harder to mine using expensive ASICs – specialized mining chips that are now the only profitable way of mining bitcoin.