Definition
An artificial neuron is a computational model inspired by the biological neuron, designed to process input signals, apply a weighting scheme, and produce an output through an activation function. It serves as the fundamental building block of artificial neural networks used in machine learning and artificial intelligence.
Overview
Artificial neurons are organized into layers within an artificial neural network (ANN). Each neuron receives a vector of inputs, multiplies each input by a corresponding weight, adds a bias term, and then passes the resulting sum through a non‑linear activation function (e.g., sigmoid, ReLU, tanh). The output can be forwarded to neurons in subsequent layers or, in the case of output neurons, interpreted as the network’s prediction. Learning in ANNs typically involves adjusting the weights and biases based on error gradients derived from a loss function, a process commonly implemented via back‑propagation and gradient descent algorithms.
Etymology/Origin
The term combines “artificial,” indicating a man‑made construct, with “neuron,” the basic functional unit of the nervous system. Early theoretical models of artificial neurons were introduced in the 1940s and 1950s, notably by Warren McCulloch and Walter Pitts (1943), who proposed a simplified binary model of neuronal firing. Subsequent developments, such as the perceptron by Frank Rosenblatt (1958) and multilayer perceptrons in the 1980s, refined the concept into the modern artificial neuron.
Characteristics
- Inputs (x₁, x₂, …, xₙ): Numerical values representing features or signals from other neurons.
- Weights (w₁, w₂, …, wₙ): Adjustable parameters that modulate the influence of each input.
- Bias (b): An additional parameter that allows the activation function to be shifted, improving model flexibility.
- Summation Function: Typically a weighted sum, Σ wᵢxᵢ + b, which aggregates the input contributions.
- Activation Function: A non‑linear mapping (e.g., sigmoid σ(z) = 1/(1+e⁻ᶻ), ReLU(z) = max(0, z)) that introduces non‑linearity, enabling the network to approximate complex functions.
- Learning Rule: Algorithms such as stochastic gradient descent, Adam, or variants that iteratively update weights and bias to minimize a defined loss.
- Output: The transformed value, which can be a scalar, vector, or probability distribution, depending on the network architecture and task.
Related Topics
- Artificial neural network (ANN)
- Perceptron
- Multilayer perceptron (MLP)
- Activation function (e.g., ReLU, sigmoid, tanh)
- Back‑propagation
- Gradient descent optimization
- Deep learning
- Computational neuroscience
- Biological neuron (for comparative study)