Fighting Back Against Attacks In Federated Learning

Federated Learning (FL) is we train AI models. Instead of sending all your sensitive data to a central location, FL keeps the data where it is, and only shares model updates. This preserves privacy and enables AI to run closer to where the data is generated.

However, with computation and data spread across many devices, new security challenges arise. Attackers can join the training process and subtly influence it, leading to degraded accuracy, biased outputs or hidden backdoors in the model.

In this project, we set out to investigate how we can detect and mitigate such attacks in FL. To do this, we built a multi node simulator that enables researchers and industry professionals to reproduce attacks and test defences more efficiently.

Contents

Why This Matters
Building the Multi Node FL Attack Simulator
The Attacks We Studied
Beyond Attacks — Understanding Unintentional Impact
Mitigation Strategies Explained
- Experiments
Why we need adaptive aggregation strategies
EE-TrMean: An epsilon greedy aggregation strategy

Why This Matters

A non-technical Example: Think of a shared recipe book that chefs from many restaurants contribute to. Each chef updates a few recipes with their own improvements. A dishonest chef could deliberately add the wrong ingredients to sabotage the dish, or quietly insert a special flavour that only they know how to fix. If no one checks the recipes carefully, all future diners across all restaurants could end up with ruined or manipulated meals.
A Technical Example: The same concept appears in FL as data poisoning (manipulating training examples) and model poisoning (altering weight updates). These attacks are especially damaging when the federation has non IID data distributions, imbalanced data partitions or late joining clients. Contemporary defences such as Multi KRUM, Trimmed Mean and Divide and Conquer can still fail in certain scenarios.

Building the Multi Node FL Attack Simulator

To evaluate the resilience of federated learning against real-world threats, we built a multi-node attack simulator on top of the Scaleout Systems FEDn framework. This simulator makes it possible to reproduce attacks, test defences, and scale experiments with hundreds or even thousands of clients in a controlled environment.

Key capabilities:

Flexible deployment: runs distributed FL jobs using Kubernetes, Helm and Docker.
Realistic data settings: Supports IID/non-IID label distributions, imbalanced data partitions and late joining clients.
Attack injection: Includes implementation of common poisoning attacks (Label Flipping, Little is Enough) and allows new attacks to be defined with ease.
Defense benchmarking: Integrates existing aggregation strategies (FedAvg, Trimmed Mean, Multi-KRUM, Divide and Conquer) and allows for experimentation and testing of a range of defensive strategies and aggregation rules.
Scalable experimentation: Simulation parameters such as number of clients, malicious share and participation patterns can be tuned from one single configuration file.

Using FEDn’s architecture means that the simulations benefit from the robust training orchestration, client management and enables visual monitoring through the Studio web interface.

It is also important to note that the FEDn framework supports Server Functions. This feature makes it possible to implement new aggregation strategies and evaluate them using the attack simulator.

To start with the first example project using FEDn, here is the quickstart guide.

The FEDn framework is free for all academic and research projects, as well as for industrial testing and trials.

The attack simulator is available and ready to use as an open source software.

The Attacks We Studied

Label Flipping (Data Poisoning) – Malicious clients flip labels in their local datasets, such as changing “cat” to “dog” to reduce accuracy.
Little is Enough (Model Poisoning) – Attackers make small but targeted adjustments to their model updates to shift the global model output toward their own goals. In this thesis we applied the Little is Enough attack every 3rd round.

Beyond Attacks — Understanding Unintentional Impact

While this study focuses on deliberate attacks, it is equally valuable for understanding the effects of marginal contributions caused by misconfigurations or device malfunctions in large-scale federations.

In our recipe example, even an honest chef might accidentally use the wrong ingredient because their oven is broken or their scale is inaccurate. The mistake is unintentional, but it still changes the shared recipe in ways that could be harmful if repeated by many contributors.

In cross-device or fleet learning setups, where thousands or millions of heterogeneous devices contribute to a shared model, faulty sensors, outdated configurations or unstable connections can degrade model performance in similar ways to malicious attacks. Studying attack resilience also reveals how to make aggregation rules robust to such unintentional noise.

Mitigation Strategies Explained

In FL, aggregation rules decide how to combine model updates from clients. Robust aggregation rules aim to reduce the influence of outliers, whether caused by malicious attacks or faulty devices. Here are the strategies we tested:

FedAvg (baseline) – Simply averages all updates without filtering. Very vulnerable to attacks.

Trimmed Mean (TrMean) – Sorts each parameter across clients, then discards the highest and lowest values before averaging. Reduces extreme outliers but can miss subtle attacks.
Multi KRUM – Scores each update by how close it is to its nearest neighbours in parameter space, keeping only those with the smallest total distance. Very sensitive to the number of updates selected (k).
EE Trimmed Mean (Newly developed) – An adaptive version of TrMean that uses epsilon–greedy scheduling to decide when to test different client subsets. More resilient to changing client behaviour, late arrivals and non IID distributions.

tables and plots presented in this post were originally designed by the Scaleout team.

Experiments

Across 180 experiments we evaluated different aggregation strategies under varying attack types, malicious client ratios and data distributions. For further details, please read the full thesis here .

The table above shows one of the series of experiments using label-flipping attack with non-IID label distributed and partially imbalanced data partitions. The table shows Test Accuracy and Test Loss AUC, computed over all participating clients. Each aggregation strategy’s results are shown in two rows, corresponding to the two late-policies (benign clients participating from the 5th round or malicious clients participating from the 5th round). Columns separate the results at the three malicious proportions, yielding six experiment configurations per aggregation strategy. The best result in each configuration is shown in bold.

While the table shows a relatively homogeneous response across all defense strategies, the individual plots present a completely different view. In FL, although a federation may reach a certain level of accuracy, it is equally important to examine client participation—specifically, which clients successfully contributed to the training and which were rejected as malicious. The following plots illustrate client participation under different defense strategies.

*Fig-1: TrMean – Label Flipping – non-IID Partially Imbalanced – 20% Malicious activity*

With 20% malicious clients under a label-flipping attack on non-IID, partially imbalanced data, Trimmed Mean (Fig-1) maintained overall accuracy but never fully blocked any client from contributing. While coordinate trimming reduced the impact of malicious updates, it filtered parameters individually rather than excluding entire clients, allowing both benign and malicious participants to remain in the aggregation throughout training.

In a scenario with 30% late-joining malicious clients and non-IID , imbalanced data, Multi-KRUM (Fig-2) mistakenly selected a malicious update from round 5 onward. High data heterogeneity made benign updates appear less similar, allowing the malicious update to rank as one of the most central and persist in one-third of the aggregated model for the rest of training.

Fig-2: Multi-KRUM – Label Flipping Attack – non-IID Imbalanced – 30% Malicious Activity (k=3)*

Why we need adaptive aggregation strategies

Existing robust aggregation rules, generally rely on static thresholds to decide which client update to include in aggregating the new global model. This highlights a shortcoming of current aggregation strategies, which can make them vulnerable to late participating clients, non-IID data distributions or data volume imbalances between clients. These insights led us to develop EE-Trimmed Mean (EE-TrMean).

EE-TrMean: An epsilon greedy aggregation strategy

EE-TrMean build on the classical Trimmed Mean, but adds an exploration vs. exploitation, epsilon greedy layer for client selection.

Exploration phase: All clients are allowed to contribute and a normal Trimmed Mean aggregation round is executed.
Exploitation phase: The clients that have been trimmed the least will be included into the exploitation phase, through an average score system based on previous rounds it participated.
The switch between the two phases is controlled by the epsilon-greedy policy with a decaying epsilon and an alpha ramp.

Each client earns a score based on whether its parameters survive trimming in each round. Over time the algorithm will increasingly favor the highest scoring clients, while occasionally exploring others to detect changes in behaviour. This adaptive approach allows EE-TrMean to increase resilience in cases where the data heterogeneity and malicious activity is high.

*Fig-3: EE-TrMean – Label Flipping – non-IID Partially Imbalanced – 20% Malicious activity*

In a label-flipping scenario with 20% malicious clients and late benign joiners on non-IID, partially imbalanced data, EE-TrMean (Fig-3) alternated between exploration and exploitation phases—initially allowing all clients, then selectively blocking low-scoring ones. While it occasionally excluded a benign client due to data heterogeneity (still much better than the known strategies), it successfully identified and minimized the contributions of malicious clients during training. This simple yet powerful modification improves the client’s contributions. The literature reports that as long as the majority of clients are honest, the model’s accuracy remains reliable.

Fighting Back Against Attacks in Federated Learning