[Paper Review] Discovering covert node in networked organization
This paper proposes a likelihood-based method to detect covert nodes in complex networks—entities that participate in interactions but leave no trace in observed communication or collaboration records. By modeling network structure and maximizing likelihood of observed data, the approach identifies suspicious nodes and records with high precision, recall, and F1 scores, approaching theoretical limits when observed data ratio is high.
Abstract—This paper addresses a method to solve a node discovery problem in a complex network. Covert nodes which exist in a social network do not appear in the records which are observed on the communication or collaborative activities among the nodes. Discovering the covert nodes refers to identifying suspicious records in which the covert nodes would appear, or suspicious nodes which would be the neighbors of the covert nodes, if the covert nodes became overt. The mathematical model is developed for the maximal likelihood estimation of the network and for the identification of the suspicious records and nodes. Precision, recall, and F value characteristics are demonstrated with the test dataset generated from network models (real organization and mathematical model). The performance is close to the theoretical limit for any target covert nodes, network topologies, and network sizes if the ratio of the number of the observed data to the number of the possible communication patterns is high. Index Terms—Complex network, Likelihood, Link discovery, Node discovery, Organization, Social network.
Motivation & Objective
- To address the challenge of detecting 'covert nodes' in complex networks that are invisible in observed communication or collaboration records.
- To develop a mathematical model for maximal likelihood estimation of network structure and covert node presence.
- To identify suspicious records and neighboring nodes that would be linked to covert nodes if they became overt.
- To evaluate performance across diverse network topologies, sizes, and covert node targets.
- To demonstrate the method's robustness and near-optimal performance under high data coverage conditions.
Proposed method
- Formulates a probabilistic network model based on maximal likelihood estimation to infer hidden network structures from observed data.
- Models the likelihood of observed communication or collaboration patterns to estimate the probability of covert node presence.
- Identifies suspicious nodes and records by evaluating deviations from expected patterns under the estimated model.
- Uses synthetic datasets generated from real organizational networks and mathematical models to test the method.
- Applies precision, recall, and F1 score metrics to evaluate detection performance.
- Assesses performance across varying network sizes, topologies, and ratios of observed data to possible communication patterns.
Experimental results
Research questions
- RQ1How can covert nodes be detected in complex networks when they leave no trace in observed communication or collaboration records?
- RQ2What likelihood-based model can accurately estimate network structure and identify suspicious records and nodes?
- RQ3How does the method’s performance vary with respect to network size, topology, and data coverage ratio?
- RQ4To what extent does the method approach theoretical performance limits in detecting covert nodes?
- RQ5How do precision, recall, and F1 scores vary across different network configurations and data availability levels?
Key findings
- The method achieves high precision, recall, and F1 scores in detecting suspicious nodes and records, approaching theoretical performance limits.
- Performance is close to the theoretical limit when the ratio of observed data to possible communication patterns is high.
- The method remains effective across diverse network topologies, sizes, and target covert node configurations.
- The likelihood-based model successfully identifies suspicious nodes that would be neighbors of covert nodes if they became overt.
- The evaluation on synthetic datasets derived from real organizational networks confirms robustness and scalability.
- The results demonstrate that data coverage ratio is a critical factor in achieving optimal detection performance.
Better researchstarts right now
From paper design to paper writing, dramatically reduce your research time.
No credit card · Free plan available
This review was created by AI and reviewed by human editors.