Electric Power Research Institute of Yunnan Power Grid Co.,Ltd.
K-means聚类算法因计算速度快、准确率高等优势被应用于大规模配电网数据异常检测,但当聚类数不合适时,可能导致聚类结果不理想。提出了一种基于改进elbow method和轮廓系数的聚类数选择算法IES(improved elbow method and silhouette coefficient),该算法首先利用elbow method的聚类评价指标和聚类数上限,确定随数据集不同而自适应变化的阈值,通过自适应阈值求解聚类数下限,然后在聚类数上下限内计算轮廓系数,并提出“一个极大值”规则避免计算所有轮廓系数,提高算法速度,最后利用轮廓系数选取合适聚类数。通过召回率评价异常检测效果,说明为K-means选取合适聚类数对异常检测的重要性。算例结果表明：IES算法能在自适应获取最佳聚类数的同时,大大削减计算时间,提高了K-means算法在线监测的准确率和高效性。
K-means clustering algorithm has been applied to anomaly detection of large-scale distribution network data due to its advantages of fast computation speed and high accuracy. However, when the assumed clustering number is not appropriate, the algorithm may lead to unavailability of clustering results. Based on improved elbow method and silhouette coefficient (IES), a clustering number selection algorithm IES is proposed. Firstly, the clustering evaluation index of the elbow method and the upper limit of clustering number are used to set a threshold which can adaptively change with data sets and is used to solve the lower limit of clustering number, and then the silhouette coefficient is calculated within the upper and lower limit of the clustering number. In order to improve the algorithm speed and avoid calculating all the silhouette coefficients, an “one maximum” rule is proposed. Finally, the calculated silhouette coefficients are used to select the appropriate clustering number. The effect of anomaly detection is evaluated by recall rate, and the importance of selecting appropriate clustering number for K-means anomaly detection is illustrated. Simulation results show that IES algorithm can obtain the optimal clustering number adaptively, and at the same time, greatly shorten the calculation time, and improve the accuracy and efficiency of the K-means algorithm in online monitoring.