# Trajectory Outlier Detection A Partition-and-Detect ：异常轨迹检测分割与检测.ppt

,April 8, 2007,,Trajectory Outlier Detection: A Partition-and-Detect Framework,Jae-Gil Lee, Jiawei Han, and Xiaolei Li Department of Computer Science University of Illinois at Urbana-Champaign,ICDE 2008,Table of Contents,Motivation Partition-and-Detect Framework Outlier Detection Algorithm: TRAOD Partitioning Phase (Simple) Detection Phase Partitioning Phase (Enhanced) Perance uation Related Work Conclusions,Outlier Detection,Definition: the process of detecting a data object that is grossly different from or inconsistent with the remaining set of data Applications: the detection of credit card fraud, the monitoring of criminal activities in electronic commerce, etc. Algorithms: distribution-based, distance-based, density-based, and deviation-based Target data: previous research has mainly dealt with outlier detection of point data,Analysis on Trajectory Data,Tremendous amounts of trajectory data of moving objects are being collected Example: vehicle positioning data, hurricane tracking data, animal movement data, etc. Trajectory outlier detection has many important, real-world applications Detection of suspicious persons in video surveillance Analysis of unusual air-mass trajectories in meteorology … A powerful outlier detection algorithm for trajectories is needed urgently,Limitations of Existing Algorithms,Knorr et al. [5] have presented one of very few attempts Define the distance between two whole trajectories using the summary ination (e.g., the coordinates of the starting and ending points) Apply a distance-based approach to detection of trajectory outliers Existing algorithms might not be able to detect outlying portions of trajectories Example: TR3 is not detected as an outlier since its overall behavior is similar to those of neighboring trajectories,,,,,,TR5,TR1,TR4,TR3,TR2,An outlying sub-trajectory,,,Discovery of Outlying Sub-Trajectories,Discovery of outlying sub-trajectories is very useful in the real world Example: Sudden changes in hurricane’s path [10] We propose the partition-and-detect framework,The Partition-and-Detect Framework,Consists of two phases: partitioning and detection,TR5,TR1,TR4,TR3,TR2,A set of trajectories,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,(1) Partition,,,,,,,,,,,,,,,,,,,,(2) Detect,,,,,,,TR3,A set of trajectory partitions,An outlier,Outlying trajectory partitions,,Note: A set of outlying trajectory partitions indicates an outlying sub-trajectory,,The Problem Statement,Given a set of trajectories I = {TR1, …, TRn}, our algorithm generates a set of outliers O = {O1, …, Om} with outlying trajectory partitions for each Oi Necessary definitions: A trajectory is a sequence of multi-dimensional points, which is denoted as TRi = p1p2p3 … pj … pleni; a trajectory partition (t-partition for short) is a line segment pipj (i j), where pi and pj are the points chosen from the same trajectory A t-partition is outlying if it does not have a sufficient number of similar neighbors A trajectory is an outlier if it contains a non-negligible amount of outlying t-partitions,The Outlier Detection Algorithm: TRAOD,Based on the partition-and-detect framework,Algorithm TRAOD (TRAjectory Outlier Detection) : A set of trajectories I = {TR1, …, TRn} Output: A set of outliers O = {O1, …, Om} with outlying t-partitions for each Oi Algorithm: /* Partitioning Phase */ 01: for each TR I do 02: Partition TR into a set L of line segments; 03: Accumulate L into a set D; /* Detection Phase */ 04: for each P D do 05: Mark P if it is an outlying t-partition; 06: for each TR I do 07: Output TR if it is an outlier;,,Where We Are Now,/* Partitioning Phase */ 01: for each TR I do 02: Partition TR into a set L of line segments 03: Accumulate L into a set D; /* Detection Phase */ 04: for each P D do 05: Mark P if it is an outlying t-partition; 06: for each TR I do 07: Output TR if it is an outlier;,by a simple strategy;,by a two-level partitioning strategy;,A Simple Partitioning Strategy (1/2),Careless partitioning (especially, in a long length) could miss possible outliers Example: Even though TRout behaves differently from its neighboring trajectories, these differences are averaged out due to careless partitioning,Neighboring Trajectories,,,,,,,,,,A t-partition,A trajectory TRout,A Simple Partitioning Strategy (2/2),A trajectory is partitioned at a base unit: the smallest meaningful unit of a trajectory in a given application Example: The base unit can be every single point Pros: high detection quality in general Cons: poor perance due to a large number of t-partitions remedied by a two-level partitioning strategy,,Neighboring Trajectories,,,,,,,,,,A t-partition,A trajectory TRout,,,,,,,,An outlying t-partition,,Where We Are Now,/* Partitioning Phase */ 01: for each TR I do 02: Partition TR into a set L of line segments 03: Accumulate L into a set D; /* Detection Phase */ 04: for each P D do 05: Mark P if it is an outlying t-partition; 06: for each TR I do 07: Output TR if it is an outlier;,by a simple strategy;,by a two-level partitioning strategy;,Distance between T-Partitions,The weighted sum of three components: the perpendicular distance( ), parallel distance( ), and angle distance( ) Adapted from similarity measures used in the domain of pattern recognition [13],Trajectory Outliers Based on Distance (1/2),Def. (a close trajectory): Def. (an outlying t-partition):,,,,,,,,,,,,,,,TRj is close to Li TRj is not close to Li,,,,Li is an outlying t-partition Li is not an outlying t-partition,,,,,,,,,,Not close,,≤ 1‒p,,,,,,,,,,,,,,,,,,Close, 1‒p,Trajectory Outliers Based on Distance (2/2),Def. (an outlier): A trajectory TRi is an outlier if,,,,,,,,,,,,,TRi,TRj,TRi is an outlier,TRj is not an outlier,,,,Incorporation of Density (1/2),The previous definition, as it is, has the local density problem A t-partition in a dense region tends to have relatively a larger number of close trajectories than that in a sparse region,,,T-Partitions in dense regions are favored!,,,Incorporation of Density (2/2),Def. (the density of a t-partition): The density of a t-partition Li is the number of t-partitions within the distance σ from Li, where σ is the standard deviation of pairwise distances between t-partitions Def. (the adjusting coefficient of a t-partition): Adjustment by the density The number of close trajectories is multiplied by the adjusting coefficient adj(Li) adj(Li) 1.0 in a sparse region,,Guidelines for Parameter Values,Three parameters: D corresponds to similar, p to sufficient, and F to non-negligible Remark: There is no universally correct parameter value even for the same data set and application Our guideline: Resorts on user feedback,,Want Many Outliers?,Have Many Trajectories?,Are Trajectories Short?,D p F,0.90,0.99,0.20,0.10,Smaller,Larger,,Where We Are Now,/* Partitioning Phase */ 01: for each TR I do 02: Partition TR into a set L of line segments 03: Accumulate L into a set D; /* Detection Phase */ 04: for each P D do 05: Mark P if it is an outlying t-partition; 06: for each TR I do 07: Output TR if it is an outlier;,by a simple strategy;,by a two-level partitioning strategy;,Two-Level Trajectory Partitioning,Objective Achieves much higher perance than the simple strategy Obtains the same result as that of the simple strategy; i.e., does not lose the quality of the result Basic idea 1. Partition a trajectory in coarse granularity first 2. Partition a coarse t-partition in fine granularity only when necessary Main benefit Narrows the search space that needs to be inspected in fine granularity Many portions of trajectories can be pruned early on,Intuition to Two-Level Trajectory Partitioning,If the distance between coarse t-partitions is very large (or small), the distances between their fine t-partitions is also very large (or small),,,,,,,,,,,,,,,,,,TRi,TRj,,Coarse-Granularity Partitioning,Fine-Granularity Partitioning,Given two coarse t-partitions, can we know if the distance between any two fine t-partitions is greater than (or less than) D?,,Coarse-Granularity Partitioning*,Try to maximize two rivalry measures Preciseness: the difference between a trajectory and a set of its coarse t-partitions should be as small as possible Required for making the bounds tight Conciseness: the number of coarse t-partitions should be as small as possible Required for reducing the number of comparisons ulate this problem using the minimum length description (MDL) principle A good tradeoff between the two measures is found based on the ination theory,* Coarse-granularity partitioning is identical to that in our earlier work on trajectory clustering [15],,Fine-Granularity Partitioning,Identify outlying coarse t-partitions by deriving the distance bounds between two coarse t-partitions Li and Lj Suppose li is a fine t-partition in Li and lj is that in Lj Derive the above bounds separately for (Lemmas 1~3) and combine them (Lemma 4),,,lb(Li, Lj, f) The lower bound of f(li, lj),,ub(Li, Lj, f) The upper bound of f(li, lj),,,Derivation of the Distance Bounds,Lemma 1. Bounds for,Lemma 2. Bounds for,Lemma 3. Bounds for,Lemma 4. Bounds for dist(Li, Lj),,Combine,Pruning Rules for Fine-Granularity Partitioning,Rule 1: If lb(Li, Lj, dist) D, fine-granularity partitioning is not required when comparing Li and Lj Rule 2: If ub(Li, Lj, dist) ≤ D, fine-granularity partitioning is required, but the distance between the fine t-partitions in Li and Lj needs not be computed,,,Li,Lj,,,Li,Lj,Perance uation,Use two real trajectory data sets Hurricane track data set Records the Atlantic hurricanes for the years 1950 through 2006 The entire set: 608 trajectories and 18,951 points; A small set (1990~2006): 221 trajectories and 7,270 points Animal movement data set Records the locations of elk, deer, and cattle for the years 1993 through 1996 (the Starkey Project) Elk1993: 33 trajectories and 15,422 points; Deer1995: 32 trajectories and 20,065 points; Cattle1993: 41 trajectories and 19,556 points Validate the quality of outlier detection uate the effectiveness of the two-level partitioning strategy,Trajectory Outliers for Hurricane Data (Small),,D = 85, p = 0.95, F = 0.2 → # of outliers = 13,Trajectory Outliers for Elk1993,,D = 55, p = 0.95, F = 0.1 → # of outliers = 3,Trajectory Outliers for Deer1995,,D = 80, p = 0.95, F = 0.1 → # of outliers = 3,Effects of Parameter Values,,(a) D = 83, p = 0.95, F = 0.2,(b) D = 87, p = 0.95, F = 0.2,19 outliers,10 outliers,Pruning Power of Two-Level Partitioning,2L-Total: the ratio of the number of pairs pruned by Rule 1 to the total number of pairs of coarse t-partitions 2L-False: the proportion of pairs pruned incorrectly Optimal: the maximum ratio of pairs that can be pruned,Achieves high pruning power (64~88%),Speedup Ratio of Two-Level Partitioning,Shows significant perance improvement,Related Work,Outlier detection algorithms for points Distribution-based [2], distance-based [3, 4, 5, 6], density-based [7, 8], deviation-based [9] Trajectory outlier detection technique using a distance-based approach [5] Not clear whether this technique can detect outlying sub-trajectories from very complicated trajectories Trajectory outlier detection algorithms based on classification [12] Require a good training set and depend on training,Conclusions,Proposed a novel framework, the partition-and-detect framework, for detecting trajectory outliers For the 1st phase, proposed a two-level trajectory partitioning strategy Ensures both high quality and high efficiency For the 2nd phase, proposed a hybrid of the distance-based and density-based approaches Very intuitive, but does not have the local density problem Demonstrated the effectiveness of TRAOD using various real trajectory data,,Thank You!,