Data stream mining
Data Stream Mining is the process of extracting knowledge structures from continuous, rapid data records. A data stream is an ordered sequence of instances that in many applications of data stream mining can be read only once or a small number of times using limited computing and storage capabilities. Examples of data streams include computer network traffic, phone conversations, ATM transactions, web searches, and sensor data. Data stream mining can be considered a subfield of data mining, machine learning, and knowledge discovery.
In many data stream mining applications, the goal is to predict the class or value of new instances in the data stream given some knowledge about the class membership or values of previous instances in the data stream. Machine learning techniques can be used to learn this prediction task from labeled examples in an automated fashion. Often, concepts from the field of incremental learning, a generalization of Incremental heuristic search are applied to cope with structural changes, on-line learning and real-time demands. In many applications, especially operating within non-stationary environments, the distribution underlying the instances or the rules underlying their labeling may change over time, i.e. the goal of the prediction, the class to be predicted or the target value to be predicted, may change over time. This problem is referred to as concept drift.
Software for data stream mining
- MOA (Massive Online Analysis): free open-source software specific for mining data streams with concept drift. It has several machine learning algorithms (classification, regression, clustering, outlier detection and recommender systems). Also it contains a prequential evaluation method, the EDDM concept drift methods, a reader of ARFF real datasets, and artificial stream generators as SEA concepts, STAGGER, rotating hyperplane, random tree, and random radius based functions. MOA supports bi-directional interaction with Weka (machine learning).
- RapidMiner: commercial software for knowledge discovery, data mining, and machine learning also featuring data stream mining, learning time-varying concepts, and tracking drifting concept (if used in combination with its data stream mining plugin (formerly: concept drift plugin))
Events
- International Workshop on Ubiquitous Data Mining held in conjunction with the International Joint Conference on Artificial Intelligence (IJCAI) in Beijing, China, August 3–5, 2013.
- International Workshop on Knowledge Discovery from Ubiquitous Data Streams held in conjunction with the 18th European Conference on Machine Learning (ECML) and the 11th European Conference on Principles and Practice of Knowledge Discovery in Databases (PKDD) in Warsaw, Poland, in September 2007.
- ACM Symposium on Applied Computing Data Streams Track held in conjunction with the 2007 ACM Symposium on Applied Computing (SAC-2007) in Seoul, Korea, in March 2007.
- IEEE International Workshop on Mining Evolving and Streaming Data (IWMESD 2006) to be held in conjunction with the 2006 IEEE International Conference on Data Mining (ICDM-2006) in Hong Kong in December 2006.
- Fourth International Workshop on Knowledge Discovery from Data Streams (IWKDDS) to be held in conjunction with the 17th European Conference on Machine Learning (ECML) and the 10th European Conference on Principles and Practice of Knowledge Discovery in Databases (PKDD) (ECML/PKDD-2006) in Berlin, Germany, in September 2006.
Master References
- Mohamed Medhat Gaber, Arkady Zaslavsky, and Shonali Krishnaswamy, "Mining Data Streams: A Review", ACM SIGMOD Record, Vol. 34, No. 2, June 2005, pp. 18–26.
- Brian Babcock, Shivnath Babu, Mayur Datar, Rajeev Motwani, and Jennifer Widom, "Models and Issues in Data Stream Systems", in Proc. 21st ACM SIGACT-SIGMOD-SIGART Symposium on Principles of Database Systems (PODS 2002), Madison, Wisconsin, USA, June 2002.
- Supervised Classification on Data Streams - "A Survey on Supervised Classification on Data Streams"
- Mining Data Streams Bibliography
Bibliographic References
- Amal and Salma: “Novelty Detection In Data Stream Clustering Using The Artificial Immune System”, Proceedings of the 13th European Mediterranean & Middle Eastern Conference on Information Systems (EMCIS), Krakow, Poland, June, 2016.
- Rutkowski, Jaworski, Pietruczuk and Duda: "A New Method for Data Stream Mining Based on the Misclassification Error", IEEE Transactions on Neural Networks and Learning Systems, vol. 26, no. 5, pp. 1048-1059, 2015.
- Shaker, Ammar and Lughofer, Edwin. "Self-Adaptive and Local Strategies for a Smooth Treament of Drifts in Data Streams.", Evolving Systems, 5:(4), p. 239-257, 2014.
- Rutkowski, Pietruczuk, Duda, and Jaworski: "Decision Trees for Mining Data Streams Based on the McDiarmid's Bound", IEEE Transactions on Knowledge and Data Engineering, vol. 25, no. 6, pp. 1272-1279, 2013.
- Minku and Yao. "DDD: A New Ensemble Approach For Dealing With Concept Drift.", IEEE Transactions on Knowledge and Data Engineering, 24:(4), p. 619-633, 2012.
- Lughofer, Edwin and Angelov, Plamen. "Handling Drifts and Shifts in On-line Data Streams with Evolving Fuzzy Systems.", Applied Soft Computing, 11:(2), p. 2057-2068, 2011.
- Hahsler, Michael and Dunham, Margaret H. Temporal structure learning for clustering massive data streams in real-time. In SIAM Conference on Data Mining (SDM11), pages 664-675. SIAM, April 2011.
- Minku, White and Yao. "The Impact of Diversity on On-line Ensemble Learning in the Presence of Concept Drift.", IEEE Transactions on Knowledge and Data Engineering, 22:(5), p. 730-742, 2010.
- Mohammad M. Masud, Jing Gao, Latifur Khan, Jiawei Han, Bhavani M. Thuraisingham: Integrating Novel Class Detection with Classification for Concept-Drifting Data Streams. ECML/PKDD (2) 2009: 79-94 (extended version will appear in TKDE journal).
- Scholz, Martin and Klinkenberg, Ralf: Boosting Classifiers for Drifting Concepts. In Intelligent Data Analysis (IDA), Special Issue on Knowledge Discovery from Data Streams, Vol. 11, No. 1, pages 3–28, March 2007.
- Nasraoui O. , Cerwinske J., Rojas C., and Gonzalez F., "Collaborative Filtering in Dynamic Usage Environments", in Proc. of CIKM 2006 – Conference on Information and Knowledge Management, Arlington VA , Nov. 2006
- Nasraoui O. , Rojas C., and Cardona C., “ A Framework for Mining Evolving Trends in Web Data Streams using Dynamic Learning and Retrospective Validation ”, Journal of Computer Networks- Special Issue on Web Dynamics, 50(10), 1425-1652, July 2006
- Scholz, Martin and Klinkenberg, Ralf: An Ensemble Classifier for Drifting Concepts. In Gama, J. and Aguilar-Ruiz, J. S. (editors), Proceedings of the Second International Workshop on Knowledge Discovery in Data Streams, pages 53–64, Porto, Portugal, 2005.
- Klinkenberg, Ralf: Learning Drifting Concepts: Example Selection vs. Example Weighting. In Intelligent Data Analysis (IDA), Special Issue on Incremental Learning Systems Capable of Dealing with Concept Drift, Vol. 8, No. 3, pages 281—300, 2004.
- Klinkenberg, Ralf: Using Labeled and Unlabeled Data to Learn Drifting Concepts. In Kubat, Miroslav and Morik, Katharina (editors), Workshop notes of the IJCAI-01 Workshop on \em Learning from Temporal and Spatial Data, pages 16–24, IJCAI, Menlo Park, CA, USA, AAAI Press, 2001.
- Maloof M. and Michalski R. Selecting examples for partial memory learning. Machine Learning, 41(11), 2000, pp. 27–52.
- Koychev I. Gradual Forgetting for Adaptation to Concept Drift. In Proceedings of ECAI 2000 Workshop Current Issues in Spatio-Temporal Reasoning. Berlin, Germany, 2000, pp. 101–106
- Klinkenberg, Ralf and Joachims, Thorsten: Detecting Concept Drift with Support Vector Machines. In Langley, Pat (editor), Proceedings of the Seventeenth International Conference on Machine Learning (ICML), pages 487—494, San Francisco, CA, USA, Morgan Kaufmann, 2000.
- Koychev I. and Schwab I., Adaptation to Drifting User’s Interests, Proc. of ECML 2000 Workshop: Machine Learning in New Information Age, Barcelona, Spain, 2000, pp. 39–45
- Schwab I., Pohl W. and Koychev I. Learning to Recommend from Positive Evidence, Proceedings of Intelligent User Interfaces 2000, ACM Press, 241 - 247.
- Klinkenberg, Ralf and Renz, Ingrid: Adaptive Information Filtering: Learning in the Presence of Concept Drifts. In Sahami, Mehran and Craven, Mark and Joachims, Thorsten and McCallum, Andrew (editors), Workshop Notes of the ICML/AAAI-98 Workshop \em Learning for Text Categorization, pages 33–40, Menlo Park, CA, USA, AAAI Press, 1998.
- Grabtree I. Soltysiak S. Identifying and Tracking Changing Interests. International Journal of Digital Libraries, Springer Verlag, vol. 2, 38-53.
- Widmer G. Tracking Context Changes through Meta-Learning, Machine Learning 27, 1997, pp. 256–286.
- Maloof, M.A. and Michalski, R.S. Learning Evolving Concepts Using Partial Memory Approach. Working Notes of the 1995 AAAI Fall Symposium on Active Learning, Boston, MA, pp. 70–73, 1995
- Mitchell T., Caruana R., Freitag D., McDermott, J. and Zabowski D. Experience with a Learning Personal Assistant. Communications of the ACM 37(7), 1994, pp. 81–91.
- Widmer G. and Kubat M. Learning in the presence of concept drift and hidden contexts. Machine Learning 23, 1996, pp. 69–101.
- Schlimmer J., and Granger R. Incremental Learning from Noisy Data, Machine Learning, 1(3), 1986, 317-357.
Books
- Gama, João; Gaber, Mohamed Medhat, eds. (2007). Learning from Data Streams: Processing Techniques in Sensor Networks. Springer. p. 244. doi:10.1007/3-540-73679-4. ISBN 9783540736783.
- Ganguly, Auroop R.; Gama, João; Omitaomu, Olufemi A.; Gaber, Mohamed M.; Vatsavai, Ranga R., eds. (2008). Knowledge Discovery from Sensor Data. Industrial Innovation. CRC Press. p. 215. ISBN 9781420082326.
- Gama, João (2010). Knowledge Discovery from Data Streams. Data Mining and Knowledge Discovery. Chapman and Hall. p. 255. ISBN 9781439826119.
- Lughofer, Edwin (2011). Evolving Fuzzy Systems - Methodologies, Advanced Concepts and Applications. Studies in Fuzziness and Soft Computing. 266. Heidelberg: Springer. p. 456. doi:10.1007/978-3-642-18087-3. ISBN 9783642180866.
- Sayed-Mouchaweh, Moamar; Lughofer, Edwin, eds. (2012). Learning in Non-Stationary Environments: Methods and Applications. New York: Springer. p. 440. doi:10.1007/978-1-4419-8020-5. ISBN 9781441980199.
See also
- Concept drift
- Data Mining
- Sequence mining
- Streaming Algorithm
- Stream processing
- Wireless sensor network
- Lambda architecture
External links
- High-Velocity Data - The Data Firehose
- IBM Spade - Stream Processing Application Declarative Engine
- IBM Infosphere Streams
- StreamIt - programming language and compilation infrastructure by MIT CSAIL
- Decision Trees for stream data mining - new results