![]() 50 These maps are interactive and browsable, and they are the result of a complex data-processing pipeline, in which terabytes to petabytes of raw sensor and image data are transformed into databases of a6utomatically detected and annotated objects and information. This type of pipeline involves many steps, in which human decisions and insight are critical, such as instrument calibration, removal of outliers, and classification of pixels. The breadth and complexity of these and many other data science scenarios means the modern data scientist requires broad knowledge and experience across a multitude of topics. Together with an increasing demand for data analysis skills, this has led to a shortage of trained data scientists with appropriate background and experience, and significant market competition for limited expertise. This desire and potential for automation is the focus of this article.Īs illustrated in these examples, data science is a complex process, driven by the character of the data being analyzed and by the questions being asked and is often highly exploratory and iterative in nature.Ĭonsidering this bottleneck, it is not surprising there is increasing interest in automating parts, if not all, of the data science process. Domain context can play a key role in these exploratory steps, even in relatively well-defined processes such as predictive modeling (for example, as characterized by CRISP-DM 5) where human expertise in defining relevant predictor variables can be critical.įigure 1 provides a conceptual framework to guide our discussion of automation in data science, including aspects that are already being automated as well as aspects that are potentially ready for automation. The vertical dimension of the figure reflects the degree to which domain context plays a role in the process. DATA MODEL CAPTURE PROJECTS TASKS AND SUBTASKS FULL.
0 Comments
Leave a Reply. |
AuthorWrite something about yourself. No need to be fancy, just an overview. ArchivesCategories |