Challenges in Machine Learning

Geoscience and Machine Learning – EAGE 2017 Workshop

Is it data science? Machine learning? Big Data? Deep learning? Fancy math? Or just chalked up statistics with enough data?

This Monday marked the start of the EAGE 2017 in Paris for me. If you have read this blog or the EAGE Student Newsletter before, you may have seen that I am some sort of a proponent of machine learning. Therefore, it was natural for me to go to the workshop named “Geoscience and Data Sciences”, which Matt Hall from Agile* usually called “the machine learning workshop”.

There were two keynotes, several good talks and two poster sessions, where some good discussion was had. Although the EAGE has eased a little on the Powerpoint only constraint, so it was possible to show some Chrome websites, it still seemed fairly limited in its extent. But, listening closely, I will give you my main take-aways:

The hackathon from the weekend before was mentioned several times. It seems to catch on with the crowd, although it seemed fairly limited in its extent. But, listening closely, I will give you my main take-aways:The hackathon from the weekend before was mentioned several times. It seems to catch on with the crowd, although a ML hackathon and a ML workshop may have some congruence. I will blog about the amazing hackathon later.Personally, I was amazed that Shell let on that they are using Scikit-

Personally, I was amazed that Shell let on that they are using Scikit-learn, which is a popular machine learning library in Python. Total even hinted to contributing to open source projects. Every talk was clear, however, that open data will not be happening. Possibly ever.Teradata talked about data lakes. A talk that was clear that something has to happen in companies to change their relationship with data. Seg-Y and LAS are great transfer data types but sub-par to terrible to work on. More sophisticated types like HDF5 are a good way to get your seismic data into a better direction, Globe Claritas is using this standard developed for storing data for astronomical research.

Teradata talked about data lakes. A talk that was clear that something has to happen in companies to change their relationship with data. Seg-Y and LAS are great transfer data types but sub-par to terrible to work on. More sophisticated types like HDF5 are a good way to get your seismic data into a better direction, Globe Claritas is using this standard developed for storing data for astronomical research.

One very interesting talk was about unstructured data. I had heard parts of the talk at the hackathon before, where they won the award for “Originality”. Unstructured data is somewhat feared in data science. It’s much nicer when you can query a database and get your information. Having to sift through badly scanned pdf files is a whole nother dimension of complexity. However, the presenter showed some great results using object recognition and segmentation in computer vision to analyze and classify legacy documents using ML. Interestingly, they even built in QC directly.

My personal favorite was Matt Hall’s keynote. It talked about actual machine learning and how our industry has to adapt to the fast pace of open source development in the deep learning community. He freshened it up using menti.com for audience participation and asked some tough questions.

Challenges in Machine Learning

Challenges in Machine Learning – Matt Hall CC-By 4.0

It’s obvious that data is an issue for machine learning, however, people seem to agree that a cultural and legal shift will be necessary to facilitate the acceleration of development in our community.

Generally, I would love to see some deep learning in the next workshop and more bleeding edge research on this. Copenhagen is the next chance for this. I hope I can put my writing where my mouth is and maybe you can join me!

The following two tabs change content below.
... is a geophysicist by heart. He works at the intersection of machine learning and geoscience. He is the founder of The Way of the Geophysicist and a deep learning enthusiast. Writing mostly about computational geoscience and interesting bits and pieces relevant to post-grad life.

Latest posts by Jesper Dramsch (see all)

Posted in Earth Science, Science and tagged , , , , , , , .