david talby
Chief Technology Officer
Architecting a predictive, petabyte-scale, self-learning fraud detection system

Fraud detection is a classic adversarial analytics challenge: As soon as an automated system successfully learns to stop one scheme, fraudsters move on to attack another way. Each scheme requires looking for different signals (i.e. features) to catch; is relatively rare (one in millions for finance or e-commerce); and may take months to investigate a single case (in healthcare or tax, for example) – making quality training data scarce.

This talk covers key lessons learned while building such real-world software systems over the past few years. We’ll be looking for fraud signals in public email datasets, using popular Python based open-source data science libraries to generate graph based, rule based, language based and time series based features, tied together with ensemble learning algorithms.

Apache Spark is used to run these models at scale – in batch mode for model training and with Spark Streaming for production use. We’ll discuss the data model, computation, and feedback workflows, as well as some tools and libraries built on top of the open-source components to enable faster experimentation, optimization and productization.

David Talby is Atigeo’s chief technology offer, leading the developments of its big data analytics platform. David has extensive experience in building and operating web-scale analytics and business platforms, as well as building world-class, agile, distributed teams. Previously he was with Microsoft’s Bing group where he led business operations for Bing Shopping in the US and Europe, and earlier he worked at Amazon both in Seattle and the UK, where he built and ran distributed teams which helped scale Amazon’s financial systems. David holds a PhD in Computer Science along with two masters degrees, in computer science and business administration.

