Women in Data Sciences(WiDS)Conf

作者: Kayyyy | 来源:发表于2017-03-08 12:17 被阅读119次

Opening

Processor Alice Wong
Associate Dean of Science, HKU

  • WiDS 2017 is a collaboration among Stanford University, SAP, Google, Microsoft and Walmart Labs.
  • 50th Anniversary for HKU Department of Statistic and Actuarial Science
  • the big data research cluster @HKU

Talk 1: Women in Data Science

Speaker:
Anita Varshney
Global Strategy Transformation Lead, SAP Hong Kong

  1. WiDS
  • held by Stanford every February (March in Asia)
  • keynote speakers from various industries that are doing data science now
  • having largest attending number actually in middle east
  1. SAP
  • the world's largest provider of enterprise application software
  • HQ in Germany; founded in 1972
  • career suggestion: look for a good mentor
  • present in 26 industries
  • Real time processes, Prediction and simulation, great User experience, Agility and TCO
  1. SAP next-gen
  • Providing platform for college students to present their ideas directly to business customers.
  • Technologies
    - Machine learning
    - IoT

*Amazing time management of presentation

Talk 2: Big Data Decision Analysis

"Big data is something that breaks Microsoft Excel" (lol)

Research project - Machine Learning for Chinese Suicide Newspaper Articles Classification

Analysis how the media report suicide incidence, and to figure out how to prevent suicide.

  • WiseNews database: over 220K search result for the keyword "suicide", containing 84 million terms
  • Big data challenges
    • Noisy dataset: e.g. "suicide car booming attack"
    • Data classification
  • Supervised Machine Learning (use labeled articles to train)
  • Web Interface for manually label
  • Article features extraction for ML
    • Text Segmentation: Sentence -> Words -> N-grams
      • Tool: Jieba(结巴) - functionalities like MP & HMM(Hidden Markov Model)
        • State Transition Matrix: P(M|B) >> P(E|B)
    • Document Representation
      • Word to Document Matrix (not very efficient)
      • Chosen approach - Word Embedding (Word2Vec)
        • each word is represented by a vector of fixed number of dimensions (usually 30-500d)
        • Neural network: to determine the dimensions of the document vector, CBOW and Skip-Gram Model
        • Cosine similarity
  • Classification (Training)
    • labeled dataset: 70% for training and 30% for testing
    • P(Suicide = Yes) 85.9% accuracy, P(Student = No), P(HK = Yes), ...
  • Future work
    • Identify any pattern of misclassification
    • Increase dimensions of the word vectors
    • Deep learning approach for other NLP tasks with this dataset
      • Predict the method used for suicide
      • Predict the reasons used for suicide

Talk 3: Predictive Analytics

Vanessa Ko
Head of Presales SAP Hong Kong

  1. SAP HK
  • Customers: I.T., Cathay Pacific, PizzaHut, etc.
  • Biggest competitor: overall no, only in some sub-areas.
  1. Predictive Analytics
  • How to make use of digitalized historical data
  • Case: Obama for America 2012
    • Data source: Historical voting data, Census, Volunteer collected data, Facebook, etc;
    • Segments of voters, Found raising prediction, who's persuadable?
    • Data Modeling: VOTING RATE MODEL, SUPPORT RATE MODEL, Persuasive Rating, Overall score;
    • Goal: Target Voters, Donators and Volunteers -> especially swing voters (not too supportive or too opposing)

相关文章

网友评论

    本文标题:Women in Data Sciences(WiDS)Conf

    本文链接:https://www.haomeiwen.com/subject/akdbgttx.html