Ê×Ò³
ѧϰ
»î¶¯
רÇø
¹¤¾ß
TVP
·¢²¼
¾«Ñ¡ÄÚÈÝ/¼¼ÊõÉçȺ/ÓŻݲúÆ·,¾¡ÔÚС³ÌÐò
Á¢¼´Ç°Íù

Manning´óÉñǣͷ£¬Ë¹Ì¹¸£¿ªÔ´Python°æNLP¿âStanza£ºº­¸Ç66ÖÖÓïÑÔ

¶ÔÓÚ˹̹¸£ NLP ¿â£¬ÎÒÃÇÒ»¶¨²»»áÄ°Éú£¬µ«ÊÇÕâÒ»¿âÖ÷Òª»ùÓÚ Java¡£½üÈÕ£¬Christopher Manning ËùÔÚµÄ˹̹¸£ NLP ×鿪ԴÁË Python °æµÄ¹¤¾ß°ü¡ª¡ªStanza£¬Èà Python Éú̬ϵͳÓÖÔöÌíÁËÒ»Ô± NLP ´ó½«¡£

ÎÒÃǶ¼ÖªµÀ˹̹¸£ NLP ×éµÄ¿ªÔ´¹¤¾ß¡ª¡ªÕâÊÇÒ»¸ö°üº¬Á˸÷ÖÖ NLP ¹¤¾ßµÄ´úÂë¿â¡£½üÈÕ£¬ËûÃǹ«¿ªÁË Python °æ±¾µÄ¹¤¾ß£¬ÃûΪ Stanza¡£¸Ã¿âÓÐ 60 ¶àÖÖÓïÑÔµÄÄ£ÐÍ£¬¿É½øÐÐÃüÃûʵÌåʶ±ðµÈ NLP ÈÎÎñ¡£Ò»¾­¿ªÔ´£¬±ãÒýÆðÁËÉçÇøµÄÈÈÒé¡£Àî·É·É¾ÍÔÚÍÆÌØÉϵãÔÞÁËÕâ¸öÏîÄ¿¡£

Ä¿Ç°£¬¸ÃÏîÄ¿¿ÉÖ±½Ó´Ó pip ½øÐа²×°¡£

ÏîÄ¿µØÖ·£ºhttps://github.com/stanfordnlp/stanza

ÏÖÓÐÄ£ÐͺÍÖ§³ÖµÄ NLP ÈÎÎñ

Stanza °üº¬ÁË 60 ¶àÖÖÓïÑÔÄ£ÐÍ£¬ÔÚ Universal Dependencies v2.5 Êý¾Ý¼¯ÉϽøÐÐÁËԤѵÁ·¡£ÕâЩģÐÍ°üÀ¨¼òÌå¡¢·±Ìå¡¢¹ÅÎÄÖÐÎÄ£¬Ó¢Óï¡¢·¨Óï¡¢Î÷°àÑÀÓï¡¢µÂÓï¡¢ÈÕÓï¡¢º«Óï¡¢°¢À­²®ÓïµÈ£¬ÉõÖÁ»¹Óб±ÈøÃ×ÓïµÈ²»Ì«³£¼ûµÄÓïÑÔ¡£

³ýÁËÓïÑÔÄ£ÐÍÍ⣬Stanza »¹Ö§³ÖÁËÊýÊ®ÖÖÓïÑÔµÄÃôÃôʵÌåʶ±ðÄ£ÐÍ¡£ÍêÕûÁбíÈçÏ£º

¾Ý Stanza µÄÂÛÎĽéÉÜ£¬Stanza º­¸ÇÁ˶à¸ö×ÔÈ»ÓïÑÔ´¦ÀíÈÎÎñ£¬Èç·Ö´Ê¡¢´ÊÐÔ±ê×¢¡¢ÒÀ´æ¾ä·¨·ÖÎö¡¢ÃüÃûʵÌåʶ±ðµÈ¡£´ËÍ⣬Ëü»¹ÌṩÁË Pyhton ½çÃ棬ÓÃÀ´ºÍÎÒÃÇÊìϤµÄ Stanford CoreNLP ¿â½øÐн»»¥£¬´Ó¶øÀ©Õ¹ÁËÒÑÓеŦÄÜ¡£

ÁíÍâÖµµÃ×¢ÒâµÄÊÇ£¬Stanza ÊÇÍêÈ«»ùÓÚÉñ¾­ÍøÂç pipeline µÄ¡£Ñо¿ÕßÔÚ 112 ¸öÊý¾Ý¼¯ÉϽøÐÐÁËԤѵÁ·£¬µ«Ê¹ÓõÄÊÇͬһ¸öÄ£Ðͼܹ¹¡£ËûÃÇ·¢ÏÖ£¬Í¬ÑùÒ»¸öÉñ¾­ÍøÂç¼Ü¹¹¿ÉÒÔ·º»¯µÃºÜºÃ¡£ÍøÂçÔÚËùÓÐÓïÑÔÉϵÄÐÔÄܶ¼ºÜºÃ¡£Õû¸öÉñ¾­ÍøÂç pipeline ¶¼ÊÇͨ¹ý PyTorch ʵÏֵġ£

ÔËÐÐ Stanza

Éñ¾­ÍøÂç pipeline ÈëÃÅ

ÒªÔËÐÐÊ׸ö Stanza pipeline£¬Ö»ÐèÒªÔÚ python ½âÊÍÆ÷ z ÖнøÐвÙ×÷£º

´úÂëÓïÑÔ£ºjavascript
¸´ÖÆ
>>>?import?stanza??>>>?stanza.download('en')?#?This?downloads?the?English?models?for?the?neural?pipeline#?IMPORTANT:?The?above?line?prompts?you?before?downloading,?which?doesn't?work?well?in?a?Jupyter?notebook.#?To?avoid?a?prompt?when?using?notebooks,?instead?use:?>>>?stanza.download('en',?force=True)??>>>?nlp?=?stanza.Pipeline()?#?This?sets?up?a?default?neural?pipeline?in?English??>>>?doc?=?nlp("Barack?Obama?was?born?in?Hawaii.?He?was?elected?president?in?2008.")??>>>?doc.sentences[0].print_dependencies()??

¶ø×îºóÒ»ÌõÖ¸ÁÊä³öµ±Ê±ÊäÈë×Ö·û´®ÖеÚÒ»¸ö¾ä×ÓÖеĴʣ¨»òÊÇ Stanza ÖбíʾµÄ Document£©£¬ÒÔ¼°ÔڸþäµÄ Universal Dependencies parse£¨Æ䡸head¡¹²¿·Ö£©ÖпØÖƸôʵÄË÷Òý£¬ÒÔ¼°´Ê֮ǰµÄÒÀÀµ¹Øϵ¡£Êä³öÈçÏ£º

´úÂëÓïÑÔ£ºjavascript
¸´ÖÆ
('Barack',?'4',?'nsubj:pass')?('Obama',?'1',?'flat')?('was',?'4',?'aux:pass')?('born',?'0',?'root')?('in',?'6',?'case')?('Hawaii',?'4',?'obl')?('.',?'4',?'punct')?

ÈëÃÅÖ¸ÄÏÀï»áÓиü¶àÏêϸÐÅÏ¢¡£

·ÃÎÊ Java Stanford Core NLP Èí¼þ

³ýÁËÉñ¾­ Pipeline£¬¸ÃÈí¼þ°ü»¹°üÀ¨Ò»¸ö¹Ù·½°ü£¬ÓÃÓÚʹÓà Python ´úÂë·ÃÎÊ Java Stanford CoreNLP Èí¼þ¡£

³õʼÉèÖãº

  • ÏÂÔØ Stanford CoreNLP ÒÔ¼°ÏëҪʹÓõÄÓïÑÔÄ£ÐÍ£»
  • ½«Ä£ÐÍ·ÅÈë·ÖÅäµÄÎļþ¼ÐÖУ»
  • ͨ¹ýÉèÖà CORENLP_HOME »·¾³±äÁ¿£¨ÈçÔÚ*nix ÖУ©£ºexport CORENLP_HOME=/path/to/stanford-corenlp-full-2018-10-05 ¸æËß Python ´úÂë Stanford CoreNLP ËùÔÚµÄλÖá£

ÎĵµÖлáÓÐÈ«ÃæµÄʾÀý£¬Õ¹Ê¾ÈçºÎͨ¹ý Stanza ʹÓà CoreNLP£¬²¢´ÓÖлñȡעÊÍ¡£

ѵÁ· Neural Pipeline Ä£ÐÍ

µ±Ç°ÎªËùÓÃµÄ Universal Dependencies ¿â V2.5 ÌṩģÐÍ£¬²¢Îª¼¸Öֹ㷺ʹÓõÄÓïÑÔÌṩ NER Ä£ÐÍ¡£

ÅúÁ¿´¦Àí×î´ó»¯ Pipeline ËÙ¶È

ΪÁË×î´ó³Ì¶ÈµØÌṩËٶȷ½ÃæµÄÐÔÄÜ£¬±ØÐëÕë¶Ô³ÉÅúµÄÎĵµÔËÐÐ Pipeline¡£Ã¿Ò»´Îµ¥ÔÚÒ»¸ö¾ä×ÓÉÏÔËÐÐÒ»¸ö for Ñ­»·½« fei'c ·Ç³£Âý£¬Ä¿Ç°½â¾ö·½·¨Êǽ«ÎĵµÁ¬ÔÚÒ»Æð£¬Ã¿¸öÎĵµ¼ûÓÿÕÐУ¨¼°Á½¸ö»»Ðзûnn£©½øÐзָ·Ö´ÊÆ÷½«ÔÚ¾ä×ÓÖжÏʱȥʶ±ð¿Õ°×ÐС£

ѵÁ·×Ô¼ºµÄ neural pipelines

¸Ã¿âÖÐËùÓÐÉñ¾­Ä£¿é¶¼¿ÉÒÔʹÓÃ×Ô¼ºµÄÊý¾Ý½øÐÐѵÁ·¡£Èç Tokenizer¡¢multi-word token£¨MWT£©À©Õ¹Æ÷¡¢POS/ÌØÕ÷±ê¼ÇÆ÷µÈ¡£Ä¿Ç°£¬²»Ö§³Öͨ¹ý pipeline ½øÐÐÄ£ÐÍѵÁ·£¬Òò´ËÐèÒª¿Ë¡ git ´æ´¢¿â²¢´ÓÔ´´úÂëÖÐÔËÐÐѵÁ·¡£

ÒÔÏÂΪѵÁ·Éñ¾­ pipeline µÄʾÀý£¬¿ÉÒÔ¿´µ½ÏîÄ¿ÖÐÌṩÁ˸÷ÖÖ bash ½Å±¾À´¼ò»¯ scripts Ŀ¼ÖеÄѵÁ·¹ý³Ì¡£ÑµÁ·Ä£ÐÍÔËÐÐÒÔÏÂÖ¸Á

´úÂëÓïÑÔ£ºjavascript
¸´ÖÆ
bash?scripts/run_${module}.sh?${corpus}?${other_args}?

ÆäÖÐ ${module} ÊÇ tokenize, mwt, pos, lemma£¬depparse Ö®Ò»£¬ÊÇÖ÷ÌåµÄÈ«Ãû; ${corpus} ÊÇѵÁ·½Å±¾ËùÔÊÐíµÄÆäËû²ÎÊý¡£

ÀýÈ磬¿ÉÒÔʹÓÃÒÔÏÂÖ¸ÁîÔÚ UD_English-EWT ÓïÁÏ¿âÉÏѵÁ·Ê±ÅúÁ¿´¦Àí´óСΪ 32£¬¶øÖÕÖ¹ÂÊΪ 0.33£º

´úÂëÓïÑÔ£ºjavascript
¸´ÖÆ
bash?scripts/run_tokenize.sh?UD_English-EWT?--batch_size?32?--dropout?0.33?

×¢Òâ¶ÔÓÚ dependency parser, »¹ÐèÒªÔÚѵÁ·/¿ª·¢Êý¾ÝÖÐΪʹÓÃµÄ POS ±êÇ©ÀàÐÍÖ¸¶¨ gold|predicted£º

´úÂëÓïÑÔ£ºjavascript
¸´ÖÆ
bash?scripts/run_depparse.sh?UD_English-EWT?gold?

Èç¹ûʹÓÃÁË predicted£¬ÑµÁ·ºóµÄ±ê¼ÇÆ÷Ä£ÐÍ»áÊ×ÏÈÔÚѵÁ·/¿ª·¢Êý¾ÝÉÏÔËÐÐÒÔ±ãÉú³ÉÔ¤²âµÄ±ê¼Ç¡£

ĬÈÏÇé¿öÏ£¬Ä£ÐÍÎļþ½«ÔÚѵÁ·Æڼ䱣´æµ½ save_models Ŀ¼£¨Ò²¿ÉÒÔʹÓà save_dir ²ÎÊý½øÐиü¸Ä£©¡£

¼Ü¹¹ºÍÓëÏÖÓпâµÄ¶Ô±È

Stanza µÄÂÛÎÄÌṩÁËÕû¸ö´úÂë¿âµÄ¼Ü¹¹¡£¿ÉÒÔ¿´µ½£¬ËüÒÔԭʼÎı¾ÎªÊäÈ룬Äܹ»Ö±½ÓÊä³ö½á¹¹»¯µÄ½á¹û¡£

tanza µÄÉñ¾­ÍøÂ粿·Ö¼Ü¹¹¡£³ýÁËÉñ¾­ÍøÂç pipeline ÒÔÍ⣬Stanza Ò²ÓÐÒ»¸ö Python ¿Í»§¶Ë½çÃ棬ºÍ Java °æµÄ Stanford CoreNLP ½øÐн»»¥¡£

ÓÚ´Ëͬʱ£¬ÂÛÎÄ»¹½« Stanza ºÍÏÖÓÐµÄ NLP ¹¤¾ß£¬Èç spaCy µÈ½øÐÐÁ˶Աȡ£¿ÉÒÔ¿´µ½£¬Stanza ÊÇÄ¿Ç°º­¸ÇÓïÑÔÊýÁ¿×î¶à£¬´ïµ½ SOTA ÇÒÍêÈ«»ùÓÚÉñ¾­ÍøÂç¿ò¼ÜµÄ¿â¡£

ºÍÏÖÓÐ NLP ¿âµÄ¶Ô±È¡£

×îºó£¬Ñо¿Õß»¹½« Stanza ÉÏ NLP ÈÎÎñµÄÐÔÄܺÍÏÖÓеĻùÏß½øÐжԱȣ¬·¢ÏÖ Stanza ´ó²¿·ÖÇé¿ö϶¼³¬¹ýÁË SOTA¡£

ºÍÒÑÓлùÏßÐÔÄܵĶԱȡ£¿ÉÒÔ¿´µ½£¬Stanza ÔÚ¶à¸öÓïÑÔ¶à¸öÈÎÎñÖж¼ÊµÏÖÁË SOTA¡£

  • ·¢±íÓÚ:
  • Ô­ÎÄÁ´½Ó£ºhttp://news.51cto.com/art/202003/613000.htm
  • ÈçÓÐÇÖȨ£¬ÇëÁªÏµ cloudcommunity@tencent.com ɾ³ý¡£

Ïà¹Ø¿ìѶ

ɨÂë

Ìí¼ÓÕ¾³¤ ½ø½»Á÷Ⱥ

ÁìȡרÊô 10ÔªÎÞÃż÷ȯ

˽Ïí×îР¼¼Êõ¸É»õ

ɨÂë¼ÓÈ뿪·¢ÕßÉçȺ
Áìȯ
http://www.vxiaotou.com