Link Mining for Kernel-based Compound-Protein Interaction Predictions Using a Chemogenomics Approach
Virtual screening (VS) is widely used during computational drug discovery to reduce costs. Chemogenomics-based virtual screening (CGBVS) can be used to predict new compound-protein interactions (CPIs) from known CPI network data using several methods…
Authors: Masahito Ohue, Takuro Yamazaki, Tomohiro Ban
Link Mining for Kernel-based C omp ound-Protein Interact ion Predi ctions U sin g a Chem og enomics Approach Masahito Ohue 1,2,3,4* , T akuro Ya maza ki 3 , Tomohi ro Ban 4 , and Yutaka Akiyama 1,2,3,4* 1 Depar tment of Com puter Sci ence, Sc hool of Com puting, Toky o Institute of T echnolog y, J apan 2 Adva nced Computa tional Drug Dis cove ry Unit, Institute of Innova tive Research , Toky o Institute of T echnolog y, Japan 3 Department of Compu ter Scien ce, Facu lty of Engi neering, Toky o Institute of T echnolog y, Japan 4 Depar tment of Com puter Sci ence, G ra duate School of I nform ation Science and E ngineerin g, Toky o Institute of T echnolog y, Japan * ohue@c.tite ch.ac.j p, akiya ma@c.titec h.ac.jp Abstract. Virtual screening (VS) is wide ly used during com putat ional dr u g d is- covery to reduce cost s. Chemogen omics-based virtu al screening ( CGBVS ) can be use d t o predict ne w compound-pr otein inte rac tions (C PIs) f ro m k now n CPI network data using several metho ds, includ ing machine learning and data min- ing. A lthough CGBVS f acilitate s highly effic ient and acc urate C PI prediction, i t has poor performa nce for predicti on of new compounds for which CPIs are un- know n. The pairw ise ke rnel method ( PKM) is a state -of-the -art C G BVS m ethod and shows high accurac y for predicti on of n ew com pounds. In this s tudy , on the basis of link mining , we improved the PKM by combining link ind icator kern el (LIK) and ch emical si milarity and evalu ated t he accur acy of the se methods. T he propos ed me thod obtaine d an ave rage a rea unde r t he pre cisi on-re call curve (AUPR) value of 0.562, which was higher than that achieved by t he conve n- tional Ga ussia n interaction profile (GI P) method (0.425 ), and t he calcul ation time was o nly increased by a fe w percent. Keyw ords: virtual scre ening; co m pound-prote in i nterac tions (C PIs); pairw ise kernel; link mining ; link i ndicator kerne ls (LIKs) 1 Introduction Virtual screening (VS), in w hich drug candidate co mpoun ds are s elected by a c o m p u t a t i o n a l m e t h o d , i s o n e o f t h e m a i n p r o c e s s e s i n t h e e a r l y stage s o f dr ug dis - covery. T here are three main approaches to VS: ligand-based VS (LBVS) [1] using known ac tivity infor mation fo r th e target pr otein of the drug; structur e-b ased VS (SBVS) [2] using structural information for the target protein of the d r ug; a nd Link Mining for Ke rnel-base d CGBVS 2 Fig. 1. Sc hema tic diagram of inform ation used in CGBVS. I nformation o n interac tion ma trices and inte raction pr ofiles can be obta ined f rom C PI data. Ke rnel m atrices w ere obta ined from the relations hip betw een the com pounds an d the pr oteins. c h e m o g e n o m i c s - b a s e d V S ( C G B V S ) [ 3 ] b a s e d o n k n o w n i n t e r a c t i o n i nfo rma tio n fo r multiple protein s an d multiple com pounds (also called drug -targ et i nteraction predic- tion). Both LBVS and CGBVS do not require a protein tertiary st ruc tur e, a nd b oth depend on statis tical machine learn ing using known experimental activity data. How- ever, CGBVS yields more robust prediction results by handling m ultiple types o f proteins. CGBVS has been well studied in recent y ears [4][5][6] [ 7 ] [ 8 ] [ 9 ] , a n d a r e - view of CGBV S was rec entl y p ublishe d b y Di ng et al . [3]. In CGBVS, computations are m ainl y performed using a similarity matrix of p ro- teins, a similarity matrix of compounds, an d a n interaction pro file m atrix composed of binary values with an d without interactions (Fig. 1). The kernel method is often applied for prediction [5][6][ 7]. Co nventio nal ly, in CGBVS, an interaction profile matrix is used only as lab eled tr aining data; however, an increasing number of frameworks have recently been described that utilize interac- tion p ro file matr ice s as t heir main feat ures. T he Ga ussian i nte r a c t i o n p r o f i l e ( G I P ) i s one of the se frame works [5] . In addition to infor mation reg ardi ng similarity matrices, GIP us es si milarities b etween vectors w hen an interaction pr ofile m atri x is viewed as verti cal a nd hor izo ntal vecto rs. The GI P ker nel f unct ions of pr ot eins and co mpo unds are represented as f ollo ws (deta ils are described in Section 2) : 2 GIP , (, ) e x p cc c c kc c J c c vv , 2 GIP , (, ) e x p pp p p kp p J c c vv . (1 ) 110 100 䈈 011 噣 N NH N N N N N N NH 2 O O N NH N N N N N N NH 2 O O compound-protein interaction network 噣 1 0 0 噣 0 0 1 噣 interaction matrix interaction profile compound kernel (similarity matrix) protein kernel (similarity matrix) compound kernel (GIP/LIK) protein kernel (GIP/LIK) 3 M. Ohue, T. Yamazaki, T . Ban, and Y. Akiyam a The problem with GIP kernels is that ‘0’ bit (interactio n i s un kno wn) is ta ken into account, s imilarly to ‘ 1’ bit (interaction ). T hus, k GIP s h o w s a m a x i m u m v a l u e ( w h i c h is equivalent to two co mpounds with co mmon interaction partners ) for t wo no vel compoun ds w hen the interact ions with a ll proteins are unknown. Since ‘0’ po tentially incl udes bo th no i ntera ctio ns and unk nown i nter act ion s in gener al CGBVS prob lems and benchm ark datasets, informati on for ‘1’ should be considered m ore reliable. In th is way, as a fra mework t hat mainly considers ‘1’ rather th an ‘0’ , link m i ning has e merged for calculation of links within a network. Link min ing is a framew ork applied to analy ze networks such as social netw orks and the Wor ld Wid e Web. Nodes an d e dg es (l in k s) of a ne tw or k a re us ed in th e c al cu la ti on . I f n od es of th e n etw ork ar e proteins /compoun ds, and edges are drawn in the interacting co mp oun d-protein pair, anal ysis usi ng t he fra mewor k of link mini ng bec omes p ossib le. S ome reports have also applied the method of link mining directly to the problem o f C G B V S [ 8 ] [ 9 ] . Ho wever, the se metho ds do not use t he fra mewor k o f t he ke rne l m ethod. Therefore, in this stud y, we pr opose to use the link indicator ker nel ( LIK) , b ased on link i ndic ator s use d in li nk mining, a s t he kernel with an i nt eraction pro file matrix s u ch a s t h at fo r m e d fr om G IP k er n e ls . A c co r d in g t o a r e vi ew by Ding et al. , the pair- wise kernel method (PKM) [7] using a support vector m achine (SV M) as a kernel learning scheme is superior in learning performance to CGBVS [3 ]. T hus, we i nte- g r a t e d G I P a n d L I K k e r n e l s t o t h e P K M a n d s h o w e d t h a t L I K k e r n e ls cou ld ca ptu re the effects of interaction profiles. 2 Materials and Methods 2.1 Preliminary An overvi ew of the compound- protein interact ion prediction prob lem i s sho wn in Figure 1. Similarities were d efined bet ween co mpounds a nd bet we en prote ins , wit h the Tanimoto coefficient of fingerprints (e.g. E CFP [10], SIMC O MP [11]) or Euclid distan ce of phys icochemical properties for compoun ds, and the E u cl id dist ance of k - mer a mino acid sequence prof iles or Smith-Waterman alignm ent sc ores [12] for pro- teins. The interaction y ( c , p ) between a com pound c a n d a p r o t e i n p is defined as bin ary {0, 1}, where ‘1’ represents an interaction (e.g., c i s t he a ct iv e co m po u n d f or t h e p ro - tein p ) and ‘0’ represents no i nteract ion (often i ncluding unk no wns). F or le ar ni ng , th e n c × n p m a t r i x Y = { y ( c , p )} c , p (called the interaction matrix) was used as trainin g data, w here n c is the num ber of target co mpounds a nd n p is th e number of t arget pro- teins . Interactions were pred icted f or pairs of com pounds and p roteins using the learned model. W hen look ing at each row and each column of the interaction matrix as a vector, the vector was ca lled the interaction p rofile. The in teraction profile v c o f compoun d c was T 12 ( ( , ), ( , ), ..., ( , )) p cn yc p yc p yc p v , and the interaction profile v p of protein p w as T 12 ( ( , ), ( , ), ..., ( , )) c pn yc p yc p yc p v . Link Mining for Kernel-base d CGBVS 4 2.2 PKM The pairwis e kernel method (PKM) [7] developed by Jacob et al . is based on pair- wise kernel s and te nso r product repre senta tio n fo r c o mpound and pro tein vectors. 1RUPDOO\ D P DS ĭ c , p ) for a pair of compounds and proteins ( c , p ) is required for a lear ning sche me. In the PKM , the lear ning scheme invol ves the t ensor product of th e PDSRIFRP SRXQGĭ c ( c DQGWKH PDSRISURWHLQĭ p ( p 7KH UHIRU H ĭ c , p ) is represent- ed as follows: (, ) () ( ) cp cp c p ) ) ) , (2) where is the tensor product operator. Pairwise kernel k i s defined between t wo pairs of proteins and compounds ( c , p ) and ( c’ , p’ ) as follo ws: T T TT ( ( ,) , ( , ) ) ( ,) ( , ) (( ) ( ) ) (( ) ( ) ) () ( ) ( ) ( ) (, ) ( , ) , cp c p cc p p cp kc p cp c p cp cp c p cc p p kc c k p p cc cc {) ) cc ) ) ) ) cc ) ) u ) ) cc {u ( 3 ) where k c is a co mpoun d kernel bet we en t wo compounds, and k p is a protein kernel between t wo proteins. T hus, it is possible to find the kernel k between two com pound - protein pairs with the scalar product of k c and k p . If both k c an d k p are p ositive d efinite kernel s, k is also a positive de finite kernel. The si milarity value menti oned in Sectio n 2.1 is oft en used f or k c and k p [4]. 2.3 GIP The Gaussian in teraction profile (GIP) [5] w as developed to inc orporate interaction pro files into ker nel learni ng b y va n Laar ho ven et al . The GIP kernel k GIP based o n the r a d i a l b a s i s f u n c t i on s h o w n i n E q . ( 1 ) i s u s e d f o r t h e c o m p o u n d k e r n e l k c a n d p ro t e i n kerne l k p in Eq. (3). Here, 1 2 1 1 c i n cc i c n J §· ¨¸ ©¹ ¦ v , 1 2 1 1 p i n pp i p n J §· ¨¸ ¨¸ ©¹ ¦ v . (4) Im port ant ly, k GIP is neve r use d al one for k c a n d k p , b ut is used as a multiple kernel (sim ple w eight ed av erag e) in com bina tion w ith s imi larity -base d kernels: sim , , GIP, sim , , GIP, (, ) (, ) (, ) (, ) (, ) (, ) , cc k c c pp k p p k cc k cc w k cc k pp k pp w k pp cc c cc c ( 5 ) where w k, c and w k, p are weighted parameters for multiple kernels. 5 M. Ohue, T. Yamazaki, T . Ban, and Y. Akiyam a 2.4 LIK The link i ndica tor is an i ndex used for networ k struct ura l an al ys i s, suc h a s a na l ys i s of the hyp erl ink s tr uct ure o f t he W orl d W ide Web a nd fr iend rel a tio nship s in so cia l network services. In this study, we proposed link i ndicator ker nels (LIKs) based on the link indicato rs for c ompound-protein interaction net works t o incorporate interac- tion profiles into kernel learni ng. We selected three link indi cator s: Jaccard index T LI K-Ja c 22 T (, ) k c c cc vv vv vv v v ( 6 ) Cosine simila rity T LIK-co s (, ) k c c c vv vv vv ( 7 ) LHN T LIK-LHN 22 (, ) . k c c c vv vv vv ( 8 ) These lin k indicator s beco me positive definite kernels when use d as kernels. Cosine similarity and LHN ar e positive definite kernels b ecause of the properties of th e ker- nel function 1 and the p osi ti ve de fini te o f the in ner p ro duct be twee n the two vectors, and the Jaccard index was previously proven to b e positive defi nite b y Bouc ha rd et al . [13 ]. T here are othe r li nk ind ica tor s, suc h as t he Adamic - Adar i nde x and grap h di s- tance; h owever, b ecause th e se are not positive de finite k ernels , as requir ed for kerne l methods, th ey were not used in this study. For integ ration of L I K and P KM ( similarity kernels), the sa me m ethod applied for GIP was adopted. T hat is, considering multiple kernels, the ker nels were defin ed as: sim , , LIK , sim , , LIK , (, ) (, ) (, ) (, ) (, ) (, ) , cc k c c pp k p p k cc k cc w k cc k pp k pp w k pp cc c cc c ( 9 ) where k LIK, c and k LIK, p are LIKs for tw o co mpounds and two proteins, res pectively. 2.5 Imp lementati on In th is stud y, we used sc ikit -le arn [ 14] , a P ytho n libr ar y fo r machine lear ning, to i m p l e m e n t P K M , G I P , a n d L I K . A s a k e r n e l l e a r n i n g m e t h o d , S V M c an be used for scikit-learn based on LIBSVM [15 ]. For the link indicator calculation of LIK, we used the python library networkx [16]. 1 Let : kX X uo \ be a p ositive d efinite kernel and : fX o \ be an arb itrary functi on. Then, t he kernel (, ) ()(, ) ( )(, ) kf k f X c xy x xy y xy is also positive definite. Link Mining for Kernel-base d CGBVS 6 2.6 Dataset and Perform a nce Evaluation We used th e benchmark dataset of CPI predi ctions published by Y am anish i et al . [4] accord ing to the revie w of D ing et al . [3]. It is a we ll-kno wn and well-u sed benchmark da taset in the fiel d. T he d ataset co nsisted of four t arge t pr ote in gr oup s (“Nuclear Receptor”, “GPCR”, “Ion Channel”, and “En zyme”). T he SIMCOMP sco re [11] was used for compoun d simil arity, and the normalized Smith -Waterm an score [12] w as used for protei n similarity , as calculated by Yamanish i et al . [4]. Information on the i nteraction matrix wa s also pro vided by Ya manishi et al . [4 ]. Evaluation was perform ed by cross validation (CV). T hree types of CVs were tes ted: randomly se- lected from all co mpoun d-protein pairs (pair w ise CV), randomly selected compounds (compoun d CV), and ran domly selected prot eins (protein CV). T he outli nes of t he se CVs are shown in Figure 2. A ccording to Din g et al . [3], the area under th e receiver ope rat ing c harac ter istic curve (AUROC) and t he area under the p recision-recall curve (AUPR) were calculated for the ev alua tio n va lue of 10 -fol d C Vs. Eac h accuracy value was averaged five times for 10-f old CVs with different random s eeds. Note that the cost parameter C of SVM w as opti mi ze d f rom { 0.1, 1 , 1 0, 1 00} i n 3- fol d C Vs acco rd- ing to Di ng et al . [3]. T he multiple ker nel weights w k o f E q s . ( 5 ) a n d ( 9 ) h a v e t h e same val ues for pr otei ns and co mpound s, a nd we e val uated {0 .1, 0.3, 0.5, 1}. 3 Results and Discussion 3.1 Performan ce of the Proposed Method for Cros s-Valid ation Benchm a rki ng F i g u r e 3 s h o w s t h e r e s u l t s f o r t h e p r e d i c t i o n a c c u r a c y o f t h e a ve rage val ues of t h r e e t y p e s o f C V s i n t h e f o u r Y a m a n i s h i d a t a s e t s ( i . e . , a v e r a g e values of 12 predic- tion accuracy valu es). W e test ed m ultiple kernel weights w k in four patte rns, a nd LIK with cosine similarity was the most accurate for both AUPR and AURO C ( AUP R: 0.562 and AUR OC: 0.906). In the case of cosine similarity, the wei g ht w k = 0 . 5 showed the best perform ance. Compared w it h GIP, the prediction accuracy of LIK showed higher accu rac y overall. Fig. 2. C onceptua l diagram of three type s of cross-validations (CVs): c ompoun d-wise CV, protein-w ise CV, and pairw ise CV. A c ase with a 3-fold C V is sh ow n as an ex ample . The “? ” indicates sam ples to be used f or the te st set. 100100 010011 100000 101100 ?????? ?????? 1001 ? ? 01 00 ? ? 1000 ? ? 1011 ? ? 0100 ? ? 0011 ? ? ?0 ?1 0 ? 0? 0? 1 1 10000 ? 10 ? ?00 ?10001 ?0 ?1 ?0 c 1 c 2 c 3 c 4 c 5 c 6 c 1 c 2 c 3 c 4 c 5 c 6 c 1 c 2 c 3 c 4 c 5 c 6 p 1 p 2 p 3 p 4 p 5 p 6 p 1 p 2 p 3 p 4 p 5 p 6 p 1 p 2 p 3 p 4 p 5 p 6 compound-wise CV protein-wise CV pairwise CV 7 M. Ohue, T. Yamazaki, T . Ban, and Y. Akiyam a Fig. 3. Overall predi ction accu racy for each CPI prediction met hod in 1 0-fold C V tests. The AUPR and AUROC values are a veraged values of th ree t ypes of CV s and f our types of data sets (total ave rage for 12 AUPR/A UROC values) . For 10-f old CVs, c alc ulations we re per form ed five times with different rando m seeds, and the accurac y values wer e then averaged. F i g u r e 4 s h o w s t h e m e a n v a l u e o f t h e p r e d i c t i o n a c c u r a c y f o r e a ch C V, i nclud i ng compoun d -wise , protein-w i se, and pai rwise C Vs. Div ision of the dataset i n each CV was rand oml y trie d f ive time s, and the val ues were ave ra ged. T h e multiple kernel wei g ht w k was set to the b est value i n the cro ss-validation result s (sho w n in F ig ur e 3 ). In the evaluation o f AUROC, GIP showed accuracy co mparable to t hat of t he t hre e LIKs; however, si milar results w ere not observed for A UPR. In p articul ar, we f ound that LIK showed a high value in the AUPR evaluation for compoun d - w i s e a n d p r o - tein-wise CVs, which could b e evaluated for predictive performa nce for nove l co m- pounds an d proteins, respectively. Fig. 4. Prediction accuracy of pro tein-wise, co m pound-w ise, and pair wise CVs. In rand o m predi ction cases, an AUROC valu e o f 0.5 and an AUPR valu e of 0. 035 we re obtai ned (av er- aged v alues depending on the ra tio of positive samples on t he d ataset). 0.7 0.8 0.9 1 Jaccard index cosine similarity LHN GIP AUROC wk=0.1 wk=0.3 wk=0.5 wk=1 w k = 0.1 w k = 0.3 w k = 0.5 w k = 1 0 0.1 0.2 0.3 0.4 0.5 0.6 Jaccard index cosine similarity LHN GIP AUPR 0 0.1 0.2 0.3 0.4 0.5 0.6 0.7 compound-wise CV protein-wise CV pairwise CV AUPR Jaccard index cosine similarity LHN GIP random 0.4 0.5 0.6 0.7 0.8 0.9 1 compound-wise CV protei n-wise CV pairwise CV AUROC Jaccard index cosine similarity LHN GIP random Link Mining for Kernel-base d CGBVS 8 Fig. 5. Distribu tion of values of LIKs (Jaccard index, cosin e similari ty, and L HN) and GIP given a ll protein interaction profiles in the Ya manishi da taset . 3.2 Observed Distrib ution of Li nk Indicato r Frequency Distribution of va lues o f fou r pr otein interactio n profile si mi larities ( k LIK ( p , p ') ) were calculated using each li nk i ndicator to deter mine why LIKs wit h c osi ne si milar i- ty s howed better r esults. T he histogra ms of similarit y values a re sho wn i n Fi gure 5 . From this result, the Jaccard index and LHN were found to have relatively si milar distribu tions o f similarity values between 0 and 1 (i.e., the i ntermediate value was low). Additionally, the number of pairs w hose similarities rang e d f r o m 0 . 9 5 t o 1 h a d the highest cosine similarity. T his may be related to the AUROC value of cosine simi- l a r i t y , w h i c h t e n d e d t o b e h i g h e r . C o n v e r s e l y , f o r L H N , w h i c h s ho wed the lowes t similari ty from 0.95 to 1, precision may be h i gher, and AU PR ma y tend to be higher. GIP also consisted of a few intermediate values similarly to LH N . O v e r a l l , c o s i n e similarity showed the best p erformance in this study. A gentle d istri but ion using a n intermediate value may be more effective as infor mation for the compou nd-protei n net work s tr uctur e. 3.3 Computationa l Complexity The com putationa l complexity for construct i ng the p rediction mo del for t he PKM is O ( n c 3 n p 3 ). However, the computational complexity for calculating the li nk i nd ica- Frequency Frequency Jaccard index cosine similarity LHN GIP 9 M. Ohue, T. Yamazaki, T . Ban, and Y. Akiyam a t o r s u s e d i n t h i s s t u d y w a s O ( n c n p ( n c + n p )). T hus, t he comp utat io nal co mplexi ty o f our proposed meth od w as O ( n c 3 n p 3 + n c n p ( n c + n p )). Here, n c a n d n p w ere g reater than 1 in g eneral, and thus, n c 3 n p 3 wa s gre at er t ha n n c n p ( n c + n p ). Therefore, the co mputa- tional complexity was O ( n c 3 n p 3 ) , w h i c h w a s t h e s a m e a s t h o s e o f P K M a n d G I P . O u r method could predict CPIs withou t a major increase in calculati o n t i m e . T h e e x e c u - tion time of one run of 10 runs of 10-fold C Vs is shown in Tabl e 1. The r esults in the tab le ar e sho wn for co mp utati ons r un ning on a n o rd inar y p er sona l computer with an Intel Core i5 CPU. Thus, our proposed method showed a slight in crease in the execu- tion t ime b y sever al p erce nta ge po ints a s c ompa red wit h t hat o f P K M ( “ N u c l e a r R e - ceptor” had the h ighest rate of i ncreas e due to the small datas et). 3.4 Limitatio ns and Cha llenges The proposed method can be directly applied to predi ction based on the GIP (e.g., WNNGIP [6], KBMF2K [17], and KronRLS-MKL [18]), and improvement o f p redic - tion accurac y is expected. For example, WNNGIP can provide robu s t p r e d i c t i o n s f o r compoun ds and proteins with less i nteracti on information by complementing the in- teraction matrix with the weighted nearest neighbor method in a dvance. However, kernel- based methods, including the proposed met hod, are restri cted to the framework using kernel functions. For ex a mple, it is not possible to simp l y c o m b in e L I K o r G I P with the method based on matrix factorization (e.g., NRLMF [19] ). Further mathe- matical ideas a nd computational ex periments are needed to devel op in tegrated meth- ods. 4 Conclusions In this study, we proposed a kernel method using link indicator s f r o m t h e v i e w - point of link m ini ng to utilize the infor mation of the CPI netw ork fo r mac hine le ar n- ing. We attempted to utilize three link indicators (Jaccar d ind ex, cosine similarity , and LHN) for construction of pos itive def inite kernels and co mpared t h e m w i t h t h e G I P m e t h o d w h e n c o m b i n e d w i t h t h e S V M - b a s e d P K M m e t h o d . A s a r e s u l t , learning by multiple kernels using LIK with cosine similarity and s etti ng o f the kernel weight w k to 0.5 sh o wed the best prediction accuracy (averaged AUPR = 0.5 62). In both AUROC and AUPR , the improvement of LIK accuracy was confirmed c omp ared wit h t ha t of G IP . Table 1. Com parison of calculation t imes for the PKM a nd propos ed metho d in each datase t. The tim e take n to calcula te one ti me out of 10 ca lculations of 10-f old CVs is s hown. Nuclear Receptor GP CR Ion Channe l Enzym e Conventio nal (PKM) [sec ] 0.0680 4.86 24.1 232 Propose d (PKM plus LIK) [sec ] 0.0850 5.17 24.8 239 Increase rate (%) 25 6.4 2 .9 3. 3 Link Mining for Kernel-bas ed CGBVS 10 Acknowledg ments . This work was partially supported by the Japan Society for th e Promotion of Science (JSPS) KAKENHI (grant numbers 24240044 and 15K16081), and Core Research for Evolutional Scien ce and T echnology (CREST ) “Extrem e Big Data” (grant n umber JPMJCR 1303) from the Japan Scien ce and T ech nolo gy Agenc y (JST). References 1. Lavecchia, A.: Machine-learning approaches in drug discovery: m et hod s and a p- plication s. Drug Discov. Today . 20, 318–331 (2015). 2. Drwal, M.N., Griffith, R.: Co mbination of liga nd- and struc ture -based methods in virtu a l screening. Drug Discov. Today Technol. 10, e 395–e401 (2 01 3). 3. Ding, H., T akigawa, I., Mamitsuka, H., Zhu, S.: S imilarity-base d m ach ine le arn ing methods for predicting drug-t arget interactions: a brief revie w . Brief. B ioinform. 15, 734–747 (2014). 4. Yamanishi, Y., Araki, M., G utteridge, A ., Honda, W., Kaneh isa, M.: Prediction of drug-target interaction networks from the integration of chemic a l a n d g e n o m i c spaces. Bioin formatics. 24, i232–i24 0 (2008). 5. van Laar hove n, T ., Nab uurs, S. B., Marc hiori , E.: Ga ussia n inte r ac tio n pro file ker- nels for pred icting drug-target interaction. Bioinfor matics. 27 , 3036–3043 (2011). 6. van La ar hove n, T ., M archi or i, E.: P redic ting dr ug -tar get int er a ctio ns for ne w drug compoun ds using a weigh ted nearest neighbor profil e. PLoS ONE. 8, e6695 2 (2013). 7. Ja cob , L., Ver t, J .P.: P rote in-liga nd int erac tio n p red ictio n: a n impr oved c he moge - nomi cs approach. Bioin formatics . 24, 2149–2156 (2008 ). 8. Daminelli, S., Tho mas, J.M., Durá n, C., Cannistraci, C.V.: Co mm on neighbou rs and th e local-community- para digm f or topologi cal link predict io n in bipartite net- works. New J. Phys. 17, 11 3037 (2015 ) 9. Durán, C., Daminelli, S., Thomas, J.M., Haupt, V.J ., Schroeder, M . , C a n n i s t r a c i , C.V.: Pioneering topological methods f or network-based drug-tar get pr ediction by exploiting a brain-network self-organizatio n theor y. Brief Bioi nform . (2017) [Epub ahead of prin t] 10. Rogers, D., Hahn, M.: Extended-conn ectivity fingerprints. J. Ch em. Inf. Model. 50, 742–754 (2010). 11. S m i t h , T . F . , W a t e r m a n , M . S . : I d e n t i f i c a t i o n o f c o m m o n m o l e c u l a r subsequences. J. Mol. Bi ol. 147, 195–197 (1981). 12. Hattori, M., Okuno, Y., Goto, S., Kanehisa, M.: Development of a chemical str uc- ture compariso n method for integrated analysis of chemical and geno mic in for - mation i n the m etabolic pathway s. J. Am. Chem. Soc. 125, 11853 – 11865 (2003). 13. Bouchard, M., Jouss el me, A .-L., Doré, P.-E.: A proof for the po sitive definitene ss of the Jaccard in dex matrix. Int. J. A pprox. Reason . 54, 615–62 6 (201 3). 14. sciki t-le ar n: mac hi ne le arni ng i n Pytho n, http://scikit-learn.org/stable/ (accessed 27 March 2017). 15. C h a n g , C . - C . , L i n , C . : L I B S V M : A l i b r a r y f o r s u p p o r t v e c t o r m a c hines. ACM Trans. Intel l. Sy st. Technol. 2, 1–27 (2011) . 11 M. Ohue, T. Yamazaki, T . Ban, and Y. Akiy ama 16. NetworkX- High-produ ctivity softw are for complex networks, https://networkx.github.io/ (accessed 27 March 2017). 17. Gonen, M.: P redicting drug-ta rget interactions f rom che mical an d geno mic kernel s using Bayesian matrix factorizatio n. Bioinformatics. 28, 2 304–2 310 (2012). 18. Nascimento, A.C. A, Prudêncio, R .B.C., Costa, I .G.: A multiple k ernel learning al- gorithm for drug-target i nteraction pred iction. BMC Bioinformat ics. 17, 46 (2016) . 19. Liu, Y., Wu, M., Miao, C., Zhao, P., Li, X.-L: Neighborhood reg u larized logistic matrix f acto rization for dr ug-target interactio n prediction. P L oS Co mput. Biol. 1 2, e1004760 (2016).
Original Paper
Loading high-quality paper...
Comments & Academic Discussion
Loading comments...
Leave a Comment