CrediBench: Building Web-Scale Network Datasets for Information Integrity
Paper
•
2509.23340
•
Published
domain
stringlengths 4
36
| scores
float32 0.05
0.96
|
|---|---|
ae.aratech
| 0.364571
|
ae.chartford
| 0.581489
|
ae.easylease
| 0.618067
|
ae.eurotravel
| 0.410364
|
ae.infinitycare
| 0.334609
|
ae.totalpropertycare
| 0.621175
|
ae.translatedubai
| 0.58595
|
ae.trizac
| 0.712081
|
aero.ecube
| 0.553948
|
af.barg
| 0.743987
|
ai.mistral
| 0.668673
|
al.mcn
| 0.713027
|
am.arabkirjmc
| 0.556351
|
art.hotm
| 0.538518
|
asia.iasas
| 0.426042
|
asia.persian
| 0.544783
|
asia.shorturl
| 0.618023
|
asia.webguru
| 0.536485
|
at.bestattung-eckl
| 0.543191
|
at.bgmedia
| 0.631725
|
at.ewk-zell
| 0.538971
|
at.fotostix
| 0.571567
|
at.intma
| 0.562272
|
at.kutschergwoelb
| 0.294618
|
at.seeya
| 0.623948
|
at.visionline
| 0.338984
|
audio.starsongspk
| 0.676526
|
az.sam
| 0.542328
|
az.teleradio
| 0.554272
|
ba.human
| 0.69387
|
ba.smiljiccompany
| 0.391096
|
ba.travnicki
| 0.642656
|
be.almanach
| 0.619887
|
be.brandactivators
| 0.406647
|
be.brugge
| 0.304604
|
be.bxlblog
| 0.590451
|
be.culeau
| 0.492005
|
be.decrolyschool
| 0.410023
|
be.koo
| 0.347498
|
be.mantagraphic
| 0.586069
|
be.moskenes
| 0.415647
|
be.popcom
| 0.279576
|
be.roulet
| 0.667581
|
be.sante-solidarite
| 0.443317
|
be.vil
| 0.589599
|
be.werelddorpenvoorkinderen
| 0.660125
|
be.wimbou
| 0.610059
|
bg.balkanstudies
| 0.568295
|
biz.bursaasia
| 0.472904
|
biz.newyorkjewelers
| 0.384718
|
biz.pcgamesinsider
| 0.560295
|
biz.stop-wise
| 0.671527
|
biz.ur0
| 0.441452
|
bo.hab
| 0.63471
|
br.com.preventsenior
| 0.579935
|
br.ufrn
| 0.55856
|
by.belgastechnika
| 0.657207
|
by.bsac
| 0.690414
|
by.impuls-flora
| 0.63494
|
by.mebelpro
| 0.650554
|
by.moto-velo
| 0.571941
|
by.moydom
| 0.658479
|
by.promsegment
| 0.586444
|
by.standartcsk
| 0.410872
|
bz.lnk
| 0.653893
|
bz.onl
| 0.634232
|
ca.123people
| 0.642014
|
ca.411directoryassistance
| 0.742585
|
ca.academy
| 0.625304
|
ca.akufen
| 0.549201
|
ca.alhaadi
| 0.251663
|
ca.archivecdbooks
| 0.588784
|
ca.atelierrestaurant
| 0.530516
|
ca.atlanticcharter
| 0.530935
|
ca.axa
| 0.56935
|
ca.bancgroup
| 0.394931
|
ca.beaubois
| 0.603868
|
ca.betteroffduds
| 0.480654
|
ca.boatdealers
| 0.491545
|
ca.bohc
| 0.644731
|
ca.caep
| 0.647944
|
ca.cafepress
| 0.641092
|
ca.cags-accg
| 0.670093
|
ca.canadianglycomics
| 0.619528
|
ca.canlit
| 0.654212
|
ca.catskiing
| 0.565359
|
ca.chauffagethermopompeclimatisation
| 0.616632
|
ca.cielvariable
| 0.509558
|
ca.circulars
| 0.41313
|
ca.clearconceptinc
| 0.604014
|
ca.crkn
| 0.579622
|
ca.cwc
| 0.638451
|
ca.digital-copyright
| 0.650606
|
ca.dundasdental
| 0.601693
|
ca.eagleeyeconcrete
| 0.763872
|
ca.equinecanada
| 0.630074
|
ca.espacepourlavie
| 0.619385
|
ca.exclaim
| 0.657273
|
ca.family-medicine
| 0.669614
|
ca.fnuniv
| 0.568327
|
CrediPred is the set of inferred scores developed by our graph-based models.
The CrediPred dataset is the set of inferred credibility scores output by our trained GNN-based models. It follows the same time granularity -- monthly -- as the webgraphs we use to train these models. Scores are available for all nodes in the corresponding month's webgraph (for more information about our webgraps, refer to CrediGraph - GitHub.